I got to thinking about SuperIO and how it could be used as a swiss army chainsaw to open files, whereever they might be on the net. From there, my fevered mind got to thinking about cat and how the two could be used together. That said, I present ucat — a universal cat, if you will, which does not need to be herded, but rather will do as you ask. It’s expecting to be able to find SuperIO, so you’ll need to make it available.
Other than SuperIO, everything which ucat uses is part of the standard ruby distribution. I’ll admit that this was a good opportunity for me to learn OptionParser — I’ll touch on some of what I’ve learned below.
SuperIO handles the opening of the files, so I first wanted to find the arguments and behaviours I was going to support.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
$ cat --help Usage: cat [OPTION] [FILE]... Concatenate FILE(s), or standard input, to standard output. -A, --show-all equivalent to -vET -b, --number-nonblank number nonblank output lines -e equivalent to -vE -E, --show-ends display $ at end of each line -n, --number number all output lines -s, --squeeze-blank never more than one single blank line -t equivalent to -vT -T, --show-tabs display TAB characters as ^I -u (ignored) -v, --show-nonprinting use ^ and M- notation, except for LFD and TAB --help display this help and exit --version output version information and exit With no FILE, or when FILE is -, read standard input. Report bugs to . |
Ok, I have a list of behaviours. Now let’s get parsing the options:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
options = {} OptionParser.new do |opts| opts.banner = "Usage: ucat.rb [options]" opts.on("-A", "--show-all", "equivalent to -vET") do |opt| options[:ends] = true options[:non_printing] = true options[:tabs] = true end opts.on("-b", "--number-non-blank", "number nonblank output lines") do |opt| options[:non_blank] = true end opts.on("-e", "equivalent to -vE") do |opt| options[:ends] = true options[:non_printing] = true end opts.on("-E", "--show-ends", "display $ at end of line") do |opt| options[:ends] = true end opts.on("-n", "--number", "number all output lines") do |opt| options[:number] = true end opts.on("-s", "--squeeze-blank", "never more than one single blank line") do |opt| options[:squeeze_blank] = true end opts.on("-t", "equivalent to -vT") do |opt| options[:non_printing] = true options[:tabs] = true end opts.on("-T", "--show-tabs", "display TAB characters as ^I") do |opt| options[:tabs] = true end opts.on("-u", "(ignored)") do |opt| end opts.on("-v", "--show-nonprinting", "use ^ and M- notation, except for LFD and TAB") do |opt| options[:non_printing] = true end opts.on_tail("--version", "output version information and exit") do |opt| puts Ucat.version.join(".") exit end opts.on_tail("--license", "output license information and exit") do |opt| puts "Unless where otherwise noted, Copyright 2008, by Matthew K Williams." puts "Released under an MIT License." exit end end.parse! |
One of the things which I learned is that anything which is to be considered an argument can be put in the opts.on
method call preceeded by a ‘-‘ — ‘-a’ or ‘–absolutely-nothing’ are both examples. The usage explanation follows the arguments. Unlike the example which OptionParser
provides, you don’t need to give both a “-” and “–” type of option for each one.
Ok, let’s start the class.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
class Ucat @@line_number = 0 def self.version [0,0,1] end def initialize(arg, options) @options = { :ends => false, :non_printing => false, :tabs => false, :non_blank => false, :number => false, :squeeze_blank => false }.merge(options) arg = STDIN if arg == "-" cat(arg) end def cat(arg) prev_line = nil f = SuperIO(arg) begin while (line = f.readline.chop) if @options[:squeeze_blank] next if (line == "") && (prev_line == "") end prev_line = line # handle the newline line = "#{line}$" if @options[:ends] # handle tabs line.gsub!(/\t/, "^I") if @options[:tabs] # handle show non-printing characters line = non_printing(line) if @options[:non_printing] # line numbering line = numbering(line) if (@options[:number] || @options[:non_blank]) puts line end rescue EOFError f.close end end end |
Yes, I know, not all the code is there; we’ll add the missing methods in a moment.
We’re keeping line_number
as a class variable because we want to be able to keep our count through more than one file without resetting it.
The initializer just sets things up for processing. One thing that is kind of interesting — we have options which have been set up based on the command line options, and we merge them with the defaults so that we always have a “safe” value. Also, we’re using ‘-‘ as the unix standard for STDIN, so if Ucat
is passed that string, it does the modification to handle it properly.
cat
is where the majority of the work occurs. It opens, via SuperIO
a file, then reads it line by line. For each line it does the following:
- Remove the newline character
- Checks to see if we need to squeeze the blanks.
- Handles newline if required.
- Handles tabs if required
- Handles the non-printing characters if required
- Handles line numbering
- Finally outputs the result
Of interest is the non_printing
method:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
def non_printing(line) proc = "" line.each_byte do |c| proc += case c when (0 .. 8): "^#{(c + 64).chr}" when (10 .. 11): "^#{(c + 64).chr}" when (13 .. 26): "^#{(c + 64).chr}" when (27 .. 31): "^#{%w([ \\ ] ^ _)[c - 27]}" when 127: "^?" when ((c & 128) == 128): "M-0#{c.to_s(8)}" else c.chr end end proc end |
It looks at each byte and determines if anything special needs to be done with it. I suggest looking at an ascii character set reference to determine what characters are output. I am ignoring TAB and LFD, per the spec. For anything above 127, I’m outputting “M-” followed by the octal value of the character.
Taking something with a number of control characters, you can see it in action:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
$ man man > manpg $ head manpg | ruby ucat.rb -vn 000001 man(1) man(1) 000002 000003 000004 000005 N^HNA^HAM^HME^HE 000006 man - format and display the on-line manual pages 000007 000008 S^HSY^HYN^HNO^HOP^HPS^HSI^HIS^HS 000009 m^Hma^Han^Hn [-^H-a^Hac^Hcd^Hdf^HfF^HFh^Hhk^HkK^HKt^Htw^HwW^HW] [-^H--^H-p^Hpa^Hat^Hth^Hh] [-^H-m^Hm _^Hs_^Hy_^Hs_^Ht_^He_^Hm] [-^H-p^Hp _^Hs_^Ht_^Hr_^Hi_^Hn_^Hg] [-^H-C^HC _^Hc_^Ho_^Hn_^Hf_^Hi_^Hg_^H__^Hf_^Hi_^Hl_^He] 000010 [-^H-M^HM _^Hp_^Ha_^Ht_^Hh_^Hl_^Hi_^Hs_^Ht] [-^H-P^HP _^Hp_^Ha_^Hg_^He_^Hr] [-^H-S^HS _^Hs_^He_^Hc_^Ht_^Hi_^Ho_^Hn_^H__^Hl_^Hi_^Hs_^Ht] [_^Hs_^He_^Hc_^Ht_^Hi_^Ho_^Hn] _^Hn_^Ha_^Hm_^He _^H._^H._^H. $ |
Or, passing in a url:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
$ ruby ucat.rb http://www.google.com Google<!-- body,td,a,p,.h{font-family:arial,sans-serif}.h{font-size:20px}.h{color:#3366cc}.q{color:#00c}.ts td{padding:0}.ts{border-collapse:collapse}#gbar{height:22px;padding-left:2px}.gbh,.gbd{border-top:1px solid #c9d7f1;font-size:1px}.gbh{height:0;position:absolute;top:24px;width:100%}#gbi,#gbs{background:#fff;left:0;position:absolute;top:24px;visibility:hidden;z-index:1000}#gbi{border:1px solid;border-color:#c9d7f1 #36c #36c #a2bae7;z-index:1001}#guser{padding-bottom:7px !important}#gbar,#guser{font-size:13px;padding-top:1px !important}@media all{.gb1,.gb3{height:22px;margin-right:.73em;vertical-align:top}#gbar{float:left}}.gb2{display:block;padding:.2em .5em}a.gb1,a.gb2,a.gb3{color:#00c !important}.gb2,.gb3{text-decoration:none}a.gb2:hover{background:#36c;color:#fff !important} --><script type="text/javascript"><!--mce:0--></script> <div id="gbar"><strong class="gb1">Web</strong> <a class="gb1" onclick="gbar.qs(this)" href="http://images.google.com/imghp?hl=en&tab=wi">Images</a> <a class="gb1" onclick="gbar.qs(this)" href="http://maps.google.com/maps?hl=en&tab=wl">Maps</a> <a class="gb1" onclick="gbar.qs(this)" href="http://news.google.com/nwshp?hl=en&tab=wn">News</a> <a class="gb1" onclick="gbar.qs(this)" href="http://www.google.com/prdhp?hl=en&tab=wf">Shopping</a> <a class="gb1" href="http://mail.google.com/mail/?hl=en&tab=wm">Gmail</a> <a class="gb3" onclick="this.blur();gbar.tg(event);return !1" href="http://www.google.com/intl/en/options/"><span style="text-decoration: underline;">more</span> <small>?</small></a> <div id="gbi"><a></a> <a class="gb2" onclick="gbar.qs(this)" href="http://video.google.com/?hl=en&tab=wv">Video</a> <a class="gb2" onclick="gbar.qs(this)" href="http://groups.google.com/grphp?hl=en&tab=wg">Groups</a> <a class="gb2" onclick="gbar.qs(this)" href="http://books.google.com/bkshp?hl=en&tab=wp">Books</a> <a class="gb2" onclick="gbar.qs(this)" href="http://scholar.google.com/schhp?hl=en&tab=ws">Scholar</a> <a class="gb2" onclick="gbar.qs(this)" href="http://finance.google.com/finance?hl=en&tab=we">Finance</a> <a class="gb2" onclick="gbar.qs(this)" href="http://blogsearch.google.com/?hl=en&tab=wb">Blogs</a> <a></a> <a class="gb2" onclick="gbar.qs(this)" href="http://www.youtube.com/?hl=en&tab=w1">YouTube</a> <a class="gb2" href="http://www.google.com/calendar/render?hl=en&tab=wc">Calendar</a> <a class="gb2" onclick="gbar.qs(this)" href="http://picasaweb.google.com/home?hl=en&tab=wq">Photos</a> <a class="gb2" href="http://docs.google.com/?hl=en&tab=wo">Documents</a> <a class="gb2" href="http://www.google.com/reader/view/?hl=en&tab=wy">Reader</a> <a class="gb2" href="http://sites.google.com/?hl=en&tab=w3">Sites</a> <a></a> <a class="gb2" href="http://www.google.com/intl/en/options/">even more »</a></div> </div> <div id="guser" style="font-size:84%;padding:0 0 4px"><a href="/url?sa=p&pref=ig&pval=3&q=http://www.google.com/ig%3Fhl%3Den%26source%3Diglk&usg=AFQjCNFA18XPfgb7dKnXfKz7x7g1GDH1tg">iGoogle</a> | <a href="https://www.google.com/accounts/Login?continue=http://www.google.com/&hl=en">Sign in</a></div> <br id="lgpd" /><a href="/search?q=Large+Hadron+Collider&hl=en"><img title="Large Hadron Collider (LHC)" src="/logos/lhc.gif" border="0" alt="Large Hadron Collider (LHC)" width="330" height="125" /></a> <form action="/search"> <table border="0" cellspacing="0" cellpadding="0"> <tbody> <tr valign="top"> <td width="25%"></td> <td align="center"><input name="hl" type="hidden" value="en" /><input name="ie" type="hidden" value="ISO-8859-1" /><input title="Google Search" maxlength="2048" name="q" size="55" /> <input name="btnG" type="submit" value="Google Search" /><input name="btnI" type="submit" value="I'm Feeling Lucky" /></td> <td width="25%"><span> <a href="/advanced_search?hl=en">Advanced Search</a> <a href="/preferences?hl=en">Preferences</a> <a href="/language_tools?hl=en">Language Tools</a></span></td> </tr> </tbody></table> </form> <span><a href="/aclk?sa=L&ai=B_Zkz8EPISJ-8F4PwefW44KAH_bH2dJno_6QJwdmc2RPQhgMQARgBIMFUOABQwbGBu_z_____AWDJBg&num=1&sig=AGiWqtwnx0RpL_ua6vjw5jfFxLo04SiemA&q=http://www.google.com/help/ig/art/">Make your homepage beautiful</a> with art by leading designers</span> <span><a href="/intl/en/ads/">Advertising Programs</a> - <a href="/services/">Business Solutions</a> - <a href="/intl/en/about.html">About Google</a></span> <span>©2008 - <a href="/intl/en/privacy.html">Privacy</a></span> <script type="text/javascript"><!--mce:1--></script><script type="text/javascript"><!--mce:2--></script><script type="text/javascript"><!--mce:3--></script> |
You can download it: ucat.rb
1 ping
Ramblings » Blog Archive » universal cat redux
September 11, 2008 at 2:39 pm (UTC -5) Link to this comment
[…] Amos King’s blog entry on Inject & Me – BFFs got me to thinking that ucat (see cat on steroids (or cat on a hot ruby roof)) could use inject as opposed to the each_byte. So, instead of def non_printing(line) proc = […]