«

»

Sep 10

cat on steroids (or cat on a hot ruby roof)

I got to thinking about SuperIO and how it could be used as a swiss army chainsaw to open files, whereever they might be on the net.  From there, my fevered mind got to thinking about cat and how the two could be used together.  That said, I present ucat — a universal cat, if you will, which does not need to be herded, but rather will do as you ask.  It’s expecting to be able to find SuperIO, so you’ll need to make it available.

Other than SuperIO, everything which ucat uses is part of the standard ruby distribution.  I’ll admit that this was a good opportunity for me to learn OptionParser — I’ll touch on some of what I’ve learned below.

SuperIO handles the opening of the files, so I first wanted to find the arguments and behaviours I was going to support.

Ok, I have a list of behaviours. Now let’s get parsing the options:

One of the things which I learned is that anything which is to be considered an argument can be put in the opts.on method call preceeded by a ‘-‘ — ‘-a’ or ‘–absolutely-nothing’ are both examples. The usage explanation follows the arguments. Unlike the example which OptionParser provides, you don’t need to give both a “-” and “–” type of option for each one.

Ok, let’s start the class.

Yes, I know, not all the code is there; we’ll add the missing methods in a moment.

We’re keeping line_number as a class variable because we want to be able to keep our count through more than one file without resetting it.

The initializer just sets things up for processing. One thing that is kind of interesting — we have options which have been set up based on the command line options, and we merge them with the defaults so that we always have a “safe” value. Also, we’re using ‘-‘ as the unix standard for STDIN, so if Ucat is passed that string, it does the modification to handle it properly.

cat is where the majority of the work occurs. It opens, via SuperIO a file, then reads it line by line. For each line it does the following:

  1. Remove the newline character
  2. Checks to see if we need to squeeze the blanks.
  3. Handles newline if required.
  4. Handles tabs if required
  5. Handles the non-printing characters if required
  6. Handles line numbering
  7. Finally outputs the result

Of interest is the non_printing method:

It looks at each byte and determines if anything special needs to be done with it. I suggest looking at an ascii character set reference to determine what characters are output. I am ignoring TAB and LFD, per the spec. For anything above 127, I’m outputting “M-” followed by the octal value of the character.

Taking something with a number of control characters, you can see it in action:

Or, passing in a url:

You can download it: ucat.rb

1 ping

  1. Ramblings » Blog Archive » universal cat redux

    […] Amos King’s blog entry on Inject & Me – BFFs got me to thinking that ucat (see cat on steroids (or cat on a hot ruby roof)) could use inject as opposed to the each_byte. So, instead of def non_printing(line) proc = […]

Leave a Reply

%d bloggers like this: