FormatR (Module)

In: format.rb
Formatter ScientificNotationFormatter CenterFormatter RightFormatter LeftFormatter NumberFormatter Array FormatHolder Exception FormatException Format FormatReader FormatEntry FormatR Module: FormatR
Name:FormatR
Description:Perl like formats for ruby
Author:Paul Rubel (prubel@sourceforge.net)
Release:1.09
Homepage:formatr.sourceforge.net
Date:29 January 2005
License:You can redistribute it and/or modify it under the same term as Ruby. Copyright © 2002,2003,2005 Paul Rubel
    THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR
    IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
    WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
    PURPOSE.

To Test this code:

Try test_format.rb with no arguments. If nothing is amiss you should see OK (??/?? tests ?? asserts). This tests the format output against perl output (which is in the test directory if you don’t have perl). If you would like to see the format output try test_format.rb —keep which will place the test’s output in the file format_testfile{1-10}

Usage

Class FormatR::Format in module FormatR provides perl like formats for ruby. For a summary of the methods you’re likely to need please see FormatR::Format. Formats are used to create output with a similar format but with changing values.

For example:

    require "format.rb"
    include FormatR

    top_ex = <<DOT
       Piggy Locations for @<< @#, @###
                         month, day, year

    Number: location              toe size
    -------------------------------------------
    DOT

    ex = <<TOD
    @)      @<<<<<<<<<<<<<<<<       @#.##
    num,    location,             toe_size
    TOD

    body_fmt = Format.new (top_ex, ex)

    body_fmt.setPageLength(10)
    num = 1

    month = "Sep"
    day = 18
    year = 2001
    ["Market", "Home", "Eating Roast Beef", "Having None", "On the way home"].each {|location|
        toe_size = (num * 3.5)
        body_fmt.printFormat(binding)
        num += 1
    }

When run, the above code produces the following output:

       Piggy Locations for Sep 18, 2001

    Number: location              toe size
    -------------------------------------------
    1)      Market                   3.50
    2)      Home                     7.00
    3)      Eating Roast Beef       10.50
    4)      Having None             14.00
    5)      On the way home         17.50

More examples are found in test_format.rb

Supported Format Fields

Standard perl formats

These are explained at www.perldoc.com/perl5.6.1/pod/perlform.html and include:

  • left justified text, @<<<
  • right justified text, @>>
  • centered text @||| all of whose length is the number of characters in the field.
  • It also supports fields that start with a ^ which signifies that the input is a large string and after being printed the variable should have the printed portion removed from its value.
  • Numeric formats of the form @##.## which let you decide where you want a decimal point. It will add extra zeroes to the fractional part but if the whole portion is too big will write it out regardless of your specification (regarding the whole as more important than the fraction).
  • A line that contains a ~ will be suppressed if it will be blank
  • A line that contains ~~ will repeat until it is blank, be sure to use this feature with at least one field starting with a ^.

Scientific formats of the form @.G##, @.g##, @.E##, and @.e##

  • The use of G, g, E, and e is consistent with their use in printf.
  • If a G or g is specified the number of characters before the exponent, excluding the decimal point, will give the number of significant figures to be used in the output. For example: @.#G### with the value 1.234e-14 will print 1.23E-14 which has 3 significant figures. This format @##.##g### with the value 123.4567E200 produces 1.23457e+202, with 6 significant figures. The capitalization of G effects whether the e is lower- or upper-case.
  • If a E or e is used the number of hashes between the decimal point and the E or e tells how many digits to print after the decimal point. The number of hashes after the precision argument just adds to the number of spaces available, I can’t see how to reasonably adjust that given the other constraints. For example the format @##.E### with the value 123.4567E200 produces 1.2E+202 since there is only one hash after the decimal point.
  • More examples of using the scientific formats can be found in test_format.rb

Reading in output printed by formats, FormatR::FormatReader

The class FormatR::FormatReader can be used to read in text that has been output with a given format and attepmts to extract the values of the variables used as the input. It does a good job of simple formats, I’m sure that there are complex ones that can confuse it. Multi-line formats are supported but as the program can’t be sure what the initial input looked like, and how it was broken across lines, every piece of a line is made to have at least one space after it.

For example: if you had the following format:

 ~~^<<
 var

and you fed it the string abcdef you would get the following:

   abc
   def

But when var was assigned to it would be var = ‘abc def’

I don’t know how to decide which is better. Perhaps an argument would help

The classes of variables

It’s not always possible to infer the class of the variable that made the format. By not taking in a binding to compare with many variables will end up as strings. Numeric formats should come out as numbers but all others will be strings and will need to be converted manually.

Using FormatR::FormatReader

Using the FormatReader is relatively simple. You pass in a format to the constructor and then call readFormat and give in an array of formatted text. It will return a hash with the key/value pairs of the variables in the format. It can also be called with a block that is passed the hash.

For example:

   f = []
   # make a format
   f.push( '<?xml version="1.0"?>' )
   f.push( '@@@ Blah @@@ }Blah @< @|| @#.#' )
   f.push( 'var_one,var_one,var_one,var_one,var_one,var_one,' +
          ' var_two, var_three, var_four')
   f.push( '@<<< @<<<')
   f.push( 'var_one,var_one')
   format = Format.new(f)

   #set values and print it out.
   var_one, var_two, var_three, var_four = 1, 2, 3, 4.3
   output_filename = "format_testfile12"
   File.open( output_filename, File::CREAT | File::WRONLY | File::TRUNC ) { |file|
     format.io = file
     format.printFormat(binding)
   }
   # read in the output
   output = []
   File.open( output_filename ){ |file|
     output = file.readlines()
   }

  # make a new FormatReader
  reader = FormatReader.new (format)
  # Read in the values
   res = reader.readFormat (output)
  # Check that the values are correct
   assert (res['var_one'] == var_one.to_s)
   assert (res['var_two'] == var_two.to_s)
   assert (res['var_three'] == var_three.to_s)
   assert (res['var_four'] == var_four)

  # or using a block for reading multiple lines:
   reader.readFormat (output) do |res|
     assert (res['var_one'] == var_one.to_s)
     assert (res['var_two'] == var_two.to_s)
     assert (res['var_three'] == var_three.to_s)
     assert (res['var_four'] == var_four)
   end

Changes:

1.09

  • Added a block form of readFormat that lets you loop through output instead of having to make your own loop

1.08

  • Moved to Test::Unit from RubyUnit.
  • Made things work with 1.8.0pre releases. Hopefully we’ll be ready for 1.8.0 when it finally comes out while maintaining 1.6.x compatability.

1.07

  • You can now use formats without having to use eval. If you pass in a hash of names to values that can be used instead. There is also an optimization you can use by calling format.useHash(true) that will turn your binding into a hash while the format is being printed. This may speed things up. The default is still to use eval so that things do not break as some dynamic formats may not work with a hash. When a value is computed using side effects of some other evaluation that has taken place while printing the format a hash won’t work. You can also use the printFormatWithHash method is you want to avoid evaling entirely. test_four in test_format.rb shows one example of how to use hashes to print formats
  • Page numbers are now working correctly. Before if you had a page number in a header or footer it was problematic. The printing of a page has been refactored and now works much better.
  • Thanks to Amos Gouaux for suggesting the setLinesLeft method!

1.06

  • I thought that the ~ had to be in the front of the picture line, this isn’t so. If you place the ~~ anywhere in the line it will repeat until the line is empty.
  • Added the FormatReader to read in formatted text and get values back

1.05

  • Hugh Sasse sent in a patch to clean up warnings. I was sloppy with my spacing but hopefully have learned better. Thanks Hugh!
  • Fixed a bug in repeating lines using ~~ when the last line wouldn’t get placed correctly unless it ended with a ’ ‘
  • Fixed a bug where a line that started with a <,>, or | would loose this character if there wasn’t a @ or ^ before it. The parsing of the non-picture parts of a picture line is greatly improved.

1.04

  • Added a scientific notation formatter so you can use @#.##E##, @##.#e##, @#.##G##, or @##.#g##. The use of G and E is consistent to their use in printf. If a G or g is specified the number of characters before the exponent excluding the decimal point will give the number of significant figures to be used in the output. If a E or e is used the number of hashes between the decimal point and the E tells how many digits to print after the decimal point. The number of hashes after the E just adds to the number of spaces available, I can’t see how to reasonably adjust that given the other constraints.

1.03

  • If perl isn’t there use cached output to test against.
  • better packaging, new versions won’t write over the older ones when you unpack
  • Changed the Format.new call. In the past you could pass in an IO object as a second parameter. You now need to use the Format.io= method as the signature of Format.new has changed as shown below. None of the examples used the second parameter so hopefully it’s safe to change
  • Added optional arguments to Format.new so you can set top, body, and middle all at once like so Format.new(top, middle, bottom) or even Format.new(top, middle). If you want a bottom without a top you’ll either need to call setBottom or pass nil or an empty format for top like so Format.new (nil, middle, bottom)
  • Made the testing script clean up after itself unless you pass the -keep flag
  • Modified setTop and setBottom so you can pass in a string or an array of strings that can be used to specify a format instead of having to create one yourself. Thanks again to Hugh Sasse for not settling for a second rate interface.
  • Move test_format.rb over to runit.
  • Added functionality so that if you pass in a format string, or array of strings to setTop or setBottom it does the right thing. This way you don’t need to make the extra formats just to pass them in.

1.02

  • Allow formats to be passed in as arrays of strings as well as just long strings
  • Added functionality so that if the first format on a page is too long to fit on that page it will be printed partially with a bottom. Perl seems to just print the whole thing and ignore the page size in this case.
  • Fixed a bug where if your number didn’t have a fractional part it would crash if you used a format that need a fractional portion like @##.##
  • On the recommendation of Hugh Sasse added finishPageWithoutFF(aBinding, io=@io) and finishPageWithFF(aBinding, io=@io) which will print out blank lines until the end of the page and then print the bottom, with and without a ^L. Only works on fixed sized bottoms.

1.01

  • Moved to rdoc for generating documentation.

1.00

  • Bottoms work iff you have a fixed size format and print out a top afterwords. This means that you will only get a bottom if you will print a top right after it so the last format page you print won’t have a bottom. It’s impossible to figure out if you are done with the format and therefore need to print the bottom. Perhaps in a future release we can just take fixed sized bottoms off the available size and get them to work that way.
  • Added support for Format.pageNumber()
  • Support ~ to be a space
  • Support ~ to suppress lines when the variables are empty
  • Support ~~ to repeat until the variables are empty
  • Support comments. If the first character in a line is a # the line is a comment
  • Testing now compares against perl, it’s a bit easier than writing the tests manually.

0.93

  • Added support for the ^ character to start a format

0.92

  • Added end of page characters and introduced line counts.
  • Added the ability to manipulate the line count in case you write to the file handle yourself
  • Added format sizes. They just give the number of lines in the current format. They don’t try to iterate and get some total count including tops and bottoms.

Incompatibilities/Issues

  • If you use bottom be sure to check that you’re happy with the output. It doesn’t currently work with variable sized bottoms. You can use the finishPageWith{out}FF(…) methods to print out a bottom if you’re done printing but haven’t finished a page.
  • Watch out for @#@??? as formats, see [ruby-talk:27782] and [ruby-talk:27734]. This should be fixed in a future version of ruby. The basic problem is that the here documents are equivalent to "" and not ’’, they will evaluate variables in them. If this is a problem be sure to just make a long string with ’’ and pass that in. You can also pass in a string of arrays.
  • Rounding seems to be broken in perl, if you try to print the following format with 123.355 you won’t get the same answer, you’ll get 123.35 and 123.36. FormatR rounds up and plans to unless there is a convincing reason not to.
     format TEST_FORMAT =
       ^#.### ^##.##
     $num,  $num
    

    I’m betting that perl must use round to even or odd. this needs to be looked into

To Do/Think about:

  • Have a format that chops lines that are too long to fit in the specified space
  • Mark so that a user can set whether to use or not FF
  • Watch out for vars that aren’t assigned but try to be used.
  • blank out undefined @##.# values with ~
  • some install mechanism?
  • Is there a better name than resetPage?
  • Hugh Sasse: The only other thing I wanted from Perl formats, which was not there, was a means to set the maximum width, and create picture lines computationally, so I could decide I wanted this and that on the left, such and such on the right, and *the rest* (the middle) filled out with some data without having to bang away on the < key for ages, hoping I got the width right.

    I think an extra line will be useful here, between the vars and the picture line

  • Fix variable sized bottoms better. I’m not sure if this is possible. You could try computing it first but this would cause trouble if it depends upon the body format. I’m currently planning to just live with fixed sized bottoms.
  • The solution to this is probably to buffer the changes to the binding until you know they will work.

Thanks go to

Hugh Sasse for his enlightening comments and suggestions. He has been incredibly helpful in making this package usable. Amos Gouaux has also been helpful with suggestions and code. Thanks to both of you.

[Validate]