Friday, May 28, 2010

Perl: Easy way to read a fixed length file

To parse a fixed length string, you can use unpack() function:
#Name(10), Age(2), Sex(6)
#Sample data:
# "John 20MALE  "
# "Mary 22FEMALE"
my $templateformat = "A10A2A5"; # Read unpack() documentations.
my @fields = unpack( $templateformat , "John 20MALE  ");
This is a simple example to read a fixed length file:
#!/usr/bin/perlopen(INFILE, $ARGV[0]);
my $templateformat = "A10A2A5";
while (<INFILE>) {
  my @fields = unpack( $templateformat , $_);
  print "$fields[0] is $fields[1] years old\n";
}
close(INFILE);
To run it:
C:\> readfixedlength.pl sampledata.txt
Output:
John is 20 years old
Mary is 22 years old
You can store the file format in a configuration file. It makes the script much easier to read and maintain. The script keeps the configuration after the __DATA__ token. NOTE: __DATA__ is a token that marks end of script. You can use DATA filehandle the text after it.
#!/usr/bin/perl
my @header_names = ();
my @header_lengths = ();
&load_template(\@header_names, \@header_lengths);
my $templateformat;
foreach (@header_lengths) { 
  $templateformat .= "A" . $_;
}
open(INFILE, $ARGV[0]);
while (<INFILE>) {
  my @fields = unpack( $templateformat , $_);
  print "$fields[0] is $fields[1] years old\n";
}
close(INFILE);
# Load from the template __DATA__ section
sub load_template() {
  # By-ref parameters
  my $ref_header_names = $_[0];
  my $ref_header_lengths = $_[1];
  # Load the file template from __DATA__  
  while (<DATA>) {
    chomp;
    # Skip comment or empty line.
    next if (/^#/ || /^$/);
    my @fields = split(",");
    push(@$ref_header_names, $fields[0]);
    push(@$ref_header_lengths, $fields[1]);
  }
}
__DATA__
FIELD_NAME,10
FIELD_AGE,2
FIELD_SEX,6
It is very simple, right?

No comments:

Post a Comment