3

Disclaimer: I'm a newbie at scripting in perl, this is partially a learning exercise (but still a project for work). Also, I have a much stronger grasp on shell scripting, so my examples will likely be formatted in that mindset (but I would like to create them in perl). Sorry in advance for my verbosity, I want to make sure I am at least marginally clear in getting my point across

I have a text file (a reference guide) that is a Word document converted to text then swapped from Windows to UNIX format in Notepad++. The file is uniform in that each section of the file had the same fields/formatting/tables.

What I have planned to do, in a basic way is grab each section, keyed by unique batch job names and place all of the values into a database (or maybe just an excel file) so all the fields can be searched/edited for each job much easier than in the word file and possibly create a web interface later on.

So what I want to do is grab each section by doing something like:
sed -n '/job_name_1_regex/,/job_name_2_regex/' file.txt --how would this be formatted within a perl script?
(grab the section in total, then break it down further from there)

To read the file in the script I have open FORMAT_FILE, 'test_format.txt'; and then use foreach $line (<FORMAT_FILE>) to parse the file line by line. --is there a better way?

My next problem is that since I converted from a word doc with tables, which looks like:

 Table Heading 1      Table Heading 2
Heading 1/Value 1    Heading 2/Value 1
Heading 1/Value 2    Heading 2/Value 2

but the text file it looks like:

Table Heading 1 
Table Heading 2
Heading 1/Value 1
Heading 1/Value 2
Heading 2/Value 1
Heading 2/Value 2

So I want to have "Heading 1" and "Heading 2" as a columns name and then put the respective values there. I just am not sure how to get the values in relation to the heading from the text file. The values of Heading 1 will always be the line number of Heading 1 plus 2 (Heading 1, Heading 2, Values for heading 1). I know this can be done in awk/sed pretty easily, just not sure how to address it inside a perl script.

---EDIT---
For this I was thinking of doing an array something like:

my @heading1 = ($value1, $value2, etc.)
my @heading2 = ($value1, $value2, etc.)

I just need to be able to associate the correct values and headings together. So that heading1 = the line after heading2 (where the values start). Like saying (in shell):

x=$(grep -n "Heading 1" file.txt | cut -d":" -f1) #gets the line that "Heading 1" is on in the file
(( x = x+2 )) #adds 2 to the line (where the values will start)
#print values from file.txt from the line where they start to the
#last one (I'll figure that out at some point before this)
sed -n "$x,$last_line_of_values p" file.txt

This is super-hacked together for the moment, to try to elaborate what I want to do...let me know if it clears it up a little...
---/EDIT---

After I have all the right values and such, linking it up to a database may be an issue as well, I haven't started looking at the way perl interacts with DBs yet.

Sorry if this is a bit scatterbrained...it's still not fully formed in my head.

2 Answers 2

3

http://perlmeme.org/tutorials/connect_to_db.html

#!/usr/bin/perl
use strict;
use warnings;
use DBI;

my $driver = "mysql";   # Database driver type
my $database = "test";  # Database name
my $user = "";          # Database user name
my $password = "";      # Database user password

my $dbh = DBI->connect(
    "DBI:$driver:$database",
    $user, $password,
    {
        RaiseError => 1,
        PrintError => 1,
    }
) or die $DBI::errstr;

my $sth = $dbh->prepare("
        INSERT INTO test 
                    (col1, col2)
             VALUES (?, ?)
    ") or die $dbh->errstr;

my $intable = 0;
open my $file, "file.txt" or die "can't open file $!";
while (<$file>)  {
  if (/job_name_1_regex/../job_name_2_regex/) { # job 1 section
    $intable = 1 if /Table Heading 1/; # table start
    if ($intable) {
      my $next_line = <$file>; # heading 2 line
      chomp; chomp $next_line;
      $sth->execute($_, $next_line) or die $dbh->errstr;
    }
  }
}
close $file or die "can't close file $!";
$dbh->disconnect;
Sign up to request clarification or add additional context in comments.

2 Comments

Awesome, that DB connection process much clearer...can you just explain what the line 'chomp; chomp $next_line;' does exactly, just trying to get a good handle on everything and why certain things are done.
@Sean: chomp removes trailing $/ (newline as a rule) from the string (if none given then it works on $_ variable).
2

Several things in this post... First, the basic "best practices" :

  1. use modern perl. start your scripts with

    use strict; use warnings;

  2. don't use global filehandles, use lexical filehandles (declare them in a variable).

  3. always check "open" for return values.

    open my $file, "/some/file" or die "can't open file : $!"

Then, about pattern matching : I don't understand your example at all but I suppose you want something like :

foreach my $line ( <$file> ) {
    if ( $line =~ /regexp1/) { 
    # do something...
    }

}

Edit : about table, I suppose the best thing is to build two arrays, one for each column. If I understand correctly when reading the file you need to split the line and put one part in the @col1 array, and the second part in the @col2 array. The clear and easy way is to use two temporary variables :

my ( $val1, $val2 ) = split /\s+/, $line;
push @col1, $val1;
push @col2, $val2;

1 Comment

Thanks waz, I updated the piece about the tables trying to better explain it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.