My answer will be non Java based since this is a classic example of a problem that can be solved in a much, much easier manner.
All you need is the tool grep. If you're on Windows, you can find it here.
Assuming your logs are in file log.txt, solution to your problem is a one liner:
grep -hE --before-context 1 "^DB2[0-9]+E" log.txt > filtered.txt
Explanation:
-h - don't print file name
-E - regular expression search
--before-context 1 - this will print one line before found error message (this will work if all your SQL queries are in one line)
^DB2[0-9]+E - search for lines that begin with "DB2", have some numbers and end with "E"
Above expression will print every line that you need in a new file called filtered.txt.
Update: after some fumbling around, I managed to get what's needed using only standard *nix utilities. Beware, it's not pretty. The final expression:
grep -nE "^DB2[0-9]+" log.txt | cut -f 1 -d " " | gawk "/E$/{y=$0;print x, y};{x=$0}" | sed -e "s/:DB2[[:digit:]]\+[IE]//g" | gawk "{print \"sed -n \\\"\" $1+1 \",\" $2 \"p\\\" log.txt \"}" | sed -e "s/$/ >> filtered.txt/g" > run.bat
Explanation:
grep -nE "^DB2[0-9]+" log.txt - prints lines that begin with DB2... and their line number at beginning. Example:
6:DB20000I The SQL command completed successfully.
12:DB21034E The command was processed as an SQL statement because it was not a valid Command Line Processor command.
19:DB21034E The command was processed as an SQL statement because it was not a valid Command Line Processor command.
26:DB21034E The command was processed as an SQL statement because it was not a valid Command Line Processor command.
34:DB20000I The SQL command completed successfully.
41:DB20000I The SQL command completed successfully.
47:DB21034E The command was processed as an SQL statement because it was not a valid Command Line Processor command.
54:DB20000I The SQL command completed successfully.
cut -f 1 -d " " - prints only the "first column", that is, removes everything after error message. Example:
6:DB20000I
12:DB21034E
19:DB21034E
26:DB21034E
34:DB20000I
41:DB20000I
47:DB21034E
54:DB20000I
gawk "/E$/{y=$0;print x, y};{x=$0}" - for every line that ends with "E" (an error line), print the line before it and then the error line. Example:
6:DB20000I 12:DB21034E
12:DB21034E 19:DB21034E
19:DB21034E 26:DB21034E
41:DB20000I 47:DB21034E
sed -e "s/:DB2[[:digit:]]\+[IE]//g" - removes colon and the error message, leaving only line numbers. Example:
6 12
12 19
19 26
41 47
gawk "{print \"sed -n \\\"\" $1+1 \",\" $2 \"p\\\" log.txt \"}" - formats above lines for sed processing and increments first line number by one. Example:
sed -n "7,12p" log.txt
sed -n "13,19p" log.txt
sed -n "20,26p" log.txt
sed -n "42,47p" log.txt
sed -e "s/$/ >> filtered.txt/g" - appends >> filtered.txt to lines, for appending to final output file. Example:
sed -n "7,12p" log.txt >> filtered.txt
sed -n "13,19p" log.txt >> filtered.txt
sed -n "20,26p" log.txt >> filtered.txt
sed -n "42,47p" log.txt >> filtered.txt
> run.bat - finally, prints the last lines to a batch file named run.bat
After you execute this file, content you wanted will appear in filtered.txt.
Update 2:
Here is another version that works on Ubuntu (previous version was written on Windows):
grep -nE "^DB2[0-9]+" log.txt | cut -f 1 -d " " | gawk '/E/{y=$0;print x, y};{x=$0}' | sed -e "s/:DB2[[:digit:]]\+[IE]//g" | gawk '{print "sed -n \""$1+1" ,"$2 "p\" log.txt" }' | sed -e "s/$/ >> filtered.txt/g" > run.sh
Two things were not working with previous version:
- for some reason,
gawk '/E$/' wasn't working (it didn't recognize that E is on end of line), so I just put /E/ since E won't be found anywhere else.
- quoting,
" were converted to ' for gawk since it doesn't like double quotes; afterwards, quoting inside the last gawk expression was modified
DB? (And does that line always start withDB?)CREATE.