2

I am trying to write a log parser for log4j. I have a regular expression that works for normal messages but when it gets to a message that throws an exception it will only show whats on the first line and will not match the stack trace.

How would I write a regular expression that can handle Java exceptions spanning multiple lines?

Here is the current regex I am using in java:

^(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(.+)$

Here is a normal log msg:

2012-01-25 20:10:03,480 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: nodeUpdate: example.com:1 clusterResources: memory: 1

Here is an example exception log msg:

2012-01-25 00:03:59,565 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint
java.io.IOException: Inconsistent checkpoint fields.
LV = -1 namespaceID = 1 cTime = 0 ; clusterId = CID-1 ; blockpoolId = BP-
Expecting respectively: -1; 1; 0; CID-1; BP-1
        at org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:111)
        at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:510)
        at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:381)
        at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$2.run(SecondaryNameNode.java:344)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:337)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
        at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:341)
        at java.lang.Thread.run(Thread.java:619)
1
  • 6
    Can you give an example of a Java exception that you want to match? So us regex folk who aren't also java/log4j folk can help you out? :) The only recommendations I can make are to look into the 'DOTALL' regex flag (often 's') which lets . match all characters including \n, and the 'MULTILINE' regex flag (often 'm') which lets ^ and $ match start/end of line as well as start/end of string. Commented Jan 25, 2012 at 0:00

1 Answer 1

2

This should do it:

(.*\\bERROR\\b.*)\\r?\\n(.*\\r?\\n)*(.*\\bat\\b.*)*(\\d{1,4}\\)\\r?\\n)

I'm making the assumption that you're reading your log file into a CharSequence and passing that to the pattern matcher in Java instead of reading the file line by line.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.