3

I'm writing a script to search a logfile for a given python regex pattern. Setting aside the fact that this would be much easier to do using a simple Bash script, can it be done in Python? Here's what I've run into:

Assumptions:

  • I'm trying to analyze the file /var/log/auth.log
    • (for the sake of simplicity, I'm omitting the ability to choose a file.)
  • the name of my cli module is logscour.
  • for the sake of argument, logscour takes only one arg called regex_in.

Intended usage:

[root@localhost]: # logscour '\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}'

Should return the lines inside of /var/log/auth.log that contain an IPv4 address.

I want to find a sort of anti-re.escape(), as I am in backslash-hell. Here's a snippet:

import re
import argparse

def main(regex_in, logfile='/var/log/auth.log'):
    ## Herein lies the problem!
    # user_regex_string = re.escape(regex_in) #<---DOESN'T WORK, EVEN MORE ESCAPE-SLASHES
    # user_regex_string = r'{}'.format(regex_in) #<---DOESN'T WORK
    user_regex_string = regex_in                 #<---DOESN'T WORK EITHER GAHHH
    
    with open(logfile, 'rb+') as authlog:
        for aline in authlog:
            if re.match(user_regex_string, aline):
                print aline

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument("regex_in", nargs="?", help="enter a python-compliant regex string. Parentheses & matching groups not supported.", default=None)
    
    args = parser.parse_args()
    if not args.regex_in:
        raise argparse.ArgumentError('regex_in', message="you must supply a regex string")
    main(args.regex_in)

This is giving me back nothing, as one would expect due to the fact that I'm using Python2.7 and these are bytestrings I'm dealing with.

Does anyone know a way to convert 'foo' to r'foo', or an "opposite" for re.escape()?

5
  • 1
    You shouldn't have to do anything; the shell string is properly quoted and there are no Python string literals involved. Does each line of your log file start with an IP address? re.match implicitly anchors the regex to the start of the line. You might want re.search instead. Commented May 30, 2017 at 18:13
  • 1
    Also, why are you opening a text file in binary mode? Commented May 30, 2017 at 18:14
  • 'foo' and r'foo' are the same thing. The purpose of the r'' prefix isn't to turn a string into a regex; it's to keep the Python interpreter from treating escape sequences like \n inside the string specially and instead pass them through raw. Commented May 30, 2017 at 18:16
  • @chepner, this a OS-dependent module for a larger, OS-independent log-auditing package I'm building. I open everything in binary-plus mode for consistency & to fit my assertion-modules & loggers...which I guess you could call meta-loggers lol. Consequently, I also don't want to rely on a particular shell-command being present, so no sh or subprocess allowed. : [ Commented May 30, 2017 at 18:38
  • Also, my shell was including the ''s from my input arg. I removed those & @Eric Dunhill's advice worked. Commented May 30, 2017 at 18:40

1 Answer 1

3
user_regex_string = re.compile(regex_in)

and

re.search(user_regex_string, aline)

should work fine. You need re.search instead of re.match because the IP address isn't necessarily at the start of a line.

I always find re.match very convenient in order to introduce subtle bugs in my code. :)

On my server, logscour '\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}' outputs:

May 28 17:38:53 dmzXX sshd[1736]: Received disconnect from 123.200.20.158: 11: Bye Bye [preauth]
May 28 17:38:54 dmzXX sshd[1738]: Invalid user guest from 123.200.20.158
...

That being said grep -P 'pattern' file would also work:

grep -P "\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}" /var/log/auth.log

-P stands for:

   -P, --perl-regexp
          Interpret PATTERN as a Perl regular expression (PCRE, see below).  This is highly experimental and  grep  -P  may  warn  of unimplemented features.

-P is needed in order to interpret \d as [0-9]

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for this! For the first time in my life, re.compile() is actually useful. Also, (after I stripped out the single-quotes that came with the regex, your note about re.search() vs re.match() fixed my problem.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.