1

So I created a simple server on an Ubuntu box on port 8000 by doing this:

python -m SimpleHTTPServer

10.127.11.18 - - [14/Aug/2014 15:11:55] "GET / HTTP/1.1" 200 -
10.127.11.18 - - [14/Aug/2014 15:11:55] code 404, message File not found
10.127.11.18 - - [14/Aug/2014 15:11:55] "GET /favicon.ico HTTP/1.1" 404 -
10.127.11.18 - - [14/Aug/2014 15:12:02] "GET /crazysean/ HTTP/1.1" 200 -
10.127.11.18 - - [14/Aug/2014 15:12:37] "GET /crazysean/ HTTP/1.1" 200 -
10.127.11.18 - - [14/Aug/2014 15:12:52] "GET /crazysean/?url=www.google.com&x=200&y=400 HTTP/1.1" 301 -
10.127.11.18 - - [14/Aug/2014 15:12:52] "GET /crazysean/?url=www.google.com&x=200&y=400/ HTTP/1.1" 200 -
10.127.11.18 - - [14/Aug/2014 15:13:10] "GET /crazysean/?url=www.google.com&x=200&y=400/ HTTP/1.1" 200 -

I am trying to parse out the GET data that is sent, such as URL, x position and y position.

I assume my first step should be creating a new script like so:

import SimpleHTTPServer
import SocketServer

PORT = 8000

Handler = SimpleHTTPServer.SimpleHTTPRequestHandler

httpd = SocketServer.TCPServer(("", PORT), Handler)

print "serving at port", PORT
httpd.serve_forever()

But I am unsure of how to make amendments to this script to capture the GET data, because eventually I want to dump the data into a sqlite3 db.

10
  • Are you trying to save the parsed logs or are you trying to do something else? Commented Aug 14, 2014 at 19:28
  • @slipjack Saving the logs would be a good start I suppose. Commented Aug 14, 2014 at 19:29
  • @slipjack I guess a better way to put what I am looking for is I want to parse the GET requests as they come in. Commented Aug 14, 2014 at 19:33
  • You would greatly benefit from using a higher level library like flask. Commented Aug 14, 2014 at 19:33
  • 2
    The actual command like is GET HTTP/1.1 /crazysean/?url=www.google.com&x=200&y=400. You can get the pieces of that by subclassing the handler. If you want the rest of the log information, like the 301, that's not part of the request; that's the handler explaining what it did about the request. Commented Aug 14, 2014 at 19:52

1 Answer 1

5

I think this is an XY problem. You have no interest in parsing GET requests, or doing anything with the logs; what you want is to "capture the GET data", in exactly the same way that SimpleHTTPServer is using that data to serve the requests, so you can store it in a database. And you just thought the only way to do that was to parse something, somewhere, but you weren't sure what.

Clearly, SimpleHTTPServer must already be parsing the GET data, and must have exactly what you want available. So, where is it?

As the docs say right at the top:

A lot of the work, such as parsing the request, is done by the base class BaseHTTPServer.BaseHTTPRequestHandler. This class implements the do_GET() and do_HEAD() functions.

Follow that link, and you'll see:

The handler will parse the request and the headers, then call a method specific to the request type. The method name is constructed from the request. For example, for the request method SPAM, the do_SPAM() method will be called with no arguments. All of the relevant information is stored in instance variables of the handler…

So, everything has been parsed into instance variables; there's a nice list of them below that paragraph below.

So:

class DBLoggingHandler(SimpleHTTPServer.SimpleHTTPRequestHandler):
    def __init__(self, *args, **kwargs):
        super(DBLoggingHandler, self).__init__(*args, **kwargs)
        self.db = sqlite3.connect(dbpath)
    def do_GET(self):
        self.db.execute("INSERT INTO GetLog (command, vers, path) VALUES (?, ?, ?)",
                        (self.command, self.request_version, self.path))
        return super(DBLoggingHandler, self).do_GET()

If you want to parse the path into separate components, you can use urlparse for that:

    def do_GET(self):
        bits = urlparse.urlpase(self.path)
        self.db.execute("""INSERT INTO GetLog (command, vers, scheme, netloc, 
                                               path, params, query, fragment,
                                               username, password, hostname, port)
                           VALUES(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
                        (self.command, self.request_version, bits.scheme, bits.netloc,
                         bits.path, bits.params, bits.query, bits.fragment,
                         bits.username, bits.password, bits.hostname, bits.port))
        return super(DBLoggingHandler, self).do_GET()

Also, remember that requests can have more than just a command line; they usually have headers, and they may have a body (although usually not for GET). See headers and rfile for that. And for information that isn't part of the HTTP request, but part of the socket connection, or information about the server, etc., there are attributes for that too.

Sign up to request clarification or add additional context in comments.

1 Comment

This is a great answer, thank you very much. I've never used a class before, so I have some researching to do.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.