Parsing query parameters in Python

Question

The problem lies somewhere in how I'm parsing and or reassembling urls. I'm losing the ?id=1 and getting ?d=1.

What I am trying to do is have the ability to manipulate and query parameter and reassemble it before sending back out modified. Meaning the dictionaries would be modified than using urlencode(modified_dict) I would reassemble url + query.

Can someone give me a pointer on what I'm doing wrong here.

from urlparse import parse_qs, urlparse , urlsplit
from urllib import urlencode
import os
import sys
import mechanize
from collections import OrderedDict
import urllib2
scrape_post_urls = []
get_inj_tests = []

#check multiple values to  strip out duplicate and useless checks
def parse_url(url):
    parsed = urlparse(url,allow_fragments=False)

    if parsed.query:


        if url not in get_inj_tests:
           get_inj_tests.append(url)
           #print url
           '''get_inj_tests.append(url)
           print url
           #print 'scheme  :', parsed.scheme
           #print 'netloc  :', parsed.netloc
           print 'path    :', parsed.path
           print 'params  :', parsed.params
           print 'query   :', parsed.query
           print 'fragment:', parsed.fragment
           #print 'hostname:', parsed.hostname, '(netloc in lower case)'
           #print 'port    :', parsed.port
           '''
else:
    if url not in scrape_post_urls:
       scrape_post_urls.append(url)
       #print url




def main():
    unparsed_urls = open('in.txt','r')
    for urls in unparsed_urls:
        try:
           parse_url(urls)
        except:
            pass

    print(len(scrape_post_urls))
    print(len(get_inj_tests))
    clean_list = list(OrderedDict.fromkeys(get_inj_tests))
    reaasembled_url = ""
    #print clean_list
    for query_test in clean_list:
        url_object = urlparse(query_test,allow_fragments=False)
        #parse query paramaters
        url = query_test.split("?")[1]
        dicty = {x[0] : x[1] for x in [x.split("=") for x in url[1:].split("&") ]}
        query_pairs = [(k,v) for k,vlist in dicty.iteritems() for v in vlist]
        reaasembled_url = "http://" + str(url_object.netloc) + str(url_object.path) +  '?'
        reaasembled_query = urlencode(query_pairs)
        full_url = reaasembled_url + reaasembled_query
        print dicty




main()

can you share your input, output and expected output

akshat
– akshat

2018-04-27 08:22:12 +00:00
Commented Apr 27, 2018 at 8:22 — akshat
– akshat, Commented Apr 27, 2018 at 8:22

bruno desthuilliers · Accepted Answer · 2018-04-27 08:41:32Z

2

Can someone give me a pointer on what I'm doing wrong here.

Well quite simply you're not using the existing tools:

1/ to parse a query string, use urllib.parse.parse_qsl().

2/ to reassemble the querystring, use urllib.parse.urlencode().

And forget about dicts, querystrings can have multiple values for the same key, ie ?foo=1&foo=2 is perfectly valid.

answered Apr 27, 2018 at 8:41

bruno desthuilliers

78.3k6 gold badges102 silver badges129 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

wabafet Over a year ago

Yes, but even so they would all still be broken up and than modified. Than reassembled with injection parameters Ill display updated version when I'm done, could You explain what You meant further?

maguri · Accepted Answer · 2018-04-27 08:30:47Z

0

first of all, your variable url is a bad name for the params variable and this could create confusion.

>>> url = "https://url.domian.com?id=22&param1=1&param2=2".split("?")[1]
'id=22&param1=1&param2=2'

>>> "https://url.domian.com?id=22&param1=1&param2=2".split("?")[1].split("&")
['id=22', 'param1=1', 'param2=2']

The error is in the url[1:].split("&")

Solution:

>>> dicty = {x[0] : x[1] for x in [x.split("=") for x in url.split("&") ]}
{'id': '22', 'param1': '1', 'param2': '2'}

answered Apr 27, 2018 at 8:30

maguri

4543 silver badges10 bronze badges

5 Comments

bruno desthuilliers Over a year ago

Typical SquaredWheel solution - Python has all you need to properly parse querystrings in it's stdlib, and dicts are the wrong tool here since a querystring can have multiple occurrences of the same key (with different values).

wabafet Over a year ago

Thank you, the point on avoiding a parsing library like parse_qs is to learn the inner workings etc.I gave up on parse_qs last night let me try this answer see where I get. I'll also make sure to avoid confusing var names.

wabafet Over a year ago

{'id': '1\n'} {'id': '1\n'} {'id': '1'} Thanks bro!

wabafet Over a year ago

Also note when I use the query_pairs over dicty on urlencode it doubles the params as its a list and I think urlencode expects a dict. So I passed dicty, perfect now I just need to strip the trailing %0A

wabafet Over a year ago

All set guys, I was able to do what I wanted and manipulate it This can be closed.

Collectives™ on Stack Overflow

Parsing query parameters in Python

2 Answers 2

1 Comment

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related