0

The problem lies somewhere in how I'm parsing and or reassembling urls. I'm losing the ?id=1 and getting ?d=1.

What I am trying to do is have the ability to manipulate and query parameter and reassemble it before sending back out modified. Meaning the dictionaries would be modified than using urlencode(modified_dict) I would reassemble url + query.

Can someone give me a pointer on what I'm doing wrong here.

from urlparse import parse_qs, urlparse , urlsplit
from urllib import urlencode
import os
import sys
import mechanize
from collections import OrderedDict
import urllib2
scrape_post_urls = []
get_inj_tests = []

#check multiple values to  strip out duplicate and useless checks
def parse_url(url):
    parsed = urlparse(url,allow_fragments=False)

    if parsed.query:


        if url not in get_inj_tests:
           get_inj_tests.append(url)
           #print url
           '''get_inj_tests.append(url)
           print url
           #print 'scheme  :', parsed.scheme
           #print 'netloc  :', parsed.netloc
           print 'path    :', parsed.path
           print 'params  :', parsed.params
           print 'query   :', parsed.query
           print 'fragment:', parsed.fragment
           #print 'hostname:', parsed.hostname, '(netloc in lower case)'
           #print 'port    :', parsed.port
           '''
else:
    if url not in scrape_post_urls:
       scrape_post_urls.append(url)
       #print url




def main():
    unparsed_urls = open('in.txt','r')
    for urls in unparsed_urls:
        try:
           parse_url(urls)
        except:
            pass

    print(len(scrape_post_urls))
    print(len(get_inj_tests))
    clean_list = list(OrderedDict.fromkeys(get_inj_tests))
    reaasembled_url = ""
    #print clean_list
    for query_test in clean_list:
        url_object = urlparse(query_test,allow_fragments=False)
        #parse query paramaters
        url = query_test.split("?")[1]
        dicty = {x[0] : x[1] for x in [x.split("=") for x in url[1:].split("&") ]}
        query_pairs = [(k,v) for k,vlist in dicty.iteritems() for v in vlist]
        reaasembled_url = "http://" + str(url_object.netloc) + str(url_object.path) +  '?'
        reaasembled_query = urlencode(query_pairs)
        full_url = reaasembled_url + reaasembled_query
        print dicty




main()
1
  • can you share your input, output and expected output Commented Apr 27, 2018 at 8:22

2 Answers 2

2

Can someone give me a pointer on what I'm doing wrong here.

Well quite simply you're not using the existing tools:

1/ to parse a query string, use urllib.parse.parse_qsl().

2/ to reassemble the querystring, use urllib.parse.urlencode().

And forget about dicts, querystrings can have multiple values for the same key, ie ?foo=1&foo=2 is perfectly valid.

Sign up to request clarification or add additional context in comments.

1 Comment

Yes, but even so they would all still be broken up and than modified. Than reassembled with injection parameters Ill display updated version when I'm done, could You explain what You meant further?
0

first of all, your variable url is a bad name for the params variable and this could create confusion.

>>> url = "https://url.domian.com?id=22&param1=1&param2=2".split("?")[1]
'id=22&param1=1&param2=2'

>>> "https://url.domian.com?id=22&param1=1&param2=2".split("?")[1].split("&")
['id=22', 'param1=1', 'param2=2']

The error is in the url[1:].split("&")

Solution:

>>> dicty = {x[0] : x[1] for x in [x.split("=") for x in url.split("&") ]}
{'id': '22', 'param1': '1', 'param2': '2'}

5 Comments

Typical SquaredWheel solution - Python has all you need to properly parse querystrings in it's stdlib, and dicts are the wrong tool here since a querystring can have multiple occurrences of the same key (with different values).
Thank you, the point on avoiding a parsing library like parse_qs is to learn the inner workings etc.I gave up on parse_qs last night let me try this answer see where I get. I'll also make sure to avoid confusing var names.
{'id': '1\n'} {'id': '1\n'} {'id': '1'} Thanks bro!
Also note when I use the query_pairs over dicty on urlencode it doubles the params as its a list and I think urlencode expects a dict. So I passed dicty, perfect now I just need to strip the trailing %0A
All set guys, I was able to do what I wanted and manipulate it This can be closed.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.