How to parse a string in python

Question

Without any 3rd party libraries (such as beautiful soup) what is the cleanest way to parse a string in PYTHON.

Given the text below I'd like the content of "uber_token" be parsed out ie. "123456789"

....

<form id="blah" action="/p-submi.html" method="post"><input type="hidden" id="" name="uber_token" value="123456789"/><div class="container-info">

....

Thanks!

do you need to tokenize all the elements and attributes or simply extract the value="XXX" part? If its just the latter, use a regex. — Matt Coubrough
– Matt Coubrough, Commented Jun 26, 2014 at 4:18
just need the value="xxx". But there are multiple instances of value="**" which may have a different associated name. — user1144251
– user1144251, Commented Jun 26, 2014 at 4:20
If the attributes and their ordering is consistent in every element you can use a regex for that, but why are you averse to using a library? — Matt Coubrough
– Matt Coubrough, Commented Jun 26, 2014 at 4:22
Note that if you need the names that accompany the values too, maybe update your question. — Matt Coubrough
– Matt Coubrough, Commented Jun 26, 2014 at 4:24
If each <input type="hidden" id="" name="uber_token" value="123456789"/> is one per line. Then you can just seatch for name and parse the two quotations after. If its equal to uber_token then find value and parse between the two quotations after. — 1478963
– 1478963, Commented Jun 26, 2014 at 4:58

Suku · Accepted Answer · 2014-06-26 04:24:21Z

2

regular expression is the solution.

use import re

>>> import re
>>> s = '<form id="blah" action="/p-submi.html" method="post"><input type="hidden" id="" name="uber_token" value="123456789"/><div class="container-info"'
>>> regex=re.search(r'name="uber_token" value="([0-9]+)"',s)
>>> print regex.group(1)
123456789

edited Jun 26, 2014 at 4:24

answered Jun 26, 2014 at 4:19

Suku

3,9101 gold badge24 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

leewz · Accepted Answer · 2014-06-26 05:18:22Z

Disclaimer: This answer is for quick-and-dirty scripts, and may lack in robustness and efficiency. Suggestions here should probably not be used for code that survives more than a few hours.

If you're unwilling to learn regex (and you should be willing to learn regex!), you can split on value=". Probably really inefficient, but simple is easier to debug.

values = []

with open('myfile.txt') as infile:
    for line in infile:
        candidates = line.split('value="')
        for s in candidates[1:]: #the first token is not a value
            try: #test if value is a number
                val = int(s.split('"')[0]) 
            except:
                continue
            values.append(val)

If you're specifically looking at HTML or XML, Python has libraries for both.

HTMLParser: https://docs.python.org/2/library/htmlparser.html
ElementTree: https://docs.python.org/2/library/xml.etree.elementtree.html

Then, for example, you can write code to search through the tree for a node with an attribute "name" that has value "uber_token", and get the "value" attribute from it.

Very dumb Python 2 example that doesn't require learning too much about ElementTrees (may need simple corrections):

import xml.etree.ElementTree as ET
tree = ET.parse('myfile.xml')
root = tree.getroot()

values = []

for element in root:
    if element.attrib['name'] == 'uber_token':
        values.append(element.attrib['value'])

omu_negru · Accepted Answer · 2014-06-26 05:51:46Z

0

Python comes with it's own xml parsing module : https://docs.python.org/3.2/library/xml.html?highlight=xml#xml so you don't have to use any third party parsing lib. If you're unwilling or not allowed to use that..... you can always drop to regex , but i'd stay clear of that when it comes to parsing XML

answered Jun 26, 2014 at 5:51

omu_negru

4,7884 gold badges29 silver badges41 bronze badges

Collectives™ on Stack Overflow

How to parse a string in python

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related