xml parsing in python

Question

my xml code fetched over network looks like this

<?xml version='1.0' ?><liverequestresponse><liverequesttime>180</liverequesttime><livemessage></livemessage></liverequestresponse>

and my python minidom code is

import urllib, urllib2, time
from xml.dom.minidom import parse
response = urllib2.urlopen(req)
the_page = response.read() 
#print the_page 
dom = parse(response)
name = dom.getElementsByTagNameNS('liverequestresponse')
print name[0].nodeValue

gives some errors

print the_page

works fine

Or if they are any other libraries which are better than minidom, plz tell me.. I would prefer the one which comes pre-installed on linux

UPDATE

errors

Traceback (most recent call last):
  File "logout.py", line 18, in <module>
    dom = parse(response)
  File "/usr/lib64/python2.7/xml/dom/minidom.py", line 1920, in parse
    return expatbuilder.parse(file)
  File "/usr/lib64/python2.7/xml/dom/expatbuilder.py", line 928, in parse
    result = builder.parseFile(file)
  File "/usr/lib64/python2.7/xml/dom/expatbuilder.py", line 211, in parseFile
    parser.Parse("", True)
xml.parsers.expat.ExpatError: no element found: line 1, column 0

mata · Accepted Answer · 2012-05-30 21:35:43Z

3

if you use response.read before parse(response) you'll already have read the content of the response. a second call to response.read (which parse is doing) will result in an empty string.

The simplest solution is to just drop the first response.read call. But if you really need the response string for some reason, you could try:

import urllib, urllib2, time
import StringIO
from xml.dom.minidom import parse
response = urllib2.urlopen(req)
the_page = response.read() 
#print the_page 
dom = parse(StringIO.StringIO(the_page))
name = dom.getElementsByTagName('liverequesttime')
text = name[0].firstChild
print text.nodeValue

edited May 30, 2012 at 21:35

answered May 30, 2012 at 21:16

mata

69.3k10 gold badges168 silver badges162 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

pahnin Over a year ago

it prints none!, I tried dropping response.read too.. its not that important so I commented it and run the script the outout was none

mata Over a year ago

it prints None because the liverequestresponse node has no value. It only contains a child node, which contains a text node which has a value. minidom isn't the most userfriendly xml parsing library. lxml is way better, or also xml.etree is nicer.

pahnin Over a year ago

That worked, I tried with childnode but it didnt worked! Thanks

Diego Navarro · Accepted Answer · 2012-05-30 21:28:17Z

1

An approach with lxml, which is being very used in Python lately to parse XML with very good results and performance:

import urllib2
from lxml import etree

with urllib2.urlopen(req) as f:
    xml = etree.parse(f)

xml.find('.//liverequesttime').text

The output of the last line would be: 180

edited May 30, 2012 at 21:28

answered May 30, 2012 at 21:23

Diego Navarro

9,7323 gold badges31 silver badges33 bronze badges

7 Comments

pahnin Over a year ago

lxml has to be installed are there any inbuilt libraries which are better than minidom?

Diego Navarro Over a year ago

lxml needs to be installed but it is pre-packaged on a lot of linux distributions, though you always can install it with easy_install

pahnin Over a year ago

I dont want to take the chances I'm writing a http login client for minimalistic linux, I may have to use this on Arch linux core

Diego Navarro Over a year ago

Well, it seems you have it available with Arch linux in extra repository

pahnin Over a year ago

it is in the repos but my script logs in to a server on intranet after which I can access internet

|

Collectives™ on Stack Overflow

xml parsing in python

2 Answers 2

3 Comments

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related