3

So first the string

'<?xml version="1.0" encoding="UTF-8"?><metalink version="3.0" xmlns="http://www.metalinker.org/" xmlns:lcgdm="LCGDM:" generator="lcgdm-dav" pubdate="Fri, 11 Oct 2013 12:46:10 GMT"><files><file name="/lhcb/L"><size>173272912</size><resources><url type="https">https://test-kit.test.de:2880/pnfs/test.file</url><url type="https">https://test.grid.sara.nl:2882/pnfs/test.file</url></resources></file></files></metalink>'

What I want to extract is the url text. Following code works but has flaws because it's hard coded:

root = ET.fromstring( xml_string )
for entry in root[0][0][1].iter():
  print entry.text

So this only works if the xml structure is the same. I tried to use xpath but I never got it working or with tags. I never got any results.

Is it a problem with the format of the xml string or am I doing something wrong?

2 Answers 2

3

You can use xpath (and findall function of Node) to get the urls , but since you have used xmlns="http://www.metalinker.org/" for the root element, you will need to use that xmlns in the xpath as well.

Example -

>>> root = fromstring(xml_string)
>>> urls = root.findall('.//{http://www.metalinker.org/}url')
>>> for url in urls:
...     print(url.text)
...
https://test-kit.test.de:2880/pnfs/test.file
https://test.grid.sara.nl:2882/pnfs/test.file

The above xpath will find all urls in the xml.

Sign up to request clarification or add additional context in comments.

2 Comments

Damn you, you were 30 seconds faster :)
Thanks to both to you!
3

You used namespaces, so you need to use them in XPath:

for entry in root.findall('.//{http://www.metalinker.org/}url'):
    print entry.text

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.