3

I wish to extract all the tag names and their corresponding data from a multi-purpose xml file. Then save that information into a python dictionary (e.g tag = key, data = value). The catch being the tags names and values are unknown and of unknown quantity.

    <some_root_name>
        <tag_x>bubbles</tag_x>
        <tag_y>car</tag_y>
        <tag...>42</tag...>
    </some_root_name>

I'm using ElementTree and can successfully extract the root tag and can extract values by referencing the tag names, but haven't been able to find a way to simply iterate over the tags and data without referencing a tag name.

Any help would be great.

Thank you.

4 Answers 4

7
from lxml import etree as ET

xmlString = """
    <some_root_name>
        <tag_x>bubbles</tag_x>
        <tag_y>car</tag_y>
        <tag...>42</tag...>
    </some_root_name> """

document = ET.fromstring(xmlString)
for elementtag in document.getiterator():
   print "elementtag name:", elementtag.tag

EDIT: To read from file instead of from string

document = ET.parse("myxmlfile.xml")
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for the reply, that should work well. I am using .xml files (not an xml string). Do I need to convert the file to a string before I can iterate through it? If so, could you please tell me how to do it? StringIO? Thanks again.
from xml.etree should be from lxml.etree, no ?
or use import xml.etree.ElementTree as ET ... unlike lxml, this and its faster C-coded sibling cElementTree comes bundled with Python.
2
>>> import xml.etree.cElementTree as et
>>> xml = """
...    <some_root_name>
...         <tag_x>bubbles</tag_x>
...         <tag_y>car</tag_y>
...         <tag...>42</tag...>
...     </some_root_name>
... """
>>> doc = et.fromstring(xml)
>>> print dict((el.tag, el.text) for el in doc)
{'tag_x': 'bubbles', 'tag_y': 'car', 'tag...': '42'}

If you really want 42 instead of '42', you'll need to work a little harder and less elegantly.

Comments

1

You could use xml.sax.handler to parse the XML:

import xml.sax as sax
import xml.sax.handler as saxhandler
import pprint

class TagParser(saxhandler.ContentHandler):
    # http://docs.python.org/library/xml.sax.handler.html#contenthandler-objects
    def __init__(self):
        self.tags = {}
    def startElement(self, name, attrs):
        self.tag = name
    def endElement(self, name):
        if self.tag:
            self.tags[self.tag] = self.data
            self.tag = None
            self.data = None
    def characters(self, content):
        self.data = content

parser = TagParser()
src = '''\
<some_root_name>
    <tag_x>bubbles</tag_x>
    <tag_y>car</tag_y>
    <tag...>42</tag...>
</some_root_name>'''
sax.parseString(src, parser)
pprint.pprint(parser.tags)

yields

{u'tag...': u'42', u'tag_x': u'bubbles', u'tag_y': u'car'}

3 Comments

Thanks for the reply, I'm not familiar with xml.sax. Is it possible to get an output that is more like {'tag_x:bubbles','tag_y:car','tag...:42'}?
@Markus: Of course it is. unutbu didn't read your question properly. You should be able to initialise self.tags as a dict and change the self.tags.append line to what you want.
@JohnMachin Ok, that's pretty straight forward. Thanks for all your answers John.
0

This could be done using lxml in python

from lxml import etree

myxml = """
          <root>
             value
          </root> """

doc = etree.XML(myxml)

d = {}
for element in doc.iter():
      key = element.tag
      value = element.text
      d[key] = value

print d

2 Comments

Another great answer and it looks a bit more compact, thank you. The same question I asked Kristofer, do I need to convert the XML file I'm trying to read into a xml string before using iter? Is that easy to do?
-1 It's NOT a great answer. Instead of d= {key:value}, it should have d[key] = value.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.