Relative python xml node traversal

Question

I am horrified that I can do the below using VBA and not python. I am trying to parse returned xml from an api into a usable format. Based on the below sample of the structure this needs to perform nested looping. The trouble is that the outermost loop is returning a detached Element from the tree so findall or iterfind get nothing and the nested loops abort. I tried this using 3.4.1 and 2.7.8 and get the same results. This makes no sense to me.

import xml.etree.ElementTree as ET
data = """
<root>
    <c1>
        <c2>C2 Value 1</c2>
        <s1>
            <s2> S2 Value 1</s2>
            <p1>
                <p2>P2 Value 1</p2>
            </p1>
            <p1>
                <p2>P2 Value 2</p2>
            </p1>
        </s1>
        <s1>
            <s2> S2 Value 2</s2>
            <p1>
                <p2>P2 Value 3</p2>
            </p1>
        </s1>
    </c1>
</root>
"""
def use_et():
    doc = ET.fromstring(data)
    result = ['','','']
    for c in doc.findall('.//c2'):
        result[0] = c.text
        # nothing here executes
        # c is a detached Element. list(c) = []
        for s in c.findall('..//s2'):
            result[1] = s.text
            for p in s.iterfind('..//p2'):
                result[2] = p.text
                print(','.join(result))
use_et()

There is two 's2' and 'p2' tags, which one do you need? first, or all? — salparadise
– salparadise, Commented Apr 30, 2015 at 6:46

Peter Gibson · Accepted Answer · 2015-04-30 06:25:00Z

2

Yes, seems like strange behaviour there from xml.etree. Looks like it works with the third party lxml module though which I believe is faster anyway:

>>> import lxml.etree as ET
>>> doc = ET.fromstring(data)
>>> c = doc.find('.//c2')
>>> c
<Element c2 at 0x10bdc3ef0>
>>> c.findall('..//s2')
[<Element s2 at 0x10bdc8a28>, <Element s2 at 0x10bdc8950>]

answered Apr 30, 2015 at 6:25

Peter Gibson

19.6k7 gold badges63 silver badges65 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Steve Green Over a year ago

Thank you! The exact same code ran as expected using a different import line. I am still taken back that the built in xml.etree module wasn't able to handle that. I did attempt to use the expat processor, but that started getting messy.

salparadise · Accepted Answer · 2015-04-30 07:07:31Z

1

Assuming you are looking for the first value, you can do this without having to loop:

import xml.etree.ElementTree as ET
data = """
<root>
    <c1>
        <c2>C2 Value 1</c2>
        <s1>
            <s2> S2 Value 1</s2>
            <p1>
                <p2>P2 Value 1</p2>
            </p1>
            <p1>
                <p2>P2 Value 2</p2>
            </p1>
        </s1>
        <s1>
            <s2> S2 Value 2</s2>
            <p1>
                <p2>P2 Value 3</p2>
            </p1>
        </s1>
    </c1>
</root>
"""
doc = ET.fromstring(data)
print ','.join(doc.findtext(_) for _ in ['.//c2', './/c2/../s1/s2', './/c2/../s1/p1/p2'])

result:

C2 Value 1, S2 Value 1,P2 Value 1

+1 on the other post recommending lxml, much better xpath support if you need something more advanced.

answered Apr 30, 2015 at 7:07

salparadise

5,8751 gold badge28 silver badges32 bronze badges

1 Comment

Steve Green Over a year ago

Unfortunately this xml is returned from an API so there is no guarantee how many of what will appear where. Only the basic structure for how they are nested is consistent so the dynamic looping is needed. Thank you for the reply though.

Collectives™ on Stack Overflow

Relative python xml node traversal

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related