2

I am horrified that I can do the below using VBA and not python. I am trying to parse returned xml from an api into a usable format. Based on the below sample of the structure this needs to perform nested looping. The trouble is that the outermost loop is returning a detached Element from the tree so findall or iterfind get nothing and the nested loops abort. I tried this using 3.4.1 and 2.7.8 and get the same results. This makes no sense to me.

import xml.etree.ElementTree as ET
data = """
<root>
    <c1>
        <c2>C2 Value 1</c2>
        <s1>
            <s2> S2 Value 1</s2>
            <p1>
                <p2>P2 Value 1</p2>
            </p1>
            <p1>
                <p2>P2 Value 2</p2>
            </p1>
        </s1>
        <s1>
            <s2> S2 Value 2</s2>
            <p1>
                <p2>P2 Value 3</p2>
            </p1>
        </s1>
    </c1>
</root>
"""
def use_et():
    doc = ET.fromstring(data)
    result = ['','','']
    for c in doc.findall('.//c2'):
        result[0] = c.text
        # nothing here executes
        # c is a detached Element. list(c) = []
        for s in c.findall('..//s2'):
            result[1] = s.text
            for p in s.iterfind('..//p2'):
                result[2] = p.text
                print(','.join(result))
use_et()
1
  • There is two 's2' and 'p2' tags, which one do you need? first, or all? Commented Apr 30, 2015 at 6:46

2 Answers 2

2

Yes, seems like strange behaviour there from xml.etree. Looks like it works with the third party lxml module though which I believe is faster anyway:

>>> import lxml.etree as ET
>>> doc = ET.fromstring(data)
>>> c = doc.find('.//c2')
>>> c
<Element c2 at 0x10bdc3ef0>
>>> c.findall('..//s2')
[<Element s2 at 0x10bdc8a28>, <Element s2 at 0x10bdc8950>]
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you! The exact same code ran as expected using a different import line. I am still taken back that the built in xml.etree module wasn't able to handle that. I did attempt to use the expat processor, but that started getting messy.
1

Assuming you are looking for the first value, you can do this without having to loop:

import xml.etree.ElementTree as ET
data = """
<root>
    <c1>
        <c2>C2 Value 1</c2>
        <s1>
            <s2> S2 Value 1</s2>
            <p1>
                <p2>P2 Value 1</p2>
            </p1>
            <p1>
                <p2>P2 Value 2</p2>
            </p1>
        </s1>
        <s1>
            <s2> S2 Value 2</s2>
            <p1>
                <p2>P2 Value 3</p2>
            </p1>
        </s1>
    </c1>
</root>
"""
doc = ET.fromstring(data)
print ','.join(doc.findtext(_) for _ in ['.//c2', './/c2/../s1/s2', './/c2/../s1/p1/p2'])

result:

C2 Value 1, S2 Value 1,P2 Value 1

+1 on the other post recommending lxml, much better xpath support if you need something more advanced.

1 Comment

Unfortunately this xml is returned from an API so there is no guarantee how many of what will appear where. Only the basic structure for how they are nested is consistent so the dynamic looping is needed. Thank you for the reply though.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.