0

I have this XML file

<Source>
    <segment1>
        <userRefNumber>test1</userRefNumber>
        <subscriber>
            <industryCode>ZZZZZ</industryCode>
            <memberCode>12345</memberCode>
            <inquirySubscriberPrefixCode>0622</inquirySubscriberPrefixCode>
        </subscriber>
        <options>
            <country>us</country>
            <language>en</language>
        </options>
        <tracking>
            <transactionTimeStamp>2021-02-25T04:09:30.508-06:00</transactionTimeStamp>
        </tracking>
    </segment1>
    <example2>
        <subscriber>
            <industryCode>ZAAAA</industryCode>
            <memberCode>999999</memberCode>
            <inquirySubscriberPrefixCode>0622</inquirySubscriberPrefixCode>
        </subscriber>
        <options>
            <country>us</country>
            <language>en</language>
        </options>
        <tracking>
            <transactionTimeStamp>2020-02-25T04:09:30.508-06:00</transactionTimeStamp>
        </tracking>
    </example2>
</Source>

I'd like to create, through Python, two XML files, that is one for each child:

  • one xml file for segment1
  • one xml file for segment2

xml1 should look like this:

<Source>
    <segment1>
        <userRefNumber>test1</userRefNumber>
        <subscriber>
            <industryCode>ZZZZZ</industryCode>
            <memberCode>12345</memberCode>
            <inquirySubscriberPrefixCode>0622</inquirySubscriberPrefixCode>
        </subscriber>
        <options>
            <country>us</country>
            <language>en</language>
        </options>
        <tracking>
            <transactionTimeStamp>2021-02-25T04:09:30.508-06:00</transactionTimeStamp>
        </tracking>
    </segment1>
</Source>

and xml2 should look like this:

<Source>
    <example2>
        <subscriber>
            <industryCode>ZAAAA</industryCode>
            <memberCode>999999</memberCode>
            <inquirySubscriberPrefixCode>0622</inquirySubscriberPrefixCode>
        </subscriber>
        <options>
            <country>us</country>
            <language>en</language>
        </options>
        <tracking>
            <transactionTimeStamp>2020-02-25T04:09:30.508-06:00</transactionTimeStamp>
        </tracking>
    </example2>
</Source>

The splitting criteria is this: to create one separate xml file for each element. In the example above there are two elements (segment1 and example2): therefore i'd like to create two xml files for each one of those.

I have checked this answer Split one large .xml file in more .xml files (python) but in that example the children have the same name so I guess the findall function doesn't apply to my case, as the children have different names (segment1 and segment2). Is it possible to create a single xml file based on the order of the elements from the root?

8
  • 1
    How should xm1 & xml2 look like - please update the post with the requested output. Commented Sep 30, 2021 at 9:05
  • 1
    Are you aware that in xml1 you have document & version and in xml2 not? Commented Sep 30, 2021 at 9:12
  • yes.. I don't need to write them out. Commented Sep 30, 2021 at 9:21
  • 1
    so you need them once or you dont need them at all ? Commented Sep 30, 2021 at 9:23
  • I don't need them at all Commented Sep 30, 2021 at 9:27

1 Answer 1

1

The below seems to work. The main point here is to loop over the elements and chack which one of them start with segment

import xml.etree.ElementTree as ET

xml = '''<Source>
    <document>response</document>
    <version>2.0</version>
    <segment1>
        <userRefNumber>test1</userRefNumber>
        <subscriber>
            <industryCode>ZZZZZ</industryCode>
            <memberCode>12345</memberCode>
            <inquirySubscriberPrefixCode>0622</inquirySubscriberPrefixCode>
        </subscriber>
        <options>
            <country>us</country>
            <language>en</language>
        </options>
        <tracking>
            <transactionTimeStamp>2021-02-25T04:09:30.508-06:00</transactionTimeStamp>
        </tracking>
    </segment1>
    <segment2>
        <userRefNumber>test2</userRefNumber>
        <subscriber>
            <industryCode>ZAAAA</industryCode>
            <memberCode>999999</memberCode>
            <inquirySubscriberPrefixCode>0622</inquirySubscriberPrefixCode>
        </subscriber>
        <options>
            <country>us</country>
            <language>en</language>
        </options>
        <tracking>
            <transactionTimeStamp>2020-02-25T04:09:30.508-06:00</transactionTimeStamp>
        </tracking>
    </segment2>
</Source>'''

root = ET.fromstring(xml)
counter = 1
for child in list(root):
    if child.tag.startswith('segment'):
        src = ET.Element('Source')
        src.append(child)
        with open(f'out_{counter}.xml','w') as f:
            tree = ET.ElementTree(src)
            tree.write(f,encoding="unicode")
        counter += 1
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks. Is there another option instead of startswith? for example: is there something like "contains"? In my file there will not always be "segment"
Come with a concrete example please. In general you can use : 'segment' in child.tag
I have update the post with an example
Are you aware to 1) You have posted a new example as answer 2) You have changed the basic conditions of your original answer.
Sorry about that. I have updated the question in the original post.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.