How to parse xml with python

Question

I have these xml code:

<?xml version="1.0" encoding="utf-8"?>
<TAB xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="FOLD.xsd">
    <FOLD SERVER="APPLE" VERSION="520" OPERATIVE_SYSTEM="HPUX" FOLD_NAME="CAR" MODIFIED="False" UPL="20211123135822UTC" FOLD_ORDER_METHOD="SYSTEM" REAL_FOLD_ID="154" TYPE="1" USED_BY_CODE="0">
        <JOB ID="443" APPLICATION="CAR" SUB_APPLICATION="SENDGEST" NAMEJO="SESA" CREATED_BY="USERA" USER="DMMM" CRITICAL="0" TASKTYPE="Dummy" CON="0" MXX="0" MRU="0" WD="0,1,2,3,4,5,6" JAN="1" FEB="1" MAR="1" APR="1" MAY="1" JUN="1" JUL="1" AUG="1" SEP="1" OCT="1" NOV="1" DEC="1" DAYS_AND_OR="O" SHIFT="Ignore Job" SHIFTNUM="+00" SYSDB="1" IND_CYCLIC="S" CREATION_USER="USERA" CREATION_DATE="20190829" CREATION_TIME="172439" CHANGE_USERID="USERA" CHANGE_DATE="20200826" CHANGE_TIME="103905" RULE_BASED_CALENDAR_RELATIONSHIP="O" APPL_TYPE="OS" MULTY_AGENT="N" USE_INSTREAM_JCL="N" VERSION_OPCODE="N" CV="Y" VERSION_SERIAL="5">
            <OUT NAME="SESA-TO-SESB" ODATE="ODAT" SIGN="+" />
        </JOB>
        <JOB ID="444" APPLICATION="CAR" SUB_APPLICATION="SENDGEST" NAMEJO="SESB" CREATED_BY="USERA" USER="TO_CAR_P" CRITICAL="0" TASKTYPE="Job" CYCLIC="1" HOST="AFBFTP" INT="00001M" CON="0" RET="0" MW="0" RR="0" AUTOARCH="1" MXX="0" MRU="0" TIMEFROM="0500" TIMETO="0455" WD="0,1,2,3,4,5,6" JAN="1" FEB="1" MAR="1" APR="1" MAY="1" JUN="1" JUL="1" AUG="1" SEP="1" OCT="1" NOV="1" DEC="1" DAYS_AND_OR="O" SHIFT="Ignore Job" SHIFTNUM="+00" SYSDB="1" IND_CYCLIC="S" CREATION_USER="USERA" CREATION_DATE="20190829" CREATION_TIME="172439" CHANGE_USERID="USERA" CHANGE_DATE="20200826" CHANGE_TIME="103905" RULE_BASED_CALENDAR_RELATIONSHIP="O" APPL_TYPE="FILE_TRANS">
            <VAR NAME="PATH" VALUE="NOTAPPLICABLE" />
            <VAR NAME="ACC" VALUE="TO_CAR_P" />
        </JOB>
    </FOLD>
</TAB>

I'm trying to get VAR NAME PATH (only some jobs have these element) with python but I can't extract. I do:

with open(file1, 'rt') as f:

    tree = ElementTree.parse(f)
for movie in root.iter('JOB.PATH'):
    print(movie.attrib)

Any help please? Thanks

Hai Vu · Accepted Answer · 2021-12-23 13:23:10Z

3

Once you got tree, you can use Xpath notation to search:

for node in tree.iterfind(".//JOB/VAR[@NAME='PATH']"):
    print(node.attrib)

Output:

{'NAME': 'PATH', 'VALUE': 'NOTAPPLICABLE'}

Update:

If you want to filter by both "NAME" and "VALUE" attributes:

for node in tree.iterfind(".//JOB/VAR[@NAME='PATH'][@VALUE='NOTAPPLICABLE']"):
    print(node.attrib)

edited Dec 23, 2021 at 13:23

answered Dec 23, 2021 at 0:47

Hai Vu

41.4k16 gold badges75 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

defekas17 Over a year ago

Thanks! A doubt. If I only would like the value. So in that example only wants "NOTAPPLICABLE". How could I do these?

Hai Vu Over a year ago

Please see my update.

kosciej16 · Accepted Answer · 2021-12-23 00:17:20Z

1

You could search for a VAR nodes and check their PATH

for movie in root.iter("VAR"):
    if movie.attrib["NAME"] == "PATH":
        print("you got me!")

Or using findall

for movie in root.findall(".//VAR/[@NAME='PATH']"):
    print(movie.attrib)

edited Dec 23, 2021 at 0:17

answered Dec 23, 2021 at 0:09

kosciej16

7,2883 gold badges21 silver badges34 bronze badges

1 Comment

Oli Over a year ago

You could also use root.iterfind

Q. Qiao · Accepted Answer · 2021-12-23 00:21:41Z

1

I think beautifulsoup is much easier.

from bs4 import BeautifulSoup

with open(file1, 'rt') as f:
    soup = BeautifulSoup(f, "xml")
for var in soup.find_all("VAR", NAME="PATH"):
    print(var)

answered Dec 23, 2021 at 0:21

Q. Qiao

8477 silver badges19 bronze badges

Collectives™ on Stack Overflow

How to parse xml with python

3 Answers 3

2 Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest