Find and Replace CDATA Attribute Values in XML - Python

Question

I am attempting to demonstrate functionality for finding/replacing XML attributes, similar to that posed in a related question (Find and Replace XML Attributes by Indexing - Python), but for content contained within a CDATA string. Specifically, I would like to know if it is possible to find and replace CDATA attribute values with new values via indexing. I am attempting to replace the first and second attribute values within the first set of 'td' subelements, and also the second and third attribute values for the second set of 'td' subelements. Below is the XML, along with the script I am using and the new values to be added to the desired output XML:

The XML ("foo_bar_CDATA.xml"):

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2">
<Overlay>
    <description>
    <![CDATA[
    <html>
    <head>
        <body>
            <div id="view">
                <div class="item">
                    <tr id="source">
                        <td class="raster">Source</td>
                        <td class="number">1800</td>
                        <td class="number">2100</td>
                    </tr>
                    <tr id="preview">
                        <td class="raster">Preview</td>
                        <td class="number">1100</td>
                        <td class="number">1500</td>
                    </tr>
                </div>
            </div>
        </body>
    </head>
    </html>
    ]]>
    </description>   
</Overlay></kml>

The script:

import lxml.etree as ET
xml = ET.parse("C:\\Users\\mdl518\\Desktop\\bar_foo_CDATA.xml")
tree=xml.getroot().getchildren()[0][1]

val_1 = 1900
val_2 = 2000
val_3 = 3000
val_4 = 4000

# Find and replace the "td" subelement attribute values with the new values (val_"x") 
for elem in tree.getiterator():
    if elem.text:
        elem.text=elem.text.replace('Source',val_1)
    if elem.text:
        elem.text=elem.text.replace('1800',val_2)
    if elem.text:
        elem.text=elem.text.replace('1100',val_3)
    if elem.text:
        elem.text=elem.text.replace('1500',val_4)
    print(elem.text)

    output = ET.tostring(tree, 
                 encoding="UTF-8",
                 method="xml", 
                 xml_declaration=True, 
                 pretty_print=True)

    print(output.decode("utf-8"))

The Desired Output XML:

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2">
<Overlay>
    <description>
    <![CDATA[
    <html>
    <head>
        <body>
            <div id="view">
                <div class="item">
                    <tr id="source">
                        <td class="raster">1900</td>
                        <td class="number">2000</td>
                        <td class="number">2100</td>
                    </tr>
                    <tr id="preview">
                        <td class="raster">Preview</td>
                        <td class="number">3000</td>
                        <td class="number">4000</td>
                    </tr>
                </div>
            </div>
        </body>
    </head>
    </html>
    ]]>
    </description>   
</Overlay></kml>

My main issue is correctly indexing/reading the attributes vs. hard-coding the desired values, as indexing them properly to find/replace with new values would be ideal. The above approach appears viable for XMLs without CDATA strings, but I cannot determine how to correctly parse the CDATA content, including properly writing of the XML to a file. Additionally, the opening and closing tags (<, >) are being incorrectly written as &gt and &lt within the XML. Any assistance is most appreciated!

Jack Fleeting · Accepted Answer · 2021-02-16 13:16:18Z

1

Since the CDATA is an HTML string, I would extract it out of the XML, make changes to it and then reinsert it in the xml:

#first edit
cd = etree.fromstring(doc.xpath('//*[local-name()="description"]')[0].text) #out of the XML

vals = ["1900","2000","3000","4000"]
rems = ["Source","1800","1100","1500"]
targets = cd.xpath('//tr//td')
for target in targets:
    if target.text in rems:
        target.text=vals[rems.index(target.text)]
#second edit
doc.xpath('//*[local-name()="description"]')[0].text = etree.CDATA(etree.tostring(cd)) #... and back into the XML as CDATA
    
print(ET.tostring(tree).decode())

The output should be your expected output.

edited Feb 16, 2021 at 13:16

answered Feb 16, 2021 at 2:24

Jack Fleeting

25k6 gold badges27 silver badges49 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

mdl518 Over a year ago

Thanks, @Jack, your update works beautifully! I did, however, add one item to my original post and that is pertaining to referencing namespaces...I regretfully neglected to include this unaware of it's significance. I am now attempting to tweak the 'cd' function, referencing a namespace 'ns' function as ns = {'kml': 'opengis.net/kml/2.2'} and cd = ET.fromstring(tree.xpath('//kml:description')[0].text, namespaces=ns), but now get an "IndexError: list index is out of range". I think this just needs a minor tweak and the full solution will be working - Thanks again!!

Jack Fleeting Over a year ago

@mdl518 Ah,the dreaded namespaces... There are a couple of ways of handling them. I used one of them (local-name()) in the edits.

mdl518 Over a year ago

Thanks, @Jack, you are the MAN!! The updated solution is perfect, it even handles the dreaded namespaces no problem! Is there otherwise a way to reference the attribute text values (i.e. the "rems") via indexing as opposed to hard coding them into a list? I will otherwise confirm your updates as the correct solution, many thanks!

Jack Fleeting Over a year ago

@mdl518 Yes, there is, but you should probably post it as a separate question, per SO policy.

Collectives™ on Stack Overflow

Find and Replace CDATA Attribute Values in XML - Python

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related