Find and Replace tags in XML using Python

Question

I have proposed a similar question before, but this one is slightly different. I want to find and replace XML tags using python. I am using the XML's to upload as metadata for some GIS shapefiles. In the metadata editor, I have options to choose dates for when certain data is collected. The options are 'single date', 'multiple dates' and 'range of dates'. In the first XML, which contains tags for a range of dates, you will see tags "rngdates" with some subelements 'begdate', 'begtime', 'enddate' and . I want to edit these tags out so that it looks like the second XML which contains multiple single dates. The new tags are 'mdattim', 'sngdate' and 'caldate'. I hope this is clear enough, but please ask for more info if needed. XML is a weird beast, and I'm still not fully understanding it.

Thanks, Mike

First XML:

<idinfo>
  <citation>
    <citeinfo>
       <origin>My Company Name</origin>
       <pubdate>05/04/2009</pubdate>
       <title>Feature Class Name</title>
       <edition>0</edition>
       <geoform>vector digital data</geoform>
       <onlink>.</onlink>
     </citeinfo>
   </citation>
<descript>
  <abstract>This dataset represents the GPS location of inspection points collected in the field for the Site Name</abstract>
  <purpose>This dataset was created to accompany the clients Assessment Plan. This point feature class represents the location within the area that the field crews collected related data.</purpose>
 </descript>
<timeperd>
 <timeinfo>
   <rngdates>
     <begdate>7/13/2010</begdate>
     <begtime>unknown</begtime>
     <enddate>7/15/2010</enddate>
     <endtime>unknown</endtime>
    </rngdates>
 </timeinfo>
 <current>ground condition</current>
</timeperd>

Second XML:

<idinfo>
  <citation>
    <citeinfo>
      <origin>My Company Name</origin>
      <pubdate>03/07/2011</pubdate>
      <title>Feature Class Name</title>
      <edition>0</edition>
      <geoform>vector digital data</geoform>
      <onlink>.</onlink>
    </citeinfo>
   </citation>
 <descript>
   <abstract>This dataset represents the GPS location of inspection points collected in the field for the Site Name</abstract>
   <purpose>This dataset was created to accompany the clients Assessment Plan. This point feature class represents the location within the area that the field crews collected related data.</purpose>
 </descript>
<timeperd>
 <timeinfo>
  <mdattim>
    <sngdate>
      <caldate>08-24-2009</caldate>
      <time>unknown</time>
     </sngdate>
    <sngdate>
      <caldate>08-26-2009</caldate>
    </sngdate>
   <sngdate>
      <caldate>08-26-2009</caldate>
    </sngdate>
   <sngdate>
      <caldate>07-07-2010</caldate>
    </sngdate>
  </mdattim>
</timeinfo>

This is my Python code so far:

folderPath = "Z:\ESRI\Figure_Sourcing\Figures\Metadata\IOR_Run_Metadata_2009"

for filename in glob.glob(os.path.join(folderPath, "*.xml")):

    fullpath = os.path.join(folderPath, filename)

    if os.path.isfile(fullpath):
        basename, filename2 = os.path.split(fullpath)

        root = ElementTree(file=r"Z:\ESRI\Figure_Sourcing\Figures\Metadata\Run_Metadata_2009\\" + filename2)

        iter = root.getiterator()
        #Iterate
        for element in iter:
            print element.tag

            if element.tag == "begdate":
                element.tag.replace("begdate", "sngdate")

Also, show us the rules for converting one to the other. I.e. show the input and the expected output generated from that input. — Jim Garrison
– Jim Garrison, Commented Aug 2, 2011 at 22:07
The first XML is the input. I have a number of template XML's that have keywords embedded between certain tags. The second is the output that I have edited manually. I want to edit the first XML so that everything between the timeinfo tags in the first XML is replaced by everything between those same tags in the second XML. I am using Python because this is an ArcGIS function and python is the preferred language. I am using this script in conjunction with their python tools. My script is going to be used to batch process XML's to be used as metadata in a large number of GIS shapefiles.... — Mike
– Mike, Commented Aug 3, 2011 at 16:51
Is this impossible? I've posted this one a couple sites and it doesn't seem like anyone viewing my question can offer a decent answer... — Mike
– Mike, Commented Aug 3, 2011 at 17:57

twasbrillig · Accepted Answer · 2014-11-13 08:01:26Z

I believe I succeeded in making the code work. This will allow you to edit certain tags if you need to change them from an existing XML file. I needed to do this to create metadata for some GIS shapefiles in a batch processing script to change certain date values depending on if they were single dates, multiple dates or a range of dates.

This webpage helped a lot: http://lxml.de/tutorial.html

I have some more work to do, but this was the answer I was looking for from my original question :) I'm sure this can be used in many other applications.

# Set workspace location for XML files
folderPath = "Z:\ESRI\Figure_Sourcing\Figures\Metadata\IOR_Run_Metadata_2009"
# Loop through each file and search for files with .xml extension
for filename in glob.glob(os.path.join(folderPath, "*.xml")):

    fullpath = os.path.join(folderPath, filename)

    # Split file name from the directory path
    if os.path.isfile(fullpath):
        basename, filename2 = os.path.split(fullpath)
        # Set variable to XML files
        root = ElementTree(file=r"Z:\ESRI\Figure_Sourcing\Figures\Metadata\IOR_Run_Metadata_2009\\" + filename2)

        # Set variable for iterator
        iter = root.getiterator()
        #Iterate through the tags in each XML file
        for element in iter:
            if element.tag == "timeinfo":
                tree = root.find(".//timeinfo")
                # Clear all tags below the "timeinfo" tag
                tree.clear()
                # Append new Element
                element.append(ET.Element("mdattim"))
                # Create SubElements to the parent tag
                child1 = ET.SubElement(tree, "sngdate")
                child2 = ET.SubElement(child1, "caldate")
                child3 = ET.SubElement(child1, "time")
                # Set text values for tags
                child2.text = "08-24-2009"
                child3.text = "unknown

Collectives™ on Stack Overflow

Find and Replace tags in XML using Python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related