0

I have a master xml file called vs_origonal_M.xml I want to add all types of a certain child

<location>  
</location>
<location>
</location>
.
.
.
<location>
</location>

until all the files are looked at.

I am doing this by first opening the directory, next I am making a list of all the files in the directory and checking to see if they are indeed xml files, then I am taking a certain child out. Then (Here's where I am stuck) I need to open the master file and insert this child right under the last child of the same name, finally when all done I need to save the master xml file

Here is the code:

# List the xml files in the directory
from xml.dom import minidom
from xml.etree import ElementTree as ET
import glob
import os
import sys


def is_xml(HART_filename):
 string_length = len(HART_filename)
 suffix = '.xml'
 if HART_filename.endswith(suffix):
    return True 
 else:
    return False 

#add the directory to the python script
os.chdir("c:/Users/ME/Documents/XML_Parasing_Python")

#List all the files in an array
xml_list = os.listdir("c:/Users/ME/Documents/XML_Parasing_Python")
print xml_list
xml_list_length = len(xml_list)
print xml_list_length
number = 1

for number in range(1,xml_list_length):
    string_length = len(xml_list[number])
    #print string_length
    print xml_list[number]
    #check to see if file is .xml
    if is_xml(xml_list[number]) == True: 
        xmldoc = minidom.parse(xml_list[number])
        reflist = xmldoc.getElementsByTagName('location')
        var_ref = reflist[0]
        print reflist[0].toxml()
        #Add to master .xml file
        tree = ET.parse('vs_original_M.xml')
        number += 1
    else:
        number += 1
        print 'wasn''t true'
3
  • Why are you mixing ET and minidom in the same program? This would be a lot simpler to do just using one XML library. Commented Aug 26, 2013 at 17:17
  • I couldn't actually tell you I am stumbling through this as this is the first python program I've written... ever. WHat should I be using ? Commented Aug 26, 2013 at 17:25
  • One or the other, not both. Commented Aug 26, 2013 at 17:38

1 Answer 1

1

There's probably a better way to do what you actually want to do—in particular, there's a good chance your real XML has a single <locations> tag that all the <location> tags go underneath, so there's no reason to search for the last <location> tag at all…

But here's how you'd do it.

os.chdir('c:/Users/ME/Documents/XML_Parasing_Python/')
origname = 'vs_original_M.xml'
master = ET.parse(origname)
for path in os.listdir('.'):
    if path != origname and os.path.splitext(path)[-1] == '.xml':
        child = ET.parse(path)
        root = child.getroot()
        last_location_parent = master.find('.//*[{}][last()]'.format(root.tag))
        last_location_parent.append(root)
master.write('master.xml')

Most of this is pretty simple. You have to find the parent of the last location node, then you can append another node to it.

The only tricky bit there is the XPath expression in the find, so let me break it down for you (but you will have to read the docs to really understand it!):

  • .// means "descendants of the current node". (Technically you should be able to just use // for "descendants of the root", but there are bugs in earlier versions of etree, so it's safer this way.)
  • * means "with any tag name".
  • [location] means "with a child "location" tag. (Of course I'm filling in the child's root tag using the format method. If you know that all of your children have location as the root, you can hardcode the tag name, and move the find out of the loop as well.)
  • [last()] means "the last one".

So, putting it all together, this is the last descendant of the root with any name with a child "location" tag.


If you don't understand XPath, you can always iterate things manually to get the same effect, but it's going to be longer, and easier to introduce subtle bugs, so it's really worth learning XPath.


I changed a bunch of other things in your program. Let me explain:

There's no reason to do if foo: return True else: return False; you can just do return foo. But that means your whole function is just return HART_filename.endswith('.xml'), so you don't even really need a function. And it's better to use path functions like os.path.splitext than string functions on paths.

If you do for number in range(1, xml_list_length), you don't need number = 1 at the start and number += 1 in the loop; the for statement already does that for you.

But you don't want to start at 1 anyway; Python lists are indexed starting at 0. If you're using that to skip over vs_original_M.xml, that only works if you get lucky; the order in which listdir returns things is unspecified and arbitrary. The only way to skip a file with a certain name is to check its name.

You almost never want to loop over range(len(foo)). If you just need the elements of foo, just do for element in foo. If you need the index for each element as well, do for index, element in enumerate(foo).

Finally, you should almost never check if foo == True. In Python, many things are "truthy" besides just True (the number 74, the string "hello", etc.), and you can just use if foo to check whether foo is truthy. Only use == True if you explicitly want to make sure it fails or other truthy values; if you just want to check the result of a boolean function like is_xml or endswith or the == operator, just check it directly.

Sign up to request clarification or add additional context in comments.

7 Comments

Thanks! Upon compiling and running the script IDLE complains about 'aapend' as in... AttributeError: 'NoneType' object has no attribute 'append' Should I rename this?
Rename what? There's no aapend in the code anywhere, so if you're made a typo, of course you'll need to fix it.
But meanwhile, if there are no existing location tags, the find is obviously not going to be able to find the parent of the last location tag, so it's going to return None, which means a the append will fail (even without the typo) with exactly that problem. The way you specified things, you were trying to add after the last location, so it's not clear what you should do if there aren't any. (And that's what my first paragraph was getting at. Maybe you really want to add it as the last child of a locations tag or something else simple that wouldn't have problems like this?)
And finally, in order to debug cases like this, you really need to either use the debugger, or log enough information to see where it's going wrong. For example, just print the path, root.tag, etc. values so you know whether the problem is that it found some unrelated .xml file, or tried to add the master into itself, or searched for loaction because of a typo in the child file, or …
@abarnert that's fantastic! Almost like a blogpost!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.