0

I'm a newbie with Python and I'd like to remove the element openingHours and the child elements from the XML.

I have this input

<Root>
   <stations>
      <station id= "1">
          <name>whatever</name>
          <openingHours>
               <openingHour>
                    <entrance>main</entrance>
                       <timeInterval>
                         <from>05:30</from>
                         <to>21:30</to>
                       </timeInterval>
                <openingHour/>
          <openingHours>
      <station/>
      <station id= "2">
          <name>foo</name>
          <openingHours>
               <openingHour>
                    <entrance>main</entrance>
                       <timeInterval>
                         <from>06:30</from>
                         <to>21:30</to>
                       </timeInterval>
                <openingHour/>
          <openingHours>
       <station/>
   <stations/>
  <Root/>

I'd like this output

  <Root>
   <stations>
      <station id= "1">
          <name>whatever</name>
      <station/>
      <station id= "2">
          <name>foo</name>
      <station/>
   <stations/>
  <Root/>

So far I've tried this from another thread How to remove elements from XML using Python

from lxml import etree

doc=etree.parse('stations.xml')
for elem in doc.xpath('//*[attribute::openingHour]'):
   parent = elem.getparent()
   parent.remove(elem)
print(etree.tostring(doc))

However, It doesn't seem to be working. Thanks

2 Answers 2

1

I took your code for a spin but at first Python couldn't agree with the way you composed your XML, wanting the / in the closing tag to be at the beginning (like </...>) instead of at the end (<.../>).

That aside, the reason your code isn't working is because the xpath expression is looking for the attribute openingHour while in reality you want to look for elements called openingHours. I got it to work by changing the expression to //openingHours. Making the entire code:

from lxml import etree

doc=etree.parse('stations.xml')
for elem in doc.xpath('//openingHours'):
    parent = elem.getparent()
    parent.remove(elem)
print(etree.tostring(doc))
Sign up to request clarification or add additional context in comments.

Comments

0

You want to remove the tags <openingHours> and not some attribute with name openingHour:

from lxml import etree

doc = etree.parse('stations.xml')
for elem in doc.findall('.//openingHours'):
    parent = elem.getparent()
    parent.remove(elem)
print(etree.tostring(doc))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.