0

I have an xml file that looks like this:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<feed xml:base="http://data.treasury.gov:8001/Feed.svc/" xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices" xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata" xmlns="http://www.w3.org/2005/Atom">
  <title type="text">DailyTreasuryYieldCurveRateData</title>
  <id>http://data.treasury.gov:8001/feed.svc/DailyTreasuryYieldCurveRateData</id>
  <updated>2015-08-30T15:17:09Z</updated>
  <link rel="self" title="DailyTreasuryYieldCurveRateData" href="DailyTreasuryYieldCurveRateData" />
  <entry>
    <id>http://data.treasury.gov:8001/Feed.svc/DailyTreasuryYieldCurveRateData(6404)</id>
    <title type="text"></title>
    <updated>2015-08-30T15:17:09Z</updated>
    <author>
      <name />
    </author>
    <link rel="edit" title="DailyTreasuryYieldCurveRateDatum" href="DailyTreasuryYieldCurveRateData(6404)" />
    <category term="TreasuryDataWarehouseModel.DailyTreasuryYieldCurveRateDatum" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" />
    <content type="application/xml">
      <m:properties>
        <d:Id m:type="Edm.Int32">6404</d:Id>
        <d:NEW_DATE m:type="Edm.DateTime">2015-08-03T00:00:00</d:NEW_DATE>
        <d:BC_1MONTH m:type="Edm.Double">0.03</d:BC_1MONTH>
        <d:BC_3MONTH m:type="Edm.Double">0.08</d:BC_3MONTH>
        <d:BC_6MONTH m:type="Edm.Double">0.17</d:BC_6MONTH>
        <d:BC_1YEAR m:type="Edm.Double">0.33</d:BC_1YEAR>
        <d:BC_2YEAR m:type="Edm.Double">0.68</d:BC_2YEAR>
        <d:BC_3YEAR m:type="Edm.Double">0.99</d:BC_3YEAR>
        <d:BC_5YEAR m:type="Edm.Double">1.52</d:BC_5YEAR>
        <d:BC_7YEAR m:type="Edm.Double">1.89</d:BC_7YEAR>
        <d:BC_10YEAR m:type="Edm.Double">2.16</d:BC_10YEAR>
        <d:BC_20YEAR m:type="Edm.Double">2.55</d:BC_20YEAR>
        <d:BC_30YEAR m:type="Edm.Double">2.86</d:BC_30YEAR>
        <d:BC_30YEARDISPLAY m:type="Edm.Double">2.86</d:BC_30YEARDISPLAY>
      </m:properties>
    </content>
  </entry>
  <entry>
    <id>http://data.treasury.gov:8001/Feed.svc/DailyTreasuryYieldCurveRateData(6405)</id>
    <title type="text"></title>
    <updated>2015-08-30T15:17:09Z</updated>
    <author>
      <name />
    </author>
    <link rel="edit" title="DailyTreasuryYieldCurveRateDatum" href="DailyTreasuryYieldCurveRateData(6405)" />
    <category term="TreasuryDataWarehouseModel.DailyTreasuryYieldCurveRateDatum" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" />
    <content type="application/xml">
      <m:properties>
        <d:Id m:type="Edm.Int32">6405</d:Id>
        <d:NEW_DATE m:type="Edm.DateTime">2015-08-04T00:00:00</d:NEW_DATE>
        <d:BC_1MONTH m:type="Edm.Double">0.05</d:BC_1MONTH>
        <d:BC_3MONTH m:type="Edm.Double">0.08</d:BC_3MONTH>
        <d:BC_6MONTH m:type="Edm.Double">0.18</d:BC_6MONTH>
        <d:BC_1YEAR m:type="Edm.Double">0.37</d:BC_1YEAR>
        <d:BC_2YEAR m:type="Edm.Double">0.74</d:BC_2YEAR>
        <d:BC_3YEAR m:type="Edm.Double">1.08</d:BC_3YEAR>
        <d:BC_5YEAR m:type="Edm.Double">1.6</d:BC_5YEAR>
        <d:BC_7YEAR m:type="Edm.Double">1.97</d:BC_7YEAR>
        <d:BC_10YEAR m:type="Edm.Double">2.23</d:BC_10YEAR>
        <d:BC_20YEAR m:type="Edm.Double">2.59</d:BC_20YEAR>
        <d:BC_30YEAR m:type="Edm.Double">2.9</d:BC_30YEAR>
        <d:BC_30YEARDISPLAY m:type="Edm.Double">2.9</d:BC_30YEARDISPLAY>
      </m:properties>
    </content>
  </entry>
</feed>

How can I parse out the '2.16' for 'BC_10YEAR'? I've been looking at other examples with ElementTree and lxml and I just can't seem to match up the xml format in those examples with that of my file.

The last thing I've tried was:

from lxml import etree
doc = etree.parse(yield_xml)
memoryElem = doc.find('content')
print memoryElem.text        # element text
print memoryElem.get('type') # attribute

I get an error: AttributeError: 'NoneType' object has no attribute 'text'

Is there a simple way to do this?

2 Answers 2

1

You may try built-in split method:

>>>[data.split('>')[1].split('<')[0] for data in str(xml_file).split('<d:') if 'BC_10YEAR' in data][0]
'2.16'
Sign up to request clarification or add additional context in comments.

3 Comments

I tried 'with open('test.xml', 'rb') as xml_file: [data.split('>')[1].split('<')[0] for data in str(xml_file).split('<d:') if 'BC_10YEAR' in data][0]' but I get "IndexError: list index out of range" error. What am I doing wrong?
It means you test.xml file object differs from example above.
That's strange, I'm pretty sure my file has exactly what I pasted above. Anyway, I modified to this to get it to work:with open(yield_xml, 'rb') as yield_file: for line in yield_file: if 'BC_10YEAR' in line: cur_yield = float(line.split('>')[1].split('<')[0]) break
0

I'd suggest to use lxml's xpath() method which provide better XPath expression support :

from lxml import etree

doc = etree.parse(yield_xml)

#register prefixes to be used in xpath
ns = {"foo": "http://www.w3.org/2005/Atom",
      "d": "http://schemas.microsoft.com/ado/2007/08/dataservices",
      "m": "http://schemas.microsoft.com/ado/2007/08/dataservices/metadata"}

#select element <d:BC_10YEAR>, and convert the value to number
result = doc.xpath("number(//foo:content/m:properties/d:BC_10YEAR)", namespaces=ns)

#print the result
print(result)
print(type(result))

output :

2.16
<type 'float'>

In case you wonder why foo:content instead of just foo in the xpath expression above, that's because content inherits default namespace from the root element, implicitly. And the default namespace uri is mapped to prefix foo in the above code; related question : parsing xml containing default namespace to get an element value using lxml

2 Comments

Thanks the code works. Unfortunately my knowledge of xml is very limited so I could not understand much of what you said. I do have a question though: how does the code differentiate between the two 'BC_10YEAR' values in the xml file? The first one is 2.16 but there is another one that's 2.23.
The code will return the first only. Getting the the other one, or all BC_10YEAR is perfectly possible with a little change in the xpath argument

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.