1

im try to scrape news data from forex calendar, but i have small problem the xml file have

def get_news_calendar():
    r = requests.get('http://www.forexfactory.com/ffcal_week_this.xml')
    soup = BeautifulSoup(r.text , 'lxml')
    events = soup.find_all('event')
    for event in events:
        print event.find('title').text, event.find('country').text, event.find('date'), event.find('time').text, event.find('impact').text, event.find('forecast').text, event.find('previous').text

output :

Current Account EUR <date></date>    
Retail Sales m/m GBP <date></date>    
MPC Member Saunders Speaks GBP <date></date>    
Core CPI m/m CAD <date></date>    
CPI m/m CAD <date></date>    
Trimmed CPI y/y CAD <date></date>    
Median CPI y/y CAD <date></date>    
Common CPI y/y CAD <date></date>    
FOMC Member Kashkari Speaks USD <date></date>    
Flash Manufacturing PMI USD <date></date>    
Flash Services PMI USD <date></date>    
Existing Home Sales USD <date></date>    
IMF Meetings ALL <date></date>    
IMF Meetings ALL <date></date>    
Treasury Sec Mnuchin Speaks USD <date></date>    
French Presidential Election EUR <date></date>

example xml file :

<event>
    <title>German Flash Manufacturing PMI</title>
    <country>EUR</country>
    <date><![CDATA[04-21-2017]]></date>
    <time><![CDATA[7:30am]]></time>
    <impact><![CDATA[Medium]]></impact>
    <forecast><![CDATA[58.1]]></forecast>
    <previous><![CDATA[58.3]]></previous>
</event> 

how i can print the value of cdata ?

2 Answers 2

2

You appear to have got the name of the parser wrong. You are parsing an XML document, so you need to use lxml-xml instead of lxml.

Try replacing

soup = BeautifulSoup(r.text , 'lxml')

with

soup = BeautifulSoup(r.text , 'lxml-xml')

After making this change to your get_news_calendar function I get the following output running it on your example XML file:

German Flash Manufacturing PMI EUR <date>04-21-2017</date> 7:30am Medium 58.1 58.3
Sign up to request clarification or add additional context in comments.

Comments

0

Consider directly using lxml and run xpath on all <event> nodes as .text() can retrieve CData content.

import requests
import lxml.etree as et

def get_news_calendar():        
    r = requests.get('http://www.forexfactory.com/ffcal_week_this.xml')
    data = et.fromstring(r.text.encode("utf-8"))

    events = data.xpath('//event')
    for event in events:
        print(event.find('title').text, event.find('country').text,
              event.find('date').text, event.find('time').text, 
              event.find('impact').text, event.find('forecast').text, 
              event.find('previous').text)

get_news_calendar()

# Bank Holiday NZD 04-16-2017 9:00pm Holiday None None
# Bank Holiday AUD 04-16-2017 10:00pm Holiday None None
# GDP q/y CNY 04-17-2017 2:00am High 6.8% 6.8%
# Industrial Production y/y CNY 04-17-2017 2:00am High 6.2% 6.3%
# Fixed Asset Investment ytd/y CNY 04-17-2017 2:00am Medium 8.8% 8.9%
# NBS Press Conference CNY 04-17-2017 2:00am Medium None None
# Retail Sales y/y CNY 04-17-2017 2:00am Low 9.7% 9.5%
# Bank Holiday CHF 04-17-2017 6:00am Holiday None None
# BOJ Gov Kuroda Speaks JPY 04-17-2017 6:15am High None None
# Bank Holiday GBP 04-17-2017 7:00am Holiday None None
# French Bank Holiday EUR 04-17-2017 7:00am Holiday None None
# ...

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.