-1

How do i parse the date start and date end value using beautifulsoup?

<h2 name="PRM-013113-21017-0FSNS" class="pointer">
    <a name="PRM-013113-21017-0FSNS">Chinese New Year Sale<br>
       <span>February 8, 2013 - February 10, 2013</span>
    </a>
</h2>
1
  • i want to have a output which is, date_start = February 8, 2013, date_end = February 10, 2013, what will i do? Commented Feb 4, 2013 at 8:25

1 Answer 1

1

Something like this.

import re
from BeautifulSoup import BeautifulSoup

html = '<h2 name="PRM-013113-21017-0FSNS" class="pointer"><a name="PRM-013113-21017-0FSNS">Chinese New Year Sale<br><span>February 8, 2013 - February 10, 2013</span></a></h2>'
date_span = BeautifulSoup(html).findAll('h2', {'class' : 'pointer'})[0].findAll('span')[0]
date = re.findall(r'<span>(.+?)</span>', str(date_span))[0]

(PS: you can also use BeautifulSoup's text=True method with findAll to get the text instead of using regex as follows.)

from BeautifulSoup import BeautifulSoup

html = '<h2 name="PRM-013113-21017-0FSNS" class="pointer"><a name="PRM-013113-21017-0FSNS">Chinese New Year Sale<br><span>February 8, 2013 - February 10, 2013</span></a></h2>'
date = BeautifulSoup(test).findAll('h2', {'class' : 'pointer'})[0].findAll('span')[0]
date = date.findAll(text=True)[0]

Update::

To have a start and end date as separate variables you can simply split them you can simply split the date variable as follows:

from BeautifulSoup import BeautifulSoup

html = '<h2 name="PRM-013113-21017-0FSNS" class="pointer"><a name="PRM-013113-21017-0FSNS">Chinese New Year Sale<br><span>February 8, 2013 - February 10, 2013</span></a></h2>'
date = BeautifulSoup(test).findAll('h2', {'class' : 'pointer'})[0].findAll('span')[0]
date = date.findAll(text=True)[0]
# Get start and end date separately
date_start, date_end = date.split(' - ')

now date_start variable contains the starting date and date_end variable contains the ending date.

Sign up to request clarification or add additional context in comments.

2 Comments

thanks @Amyth but i want to have an output of each dates, which is date_start = February 8, 2013 and date_end = February 10, 2013
how about simply splitting the date output on ` - `? Check the updated answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.