2

I've been trying to scrape a HTML table with Python and I can't get it to print for some reason, bear with me since I've just started using Python (2 days in.) and I've barely scratched the surface, this is also my first Stack Overflow post so I'll try to make it as descriptive as possible.

Pretty sure this question might've been asked before, and I'm sorry in that case.

Anyways! Here's the code:

import urllibs2
from bs4 import BeautifulSoup
soup = BeautifulSoup(urllib2.urlopen ('http://premierleague.com/en-gb/matchday/league-table.html').read())
for row in soup('table',{'class':'leagueTable'})[0].tbody('tr'):
tds=row('td')

http://premierleague.com/en-gb/matchday/league-table.html

I'm weak at Python and I'm not sure the code is right for this type of scrape, but from what I can understand myself it's the print I can't get to work. I tried different ways of printing but can't get it to work.

1 Answer 1

1

Make it simpler - use a CSS selector to get to the desired rows - tr elements having club-row class located inside the table having leagueTable class. For each row get the text of all the cells. Working example:

import urllib2

from bs4 import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen('http://www.premierleague.com/en-gb/matchday/league-table.html'))

for row in soup.select("table.leagueTable tr.club-row"):
    cells = [cell.get_text(strip=True) for cell in row.find_all('td')]
    print cells

Prints:

[u'1', u'', u'(1)', u'Manchester City', u'5', u'5', u'0', u'0', u'11', u'0', u'11', u'15']
[u'2', u'', u'(2)', u'Leicester City', u'5', u'3', u'2', u'0', u'11', u'7', u'4', u'11']
[u'3', u'', u'(3)', u'Manchester United', u'5', u'3', u'1', u'1', u'6', u'3', u'3', u'10']
[u'4', u'', u'(4)', u'Arsenal', u'5', u'3', u'1', u'1', u'5', u'3', u'2', u'10']
[u'5', u'', u'(10)', u'West Ham United', u'5', u'3', u'0', u'2', u'11', u'6', u'5', u'9']
[u'6', u'', u'(5)', u'Crystal Palace', u'5', u'3', u'0', u'2', u'8', u'6', u'2', u'9']
[u'7', u'', u'(6)', u'Everton', u'5', u'2', u'2', u'1', u'8', u'5', u'3', u'8']
[u'8', u'', u'(7)', u'Swansea City', u'5', u'2', u'2', u'1', u'7', u'5', u'2', u'8']
[u'9', u'', u'(8)', u'Norwich City', u'5', u'2', u'1', u'2', u'8', u'9', u'-1', u'7']
[u'10', u'', u'(9)', u'Liverpool', u'5', u'2', u'1', u'2', u'3', u'6', u'-3', u'7']
[u'11', u'', u'(11)', u'Southampton', u'5', u'1', u'3', u'1', u'5', u'5', u'0', u'6']
[u'12', u'', u'(12)', u'Tottenham Hotspur', u'5', u'1', u'3', u'1', u'4', u'4', u'0', u'6']
[u'13', u'', u'(13)', u'Watford', u'5', u'1', u'3', u'1', u'3', u'4', u'-1', u'6']
[u'14', u'', u'(14)', u'West Bromwich Albion', u'5', u'1', u'2', u'2', u'3', u'6', u'-3', u'5']
[u'15', u'', u'(15)', u'Aston Villa', u'5', u'1', u'1', u'3', u'6', u'8', u'-2', u'4']
[u'16', u'', u'(16)', u'Bournemouth', u'5', u'1', u'1', u'3', u'6', u'9', u'-3', u'4']
[u'17', u'', u'(17)', u'Chelsea', u'5', u'1', u'1', u'3', u'7', u'12', u'-5', u'4']
[u'18', u'', u'(19)', u'Stoke City', u'5', u'0', u'2', u'3', u'3', u'7', u'-4', u'2']
[u'19', u'', u'(20)', u'Sunderland', u'5', u'0', u'2', u'3', u'6', u'11', u'-5', u'2']
[u'20', u'', u'(18)', u'Newcastle United', u'5', u'0', u'2', u'3', u'2', u'7', u'-5', u'2']

And now we can clearly see - that's a terrible start for Chelsea.

Sign up to request clarification or add additional context in comments.

4 Comments

Cheers mate! Can't get it to print more than 4 rows tho, which ironically is Arsenal. Haha, hope it stays that way for Chelsea. Also, how do I include this code in a HTML table, I mean is it an easy process or should I read about that? Thanks in advance brother!
@smokeyblunts we are experiencing the differences between parsers that bs4 chooses automatically from available in a current python environment. Install html5lib or lxml and call` BeaufitulSoup` constructor as BeautifulSoup(urllib2.urlopen('http://premierleague.com/en-gb/matchday/league-table.html'), "html5lib") or BeautifulSoup(urllib2.urlopen('http://premierleague.com/en-gb/matchday/league-table.html'), "lxml").
Hi again alecxe, never really got the parsing problem working, sorry for bothering you again. as I mentioned before I managed to get 4 rows working, didn't have alot of time yesterday so I figured I'd make it work today. However, today nothing prints, the code's exactly the same as it was yesterday, but it doesn't print a single line, would've tried to PM you about it but I can't seem to find if there's a PM function or not. Thanks in advance, yet again!
@smokeyblunts sure, you need to fix your url - it should be http://www.premierleague.com/en-gb/matchday/league-table.html (note the www there)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.