Scraping a table from webpage with Python

Question

from bs4 import BeautifulSoup
from urllib import urlopen

player_code = open("/Users/brandondennis/Desktop/money/CF_Name.txt").read()
player_code = player_code.split("\r")


for player in player_code:

html =urlopen("https://www.capfriendly.com/players/"+player+"")

soup = BeautifulSoup(html, 'html.parser')

for section in soup.findAll('div',{"class": "table_c"}):
    table = section.findChildren()[10].text
    print player, table

Here is a link to a sample player page : https://www.capfriendly.com/players/patrik-elias

Here is a sample of player names that I am adding from a text file to the base url.

This is ultimately what I am wanting to do for my text file of 1000+ players

@keatinge I would like to eventually have them in csv format like the table formatted in on the website. link. I have a list of about 1000 names that I would like to get. — denn9268
– denn9268, Commented Jun 17, 2016 at 18:21
@PadraicCunningham I would like to get everything back to 2007-08. At the moment I am just trying to get the code to work for any table with salary information. — denn9268
– denn9268, Commented Jun 17, 2016 at 22:45

alecxe · Accepted Answer · 2016-06-17 15:49:00Z

1

Aside from what the others mentioned. Take a look at this line:

table = soup.findAll('table_c')[2]

here, BeautifulSoup would try to locate table_c elements. But, table_c is a class attribute:

<div class="table_c"><div class="rel navc column_head3 cntrct"><div class="ofh"><div>HISTORICAL SALARY </div><div class="l cont_t mt4">SOURCE: The Hockey News, USA Today</div></div></div>
    <table class="cntrct" id="contractinsert" cellpadding="0" border="0" cellspacing="0">
    ...
    </table>
</div>

Use the class_ argument instead:

table = soup.find_all(class_='table_c')[2]

Or, you may get directly to the table by id:

table = soup.find("table", id="contractinsert")

answered Jun 17, 2016 at 15:49

alecxe

476k127 gold badges1.1k silver badges1.2k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

DeepSpace · Accepted Answer · 2016-06-17 15:17:03Z

0

Hard to answer when there is almost no context in your question (What exactly isn't working and what exactly you are trying to scrape), but take a look at these lines:

first_columns.append(row.findAll('td'))[0]
third_columns.append(row.findAll('td'))[2]

Since append returns None these lines would raise an exception.

I believe they are meant to be:

first_columns.append(row.findAll('td')[0])
third_columns.append(row.findAll('td')[2])

answered Jun 17, 2016 at 15:17

DeepSpace

82.1k12 gold badges119 silver badges166 bronze badges

3 Comments

denn9268 Over a year ago

I continue to see this error. (u'2018-19', u'$2,750,000') Traceback (most recent call last): File "/Users/bd/Desktop/untitled-14.py", line 15, in <module> table = soup.findAll('table')[0] IndexError: list index out of range

DeepSpace Over a year ago

@denn9268 Update your question with the exact code you are executing and the exact error you are getting. This bit of code works for me:

from urllib import urlopen; from bs4 import BeautifulSoup;  url = 'https://www.capfriendly.com/players/patrik-elias'; soup = BeautifulSoup(urlopen(url));  table = soup.findAll('table')[0]

denn9268 Over a year ago

that works for me as well with one name. I am getting the Error code when I am trying to pull from multiple names. I have updated the code and a sample list of names.

Tezirg · Accepted Answer · 2016-06-17 15:17:29Z

0

Your parenthesis and brackets look misplaced.

Does this do what you want ?

first_columns = []
third_columns = []
for row in rows[1:]:
    first_columns.append(row.findAll('td')[0])
    third_columns.append(row.findAll('td')[2])

Where I no longer insert all the td elements in each list and then select the [0] and [2] elements, discarded anyways.

answered Jun 17, 2016 at 15:17

Tezirg

1,6591 gold badge10 silver badges20 bronze badges

Comments

Padraic Cunningham · Accepted Answer · 2016-06-17 22:06:49Z

0

It seems to work fine for one player at a time, but when I change to my text file of a list of players that where I am having trouble. I think how you are parsing the file is the issue, if you have a player per line just iterate over the file object, stripping any whitespace:

from bs4 import BeautifulSoup
from urllib import urlopen
import csv

with open("/Users/bd/Desktop/testfolder/Player_Code_Test.txt") as f:
   for player in map(str.strip, f)    

     html =urlopen("https://www.capfriendly.com/players/".format(player))

answered Jun 17, 2016 at 22:06

Padraic Cunningham

181k30 gold badges264 silver badges327 bronze badges

Collectives™ on Stack Overflow

Scraping a table from webpage with Python

4 Answers 4

Comments

3 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related