1
from bs4 import BeautifulSoup
from urllib import urlopen

player_code = open("/Users/brandondennis/Desktop/money/CF_Name.txt").read()
player_code = player_code.split("\r")


for player in player_code:

html =urlopen("https://www.capfriendly.com/players/"+player+"")

soup = BeautifulSoup(html, 'html.parser')

for section in soup.findAll('div',{"class": "table_c"}):
    table = section.findChildren()[10].text
    print player, table

Here is a link to a sample player page : https://www.capfriendly.com/players/patrik-elias

Here is a sample of player names that I am adding from a text file to the base url.

enter image description here

This is ultimately What I am wanting to do for my text file of 1000+ players

This is ultimately what I am wanting to do for my text file of 1000+ players

7
  • Can you give an example of the output you are expecting? Commented Jun 17, 2016 at 15:32
  • @keatinge I would like to eventually have them in csv format like the table formatted in on the website. link. I have a list of about 1000 names that I would like to get. Commented Jun 17, 2016 at 18:21
  • What table do you want? Commented Jun 17, 2016 at 22:03
  • @PadraicCunningham I would like to get everything back to 2007-08. At the moment I am just trying to get the code to work for any table with salary information. Commented Jun 17, 2016 at 22:45
  • Add a Sample of your file, exactly as you see it. Commented Jun 17, 2016 at 22:57

4 Answers 4

1

Aside from what the others mentioned. Take a look at this line:

table = soup.findAll('table_c')[2]

here, BeautifulSoup would try to locate table_c elements. But, table_c is a class attribute:

<div class="table_c"><div class="rel navc column_head3 cntrct"><div class="ofh"><div>HISTORICAL SALARY </div><div class="l cont_t mt4">SOURCE: The Hockey News, USA Today</div></div></div>
    <table class="cntrct" id="contractinsert" cellpadding="0" border="0" cellspacing="0">
    ...
    </table>
</div>

Use the class_ argument instead:

table = soup.find_all(class_='table_c')[2] 

Or, you may get directly to the table by id:

table = soup.find("table", id="contractinsert")
Sign up to request clarification or add additional context in comments.

Comments

0

Hard to answer when there is almost no context in your question (What exactly isn't working and what exactly you are trying to scrape), but take a look at these lines:

first_columns.append(row.findAll('td'))[0]
third_columns.append(row.findAll('td'))[2]

Since append returns None these lines would raise an exception.

I believe they are meant to be:

first_columns.append(row.findAll('td')[0])
third_columns.append(row.findAll('td')[2])

3 Comments

I continue to see this error. (u'2018-19', u'$2,750,000') Traceback (most recent call last): File "/Users/bd/Desktop/untitled-14.py", line 15, in <module> table = soup.findAll('table')[0] IndexError: list index out of range
@denn9268 Update your question with the exact code you are executing and the exact error you are getting. This bit of code works for me: from urllib import urlopen; from bs4 import BeautifulSoup; url = 'https://www.capfriendly.com/players/patrik-elias'; soup = BeautifulSoup(urlopen(url)); table = soup.findAll('table')[0]
that works for me as well with one name. I am getting the Error code when I am trying to pull from multiple names. I have updated the code and a sample list of names.
0

Your parenthesis and brackets look misplaced.

Does this do what you want ?

first_columns = []
third_columns = []
for row in rows[1:]:
    first_columns.append(row.findAll('td')[0])
    third_columns.append(row.findAll('td')[2])

Where I no longer insert all the td elements in each list and then select the [0] and [2] elements, discarded anyways.

Comments

0

It seems to work fine for one player at a time, but when I change to my text file of a list of players that where I am having trouble. I think how you are parsing the file is the issue, if you have a player per line just iterate over the file object, stripping any whitespace:

from bs4 import BeautifulSoup
from urllib import urlopen
import csv

with open("/Users/bd/Desktop/testfolder/Player_Code_Test.txt") as f:
   for player in map(str.strip, f)    

     html =urlopen("https://www.capfriendly.com/players/".format(player))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.