I am trying to scrape an interactive graph from a website and this script works and gathers the text from the chart, but it is quite slow. In order for the text to appear the cursor has to hover over the graph in certain positions. Does anyone have any suggestions on how to make it more efficient?
Right now with each offset movement it runs through all the previous movements before it progresses to the next one. Does anyone know how to bypass that? (for example it goes 0 to 5 then instead of progressing from 5 to 10, it goes back again to 0 then 5 then 10)
# set the pace at which the cursor will move and the limit to which it will move to
# (which should be the current date or the x-axis limit of the image)
# set the limit to be current date
limit = full_length
pace = 5
count = 0
while count <= limit:
value = driver.find_element_by_class_name('highcharts-tooltip').text
date_price = value.split("\n")
date = date_price[0]
price = date_price[1].split(": ")
price = price[1]
# take values at current point and add to dictionary
dp = {'date': date,
'price': price }
archived_prices.append(dp)
# move to the next date
action.move_by_offset(pace, 0).perform()
# set up a counter to figure out when we will reach the limit
count = count + pace
print(count)
NEW CODE:
# adding new column with complete url for api call
full_urls = []
for value in dataframe['urlKeys']:
full = 'https://stockx.com/api/products/'+value+'?includes=market,360¤cy=EUR&country=IT'
full_urls.append(full)
dataframe['urlFull'] = full_urls
def get_shoe_info(url_list):
for url in url_list:
headers = {
"accept-encoding": "gzip, deflate, br",
"sec-fetch-mode": "cors",
"sec=fetch-site": "same-origin",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36",
"x-requested-with": "XMLHttpRequest"
}
response = requests.get(url, headers=headers)
response.raise_for_status()
product = response.json()["Product"]
for p in product:
id_num = p["id"]
brand = p["brand"]
colorway = p["colorway"]
release_date = p["releaseDate"]
retail_price = p["retailPrice"]
shoe_name = p["shoe"]
volatility = p["market"]["volatility"]
change_percentage = p["market"]["changePercentage"]
gender = p["gender"]
print(f'ID: {id_num}\n'
f'Brand: {brand}\n'
f'Colorway: {colorway}\n'
f'Release Date: {release_date}\n'
f'Retail Price: {retail_price}\n'
f'Shoe Name: {shoe_name}\n'
f'Volatility: {volatility}\n'
f'Change Percentage: {change_percentage}\n'
f'Gender: {gender}')
#print(p)
return 0
if __name__ == "__main__":
import sys
sys.exit(get_shoe_info(full_urls))
I still have some difficulty with understanding how to pass in variables, so I am not sure I did this correctly. The first part is me taking the url keys for all the shoes and creating a url list to iterate through. Then I am trying to pass in the list to the get_shoe_info function. My error is popping up as "TypeError: string indices must be integers" and I investigated and saw that when I tried to print(p) to look at the path I was only getting strings of the key portions. I'm not sure what to do to get the values that I want.
I have added everything to (my github) in case you need to see anything else.