0

i'm trying to webscrape to excel, but i can't get the list of courses to actually align properly. it is just one long string on one row. i would like each row for every course to be under the proper category (see image for reference).

from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import re
import csv

driver = webdriver.Chrome("drivers/chromedriver")

# driver.get("https://web3.ncaa.org/hsportal/exec/hsAction")

driver.get("https://web3.ncaa.org/hsportal/exec/hsAction")
Select(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.ID, "state")))).select_by_visible_text("New Hampshire")
driver.find_element_by_xpath("//input[@id='city']").send_keys("Moultonborough")
driver.find_element_by_xpath("//input[@id='name']").send_keys("Moultonborough Academy")
driver.find_element_by_xpath("//input[@value='Search']").click()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//input[@name='hsCode']"))).click()

x = driver.find_elements_by_xpath("(//tr[th[@class='header']])[1]/th")
head = [re.sub('\s+',' ',el.text) for el in x]
y =  = ([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "table#approvedCourseTable_1 td")))])

with open('out.csv', 'w', encoding='utf-8') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(head)
    writer.writerow(courses)

currently it looks like this: enter image description here

but would like it like this: enter image description here

2
  • What are you using as separator character? Commented Aug 3, 2020 at 20:05
  • i guess i don't know enough about separator characters, so i'll look into that. i'd appreciate any pointers or direction if possible. thank you Commented Aug 3, 2020 at 20:34

1 Answer 1

0

I took the liberty of using a different library. If you really need a csv output, then it should be easy enough to tweak the pandas bit to serve your needs. You can use to_csv().

main_table = driver.find_element_by_id("NcaaCrs_ApprovedCategory_All")  # Parent table.
titles = iter([title.text for title in main_table.find_elements_by_class_name("hs_tableHeader")][1:])  # Get titles above the tables, ignoring the general title in the first entry.

tables = main_table.find_elements_by_tag_name("table[id*=approvedCourseTable")  # Find all tables by partial id.
tables_html = "\n".join([table.get_attribute('outerHTML') for table in tables])  # Combine html of all tables.
tables_df = pd.read_html(tables_html)  # Creates a list of table DataFrames.

writer = pd.ExcelWriter("temp.xlsx", engine='openpyxl')

for table_df in tables_df:  # Loop through DataFrames
    title = next(titles).replace("/", "-")  # Select next title and clean it up for excel sheet name.
    table_df.to_excel(writer, sheet_name=title, index=False)  # Send DataFrame to writer on individual sheet.
writer.save()  # Save Excel file.

Hope it achieves what it was supposed to.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.