2

I want to make a really script in Python that gets the contents from the title tags of a specified web page and then puts them into a MySQL database.

I have very (and I mean very) little experience with Python but this needs to be done for my project. How can I do this in the simplest way possible?

I hope you are able to understand what I'm trying to ask.

2 Answers 2

5
  1. Study urllib2 to see how to download the webpage.
  2. Study BeautifulSoup to parse the HTML and pull out the title.
  3. Study the Python Database API Specification to insert rows into the MySQL database.

Here is some example code to get you started:

import urllib2
import BeautifulSoup
import MySQLdb

f = urllib2.urlopen('http://www.python.org/')
soup=BeautifulSoup.BeautifulSoup(f.read())
title=soup.find('title')
print(title.string)

connection=MySQLdb.connect(
    host='HOST',user='USER',
    passwd='PASS',db='MYDB')
cursor=connection.cursor()

sql='''CREATE TABLE IF NOT EXISTS foo (
           fooid int(11) NOT NULL AUTO_INCREMENT,
           title varchar(100) NOT NULL,
           PRIMARY KEY (fooid)
       )'''
cursor.execute(sql)

sql='INSERT INTO foo (title) VALUES (%s)'
args=[title.string]
cursor.execute(sql,args)
cursor.close()
connection.close()
Sign up to request clarification or add additional context in comments.

1 Comment

@unbuntu's example code will get you started. urllib2 is part of Python, but you need to install the other two packages from pypi.python.org/pypi
1

use urllib2 to open up the web page. Then parse the returned text with regex to retrieve the title.

1 Comment

No. Never try to use regular expressions on HTML, it's not regular. Also, as the poster said they had no Python experience, this is completely unhelpful even if it was right.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.