1

I'm just trying to get some data from a webpage like this one:

[ . . . ]

<p class="special-large">Lorem Ipsum 01</p>
<p class="special-large">Lorem Ipsum 02</p>
<p class="special-large">Lorem Ipsum 03</p>
<p class="special-large">Lorem Ipsum 04</p>
<p class="special-large">Lorem Ipsum 05</p>

[ . . . ]

I would like to have a python array like the following one:

myArrayWebPage = ["Lorem Ipsum 01","Lorem Ipsum 02","Lorem Ipsum 03","Lorem Ipsum 04","Lorem Ipsum 05"]

This is my python script:

import urllib.request

urlAddress = "http:// ... /" # my url address
getPage = urllib.request.urlopen(urlAddress)
outputPage = getPage.read()
print(outputPage)

How can I get the array from "outputPage"?

1 Answer 1

1

This appears to do what you want:

Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 08:06:12) [MSC v.1900 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> html = '''<p class="special-large">Lorem Ipsum 01</p>
<p class="special-large">Lorem Ipsum 02</p>
<p class="special-large">Lorem Ipsum 03</p>
<p class="special-large">Lorem Ipsum 04</p>
<p class="special-large">Lorem Ipsum 05</p>'''
>>> import re
>>> re.findall('<p class="special-large">([^<]+)</p>', html)
['Lorem Ipsum 01', 'Lorem Ipsum 02', 'Lorem Ipsum 03', 'Lorem Ipsum 04', 'Lorem Ipsum 05']
>>> 

Please note that regular expressions are typically not preferred for something like this. You should use a library like Beautiful Soup instead.

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you! Can I ask you what do you mean for "regular expressions"?
You can click on the term now, and a Wikipedia article will show up. Next time, try searching Google for a term you are not familiar with.
@JoeHunter Please take this opportunity to read the wildly entertaining answers on why regexes are insufficient to parse HTML: stackoverflow.com/questions/1732348/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.