http request and regex in Python for HTML parser

Question

When I execute the script, the result is empty. Why? The script connected with a site and parse html tag <a>:

#!/usr/bin/python3

import re
import socket
import urllib, urllib.error
import http.client
import sys

conn = http.client.HTTPConnection('www.guardaserie.online');
headers = { "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
                "Content-type": "application/x-www-form-urlencoded; charset=UTF-8" }
params = urllib.parse.urlencode({"s":"hannibal"})
conn.request('GET', '/',params, headers)
response = conn.getresponse();

site = re.search('<a href="(.*)" class="box-link-serie">', str(response.read()), re.M|re.I)
if(site):
  print(site.group())

Possible duplicate of RegEx match open tags except XHTML self-contained tags — Lex Scarisbrick
– Lex Scarisbrick, Commented Aug 4, 2016 at 18:14

l'L'l · Accepted Answer · 2016-08-04 18:13:21Z

1

It's likely the pattern you are searching for is non-existent in the read response, or it chokes at some point trying to parse html.

re.search( 'href="(.*)" class="box-link-serie"', str(response.read()), re.M | re.I )

Using something more generic or another parser method will likely lead you to your desired result.

answered Aug 4, 2016 at 18:13

l'L'l

47.5k12 gold badges102 silver badges154 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

l'L'l Over a year ago

If you tried the pattern above it should return a result. I would recommend you try using these imports: import re, httplib, socket, urllib, sys, and change params = urllib.urlencode, as well as conn = httplib.HTTPConnection ...

faserx Over a year ago

the pattern return the entire html page

faserx Over a year ago

the result is always that

l'L'l Over a year ago

I get href="http://www.guardaserie.online/ray-donovan-a/" class="box-link-serie" when using print(site.group()) ... python code here : gist.github.com/anonymous/43026f7262b2fddfb7643169f0d558b2

l'L'l Over a year ago

See comment #4.

|

Collectives™ on Stack Overflow

http request and regex in Python for HTML parser

1 Answer 1

8 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related