Get HTML contents from URL using Python sockets

Question

Is there a way to retrieve the webpage contents from a URL (not hostname) using sockets in Python? socket.connect() only works with a host name. I can get the contents from www.python.org but not www.python.org/about.

Thanks!

show the community the code you use socket API to retrieval www.python.org. — georgexsh
– georgexsh, Commented Jan 21, 2018 at 20:23
@MattiLyra, just been learning socket programming in python, and couldn't get why I could only get contents from some websites and not others. — Buttlet
– Buttlet, Commented Jan 22, 2018 at 21:52

Buttlet · Accepted Answer · 2018-01-22 22:00:10Z

1

K found the answer. I was supposed to indicate the path in the GET request sent to the server.

In www.python.org/about/, the www.python.org is the hostname, and the /about/ is the path. So, the string to be sent would be "GET /about HTTP...". Something like:

import socket
from urllib import parse # for separating path and hostname
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
url = parse.urlparse(link)
s.connect((url[1], 80))
msg = "GET " + link[2] + " HTTP/1.0\r\n\r\n"
s.sendall(msg)
s.recv(4096)

answered Jan 22, 2018 at 22:00

Buttlet

3076 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Get HTML contents from URL using Python sockets

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related