Is there a way to retrieve the webpage contents from a URL (not hostname) using sockets in Python? socket.connect() only works with a host name. I can get the contents from www.python.org but not www.python.org/about.
Thanks!
K found the answer. I was supposed to indicate the path in the GET request sent to the server.
In www.python.org/about/, the www.python.org is the hostname, and the /about/ is the path. So, the string to be sent would be "GET /about HTTP...". Something like:
import socket
from urllib import parse # for separating path and hostname
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
url = parse.urlparse(link)
s.connect((url[1], 80))
msg = "GET " + link[2] + " HTTP/1.0\r\n\r\n"
s.sendall(msg)
s.recv(4096)
sockets?requestsmodule for getting html