1

Is there a way to retrieve the webpage contents from a URL (not hostname) using sockets in Python? socket.connect() only works with a host name. I can get the contents from www.python.org but not www.python.org/about.

Thanks!

4
  • 1
    is there some reason you absolutely want to use sockets? Commented Jan 21, 2018 at 18:46
  • It's better use requests module for getting html Commented Jan 21, 2018 at 18:55
  • show the community the code you use socket API to retrieval www.python.org. Commented Jan 21, 2018 at 20:23
  • @MattiLyra, just been learning socket programming in python, and couldn't get why I could only get contents from some websites and not others. Commented Jan 22, 2018 at 21:52

1 Answer 1

1

K found the answer. I was supposed to indicate the path in the GET request sent to the server.

In www.python.org/about/, the www.python.org is the hostname, and the /about/ is the path. So, the string to be sent would be "GET /about HTTP...". Something like:

import socket
from urllib import parse # for separating path and hostname
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
url = parse.urlparse(link)
s.connect((url[1], 80))
msg = "GET " + link[2] + " HTTP/1.0\r\n\r\n"
s.sendall(msg)
s.recv(4096)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.