0

Not able to retrieve web data using socket programming in python:

import socket
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect(('data.pr4e.org',80))

cmd = 'GET http://data.pr4e.org/intro-short.txt HTTP/1.1\r\n\r\n'.encode()
mysock.send(cmd)
while True:
    data = mysock.recv(100)
    if(len(data) < 1):
        break
    print(data.decode(),end='')
mysock.close()

Error

HTTP/1.1 400 Bad Request

Date: Sat, 02 Nov 2019 08:41:58 GMT

Server: Apache/2.4.18 (Ubuntu)

Content-Length: 308

Content-Type: text/html; charset=iso-8859-1

Via: HTTP/1.1 forward.http.proxy:3128

Connection: close

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>400 Bad Request</title>
</head><body>
<h1>Bad Request</h1>
<p>Your browser sent a request that this server could not understand.<br />
</p>
<hr>
<address>Apache/2.4.18 (Ubuntu) Server at do1.dr-chuck.com Port 80</address>
</body></html> 

1 Answer 1

2

This is not a valid HTTP/1.1 request. It misses the Host header and there should only be a relative path given and not the absolute one:

  cmd = 'GET /intro-short.txt HTTP/1.1\r\nHost: data.pr4e.org\r\n\r\n'.encode()

For more information please read the HTTP standard and don't just guess how HTTP looks like. Note that HTTP is likely way more complex than you imagine. For example: even with the proper request this program will hang after it got the response.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.