pythonsocketspython-requestsnetwork-programminghttp-1.1

Python Socket GET request from URL not understood with correct format?


Im trying to set up a http request for an assignment to connect to a webserver and count how many times a given word occurs in the page. Im working on the first half of this and every time i try to send the request for the header information with the last modified date it give back a 400 bad request error

import socket,sys 

client_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_address = ('162.246.156.195', 80)
client_socket.connect(server_address)

request = ''' GET / tests/a.html HTTP/1.1
HOST: 162.246.156.195
IF-MODIFIED-SINCE: <>
Conncection: keep-alive

''';
client_socket.send(request.encode())
mod_request = client_socket.recv(2048).decode()
print(mod_request)
client_socket.close()

this is what i get back,

HTTP/1.1 400 Bad Request
Date: Sat, 08 Feb 2020 20:53:36 GMT
Server: Apache/2.4.29 (Ubuntu)
Content-Length: 328
Connection: close
Content-Type: text/html; charset=iso-8859-1

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>400 Bad Request</title>
</head><body>
<h1>Bad Request</h1>
<p>Your browser sent a request that this server could not understand.<br />
</p>
<hr>
<address>Apache/2.4.29 (Ubuntu) Server at 2605:fd00:4:1000:f816:3eff:fe1e:9b1a Port 80</address>
</body></html>

Im currently stuck here and i can only utilize the socket and sys modules so other third party libraries assist here. If anyone can point out where i went wrong it would be greatly appreciated and if anyone has any tips to count the words. Thanks in advance!


Solution

  • ... not understood with correct format?

    This isn't a correct format, i.e. your initial assumption when asking the question is wrong. And the server is correct in treating that as a bad request.

    request = ''' GET / tests/a.html HTTP/1.1
    HOST: 162.246.156.195
    IF-MODIFIED-SINCE: <>
    Conncection: keep-alive
    
    '''
    

    First, your request starts with <space>GET ... instead of GET .... Then you have a space in the path component, i.e. /<space>test/a.html. Then you use a simple newline (\n) instead of \r\n as line delimiter. And the time given inside the If-Modified-Since field is invalid. And your Connection field has a typo in the field name.

    Please note that HTTP is way more complex then you think, it looks simple since it is just text based but it has many pitfalls and details one need to know. If you don't want to you an existing library to handle the complexity please read the actual HTTP standard (which is long) and don't guess how HTTP works from a few examples you've seen.