pythonnetwork-programmingweb-client

web client in python not working


I wrote a python code that will fetch data from web servers by connecting to port 80 and sending GET http request. But this does no give me the data of the web page instead it gives me a html code saying 'The web page has moved'.

Please help me in this

Below is the code and a sample output

import socket

def web_client():
    host=str(input("\nEnter the site from which you want to recieve data \n\n -> "))
    port=80
    s=socket.socket()
    ip=socket.gethostbyname(host)
    s.connect((ip, port))
    print("\nconnection successful with "+ str(host)+" on ip "+str(ip))
    msg="GET / HTTP/1.1\r\n\r\n"
    encoded_msg=bytes(msg, "utf-8")
    s.send(encoded_msg)
    data=s.recv(2048)
    decoded_data=data.decode("utf-8")
    print("\n"+decoded_data)

web_client()

The output I get when I type 'www.google.com' is given below

Enter the site from which you want to recieve data 

 -> www.google.com

connection successful with www.google.com on ip 216.58.220.36

HTTP/1.1 302 Found
Cache-Control: private
Content-Type: text/html; charset=UTF-8
Location: http://www.google.co.in/?gfe_rd=cr&ei=k09IVbiMKq_v8wez3oGICw
Content-Length: 261
Date: Tue, 05 May 2015 05:05:23 GMT
Server: GFE/2.0
Alternate-Protocol: 80:quic,p=1

<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>302 Moved</TITLE></HEAD><BODY>
<H1>302 Moved</H1>
The document has moved
<A HREF="http://www.google.co.in/?gfe_rd=cr&amp;ei=k09IVbiMKq_v8wez3oGICw">here</A>.
</BODY></HTML>

Solution

  • Google.com tries to redirect you to regional domain. socket package doesn't support HTTP-redirects (you should implement them yourself). The simplest solution is to install Requests library:

    pip install requests
    

    It's really easy to make HTTP-requests with this library:

    import requests
    site = raw_input("\nEnter the site from which you want to receive data \n\n -> ")
    r = requests.get(site, allow_redirects=True)
    print r.headers
    print r.content