pythonftpftplibnoaa

Accessing NOAA FTP server in Python


I am trying to access NOAA FTP server to download multiple datasets. There are 365 files per year for daily data, manual downloading is little cumbersome. I tried to use ftplib, but got:

gaierror: [Errno 11001] getaddrinfo failed

Below is my code snippet:

from ftplib import FTP
ftp = FTP("https://gml.noaa.gov/aftp/data/radiation/surfrad/Boulder_CO/2020/")
ftp.login()

# Get all files
files = ftp.nlst()

# Print out the files:
for file in files:
    print("Downloading..." + file)
    ftp.retrbinary("RETR" + file, open("..../NOAA/surfrad/Boulder_CO/2020/" + file, 'wb').write)
ftp.close()

Any help on this one would be grateful. Also I tried to ping the server, and it only return signal when using:

ping gml.noaa.gov

When I tried to ping on full ftp link:

ping https://gml.noaa.gov/aftp/data/radiation/surfrad/Boulder_CO/2020

it doesn't. Not sure why is that.

The full traceback is:

---------------------------------------------------------------------------
gaierror                                  Traceback (most recent call last)
<ipython-input-102-ea6ae149ac16> in <module>
      1 start = datetime.now()
----> 2 ftp = FTP("ftp://aftp.cmdl.noaa.gov/data/radiation/surfrad/Boulder_CO/2020")
      3 # ftp.login('your-username', 'your-passwor')
      4 ftp.login()
      5 

c:\users\smnge\anaconda3\envs\dlgpu\lib\ftplib.py in __init__(self, host, user, passwd, acct, timeout, source_address)
    115         self.timeout = timeout
    116         if host:
--> 117             self.connect(host)
    118             if user:
    119                 self.login(user, passwd, acct)

c:\users\smnge\anaconda3\envs\dlgpu\lib\ftplib.py in connect(self, host, port, timeout, source_address)
    150             self.source_address = source_address
    151         self.sock = socket.create_connection((self.host, self.port), self.timeout,
--> 152                                              source_address=self.source_address)
    153         self.af = self.sock.family
    154         self.file = self.sock.makefile('r', encoding=self.encoding)

c:\users\smnge\anaconda3\envs\dlgpu\lib\socket.py in create_connection(address, timeout, source_address)
    705     host, port = address
    706     err = None
--> 707     for res in getaddrinfo(host, port, 0, SOCK_STREAM):
    708         af, socktype, proto, canonname, sa = res
    709         sock = None

c:\users\smnge\anaconda3\envs\dlgpu\lib\socket.py in getaddrinfo(host, port, family, type, proto, flags)
    750     # and socket type values to enum constants.
    751     addrlist = []
--> 752     for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
    753         af, socktype, proto, canonname, sa = res
    754         addrlist.append((_intenum_converter(af, AddressFamily),

gaierror: [Errno 11001] getaddrinfo failed

Solution

  • The link you posted was a website link, not an FTP link.

    However, this would work at the start of your script:

    from ftplib import FTP
    ftp = FTP("ftp.gml.noaa.gov")
    ftp.login()
    ftp.cwd('data/radiation/surfrad/Boulder_CO/2020')
    
    # Get all files
    files = ftp.nlst()
    
    # etc ...
    

    Note that the https:// is gone, ftp. has been added to the start of the domain and the path is changed with a separate command, missing the aftp/ root.

    The https:// was simply a mistake, it clearly indicates the URI as being a website URL, to be retrieved using HTTPS.

    The ftp. at the start of the domain was just a guess, but it's a very common convention to host an FTP server at ftp.example.com, just like you'd use to see www.example.com for websites (and still do).

    Removing the aftp/ was another guess, after the site didn't allow changing into that folder, but since the URL was a website, it made sense to assume the aftp folder was really just the root for anonymous FTP, which is what you are doing - logging in without credentials.

    A working solution:

    from ftplib import FTP
    from pathlib import Path
    
    ftp = FTP("ftp.gml.noaa.gov")
    ftp.login()
    ftp.cwd('data/radiation/surfrad/Boulder_CO/2020')
    
    # Get all files
    files = ftp.nlst()
    
    # Download all the files to C:\Temp
    for file in files:
        print("Downloading..." + file)
        ftp.retrbinary(f'RETR {file}', open(str(Path(r'C:\Temp') / file), 'wb').write)
    ftp.close()
    

    Or, if you don't like the complication of pathlib:

        ftp.retrbinary(f'RETR {file}', open(rf'C:\Temp\{file}', 'wb').write)