Ok, first things first - there is little configuration available on this FTP; it is part of an industrial HMI control panel. I have tested via Filezilla the ability to download a file without PASV and it works fine. Here is the verbose from filezilla - you can see it takes about 1 second because we're talking kilobytes here.
Status: Starting download of /USB_Pen_Memory/Log/Data Log Trend_Ops_Data_Log_230802.txt
Trace: CFtpChangeDirOpData::Send() in state 0
Trace: CFtpChangeDirOpData::Send() in state 2
Command: CWD /USB_Pen_Memory/Log
Trace: CFtpControlSocket::OnReceive()
Response: 250 CWD command successful.
Trace: CFtpChangeDirOpData::ParseResponse() in state 2
Trace: CFtpControlSocket::ResetOperation(0)
Trace: CControlSocket::ResetOperation(0)
Trace: CFtpChangeDirOpData::Reset(0) in state 2
Trace: CFtpFileTransferOpData::SubcommandResult(0) in state 1
Trace: CFtpControlSocket::SetAsyncRequestReply
Trace: CControlSocket::SendNextCommand()
Trace: CFtpFileTransferOpData::Send() in state 5
Trace: CFtpRawTransferOpData::Send() in state 0
Trace: CFtpRawTransferOpData::Send() in state 1
Command: TYPE A
Trace: CFtpControlSocket::OnReceive()
Response: 200 Type set to A.
Trace: CFtpRawTransferOpData::ParseResponse() in state 1
Trace: CControlSocket::SendNextCommand()
Trace: CFtpRawTransferOpData::Send() in state 2
Command: PASV
Trace: CFtpControlSocket::OnReceive()
Response: 502 Invalid command.
Trace: CFtpRawTransferOpData::ParseResponse() in state 2
Trace: CControlSocket::SendNextCommand()
Trace: CFtpRawTransferOpData::Send() in state 2
Command: PORT 192,168,0,41,204,135
Trace: CFtpControlSocket::OnReceive()
Response: 200 PORT command successful.
Trace: CFtpRawTransferOpData::ParseResponse() in state 2
Trace: CControlSocket::SendNextCommand()
Trace: CFtpRawTransferOpData::Send() in state 4
Command: RETR Data Log Trend_Ops_Data_Log_230802.txt
Trace: CTransferSocket::OnAccept(0)
Trace: CTransferSocket::OnConnect
Trace: CFtpControlSocket::OnReceive()
Response: 150 Opening ASCII mode data connection for Data Log Trend_Ops_Data_Log_230802.txt.
Trace: CFtpRawTransferOpData::ParseResponse() in state 4
Trace: CControlSocket::SendNextCommand()
Trace: CFtpRawTransferOpData::Send() in state 5
Trace: CFtpControlSocket::OnReceive()
Response: 226 Transfer complete.
Trace: CFtpRawTransferOpData::ParseResponse() in state 5
Trace: CControlSocket::SendNextCommand()
Trace: CFtpRawTransferOpData::Send() in state 8
Trace: CTransferSocket::TransferEnd(1)
Trace: CFtpControlSocket::TransferEnd()
Trace: CFtpControlSocket::ResetOperation(0)
Trace: CControlSocket::ResetOperation(0)
Trace: CFtpRawTransferOpData::Reset(0) in state 8
Trace: CFtpFileTransferOpData::SubcommandResult(0) in state 7
Trace: CFtpControlSocket::ResetOperation(0)
Trace: CControlSocket::ResetOperation(0)
Trace: CFtpFileTransferOpData::Reset(0) in state 7
Status: File transfer successful, transferred 48.20 KB in 1 second
Here is my code in AWS Lambda Python version 3.10
from datetime import datetime as dt
from datetime import timedelta
from io import BytesIO
import ftplib
import logging
import os
import csv
import time
import boto3
import logging
def ftp_connection(HOST, USER, PASS):
ftp = ftplib.FTP(source_address=())
ftp.connect(HOST)
ftp.login(USER,PASS)
ftp.set_pasv(False)
ftp.set_debuglevel(1)
return ftp
def lambda_handler(event, context):
yesterday = dt.strftime(dt.today() - timedelta (days = 1), '%y%m%d')
timestamp = time.strftime("%Y%m%d%H%M%S", time.gmtime())
config_s3_bucket_name = os.environ['S3_CONFIG_LOC']
data_s3_bucket_name = os.environ['S3_DATA_LOC']
config_file_key = 'ftp_config.csv'
crawler_name = 'FTPLogs'
logging.basicConfig(level=logging.ERROR, format='%(asctime)s %(levelname)s %(name)s %(message)s')
logger=logging.getLogger(__name__)
# Step 1: Connect to S3 and download the config file
s3 = boto3.client('s3', config=boto3.session.Config(signature_version='s3v4'))
config_file_obj = s3.get_object(Bucket=config_s3_bucket_name, Key=config_file_key)
config_file_data = config_file_obj['Body'].read().decode('utf-8').splitlines()
config_reader = csv.DictReader(config_file_data)
# Step 2: Loop through each row in the config file and load latest log to s3
for row in config_reader:
ftp_site_name = row['ftp_site_name']
ftp_ip = row['ftp_ip_address']
ftp_username = row['ftp_username']
ftp_password = row['ftp_password']
file_directory = row['ftp_log_directory']
filename_convention = row['filename_convention']
conn = ftp_connection(ftp_ip, ftp_username, ftp_password)
#change the directory
conn.cwd(file_directory)
try :
with BytesIO() as output_buffer:
conn.transfercmd('RETR ' + filename_convention + yesterday + '.txt', output_buffer.write)
output_buffer.write(f',{ftp_site_name},{timestamp}\n'.encode('utf-8'))
output_buffer.seek(0)
s3_key = f'{ftp_site_name}/dt={timestamp}/{latest_csv_file}'
s3.upload_fileobj(output_buffer, Bucket=data_s3_bucket_name, Key=s3_key)
# Close file and connection
local_file.close()
conn.close()
# Upload the file from the local temporary directory to S3
s3 = boto3.client('s3')
s3_key = f'dt={timestamp}/{filename_convention}{yesterday}.txt'
s3.upload_file(local_temp_file, Bucket=data_s3_bucket_name, Key=s3_key)
#Log any error
except ftplib.all_errors as e :
logger.error(str(e))
continue
logging.shutdown()
glue = boto3.client('glue', config=boto3.session.Config(signature_version='s3v4'))
glue.start_crawler(Name=crawler_name)
return {
'statusCode': 200,
'body': 'Function executed successfully. Crawler Started'
}
Problem is it just sits:
2023-08-02T19:58:15.620-07:00 *cmd* 'CWD /USB_Pen_Memory/Log'
2023-08-02T19:58:15.675-07:00 *resp* '250 CWD command successful.'
2023-08-02T19:58:15.676-07:00 *cmd* 'PORT 169,254,76,1,132,71'
2023-08-02T19:58:15.730-07:00 *resp* '200 PORT command successful.'
2023-08-02T19:58:15.730-07:00 *cmd* 'REST <built-in method write of _io.BytesIO object at 0x7f4ff15a53a0>'
2023-08-02T19:58:15.790-07:00 *resp* '350 Restarting at 0.'
2023-08-02T19:58:15.790-07:00 *cmd* 'RETR Data Log Trend_Ops_Data_Log_230801.txt'
2023-08-02T20:00:15.274-07:00 2023-08-03T03:00:15.273Z f68b8176-a188-4a50-ae45-6c12167dcad6 Task timed out after 120.05 seconds
the PORT has caught my eye here as it stands out as different than what is shown from filezilla. I have tried retrbinary and transfercmd; both are the same. What am I doing wrong?
Edit:
Since Tim pointed out that s3 is slow, here is the version I have which attempts to download to local tmp which is seeing the same exact problem. Also I just remembered that I have two sites with different IPs and both are showing the same thing. This strange unrouteable PORT IP.
from datetime import datetime as dt
from datetime import timedelta
from io import BytesIO
import ftplib
import logging
import os
import csv
import time
import boto3
import logging
def ftp_connection(HOST, USER, PASS):
ftp = ftplib.FTP(source_address=())
ftp.connect(HOST)
ftp.login(USER,PASS)
ftp.set_pasv(False)
ftp.set_debuglevel(1)
return ftp
def lambda_handler(event, context):
yesterday = dt.strftime(dt.today() - timedelta (days = 1), '%y%m%d')
timestamp = time.strftime("%Y%m%d%H%M%S", time.gmtime())
config_s3_bucket_name = os.environ['S3_CONFIG_LOC']
data_s3_bucket_name = os.environ['S3_DATA_LOC']
config_file_key = 'ftp_config.csv'
crawler_name = 'FTPLogs'
logging.basicConfig(level=logging.ERROR, format='%(asctime)s %(levelname)s %(name)s %(message)s')
logger=logging.getLogger(__name__)
# Step 1: Connect to S3 and download the config file
s3 = boto3.client('s3', config=boto3.session.Config(signature_version='s3v4'))
config_file_obj = s3.get_object(Bucket=config_s3_bucket_name, Key=config_file_key)
config_file_data = config_file_obj['Body'].read().decode('utf-8').splitlines()
config_reader = csv.DictReader(config_file_data)
# Step 2: Loop through each row in the config file
for row in config_reader:
ftp_site_name = row['ftp_site_name']
ftp_ip = row['ftp_ip_address']
ftp_username = row['ftp_username']
ftp_password = row['ftp_password']
file_directory = row['ftp_log_directory']
filename_convention = row['filename_convention']
conn = ftp_connection(ftp_ip, ftp_username, ftp_password)
#change the directory
conn.cwd(file_directory)
try :
# Define the local temporary directory
local_temp_directory = '/tmp'
os.makedirs(local_temp_directory, exist_ok=True)
# Define the local temporary file path
local_temp_file = os.path.join(local_temp_directory, f'{filename_convention}{yesterday}.txt')
# Download the file to the local temporary directory
with open(local_temp_file, 'wb') as local_file:
conn.retrbinary(f'RETR {filename_convention}{yesterday}.txt', local_file.write)
local_file.write(f',{ftp_site_name},{timestamp}\n'.encode('utf-8'))
# Close file and connection
local_file.close()
conn.close()
# Upload the file from the local temporary directory to S3
s3 = boto3.client('s3')
s3_key = f'dt={timestamp}/{filename_convention}{yesterday}.txt'
s3.upload_file(local_temp_file, Bucket=data_s3_bucket_name, Key=s3_key)
#Log any error
except ftplib.all_errors as e :
logger.error(str(e))
continue
logging.shutdown()
glue = boto3.client('glue', config=boto3.session.Config(signature_version='s3v4'))
glue.start_crawler(Name=crawler_name)
return {
'statusCode': 200,
'body': 'Function executed successfully. Crawler Started'
}
EDIT 3:
Yes I used prints because... that is really all.
# Download the file to the local temporary directory
with open(local_temp_file, 'wb') as local_file:
print('opened local file')
conn.retrbinary(f'RETR {filename_convention}{yesterday}.txt', local_file.write)
print('{filename_convention}{yesterday} written to local tmp')
local_file.write(f',{ftp_site_name},{timestamp}\n'.encode('utf-8'))
print('{{ftp_site_name} and {timestamp} written to local file')
# Close file and connection
local_file.close()
conn.close()
print('local file and connection closed')
# Upload the file from the local temporary directory to S3
s3 = boto3.client('s3')
s3_key = f'dt={timestamp}/{filename_convention}{yesterday}.txt'
s3.upload_file(local_temp_file, Bucket=data_s3_bucket_name, Key=s3_key)
Logs from cloudwatch:
2023-08-02T21:31:02.647-07:00 *cmd* 'CWD /SD1/Log'
2023-08-02T21:31:02.795-07:00 *resp* '250 CWD command successful.'
2023-08-02T21:31:02.795-07:00 opened local file
2023-08-02T21:31:02.795-07:00 *cmd* 'TYPE I'
2023-08-02T21:31:02.907-07:00 *resp* '200 Type set to I.'
2023-08-02T21:31:02.907-07:00 *cmd* 'PORT 169,254,76,1,145,9'
2023-08-02T21:31:03.062-07:00 *resp* '200 PORT command successful.'
2023-08-02T21:31:03.062-07:00 *cmd* 'RETR Data Log Trend_Ops_Data_Log_230802.txt'
It never gets to even writing the local file. I hope that is enough for @TimRoberts
Latest:
I added conn.sendcmd('SITE CHMOD 777 ' + filename_convention + yesterday + '.txt')
to my script to test and it started working locally. I added the same script to AWS and there is STILL the only difference being the following:
Local:
*cmd* 'PORT 192,168,0,41,250,76'
AWS
*cmd* 'PORT 169,254,76,1,156,157'
This might mean there is some kind of networking setup issue on the AWS side that I need to research.
Ultimately the answer is this is a convergence of the perfect edge case for not working.
So, if anyone ever manages to have this perfect storm again here is what I ended up doing:
I am working on another project for the same site that involves a RaspberryPi device. I moved this script, with a few other pieces, to the Pi and will be using it to retrieve the logs and then send to S3.
As for AWS options I did not explore but may have worked: