Today, I was handed a 300 million entry csv file of netflow records, and my objective is to convert the netflow data to synthesized packets by any means necessary. After a bit of researching, I've decided Scapy would be an incredible tool for this process. I've been fiddling with some of the commands and attempting to create accurate packets that depict that netflow data, but I'm struggling and would really appreciate help from someone whose dabbled with Scapy before.
Here is an example entry from my dataset:
1526284499.233,1526284795.166,157.239.11.35,41.75.41.198,443,55915,6,1,24,62,6537,1419,1441,32934,65535,
Below is what each comma separated value represents:
Start Timestamp (Epoch Format): 1526284499.233
End Timestamp (Epoch Format): 1526284795.166
Source IP: 157.239.11.35
Destination IP: 41.75.41.198
IP Header Protocol Number: 443 (HTTPS)
Source Port Number: 55915
Destination Port Number: 6 (TCP)
TOS Value in IP Header: 1 (FIN)
TCP Flags: 24 (ACK & PSH)
Number of Packets: 62
Number of Bytes: 6537
Router Ingress Port: 1419
Router Egress Port: 1441
Source Autonomous System: 32934 (Facebook)
Destination Autonomous System: 65535
My Current Scapy Representation of this Entry:
>>> size = bytes(6537)
>>> packet = IP(src="157.240.11.35", dst="41.75.41.200", chksum=24, tos=1, proto=443) / TCP(sport=55915, dport=6, flags=24) / Raw(size)
packet.show():
###[ IP ]###
version= 4
ihl= None
tos= 0x1
len= None
id= 1
flags=
frag= 0
ttl= 64
proto= 443
chksum= 0x18
src= 157.240.11.35
dst= 41.75.41.200
\options\
###[ TCP ]###
sport= 55915
dport= 6
seq= 0
ack= 0
dataofs= None
reserved= 0
flags= PA
window= 8192
chksum= None
urgptr= 0
options= []
###[ Raw ]###
load= '6537'
My Confusion:
Frankly, I'm not sure if this is right. Where I get confused is that the IP Protocol Header is 443, indicating HTTPS, however the destination port is 6, indicating TCP. Therefore, I'm not sure if I should include TCP or not, or if including the proto IP attribute is gratuitous. Furthermore, I'm not sure if Raw() is the correct way to include the size of each packet, let alone if I defined size in a proper manner.
Please be so kind as to let me know where I've gone wrong, or if I actually miraculously created a perfect synthesized packet for this particular entry. Thank you so much!
I think the columns might be wrong. HTTPS is TCP port 443 (usually), so the protocol number should be 6 (TCP) and one of the ports should be 443. My GUESS is that 443 is the source port, since the source IP belongs to Facebook, making 55915 the destination port. So, I think the columns there go: source IP, dest IP, source port, dest port, protocol.