pythoncsvdictionarysetdefault

Parsing CSV using Python


I have the following csv file that has three fields Vulnerability Title, Vulnerability Severity Level , Asset IP Address which shows vulnerabilities name , level of vulnerability and IP address that is having that vulnerability. I am trying to print a report that would list vulnerability in a column severity next to it and last column list of IP address having that vulnerability.

Vulnerability Title Vulnerability Severity Level    Asset IP Address
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566)   4   10.103.64.10
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566)   4   10.103.64.10
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566)   4   10.103.65.10
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566)   4   10.103.65.164
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566)   4   10.103.64.10
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566)   4   10.10.30.81
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566)   4   10.10.30.81
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566)   4   10.10.50.82
TLS/SSL Server Supports Weak Cipher Algorithms  6   10.103.65.164
Weak Cryptographic Key  3   10.103.64.10
Unencrypted Telnet Service Available    4   10.10.30.81
Unencrypted Telnet Service Available    4   10.10.50.82
TLS/SSL Server Supports Anonymous Cipher Suites with no Key Authentication  6   10.103.65.164
TLS/SSL Server Supports The Use of Static Key Ciphers   3   10.103.64.10
TLS/SSL Server Supports The Use of Static Key Ciphers   3   10.103.65.10
TLS/SSL Server Supports The Use of Static Key Ciphers   3   10.103.65.100
TLS/SSL Server Supports The Use of Static Key Ciphers   3   10.103.65.164
TLS/SSL Server Supports The Use of Static Key Ciphers   3   10.103.65.164
TLS/SSL Server Supports The Use of Static Key Ciphers   3   10.103.64.10
TLS/SSL Server Supports The Use of Static Key Ciphers   3   10.10.30.81

and I would like to recreate a csv file that uses Vulnerability Title tab as the key and creates a second tab called Vulnerability Severity Level and last tab would contain all the ip addresses that has the vulnerabilities

import csv
from pprint import pprint
from collections import defaultdict
import glob
x= glob.glob("/root/*.csv")

d = defaultdict()
n = defaultdict()
for items in x:
        with open(items, 'rb') as f:
                reader = csv.DictReader(f, delimiter=',')
                for row in reader:
                        a = row["Vulnerability Title"]
                        b = row["Vulnerability Severity Level"], row["Asset IP Address"]
                        c = row["Asset IP Address"]
        #               d = row["Vulnerability Proof"]
                        d.setdefault(a, []).append(b)
        f.close()
pprint(d)
with open('results/ipaddress.csv', 'wb') as csv_file:
        writer = csv.writer(csv_file)
        for key, value in d.items():
                for x,y in value:
                        n.setdefault(y, []).append(x)
#                       print x
                        writer.writerow([key,n])

with open('results/ipaddress2.csv', 'wb') as csv2_file:
        writer = csv.writer(csv2_file)
        for key, value in d.items():
             n.setdefault(value, []).append(key)
             writer.writerow([key,n])

Since I cant explain very well. let me try to simplify

lets say I have the following csv

Car model   owner
Honda   Blue    James
Toyota  Blue    Tom
Chevy   Green   James
Chevy   Green   Tom

I am trying to create this csv as the following:

Car model   owner
Honda   Blue    James
Toyota  Blue    Tom
Chevy   Green   James,Tom

both of the solutions are correct. here is my final script as well

import csv
import pandas as pd

df = pd.read_csv('test.csv', names=['Vulnerability Title', 'Vulnerability Severity Level','Asset IP Address'])
#print df
grouped = df.groupby(['Vulnerability Title','Vulnerability Severity Level'])

groups = grouped.groups
#print groups
new_data = [k + (v['Asset IP Address'].tolist(),) for k, v in grouped]
new_df = pd.DataFrame(new_data, columns=['Vulnerability Title' ,'Vulnerability Severity Level', 'Asset IP Address'])

print new_df
new_df.to_csv('final.csv')

thank you


Solution

  • When manipulate structured date, especially large data set. I would like to suggest you to use pandas.

    For your problem, I will give you an example of pandas groupby feature as solution. Suppose you have the data:

    data = [['vt1', 3, '10.0.0.1'], ['vt1', 3, '10.0.0.2'], 
            ['vt2', 4, '10.0.10.10']]
    

    The pandas to operate date is very fensy:

    import pandas as pd
    
    df = pd.DataFrame(data=data, columns=['title', 'level', 'ip'])
    grouped = df.groupby(['title', 'level'])
    

    Then

    groups = grouped.groups
    

    will be a dict that is almost you need.

    print(groups)
    {('vt1', 3): [0, 1], ('vt2', 4): [2]}
    

    [0,1] represents the row label. Actually you can iterate on these groups to apply any operation you want. For example, If you want to save them into csv file:

    new_data = [k + (v['ip'].tolist(),) for k, v in grouped]
    new_df = pd.DataFrame(new_data, columns=['title', 'level', 'ips'])
    

    Let's see what is new_df now:

      title  level                   ips
    0   vt1      3  [10.0.0.1, 10.0.0.2]
    1   vt2      4          [10.0.10.10]
    

    That's what you need. And finally, save to file:

    new_df.to_csv(filename)
    

    I strongly suggest that you should learn pandas data manipulation. You may find that was much easier and cleaner.