pythonpython-3.xpandasexcel-automation

How to code the exception for a column in excel using pandas?


Sample data:

|   | Status                  | Failed | In Progress | Passed | Untested |
|---|-------------------------|--------|-------------|--------|----------|
| 2 | P0 Dry Run - 13/02/18   | 2.0    |             | 143.0  | 5.0      |
| 3 | P1 Test Plan - 06/02/18 | 4.0    |             | 247.0  | 367.0    |
| 4 | P2 Test plan - 03/01/18 | 22.0   | 2.0         | 496.0  | 54.0     |

Code:

msft = pd.read_csv("C:\\Users\\gomathis\\Downloads\\week_071.csv") 
msft = msft[['Passed', 'Failed', 'Blocked', 'In Progress', 'Not_Implemented', 'Not Applicable', 'Clarification Opened', 'Untested']]
msft.to_csv("C:\\Users\\gomathis\\Downloads\\week_072.csv")

Error:

KeyError: "['Blocked'] not in index"

Expected result:

I need an exception for a column which may not be available now but in future it may come. So help me accordingly to solve this.


Solution

  • Use the csv.DictReader.fieldnames attribute, figure out what columns are present in your CSV, and then find the intersection of those.

    First, specify the columns you want.

    columns = ['Passed', 
               'Failed', 
               'Blocked', 
               'In Progress', 
               'Not_Implemented', 
               'Not Applicable', 
               'Clarification Opened', 
               'Untested']
    
    path = "C:\\Users\\gomathis\\Downloads\\week_071.csv"   # we'll use this later
    

    Next, use the csv.DictReader to read the CSV's headers (this does NOT read the entire file!).

    import csv
    with open(path, 'r') as f:
        reader = csv.DictReader(f)
        df_columns = reader.fieldnames
    

    Now, find the set intersection, and pass it to usecols in pd.read_csv:

    df = pd.read_csv(path, usecols=set(columns).intersection(df_columns))
    

    Finally, to fill in missing columns, take the set difference and call df.assign:

    df = df.assign(**dict.fromkeys(set(columns).difference(df_columns), np.nan))