phparrayscsvfgetcsvfileparsing

Parse a CSV file which may have columns out of the expected order


I use the fgetcsv() function to break a .CSV file into an array so I can insert the values into a database.

The problem is that sometimes the corporation that I download these files from change columns around which changes their array number. Here is an example of a print_r of the CSV file:

Array ( [0] => Symbol [1] => Description [2] => Qty (Quantity) [3] => Price [4] => Price Chng % (Price Change %) [5] => Price Chng (Price Change ) [6] => Day Chng % (Day Change %) [7] => Day Chng (Day Change ) [8] => Cost Basis [9] => Gain % (Gain/Loss %) [10] => Gain (Gain/Loss ) [11] => Reinvest? [12] => Reinvest Capital Gains? [13] => Last Div (Last Dividend) [14] => Volume [15] => Security Type ) 
Array ( [0] => Test1 [1] => test desc [2] => 820 [3] => 19.505 [4] => -1.84% [5] => -0.365 [6] => -1.84% [7] => -300.37 [8] => 601.73 [9] => -18.4% [10] => -607.63 [11] => No [13] => 0.72025 [14] => 7041528 [15] => ETFs & Closed End Funds ) 
Array ( [0] => Test2 [1] => test desc again [2] => 110 [3] => 49.715 [4] => -1.83% [5] => -0.925 [6] => -1.76% [7] => -98.00 [8] => 95.2 [9] => -8.78% [10] => -526.55 [11] => Yes [14] => 28668328 [15] => ETFs & Closed End Funds ) 
Array ( [0] => Test3 [1] => test desc example [2] => 740 [3] => 21.71 [4] => -3.98% [5] => -0.9 [6] => -3.93% [7] => -657.59 [8] => 242.13 [9] => -20.63% [10] => -4176.73 [11] => No [13] => 2.0216 [14] => 2759846 [15] => ETFs & Closed End Funds ) 

You can see in the second array it has titles such as $column[0] is Symbol and $column[3] is Price.

[0] => Symbol [1] => Description [2] => Qty (Quantity) [3] => Price

Can I add code so that $column[0] for example will always display the symbol name? If the corporation decides to add a column or rearrange stuff, then it will change the Price column into another one like moving it from $column[3] to $column[5]. I am guessing there is a way I can have it search for whichever key that matches a column like Volume for example, then whatever key that is it could make sure all of the keys in every array for Volume will match that key number.

I was manually just changing the array numbers when they did this, but they change it a lot so I am wondering if I can add some code to where the Symbol column would always be 0 and Price would always be in column 3 and the same applies to all of the column names. If they get rearranged they still have the same number as before?

A small example of the CSV file:

"Symbol","Description","Qty (Quantity)","Price","Price Chng % (Price Change %)","Price Chng $ (Price Change $)","Day Chng % (Day Change %)","Day Chng $ (Day Change $)","Cost Basis","Gain % (Gain/Loss %)","Gain $ (Gain/Loss $)","Reinvest?","Reinvest Capital Gains?","Last Div (Last Dividend)","Volume","Security Type"
"Test1","ETF","820","$19.39","-2.42%","-$0.48","-2.42%","-$393.6","$601.73","-18.89%","-$3701.93","No","--","$0.72025","14,626,464","ETFs & Closed End Funds"
"Test2"," ETF","110","$49.43","-2.39%","-$1.21","-2.39%","-$133.1","$995.2","-9.31%","-$557.9","Yes","--","N/A","59,351,095","ETFs & Closed End Funds"
"Test3"," ETF","760","$21.77","-3.72%","-$0.84","-3.72%","-$638.4","$687.73","-20.02%","-$142.53","No","--","$2.0216","6,402,084","ETFs & Closed End Funds"

Solution

  • If your incoming data is not reliably structured, then use a whitelist of expected columns in your desired order to filter and map the parsed data. Demo >= PHP7.4, Demo PHP7 - PHP8.3, Demo PHP5.2 - PHP8.3

    $whitelist = [
        "Symbol",
        "Description",
        "Qty (Quantity)",
        "Price",
        "Price Chng % (Price Change %)",
        "Price Chng $ (Price Change $)",
        "Day Chng % (Day Change %)",
        "Day Chng $ (Day Change $)",
        "Cost Basis",
        "Gain % (Gain/Loss %)",
        "Gain $ (Gain/Loss $)",
        "Reinvest?",
        "Reinvest Capital Gains?",
        "Last Div (Last Dividend)",
        "Volume",
        "Security Type"
    ];
    
    $result = [];
    if (($handle = fopen("file.csv", "r")) !== false) {
        $headers = fgetcsv($handle, escape: '') ?: [];
        $map = array_flip($headers);
        while (($values = fgetcsv($handle, escape: '')) !== false) {
            $row = [];
            foreach ($whitelist as $col) {
                $row[$col] = $values[$map[$col]] ?? null;
            }
            $result[] = $row;
        }
        fclose($handle);
    }
    var_export($result);
    

    File Contents:

    "Price","Bogus","Symbol","Description","Qty (Quantity)","Price Chng % (Price Change %)","Price Chng $ (Price Change $)","Day Chng % (Day Change %)","Day Chng $ (Day Change $)","Cost Basis","Gain % (Gain/Loss %)","Gain $ (Gain/Loss $)","Reinvest?","Reinvest Capital Gains?","Last Div (Last Dividend)","Volume","Security Type"
    "$19.39","foo1","Test1","ETF","820","-2.42%","-$0.48","-2.42%","-$393.6","$601.73","-18.89%","-$3701.93","No","--","$0.72025","14,626,464","ETFs & Closed End Funds"
    "$49.43","foo2","Test2"," ETF","110","-2.39%","-$1.21","-2.39%","-$133.1","$995.2","-9.31%","-$557.9","Yes","--","N/A","59,351,095","ETFs & Closed End Funds"
    "$21.77","foo3","Test3"," ETF","760","-3.72%","-$0.84","-3.72%","-$638.4","$687.73","-20.02%","-$142.53","No","--","$2.0216","6,402,084","ETFs & Closed End Funds"
    

    Output:

    array (
      0 => 
      array (
        'Symbol' => 'Test1',
        'Description' => 'ETF',
        'Qty (Quantity)' => '820',
        'Price' => '$19.39',
        'Price Chng % (Price Change %)' => '-2.42%',
        'Price Chng $ (Price Change $)' => '-$0.48',
        'Day Chng % (Day Change %)' => '-2.42%',
        'Day Chng $ (Day Change $)' => '-$393.6',
        'Cost Basis' => '$601.73',
        'Gain % (Gain/Loss %)' => '-18.89%',
        'Gain $ (Gain/Loss $)' => '-$3701.93',
        'Reinvest?' => 'No',
        'Reinvest Capital Gains?' => '--',
        'Last Div (Last Dividend)' => '$0.72025',
        'Volume' => '14,626,464',
        'Security Type' => 'ETFs & Closed End Funds',
      ),
      1 => 
      array (
        'Symbol' => 'Test2',
        'Description' => ' ETF',
        'Qty (Quantity)' => '110',
        'Price' => '$49.43',
        'Price Chng % (Price Change %)' => '-2.39%',
        'Price Chng $ (Price Change $)' => '-$1.21',
        'Day Chng % (Day Change %)' => '-2.39%',
        'Day Chng $ (Day Change $)' => '-$133.1',
        'Cost Basis' => '$995.2',
        'Gain % (Gain/Loss %)' => '-9.31%',
        'Gain $ (Gain/Loss $)' => '-$557.9',
        'Reinvest?' => 'Yes',
        'Reinvest Capital Gains?' => '--',
        'Last Div (Last Dividend)' => 'N/A',
        'Volume' => '59,351,095',
        'Security Type' => 'ETFs & Closed End Funds',
      ),
      2 => 
      array (
        'Symbol' => 'Test3',
        'Description' => ' ETF',
        'Qty (Quantity)' => '760',
        'Price' => '$21.77',
        'Price Chng % (Price Change %)' => '-3.72%',
        'Price Chng $ (Price Change $)' => '-$0.84',
        'Day Chng % (Day Change %)' => '-3.72%',
        'Day Chng $ (Day Change $)' => '-$638.4',
        'Cost Basis' => '$687.73',
        'Gain % (Gain/Loss %)' => '-20.02%',
        'Gain $ (Gain/Loss $)' => '-$142.53',
        'Reinvest?' => 'No',
        'Reinvest Capital Gains?' => '--',
        'Last Div (Last Dividend)' => '$2.0216',
        'Volume' => '6,402,084',
        'Security Type' => 'ETFs & Closed End Funds',
      ),
    )