phpforeachiterationfputcsv

Weirdness with foreach() iteration of array data


I have a csv file on the webserver (eg. 5GDPR6LR-1.csv)

The csv file consists of data from an associative array written to file using fputcsv() within a foreach() loop.

The problem occurs when I try to write a third row to the csv file. Somehow, the second time the foreach() iterates the array it just uses the data from the first array row instead of the second row. Then it happily continues at the third row again.

Here's the code doing the iteration and writing to file (note: I added the $row !== $lastrow check to prevent the duplicate rows that this problem was creating, but this check shouldn't be necessary):

<?php
$userID = $_POST['UserID'];
$batchID = $_POST['BatchID'];
$batchItemID = $_POST['BatchItemID'];
$comment = $_POST['UserComment'];
$categories = isset($_POST['Categories']) ? $_POST['Categories'] : [];

echo '<pre>POST:<br>'; print_r($_POST); echo '</pre><hr>';

$responseFile = "data/responses/{$userID}-{$batchID}.csv";
$exists = file_exists($responseFile);
$data = [];

if ($exists) {
    $rows = array_map('str_getcsv', file($responseFile));
    $headers = array_shift($rows);
    $data = array_map(fn($r) => array_combine($headers, $r), $rows);
    echo '<pre>EXISTING FILE DATA:<br>'; print_r($data); echo '</pre><hr>';
} else {
    $headers = ['UserID', 'BatchID', 'BatchItemID', 'Categories', 'UserComment'];
}

$found = false;
foreach ($data as &$row) {
    if ($row['UserID'] === $userID && $row['BatchID'] === $batchID && $row['BatchItemID'] === $batchItemID) {
        $row['Categories'] = json_encode($categories);
        $row['UserComment'] = htmlspecialchars($comment, ENT_QUOTES, 'UTF-8');
        $found = true;
        break;
    }
}

// echo 'Found:' . ($found ? 'Y' : 'N') . '<br/>';

if (!$found) {
    $data[] = [
        'UserID' => $userID,
        'BatchID' => $batchID,
        'BatchItemID' => $batchItemID,
        'Categories' => json_encode($categories),
        'UserComment' => htmlspecialchars($comment, ENT_QUOTES, 'UTF-8')
    ];
}

echo '<pre>ARRAY TO WRITE TO FILE:<br>'; print_r($data); echo '</pre><hr>';

$fp = fopen($responseFile, 'w');
fputcsv($fp, $headers, ",", "\"", "\\", "\n");
$lastrow = array();
foreach ($data as $row) {
    echo '<pre>LASTROW:'; print_r($lastrow); echo '</pre>';
    echo '<pre>THISROW:'; print_r($row); echo '</pre>';

    if ($row !== $lastrow) { 
        fputcsv($fp, $row, ",", "\"", "\\", "\n"); 
        echo '<pre>ROW WRITTEN TO CSV:'; print_r($row); echo '</pre>';
    }
    $lastrow = $row;
}
fclose($fp);
exit();

This is the data log. Form Data is received via POST array. This data should either add a new row to the csv file or update an existing row if it already exists. In this run I'm trying to add the third entry to the csv file that already contains the first two. The result of this run is that the first and third row are stored in the csv file and the second row is skipped...

POST:
Array
(
    [Categories] => Array
        (
            [0] => 5
            [1] => 6
        )

    [BatchID] => 1
    [UserID] => 5GDPR6LR
    [BatchItemID] => 3
    [UserComment] => third comment
)

------------

EXISTING FILE DATA:
Array
(
    [0] => Array
        (
            [UserID] => 5GDPR6LR
            [BatchID] => 1
            [BatchItemID] => 1
            [Categories] => ["1","2"]
            [UserComment] => first comment
        )

    [1] => Array
        (
            [UserID] => 5GDPR6LR
            [BatchID] => 1
            [BatchItemID] => 2
            [Categories] => ["3","4"]
            [UserComment] => second comment
        )

)

------------

ARRAY TO WRITE TO FILE:
Array
(
    [0] => Array
        (
            [UserID] => 5GDPR6LR
            [BatchID] => 1
            [BatchItemID] => 1
            [Categories] => ["1","2"]
            [UserComment] => first comment
        )

    [1] => Array
        (
            [UserID] => 5GDPR6LR
            [BatchID] => 1
            [BatchItemID] => 2
            [Categories] => ["3","4"]
            [UserComment] => second comment
        )

    [2] => Array
        (
            [UserID] => 5GDPR6LR
            [BatchID] => 1
            [BatchItemID] => 3
            [Categories] => ["5","6"]
            [UserComment] => third comment
        )

)

------------
HERE IS THE ITERATION OF THE ABOVE ARRAY DATA...
------------

LASTROW:Array
(
)

THISROW:Array
(
    [UserID] => 5GDPR6LR
    [BatchID] => 1
    [BatchItemID] => 1
    [Categories] => ["1","2"]
    [UserComment] => first comment
)

ROW WRITTEN TO CSV:Array
(
    [UserID] => 5GDPR6LR
    [BatchID] => 1
    [BatchItemID] => 1
    [Categories] => ["1","2"]
    [UserComment] => first comment
)

LASTROW:Array
(
    [UserID] => 5GDPR6LR
    [BatchID] => 1
    [BatchItemID] => 1
    [Categories] => ["1","2"]
    [UserComment] => first comment
)

THISROW:Array     '<<<<----- HERE IS THE PROBLEM (IN THIS EXAMPLE) IT HASNT SKIPPED TO ARRAY ROW 2!!!'
(
    [UserID] => 5GDPR6LR
    [BatchID] => 1
    [BatchItemID] => 1
    [Categories] => ["1","2"]
    [UserComment] => first comment
)

LASTROW:Array
(
    [UserID] => 5GDPR6LR
    [BatchID] => 1
    [BatchItemID] => 1
    [Categories] => ["1","2"]
    [UserComment] => first comment
)

THISROW:Array
(
    [UserID] => 5GDPR6LR
    [BatchID] => 1
    [BatchItemID] => 3
    [Categories] => ["5","6"]
    [UserComment] => third comment
)

ROW WRITTEN TO CSV:Array
(
    [UserID] => 5GDPR6LR
    [BatchID] => 1
    [BatchItemID] => 3
    [Categories] => ["5","6"]
    [UserComment] => third comment
)

Any ideas what could be causing this?

I was experiencing this issue on my dev server so moved it to a live server but the same issue occurs.

It's a basic script. I'm lost as to what to check.


Solution

  • Always unset($row) after a foreach (… as &$row)

    A reference to a variable, that can be a nice way to modify the items of an array in-place, should always be taken with caution, as pointed through random references (pun originally not intended on "references" but now it's there).

    As they were introduced at a time to mimic objects that PHP did not have yet, they always had a (somewhat merited) reputation of hacky and dangerous feature.

    Life-saving reflex
    The why

    foreach ($data as &$row) creates a reference named $row, successively pointing to each item of $data.
    However, it you do not unset($row); after it, it keeps its reference nature, still pointing to the last place it walked through (like a dangling pointer), that is, $data[1] ("second comment"),
    and any non-reference use of it will simply work on the contents pointed by the reference.

    Thus when starting the second loop without an &:

    unset($row); makes sure PHP forgets everything about $row: the next time it is set (either by an = or a foreach (… as $row)) it will be re-created from scratch, by default as a conventional variable not pointing anymore to the contents of $data.

    Naming conventions can help

    One additional visual hint I use: to make sure none of my references do bleed, in addition to unsetting them after use, I prefix them with ptr, for example foreach ($data as &$ptrRow): not only does the name hints on its nature, but I'm pretty sure I won't have the dumb idea to then reuse that typed (and harder to type) name for a conventional variable.

    As for function calls: I've never understood why in PHP 5.3 they removed the call-time pass-by-reference, which made obvious that a function intended to modify the contents of its parameters.
    To mitigate this loss in readability, I like to write the & back as a comment when calling such a function, e.g.:

    $table = [ … ];
    usort(/*&*/$table, …); // There it's clear that $table's contents gets changed.