phpmysqlcsvbyte-order-markfgetcsv

Using PHP mySQL with CSV data containing BOM


I have a database that holds stock levels for certain items that are supplied by different suppliers. Each supplier sends me a daily CSV file with their current stock levels. I am trying to update the stock levels into my database.

The problem I am having is that when I extract the data from the CSV and send it through queries, it is not being working properly.

I have echoed the queries prior to sending them and the output is fine. Using phpMyAdmin, if I just paste the code as it is echoed, it works fine. This has led me to believe that it is an encoding problem.

Viewing the CSV file in cPanel File Manager I see there is an odd character at the beginning of the file. (I believe this is caleld a BOM). If I delete this characted and save the CSV file then my code works perfectly and the databse updates as expected.

enter image description here

Editing the file in cPanel File Manager, the Encoding opens as ansi_x3.110-1983. While manually deleting the character will fix the issue, it is not an option as I want this to be a fully automated daily process.

My code to open the file and extract the data from CSV:

// Open File        
$csvData = fopen($file, "r");
       
if($csvData !== FALSE)
{
  while(!feof($csvData))
  {
      $csvRow[] = fgetcsv($csvData, 100);
  }
}

// Close file
fclose($csvData);

My code to build a simple search query

foreach($csvRow as $row)
{
  $searchQuery = "SELECT * FROM supplier WHERE supplierItemCode = '".$row[0]."'";
  $result = $conn->query($searchQuery);
  echo "<br>".$searchQuery;
  if($result->num_rows > 0)
  {
      // CODE NEVER REACHES HERE
  }

As mentioned, if I simply paste the echo of $searchQuery into phpMyAdmin and run the query it works fine.

I have tried using fseek($csvData, 2) which successfully removes the BOM characters from the first row of data, but that is having no effect.

As suggested, I have tried using

$csvData = fopen($file, "r");
$BOM = null;
if($csvData !== FALSE)
{
   $BOM = fread($csvData, 3);
   if($BOM !==  FALSE)
   {
      if($BOM != "\xef\xbb\xbf")
      {
         echo "<h5>BOM: ".$BOM; // This code is executed every time
         fseek($csvData, 0);
      }
   }
   //fseek($csvData, 2); // This was my earlier attempts without the above BOM filter
   while(!feof($csvData))
   {
      $csvRow[] = fgetcsv($csvData, 100);
   }
}

Using the BOM filter method produces this output. Sample output

As a further note, you'll notice that in my Update query output, there is a blank space in the SET quantity column. This space is not visible in the csv file.

This query is built with

$updateQuery = "UPDATE supplier SET ".$supplier." = '".$row[2]."' WHERE supplierItemCode = '".$row[0]."'";

Any suggestions on what exactly is causing this issue and how I can get around it.

Thanks in advance.


Solution

  • I finally got a solution to work. After doing a lot of investigating, I believed it was encoded in UTF-16, despite what the BOM characters may have been saying.

    I just wrote a simple function to convert each CSV value I was passing to the SQL.

    function Convert($str)
        {
            return mb_convert_encoding($str, "UTF-8", "UTF-16BE");
        }
    
    ........
    
    $updateQuery = "UPDATE supplier SET ".$supplier." = '".Convert($row[2])."' WHERE supplierItemCode = '".Convert($row[0])."'";
                    
    

    I'm not sure why the BOM was causing such issue and why removing it entirely was not working. Thanks for everyone's help that lead me to discover the encoding problem.