phpcsvtext-parsingsanitization

Sanitize a CSV string containing double quoted values


I am trying to parse and sanitize a dynamic CSV string ($line) which has quoted values.

echo "<pre>", print_r($line, 1), "</pre>";

The string is in this format:

"AARON, ELVIA J",WATER RATE TAKER,WATER MGMNT,$81000.00,$73862.00

I need to change the string to remove commas, spaces and quotes from quoted values so that it looks like this:

AARONELVIAJ,WATER RATE TAKER,WATER MGMNT,$81000.00,$73862.00

What I have done:

$re1 = '(")';   # Any Single Character 1
$re2 = '((?:[a-z][a-z]+))'; # Word 1
$re3 = '(,)';   # Any Single Character 2
$re4 = '(\\s+)';    # White Space 1
$re5 = '((?:[a-z][a-z]+))'; # Word 2
$re6 = '(\\s+)';    # White Space 2
$re7 = '.*?';   # Non-greedy match on filler
$re8 = '(")';

$reg1 = "/" . $re1 . $re2 . "/";
$reg2 = "/" . $re3 . $re4 . "/";
$reg3 = "/" . $re5 . $re6 . $re7 . $re8 . "/";
$line = preg_replace("/" . '($reg1)$reg2($reg3)' . "/", "$1$2", $line); //this is also generating the same array "AARON,  ELVIA J",WATER RATE TAKER,WATER MGMNT,$81000.00,$73862.00
echo "<pre>", print_r($line, 1), "</pre>";
        
$pattern = "/" . $re1 . $re2 . $re3 . $re4 . $re5 . $re6 . $re7 . $re8 . "/";
$replacement = "/" . $re2 . $re5 . $re7 . "/";

$values = preg_replace($pattern, $replacement, $line);
$values = explode(',',$line);
echo "<br>";
$values = preg_replace('/[^A-Za-z0-9\-]/', '', $values);
$values = implode(',',$values);
echo "<pre>", print_r($values), "</pre>";
echo "<pre>", print_r($values, 1), "</pre>";

What I am getting is like this:

AARON,ELVIAJ,WATERRATETAKER,WATERMGMNT,8100000,73862001

I need the internal comma from the quote-wrapped value to be removed.


Solution

  • Try This

        $line = '"AARON,  ELVIA J",WATER RATE TAKER,WATER MGMNT,$81000.00,$73862.00';
        $pieces = explode('"', $line);
    
        $result = '';
        foreach ($pieces as $value) {
        if(substr( $value, 0, 1 ) === "," || substr($value, -1) == ','){
            $result .= $value;
        }else{
            $value = str_replace(',', '', $value);
            $result .= str_replace(' ', '', $value);
        }
    }
    echo $result;