phpperformancefwritefputs

fputs slowly writing to disk


I have a php script which writes csv files to disk, this is the function:

function fputcsv_content($fp, $array, $delimiter=",", $eol="\n") {

    $line = "";
    foreach($array as $value) {
        $value = trim($value);
        $value = str_replace("\r\n", "\n", $value);
        if(preg_match("/[$delimiter\"\n\r]/", $value)) {
            $value = '"'.str_replace('"', '""', $value).'"';
        }
        $line .= $value.$delimiter;
    }
    $eol = str_replace("\\r", "\r", $eol);
    $eol = str_replace("\\n", "\n", $eol);
    $line = substr($line, 0, (strlen($delimiter) * -1));
    $line .= $eol;
    return fputs($fp, $line);
}

The server is an AWS instance, CentOS 7 and PHP version is 7.2

Server specs: 4GB RAM 32GB SWAP 2 cores, 2.5GHZ

When files are large, (3GB, 4GB) the writing process is very slow, (1MB every 2 or 3 seconds).

Is there any setting in php.ini or apache config that controls this fputs/fwrite function?

I've seen an output_buffer setting in php.ini (currently set to 4096) but I doubt it has anything to do.

Thanks!


Solution

  • Don't use .= to append a line. Use an array, add the values to the array, then implode the array. You're now filling your memory with constantly discarded strings. Every time you do .= The old string is kept on the stack, and new space is reserved for the new string, and the GC only runs when the function is ready. With a file of 3-4gb that might end up being many multiples of that, which causes the process to use swap as extra memory, which is slow.

    Try refactoring it to an array method and see if that alleviate your issues a bit, by using some memory saving techniques.

    I added in the use of static function variables so they get assigned only once, instead of each iteration, which also saves a marginal bit of memory, setting aside whichever optimisations php may or may not do.

    See it online: https://ideone.com/dNkxIE

    function fputcsv_content($fp, $array, $delimiter=",", $eol="\n") 
    {
        static $find = ["\\r","\\n"];
        static $replace = ["\r","\n"];
        static $cycles_count = 0;
        $cycles_count++;
        
        $array = array_map(function($value) use($delimiter) {
          return clean_value($value, $delimiter);
        }, $array);
        $eol = str_replace($find, $replace, $eol);
    
        $line = implode($delimiter, $array) . $eol;
        
        $return_value = fputs($fp, $line);
    
        /** purposefully free up the ram **/
        $line = null;
        $eol = null;
        $array = null;
    
        /** trigger gc_collect_cycles() every 250th call of this method **/
        if($cycles_count % 250 === 0) gc_collect_cycles();
    
        return $return_value;
    }
    
    /** Use a second function so the GC can be triggered here
      * when it returns the value and all intermediate values are free.
      */
    function clean_value($value, $delimeter) 
    {
       /**
         *  use static values to prevent reassigning the same
         *  values to the stack over and over
         */
       static $regex = []; 
       static $find = "\r\n";
       static $replace = "\n";
       static $quote = '"';
       if(!isset($regex[$delimeter])) {
          $regex[$delimeter] = "/[$delimiter\"\n\r]/";
       }
       $value = trim($value);
       $value = str_replace($find, $replace, $value);
       if(preg_match($regex[$delimeter], $value)) {
            $value = $quote.str_replace($quote, '""', $value).$quote;
       }
       return $value;
    }