phpregexfunctionline-breakseol

Finding all types of end-of-line delimiters in a file


I have the following function to get all of the different types of end-of-line delimiters in a file. There may be one or more, so I want to return an array of all types.

function ddtt_get_file_eol( $file_contents, $incl_code = true ) {
    $types = [
        '\r\n',
        '\n',
        '\r'
    ];
    $found = [];
    foreach ( $types as $type ) {
        if ( $type == '\r\n' ) {
            $regex = "/\r\n/";
        } elseif ( $type == '\n' ) {
            $regex = "/(?<!\r)\n/";
        } else {
            $regex = "/\r(?!\n)/";
        }
        if ( preg_match( $regex, $file_contents ) ) {
            $found[] = ( $incl_code ) ? '<code class="hl">'.$type.'</code>' : $type;
        }
    }
    return $found;
} // End ddtt_get_php_eol()

The problem I am having is that it is recognizing \r\n as two separate types and outputting [ '\n', '\r' ]. I want to output [ '\r\n' ] if it is just using that type, or [ '\r\n', '\n' ] if using both types, etc. How do I modify my code to correctly fetch all types used?


Solution

  • Let me guess, you are a developer who wants perfect identification of newline sequences regardless of the environment AND you want to keep all of your hair?

    PHP has had a solution for this for a long time and it doesn't involve Minoxidil; just use \R. I'll replace each newline sequence with an asterisk to show how it reliably respects all possible newline sequences across all environments and treats them as whole newline sequences whenever appropriate.

    Code: (Demo)

    $inputs = [
        'Windows' => "Dog\r\nCat\r\nMouse",
        'Linux' => "Bicycle\nCar\nTrain\nAirplane",
        'Mac' => "iPhone\riPod\rMacBook",
        'Win + Linux' => "int main() {\n   return 0;\r\n}\n",
        'All mixed up' => "This is a Windows new line\r\n, followed by a Linux new line\n and finally an old Mac with a single carriage return\rat the end",
    ];
    
    var_export(
        preg_replace('/\R/', '*', $inputs)
    );
    

    Output:

    array (
      'Windows' => 'Dog*Cat*Mouse',
      'Linux' => 'Bicycle*Car*Train*Airplane',
      'Mac' => 'iPhone*iPod*MacBook',
      'Win + Linux' => 'int main() {*   return 0;*}*',
      'All mixed up' => 'This is a Windows new line*, followed by a Linux new line* and finally an old Mac with a single carriage return*at the end',
    )
    

    If you need an array of newline sequences, just use preg_match_all() with the same, lone pattern. Demo

    foreach ($inputs as $env => $input) {
        preg_match_all('/\R/', $input, $matches);
        var_dump(
            $env,
            json_encode($matches[0])
        );
    }
    

    Relevant reading on implementations of \R: