phpregexstringtext-parsingfileparsing

Parse the lines of a predictably formatted text file


I am trying extract some formatted info from files.

Sample data

2011/09/20  00:57       367,044,608 S1E04 - Cancer Man.avi
2012/03/12  03:01       366,991,496 Family Guy - S09E01 - And Then There Were Fewer.avi
2012/03/25  00:27        53,560,510 Avatar- The Legend of Korra S01E01.avi

What i would like to extract is the Date, File size and name of the file, remembering that the file can start with basically anything. and file size changes all the time.

What I have currently.

$dateModifyed = substr($file, 0, 10); 
$fileSize = preg_match('[0-9]*/[0-9]*/[0-9]*/s[0-9]*:[0-9]*/s*', $file, $match)
$FileName = 

The full code that I am working on:

function recursivePrint($folder, $subFolders, $Jsoncounter) {
    $f = fopen("file.json", "a");
    
    echo '{ "id" : "' . $GLOBALS['Jsoncounter'] . '", parent" : "' . "#" . '", Text" : "' . $folder . '" },' . "\n";
    $PrintString = '{ "id" : "' . $GLOBALS['Jsoncounter'] . '", parent" : "' . "#" . '", Text" : "' . $folder . '" },' . "\n";
    fwrite($f, $PrintString);
    $foldercount = $GLOBALS['Jsoncounter'];
    $GLOBALS['Jsoncounter']++;
    foreach($subFolders->files as $file) {


        preg_match('/^(\d{4}/\d{2}/\d{2}\s+\d{2}:\d{2})\s+([\d,]+)\s+(.*)$/', $file, $match);
        $dateModified = $match[1];
        $fileSize = str_replace(',', '', $match[2]);
        $fileName = $match[3];
        echo $dateModified . $fileSize . $fileName;


        echo '{ "id" : "' . $GLOBALS['Jsoncounter'] . '", parent" : "' . $foldercount . '", Text" : "' . $file . '" },';
        $PrintString ='{ "id" : "' . $GLOBALS['Jsoncounter'] . '", parent" : "' . $foldercount . '", Text" : "' . $file . '" },';
        fwrite($f, $PrintString);
        $GLOBALS['Jsoncounter']++;
    }
    
    foreach($subFolders->folders as $folder => $subSubFolders) {
        recursivePrint($folder, $subSubFolders, $Jsoncounter);
    }
    fclose($f); 
}

Solution

  • There are several problems in your regex:

    preg_match('[0-9]*/[0-9]*/[0-9]*/s[0-9]*:[0-9]*/s*', $file, $match)
                ^--missing delimiter ^            ^-- asterisk instead of plus
                                     |--literal s instead of \s
    

    and of course you haven't used anchors or capturing groups, and the regex isn't finished yet.

    Try the following:

    preg_match_all(
        '%^                     # Start of line
        ([0-9]+/[0-9]+/[0-9]+)  # Date (group 1)
        \s+                     # Whitespace
        ([0-9]+:[0-9]+)         # Time (group 2)
        \s+                     # Whitespace
        ([0-9,]+)               # File size (group 3)
        \s+                     # Whitespace
        (.*)                    # Rest of the line%mx', 
        $file, $result, PREG_SET_ORDER);
    for ($matchi = 0; $matchi < count($result); $matchi++) {
        for ($backrefi = 0; $backrefi < count($result[$matchi]); $backrefi++) {
            # Matched text = $result[$matchi][$backrefi];
    

    so for example $result[0][1] will contain 2011/09/20, and $result[2][4] will contain Avatar- The Legend of Korra S01E01.avi etc.