phpflushdiskcache

Testing for disk cache buffers being flushed


I currently have a video file that's being converted to a different format via a shell_exec() call. No problems with the call or with the format conversion, that all works correctly; but my next step is to push that file up to an s3 bucket.

However, I'd noticed that the filesystem caching won't necessarily flush my newly-written file immediately, so I was pushing a 0 byte file to the s3, even though whenever I looked at it on the filesystem it was the correct length. Inserting an arbitrary 5 second sleep in my code between the call to shell_exec and the s3-push solved this problem, but it feels very hacky, and I've no way of knowing whether 5 seconds sleep will always be enough especially when working with larger video files and the system is under load.

I'm pretty sure that I can't force a disk cache flush unless I execute a sync call (via shell_exec again), but I don't want to use that approach because it will affect all files on the server with any buffered data, not simply the single file that I'm manipulating.

So I wrote this simple bit of code to monitor the filesize until any disk cache flush is completed:

$prevSize = -1;
$size = filesize($myFileName);
while ($prevSize < $size) {
    sleep(1);
    clearstatcache(true, $myFileName);
    if ($size > 0)
        $prevSize = $size;
    $size = filesize($myFileName);
}

Basically, just looping until at least something has been flushed to the file, and filesize has been consistent for at least a second.

What I don't know is whether a disk flush will update the size only when all the file cache has been successfully flushed to disk; or whether it will flush a few blocks at a time, and I might find myself trying to push a partially flushed file to s3 and ending up with it being corrupted.

Any advice would be appreciated.

EDIT

The existing code looks something like:

private static function pushToS3($oldFilePath, $s3FileName, $newFilePath) {
    self::testFileFlush($newFilePath);
    file_put_contents(
        $s3FileName,
        file_get_contents($newFilePath)
    );
}

private function processVidoe($oldFilePath, $s3FileName, $newFilePath) {
    // Start Conversion
    $command = "ffmpeg -i \"$oldFilePath\" -y -ar 44100 \"$newFilePath\"";
    $processID = shell_exec("nohup ".$command." >/dev/null & echo $!");

    self::pushToS3($oldFilePath, $s3FileName, $newFilePath);
    unlink($newFilePath);
    unlink($oldFilePath);
}

This is a mod to old legacy code that ran on a single server, simply storing the files in the server's filesystem; but I've changed the infrastructure to run on multiple AWS EC2 app servers for resilience, and using S3 to provide sharing of file resources between the EC2s. Files are uploaded to the appservers by our users, then converted to flv and pushed to the S3 so that they're available to all EC2 instances.

The longer term solution is going to be using AWS Elastic Transcoder, when I can simply push the originals to S3 and submit a queued request to Elastic Transcoder, but that's a while away yet.


Solution

  • Unless you're doing one of the following, the behaviour you're describing should be impossible:

    1. Writing your data to a temp file, then copying/moving it to the location you're trying to upload.
    2. Mounting the same partition with two different machines, one writing the file and the other attempting to upload it.
    3. Some sort of hacky software buffering is happening.

    Otherwise the FS cache should be completely transparent to anything run on the OS, and any request for cached data that has not been written to the disk will be served from cache by the OS.

    In the case of #2 you should be able to get somewhat around it by changing the caching method to write-through instead of write-back. Your write performance goes down, but data is always written immediately and you're much less at risk of data loss.

    edit

    Ffmpeg is probably touching the filename you give it, using temp file[s] to convert the video, and then moving the finished file to the destination. I'm assuming that the script that fires off the conversion backgrounds the process since otherwise there wouldn't be any confusion as to if the completed file exists or not.

    What I would suggest is that instead of forking just ffmpeg into a background process and then testing if the end file exists fork into another PHP script in the background in which you call ffmpeg without backgrounding it, and then trigger the upload once that's complete.

    eg:

    //user-facing.php
    <?php
    echo "Queueing your file for processing..."
    shell_exec("/usr/bin/php /path/to/process.php /path/to/source.mpg /path/to/dest.mpg &")
    echo "Done!"
    

    and:

    //process.php
    <?php
    exec(sprintf("/path/to/ffmpeg -options %s %s", $argv[1], $argv[2]), $output, $exit_code);
    if($exit_code === 0) {
      upload_to_s3($argv[2]);
    } else {
      //notify someone of the error
    }
    

    This also lets you capture the output and return code from ffmpeg and act on it instead of wondering about why some videos just silently fail to convert.