I created a script for use with my website that is supposed to erase the oldest entry in cache when a new item needs to be cached. My website is very large with 500,000 photos on it and the cache space is set to 2 GB.
These functions are what cause the trouble:
function cache_tofile($fullf, $c)
{
error_reporting(0);
if(strpos($fullf, "/") === FALSE)
{
$fullf = "./".$fullf;
}
$lp = strrpos($fullf, "/");
$fp = substr($fullf, $lp + 1);
$dp = substr($fullf, 0, $lp);
$sz = strlen($c);
cache_space_make($sz);
mkdir($dp, 0755, true);
cache_space_make($sz);
if(!file_exists($fullf))
{
$h = @fopen($fullf, "w");
if(flock($h, LOCK_EX))
{
ftruncate($h, 0);
rewind($h);
$tmo = 1000;
$cc = 1;
$i = fputs($h, $c);
while($i < strlen($c) || $tmo-- > 1)
{
$c = substr($c, $i);
$i = fwrite($h, $c);
}
flock($h, LOCK_UN);
fclose($h);
}
}
error_reporting(7);
}
function cache_space_make($sz)
{
$ct = 0;
$cf = cachefolder();
clearstatcache();
$fi = shell_exec("df -i ".$cf." | tail -1 | awk -F\" \" '{print \$4}'");
if($fi < 1)
{
return;
}
if(($old = disk_free_space($cf)) === false)
{
return;
}
while($old < $sz)
{
$ct++;
if($ct > 10000)
{
error_log("Deleted over 10,000 files. Is disk screwed up?");
break;
}
$fi = shell_exec("rm \$(find ".$cf."cache -type f -printf '%T+ %p\n' | sort | head -1 | awk -F\" \" '{print \$2}');");
clearstatcache();
$old = disk_free_space($cf);
}
}
cachefolder()
is a function that returns the correct folder name with a /
appended to it.
When the functions are executed, the CPU usage for apache is between 95% and 100% and other services on the server are extremely slow to access during that time. I also noticed in whm that cache disk usage is at 100% and refuses to drop until I clear the cache. I was expecting more like maybe 90ish%.
What I am trying to do with the cache_tofile function is attempt to free disk space in order to create a folder then free disk space to make the cache file. The cache_space_make function takes one parameter representing the amount of disk space to free up.
In that function I use system calls to try to find the oldest file in the directory tree of the entire cache and I was unable to find native php functions to do so.
The cache file format is as follows:
/cacherootfolder/requestedurl
For example, if one requests http://www.example.com/abc/def then from both functions, the folder that is supposed to be created is abc and the file is then def so the entire file in the system will be:
/cacherootfolder/abc/def
If one requests http://www.example.com/111/222 then the folder 111 is created and the file 222 will be created
/cacherootfolder/111/222
Each file in both cases contain the same content as what the user requests based on the url. (example: /cacherootfolder/111/222 contains the same content as what one would see when viewing source from http://www.example.com/111/222)
The intent of the caching system is to deliver all web pages at optimal speed.
My question then is how do I prevent the system from trying to lockup when the cache is full. Is there better code I can use than what I provided?
I would start by replacing the ||
in your code by &&
, which was most likely the intention.
Currently, the loop will always run at least 1000 times - I very much hope the intention was to stop trying after 1000 times.
Also, drop the ftruncate
and rewind
.
From the PHP Manual on fopen
(emphasis mine):
'w' Open for writing only; place the file pointer at the beginning of the file and truncate the
file to zero length. If the file does not exist, attempt to create it.
So your truncate
is redundant, as is your rewind
.
Next, review your shell_exec
's.
The one outside the loop doesn't seem too much of a bottleneck to me, but the one inside the loop...
Let's say you have 1'000'000 files in that cache folder.
find
will happily list all of them for you, no matter how long it takes.
Then you sort that list.
And then you flush 999'999 entries of that list down the toilet, and only keep the first one.
Then you do some stuff with awk
that I don't really care about, and then you delete the file.
On the next iteration, you'll only have to go through 999'999 files, of which you discard only 999'998.
See where I'm going?
I consider calling shell scripts out of pure convenience bad practice anyway, but if you do it, do it as efficiently as possible, at least!
Do one shell_exec
without head -1
, store the resulting list in a variable, and iterate over it.
Although it might be better to abandon shell_exec
altogether and instead program the corresponding routines in PHP (one could argue that find
and rm
are machine code, and therefore faster than code written in PHP to do the same task, but there sure is a lot of overhead for all that IO redirection).
Please do all that, and then see how bad it still performs.
If the results are still unacceptable, I suggest you put in some code to measure the time certain parts of those functions require (tip: microtime(true)
) or use a profiler, like XDebug, to see where exactly most of your time is spent.
Also, why did you turn off error reporting for that block? Looks more than suspicious to me.
And as a little bonus, you can get rid of $cc
since you're not using it anywhere.