I am creating a php script to scrape the images and respective dimension recommendations from https://gtmetrix.com/reports/example.com/a_unique_code.
After extracting the image path and the suggested new height and width, I will programmatically optimize my images.
The following is the relevant portion of the html returned from the Uniform Resource Locator:
<tr class="rules-details" style="display: none">
<td colspan="4">
<a href="/serve-scaled-images.html" class="rule-help btn hover-tooltip" data-tooltip-interactive data-tooltip-max-width="450" title="<h4>Serve scaled images</h4><p>Serving appropriately-sized images can save many bytes of data and improve the performance of your webpage, especially on low-powered (eg. mobile) devices.</p><p class="rule-help-tooltip-more"><a href="/serve-scaled-images.html">Read more</a></p>"><i class="sprite-question"></i><span class="resp-hidden">What's this mean?</span></a>
<div>
<p>The following images are resized in HTML or CSS. Serving scaled images could save 1.3MiB (45% reduction).
<ul>
<li><a href="https://www.example.com/Pictures/thumbs/0029.jpg" target="_blank" rel="nofollow noopener noreferrer">https://www.example.com/Pictures/thumbs/0029.jpg</a> is resized in HTML or CSS from 300x623 to 123x200. Serving a scaled image could save 51.3KiB (86% reduction).</li>
<li><a href="https://www.example.com/Pictures/thumbs/0133.jpg" target="_blank" rel="nofollow noopener noreferrer">https://www.example.com/Pictures/thumbs/0133.jpg</a> is resized in HTML or CSS from 300x578 to 135x200. Serving a scaled image could save 44.0KiB (84% reduction).</li>
<li><a href="https://www.example.com/Pictures/thumbs/0075.jpg" target="_blank" rel="nofollow noopener noreferrer">https://www.example.com/Pictures/thumbs/0075.jpg</a> is resized in HTML or CSS from 300x390 to 176x200. Serving a scaled image could save 43.2KiB (69% reduction).</li>
<li><a href="https://www.example.com/Pictures/thumbs/0057.jpg" target="_blank" rel="nofollow noopener noreferrer">https://www.example.com/Pictures/thumbs/0057.jpg</a> is resized in HTML or CSS from 300x436 to 174x200. Serving a scaled image could save 35.0KiB (73% reduction).</li>
<li><a href="https://www.example.com/Pictures/thumb/thumb.png" target="_blank" rel="nofollow noopener noreferrer">https://www.example.com/Pictures/thumbs/thumb.png</a> is resized in HTML or CSS from 148x100 to 68x46. Serving a scaled image could save 31.4KiB (78% reduction).</li>
<li><a href="https://www.example.com/Pictures/thumb/thumb.png" target="_blank" rel="nofollow noopener noreferrer">https://www.example.com/Pictures/thumbs/thumb.png</a> is resized in HTML or CSS from 148x100 to 68x46. Serving a scaled image could save 30.9KiB (78% reduction).</li>
<li><a href="https://www.example.com/Pictures/thumb/thumb.png" target="_blank" rel="nofollow noopener noreferrer">https://www.example.com/Pictures/thumbs/thumb.png</a> is resized in HTML or CSS from 148x100 to 68x46. Serving a scaled image could save 30.7KiB (78% reduction).</li>
<li><a href="https://www.example.com/Pictures/thumb/thumb.png" target="_blank" rel="nofollow noopener noreferrer">https://www.example.com/Pictures/thumbs/thumb.png</a> is resized in HTML or CSS from 148x100 to 68x46. Serving a scaled image could save 30.7KiB (78% reduction).</li>
<li><a href="https://www.example.com/Pictures/thumbs/0093.jpg" target="_blank" rel="nofollow noopener noreferrer">https://www.example.com/Pictures/thumbs/0093.jpg</a> is resized in HTML or CSS from 300x458 to 138x200. Serving a scaled image could save 28.9KiB (79% reduction).</li>
</ul>
</p>
</div>
</td>
</tr>
After advice from John Conde to use a DOM parser, here is my coding attempt:
$html = file_get_contents('https://gtmetrix.com/reports/example.com/a_unique_code');
$document = new DOMDocument();
$document->loadHTML($html);
$xpath = new DOMXpath($document);
$stack = array();
$expression = './/tr[contains(concat(" ", normalize-space(@class), " "), " rules-details ")]';
foreach ($xpath->evaluate($expression) as $tr)
{
array_push($stack, $tr->nodeValue);
}
$i=0;
foreach ($stack as $string)
{
$search_string = $string;
$find = 'reduction';
$pos = strpos($search_string, $find);
if($pos===false){}
else
{
$string = str_replace("What's this mean?","",$string);
$string = trim(preg_replace("/\s+/", " ", $string));
$string_array = explode(').', $string);
for($i=0;$i<sizeof($string_array);$i++)
{
$search_string = $string_array[$i];
$find = 'The following images are resized in HTML or CSS.';
$pos = strpos($search_string, $find);
if($pos===false){}
else
{
unset($string_array[$i]);
}
$find = "Optimize the following images to reduce their size by";
$pos = strpos($search_string, $find);
if($pos===false){}
else
{
$current_index = $string_array[$i];
$array_size = sizeof($string_array);
for($j=$current_index;$j<$array_size;$j++)
{
unset($string_array[$i]);
}
}
echo '<pre>'.$string_array[$i];
}
}
}
The question is, given the following string, how do I extract the url and second image dimension?
example.com/Pictures/thumbs/0093.jpg is resized in HTML or CSS from 300x458 to 138x200. Serving a scaled image could save 28.9KiB (79% reduction).
I need:
example.com/Pictures/thumbs/0093.jpg
138x200
I will be optimizing this prototype script, but this is how I am implementing JohnConde's answer:
<?php
// #########################################
// AUTOMATED IMAGE OPTIMIZATION
// #########################################
class Image
{
public $image_url;
public $image_name;
public $image_path;
public $image_full_path;
public $original_size;
public $new_size;
}
$debugging = true;
if($debugging === true){echo '<ul class="Results" style="display:block; height:auto;">';}
try
{
$HTML = file_get_contents('https://gtmetrix.com/reports/www.example.com/a_unique_code');// Get Webpage
switch($HTML)
{
case false:
if($debugging === true)
{
$error = error_get_last();
echo '<li class="Error_Msg" style="display:block; height:auto;">';
echo '<span><b>## FATAL ERROR - PROGRAM ABORTED ##</b></span>';
echo '<span><b>Message:</b> Could not retrieve the HTML document</span>';
echo '</li>';
error_clear_last();
exit;
}
break;
default:// START OF WRAPPER
$DOMdoc = new DOMDocument();// Object to store an HTML document
libxml_use_internal_errors(true);//
$html = @$DOMdoc->loadHTML($HTML);// Parse the HTML
$racks = (new DOMXPath($DOMdoc))->query('//tr/td/div//ul/li');// Creates a new DOMXPath object from the XPath expression
$images_info_array = array();// Array for storing image details objects
$document_root = $_SERVER['DOCUMENT_ROOT'];// Define the document root
foreach($racks as $rack)// Traverse over the HTML structure
{
// Define a pattern to search for
$expression = "/https?\:\/\/[^\",]+ is resized in HTML or CSS from \d{1,3}x\d{1,3} to \d{1,3}x\d{1,3}./";
if(preg_match_all($expression, $rack->nodeValue, $matched) == 1)// If the pattern is found then
{
$url = $rack->firstChild->nodeValue;// Get the URL from the string
preg_match_all('/\d{1,4}x\d{1,4}/', $rack->nodeValue, $matches);// Get the image dimensions from the string
[$original_size, $new_size] = $matches[0];//
$url_parts = parse_url($url);// Break the URL up into sections
$directory_path = $url_parts['path'];// Get the directory path without the domain
$path_parts = pathinfo($directory_path);// Get information about a file path
$position = strpos($directory_path, '/');// Find the first / in the file path
if ($position !== false)// If found
{
$new_directory_path = substr_replace($directory_path, "", $position, strlen('/'));// Remove the /
$image_info = new Image();// Create a new Image Object
$image_info->image_url = $url;// Store the image URL
$image_info->image_name = basename($url);// Store just the image name
$image_info->image_path = $path_parts['dirname'];// Store image directory without domain & file name
$image_info->image_full_path = $new_directory_path;//
$image_info->original_size = $original_size;// Store the original image size
$image_info->new_size = $new_size;// Store the new image size
array_push($images_info_array, $image_info);// Add the image information to an array
}else{
if($debugging === true)
{
$error = error_get_last();
echo '<li class="Warning_Msg">';
echo '<span><b>## WARNING - FILE PATH CHARACTER MISSING ##</b></span>';
echo '<span><b>Message:</b> / in the file path not found</span>';
echo '</li>';
error_clear_last();
}
}
}else{// If the pattern is not found then
if($debugging === true)
{
$error = error_get_last();
echo '<li class="Error_Msg" style="display:block; height:auto;">';
echo '<span><b>## FATAL ERROR - PROGRAM ABORTED ##</b></span>';
echo '<span><b>Message:</b> Could not find the pattern required to extract the URL & size information</span>';
echo '</li>';
error_clear_last();
exit;
}
}
}
foreach($images_info_array as $image_info)// Traverse the image info array
{
if(file_exists($image_info->image_full_path))// Check if the image exists
{
$temp_path = $document_root.$image_info->image_path.'/temp/';// Define a temporary folder location
switch(file_exists($temp_path))// Check if the temporary folder exists
{
case true:// If it does recursively delete it
$files = new RecursiveIteratorIterator(new RecursiveDirectoryIterator($temp_path, RecursiveDirectoryIterator::SKIP_DOTS), RecursiveIteratorIterator::CHILD_FIRST);
foreach ($files as $fileinfo)
{
$todo = ($fileinfo->isDir() ? 'rmdir' : 'unlink');
$todo($fileinfo->getRealPath());
}
rmdir($temp_path);
break;
case false:// If it does not exist create it
mkdir($temp_path, 0777);// If it doesnt create the temporary folder
break;
}
// Define the convert command for recommended optimization of the image
$command = 'convert -thumbnail '.$image_info->new_size.' "'.$document_root.'/'.$image_info->image_full_path.'" "'.$document_root.''.$image_info->image_path.'/temp/'.$image_info->image_name.'" 2>&1';
$last_line = system($command, $return_value);// Run the defined command
if($debugging === true)
{
switch ($return_value)
{
case true:
echo '<li class="Normal_Message">';
echo '<span><b>MESSAGE - THE COMMAND COMPLETED SUCCESSFULLY</b></span>';
echo '<span><b>Command:</b> '.$command.'</span>';
echo '<span><b>Directory:</b> '.$item->image_full_path.'</span>';
echo '<span><b>Resized:</b> '.$item->new_size.'</span>';
echo '<span><b>Returned:</b> '.$return_value.'</span>';
echo '<span><b>Output:</b> '.$last_line.'</span>';
echo '</li>';
break;
case false;
$error = error_get_last();
echo '<li class="Error_Msg" style="display:block; height:auto;">';
echo '<span><b>## ERROR - THE COMMAND DID NOT COMPLETE ##</b></span>';
echo '<span><b>TYPE:</b> '.$error['type'].'</span>';
echo '<span><b>MESSAGE:</b> '.$error['message'].'</span>';
echo '<span><b>FILE:</b> '.$error['file'].'</span>';
echo '<span><b>LINE:</b> '.$error['line'].'</span>';
echo '</li>';
error_clear_last();
break;
default:
break;
}
}
}
else// If the file does not exist
{
echo '<li class="Warning_Message" style="display:block; height:auto;">The file doesn\'t exist</li>';
}
}
break;// END OF WRAPPER
}
}
catch(Exception $Error_Message)
{
echo $Error_Message;
}
echo '</ul>';
?>
This will parse that HTML and output the text you are looking for:
$html = '<tr class="rules-details" style="display: none">
<td colspan="4">
<a href="/serve-scaled-images.html" class="rule-help btn hover-tooltip" data-tooltip-interactive data-tooltip-max-width="450" title="<h4>Serve scaled images</h4><p>Serving appropriately-sized images can save many bytes of data and improve the performance of your webpage, especially on low-powered (eg. mobile) devices.</p><p class="rule-help-tooltip-more"><a href="/serve-scaled-images.html">Read more</a></p>"><i class="sprite-question"></i><span class="resp-hidden">What\'s this mean?</span></a>
<div>
<p>The following images are resized in HTML or CSS. Serving scaled images could save 1.3MiB (45% reduction).
<ul>
<li><a href="https://www.example.com/Pictures/thumbs/0029.jpg" target="_blank" rel="nofollow noopener noreferrer">https://www.example.com/Pictures/thumbs/0029.jpg</a> is resized in HTML or CSS from 300x623 to 123x200. Serving a scaled image could save 51.3KiB (86% reduction).</li>
<li><a href="https://www.example.com/Pictures/thumbs/0133.jpg" target="_blank" rel="nofollow noopener noreferrer">https://www.example.com/Pictures/thumbs/0133.jpg</a> is resized in HTML or CSS from 300x578 to 135x200. Serving a scaled image could save 44.0KiB (84% reduction).</li>
<li><a href="https://www.example.com/Pictures/thumbs/0075.jpg" target="_blank" rel="nofollow noopener noreferrer">https://www.example.com/Pictures/thumbs/0075.jpg</a> is resized in HTML or CSS from 300x390 to 176x200. Serving a scaled image could save 43.2KiB (69% reduction).</li>
<li><a href="https://www.example.com/Pictures/thumbs/0057.jpg" target="_blank" rel="nofollow noopener noreferrer">https://www.example.com/Pictures/thumbs/0057.jpg</a> is resized in HTML or CSS from 300x436 to 174x200. Serving a scaled image could save 35.0KiB (73% reduction).</li>
<li><a href="https://www.example.com/Pictures/thumb/thumb.png" target="_blank" rel="nofollow noopener noreferrer">https://www.example.com/Pictures/thumbs/thumb.png</a> is resized in HTML or CSS from 148x100 to 68x46. Serving a scaled image could save 31.4KiB (78% reduction).</li>
<li><a href="https://www.example.com/Pictures/thumb/thumb.png" target="_blank" rel="nofollow noopener noreferrer">https://www.example.com/Pictures/thumbs/thumb.png</a> is resized in HTML or CSS from 148x100 to 68x46. Serving a scaled image could save 30.9KiB (78% reduction).</li>
<li><a href="https://www.example.com/Pictures/thumb/thumb.png" target="_blank" rel="nofollow noopener noreferrer">https://www.example.com/Pictures/thumbs/thumb.png</a> is resized in HTML or CSS from 148x100 to 68x46. Serving a scaled image could save 30.7KiB (78% reduction).</li>
<li><a href="https://www.example.com/Pictures/thumb/thumb.png" target="_blank" rel="nofollow noopener noreferrer">https://www.example.com/Pictures/thumbs/thumb.png</a> is resized in HTML or CSS from 148x100 to 68x46. Serving a scaled image could save 30.7KiB (78% reduction).</li>
<li><a href="https://www.example.com/Pictures/thumbs/0093.jpg" target="_blank" rel="nofollow noopener noreferrer">https://www.example.com/Pictures/thumbs/0093.jpg</a> is resized in HTML or CSS from 300x458 to 138x200. Serving a scaled image could save 28.9KiB (79% reduction).</li>
</ul>
</p>
</div>
</td>
</tr>';
$doc = new DOMDocument();
$html = @$doc->loadHTML($html);
$items = (new DOMXPath($doc))->query('//tr/td/div//ul/li');
foreach ($items as $item) {
$url = $item->firstChild->nodeValue;
preg_match_all('/\d{1,3}x\d{1,3}/', $item->nodeValue, $matches);
[$original, $resized] = $matches[0];
printf('URL:%s Original: %s Resized: %s%s', $url, $original, $resized, PHP_EOL);
}
Outputs
URL:https://www.example.com/Pictures/thumbs/0029.jpg Original: 300x623 Resized: 123x200
URL:https://www.example.com/Pictures/thumbs/0133.jpg Original: 300x578 Resized: 135x200
URL:https://www.example.com/Pictures/thumbs/0075.jpg Original: 300x390 Resized: 176x200
URL:https://www.example.com/Pictures/thumbs/0057.jpg Original: 300x436 Resized: 174x200
URL:https://www.example.com/Pictures/thumbs/thumb.png Original: 148x100 Resized: 68x46
URL:https://www.example.com/Pictures/thumbs/thumb.png Original: 148x100 Resized: 68x46
URL:https://www.example.com/Pictures/thumbs/thumb.png Original: 148x100 Resized: 68x46
URL:https://www.example.com/Pictures/thumbs/thumb.png Original: 148x100 Resized: 68x46
URL:https://www.example.com/Pictures/thumbs/0093.jpg Original: 300x458 Resized: 138x200