This is a very simple-to-understand question.
I'm having a user submit a URL, for example "http://example.com/path/filename.html".
I'm using PHP's dirname()
function to get the so-called "base" of this URL. For the above example, that would be "http://example.com/path".
My problem arises when the user enters this:
http://example.com/blog
If you type the above into your browser, you will see the index.php or .html page in the folder called "blog". However, PHP's dirname()
will return only "http://example.com".
I'm not sure if it thinks that "blog" is an extension-less file, if that exists, but I can't really find a solution.
I first tried getting the extension of the URL using this quick method:
$url = 'http://example.com/index.php';
$file_extension = end(explode('.', $filename));
Then, I would check if the extension existed using PHP empty()
. If the extension exists, that means that a filename was entered after the folder, such as "http://example.com/path/file.html", and dirname()
is perfect. If the extension doesn't exist, no file was entered and the last item in the path is a folder, so it is already "the base".
However, in the case of simply "http://example.com/path/", the above would return ".com/path/" as the file extension, which we all know doesn't exist. In this case, I would use the dirname()
function and cut off "/path/".
EDIT:
Taking the extension of basename($url)
won't work because if the user enters "http://example.com" basename()
returns "example.com", the extension for which is supposedly
".com"
EDIT Ok, last time before I give up:
function getPath($url){
$parts=explode("/",$url);
$patharray=array(".","http:","https:");
if(!in_array(pathinfo($url,PATHINFO_DIRNAME),$patharray) && strpos($parts[count($parts)-1], ".")!==false)
unset($parts[count($parts)-1]);
$url=implode("/",$parts);
if(substr($url,-1)!='/')
$url.="/";
return $url;
}
echo getPath("http://www.google.com/blog/testing.php")."\n";
echo getPath("www.google.com/blog/testing.php")."\n";
echo getPath("http://www.google.com/blog/")."\n";
echo getPath("http://www.google.com/blog")."\n";
echo getPath("http://www.google.com")."\n";
echo getPath("http://www.google.com/")."\n";
echo getPath("www.google.com/")."\n";
echo getPath("www.google.com")."\n";
Any url with the last portion having a "." in it is parsed out, otherwise it is left alone. It uses pathinfo()
to check to see if it is just a domain ("google.com" or "http://www.google.com") and then leaves the last portion alone as there would be a "." in it.
Here is the script output:
http://www.google.com/blog/
www.google.com/blog/
http://www.google.com/blog/
http://www.google.com/blog/
http://www.google.com/
http://www.google.com/
www.google.com/
www.google.com/