filepathexplodedirname

PHP dirname() loses last folder


The Problem

This is a very simple-to-understand question.

I'm having a user submit a URL, for example "http://example.com/path/filename.html".

I'm using PHP's dirname() function to get the so-called "base" of this URL. For the above example, that would be "http://example.com/path".

My problem arises when the user enters this:

http://example.com/blog

If you type the above into your browser, you will see the index.php or .html page in the folder called "blog". However, PHP's dirname() will return only "http://example.com".

I'm not sure if it thinks that "blog" is an extension-less file, if that exists, but I can't really find a solution.

Things I've Tried

I first tried getting the extension of the URL using this quick method:

$url = 'http://example.com/index.php';
$file_extension = end(explode('.', $filename));

Then, I would check if the extension existed using PHP empty(). If the extension exists, that means that a filename was entered after the folder, such as "http://example.com/path/file.html", and dirname() is perfect. If the extension doesn't exist, no file was entered and the last item in the path is a folder, so it is already "the base".

However, in the case of simply "http://example.com/path/", the above would return ".com/path/" as the file extension, which we all know doesn't exist. In this case, I would use the dirname() function and cut off "/path/".

EDIT:

Taking the extension of basename($url) won't work because if the user enters "http://example.com" basename() returns "example.com", the extension for which is supposedly ".com"

Hopefully, someone has had the same problem and knows the solution. I'm still looking, but any answers are wholly appreciated!!


Solution

  • EDIT Ok, last time before I give up:

    function getPath($url){
        $parts=explode("/",$url);
        $patharray=array(".","http:","https:");
        if(!in_array(pathinfo($url,PATHINFO_DIRNAME),$patharray) && strpos($parts[count($parts)-1], ".")!==false)
            unset($parts[count($parts)-1]);
        $url=implode("/",$parts);
        if(substr($url,-1)!='/')
            $url.="/";
        return $url;
    }
    echo getPath("http://www.google.com/blog/testing.php")."\n";
    echo getPath("www.google.com/blog/testing.php")."\n";
    echo getPath("http://www.google.com/blog/")."\n";
    echo getPath("http://www.google.com/blog")."\n";
    echo getPath("http://www.google.com")."\n";
    echo getPath("http://www.google.com/")."\n";
    echo getPath("www.google.com/")."\n";
    echo getPath("www.google.com")."\n";
    

    Any url with the last portion having a "." in it is parsed out, otherwise it is left alone. It uses pathinfo() to check to see if it is just a domain ("google.com" or "http://www.google.com") and then leaves the last portion alone as there would be a "." in it. Here is the script output:

    http://www.google.com/blog/
    www.google.com/blog/
    http://www.google.com/blog/
    http://www.google.com/blog/
    http://www.google.com/
    http://www.google.com/
    www.google.com/
    www.google.com/