In a now migrated question about human-readable URLs I allowed myself to elaborate a little hobby-horse of mine:
When I encounter URLs like
http://www.example.com/product/123/subpage/456.html
I always think that this is an attempt on creating meaningful hierarchical URLs which, however, is not entirely hierarchical. What I mean is, you should be able to slice off one level at a time. In the above, the URL has two violations on this principle:
/product/123
is one piece of information represented as two levels. It would be more correctly represented as/product:123
(or whatever delimiter you like)/subpage
is very likely not an entity in itself (i.e., you cannot go up one level from456.html
ashttp://www.example.com/product/123/subpage
is "nothing").Therefore, I find the following more correct:
http://www.example.com/product:123/456.html
Here, you can always navigate up one level at a time:
http://www.example.com/product:123/456.html
— The subpagehttp://www.example.com/product:123
— The product pagehttp://www.example.com/
— The rootFollowing the same philosophy, the following would make sense [and provide an additional link to the products listing]:
http://www.example.com/products/123/456.html
Where:
http://www.example.com/products/123/456.html
— The subpagehttp://www.example.com/products/123
— The product pagehttp://www.example.com/products
— The list of productshttp://www.example.com/
— The root
My primary motivation for this approach is that if every "path element" (delimited by /
) is selfcontained1, you will always be able to navigate to the "parent" by simply removing the last element of the URL. This is what I (sometimes) do in my file explorer when I want to go to the parent directory. Following the same line of logic the user (or a search engine / crawler) can do the same. Pretty smart, I think.
On the other hand (and this is the important bit of the question): While I can never prevent that a user tries to access a URL he himself has amputated, am I wrongfully asserting (and honouring) that a search engine might do the same? I.e., is it reasonable to expect that no search engine (or really: Google) would try to access http://www.example.com/product/123/subpage
(point 2, above)? (Or am I really only taking the human factor into account here?)
This is not a question about personal preference. It's techical question about what I can expect of an crawler / indexer and to what extend I should take non-human URL manipulation into account when designing URLs.
Also, the structural "depth" of http://www.example.com/product/123/subpage/456.html
is 4, where http://www.example.com/products/123/456.html
is only 3. Rumour has it that this depth influences search engine ranking. At least, so I was told. (It is now evident that SEO is not what I know most about.) Is this (still?) true: does the hierarchical depth (number of directories) influence search ranking?
So, is my "hunch" technically sound or should I spend my time on something else?
Example: Doing it (almost) right
Good ol' SO gets this almost right. Case in point: profiles, e.g., http://stackoverflow.com/users/52162
:
http://stackoverflow.com/users/52162
— Single profilehttp://stackoverflow.com/users
— List of usershttp://stackoverflow.com/
— RootHowever, the canonical URL for the profile is actually http://stackoverflow.com/users/52162/jensgram
which seems redundant (the same end-point represented on two hierarchical levels). Alternative: http://stackoverflow.com/users/52162-jensgram
(or any other delimiter consistently used).
1) Carries a complete piece of information not dependent on "deeper" elements.
Hierarchical urls of this kind "http://www.example.com/product:123/456.html" are as useless as "http://www.example.com/product/123/subpage", because when users see your urls, they don't care about identifiers from your database, they want meaningful paths. This is why StackOverflow puts question titles into urls: "http://stackoverflow.com/questions/4017365/human-readable-urls-preferably-hierarchical-too".
Google advices against practice of replacing usual queries like "http://www.example.com/?product=123&page=456", because when every site develops it's own scheme, crawler doesn't know what each part means, if it's important or not. Google has invented sophisticated mechanisms to find important arguments and ignore unimportant, which means you'll get more pages into index and there will be less duplicates. But these algorithms often fail when web developers invent their own scheme.
If you care about both users and crawlers you should use urls like this instead:
Also, search engines give higher rating to pages with keywords in the url.