seopermalinksrel

Is rel=self the correct rel tag to use for forum permalinks?


I have been building a forum from scratch with my friends just for fun, and we're starting to see bots and scrapers go by. The problem we're having is that you can load a page /post/1 with four replies, and each reply includes a little permalink to itself /reply/1#reply-1. If I am on /post/1 and navigate to /reply/1, I'll end up right back where I started, just with the anchor to the reply. But! Scrapers have no idea this is the case, so they're opening every /post link and then following every /reply link, and it's causing performance issues, so I've been looking around SEO sites to try to fix it.

I've started using rel=canonical on the /reply page, to tell the bots they're all the same, but as far as I can tell that doesn't help me until the bot has already loaded the page, and thus I wind up with tons of traffic. Would it be correct to change my

<a href="/reply/1#reply-1">Permalink</a>

tags to

<a href="/reply/1#reply-1" rel="self">Permalink</a>

since they should be the same content? Or would this be misusing rel="self" and there's another, better rel tag I should be using instead?


Solution

  • The self link type is not defined for HTML (but for Atom), so it can’t be used in HTML5 documents.

    The canonical link type is appropriate for your case (if you make sure that it always points to the correct page, in case the thread is paginated), but it doesn’t prevent bots from crawling the URLs.

    If you want to prevent crawling, no link type will help (not even the nofollow link type, but it’s not appropriate for your case anyway). You’d have to use robots.txt, e.g.:

    User-agent: *
    Disallow: /reply/
    

    That said, you might want to consider changing the permalink design. I think it’s not useful (neither for your users nor for bots) to have such an architecture. It’s a good practice to have exactly one URL per document, and if users want to link to a certain post, there is no reason to require a new page load if it’s actually the same document.

    So I would either use the "canonical" URL and add a fragment component (/post/1#reply-1, or what might make more sense: /threads/1#post-1), or (if you think it can be useful for your users) I would create a page that only contains the reply (with a link back to the full thread).