I'm quite interested in the Facebook's BigPipe technique for improving user experience when displaying web pages. The downside is that this is heavily Javascript-based, and not at all search-engine friendly.
When developing a similar technique on my own website, I designed it so it can very easily be disabled server-side to serve more standard pages, without BigPipe enabled. Now, I'm looking for a way to make it crawler-friendly.
The easy way would be to serve non-BigPipe content to search engine crawlers / bots, and pipelined content to the rest. This should not be considered as cloaking : the content is exactly the same, the layout is the same (after BigPipe's javascript has been executed). The only thing that changes it the way it is delivered, to make it more crawler-friendly. But will Google see this as legitimate?
The second way would be to use another Javascript to solve this problem. On the first request, send non-BigPipe page, that includes some Javascript that will save some cookie. On subsequent requests, send BigPipe content only if the cookie is presented. Very fist page load will not be optimized, but the other will. Looks like a great solution, but I don't really like multiplying cookies.
The third way would be to stream BigPipe content not using HTML comments as Facebook does, but using <noscript>
tags. This would make a pagelet look like :
<noscript id="pagelet_payload_foo">Some content to be indexed here</noscript>
<script>onPageletArrive({id:'foo', [...]})</script>
instead of the Facebook's approach:
<code id="pagelet_payload_foo"><!-- Some content to be indexed here --></code>
<script>onPageletArrive({id:'foo', [...]})</script>
This looks great, simple, both crawler friendly and user friendly. But this seems a little hackish to me, and does not work in IE 7/8 because the contents of the noscript
tag is ignored in the DOM. That would involve some dirty special case for these browsers.
Then, I looked more closely at what Facebook does. Seems like they are doing the same. Pages are optimized in my browser, but are not in Google's cache. I tried to clear all my browser cache and cookies, and requested the page again. No matter what, I keep getting the content through BigPipe. They are not using any cookie-based technique.
Then, the question is simple : How does Facebook do that? Would the first method be considered as cloaking, or does it only work for Facebook because it is Facebook? Or did I miss something else?
Thanks.
The easy answer is that Facebook discriminate search bots and serve them different content. That can be via the user agent (as I think you're implying in your Q) or by looking up the IP address to see if it matches a Google address range.
The fully static version would be my preference, because it also permits you to optimise for speed, something that Google (and perhaps others) include in its indexing.