Our website uses the Facebook Comments Box plugin. We include the comments box on our staging site that is behind our firewall, which means Facebook can't access it and generates the "URL is unreachable" error. This I understand.
However, once a page is published, and is reachable by Facebook, the error is still displayed. This can be easily fixed by clicking on the debug link provided along with the error, but my content editors don't want to have to do this every time, and they sometimes forget.
It seems like the reachable status is cached and reset once you use the debugger. Can anyone think of another explanation?
I suppose I could omit the Facebook comments box from the staging site, but would prefer not to. Any other ideas?
In the documentation of the Like Button they explain when the page is being scraped :
When does Facebook scrape my page?
Facebook needs to scrape your page to know how to display it around the site.
Facebook scrapes your page every 24 hours to ensure the properties are up to date. The page is also scraped when an admin for the Open Graph page clicks the Like button and when the URL is entered into the Facebook URL Linter. Facebook observes cache headers on your URLs - it will look at "Expires" and "Cache-Control" in order of preference. However, even if you specify a longer time, Facebook will scrape your page every 24 hours.
The user agent of the scraper is: "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"
Here are three options:
You can call the debugger by issuing a simple http request, you can do that from the server when you publish your article (or what ever you're publishing), you don't have to use the debugger tool.
You can check the user agent string for requests and if it's the facebook scraper allow it so that it can cache the page.
You can use different urls for production and staging, that way the cache of the staging pages won't matter in production.