My __NEXT_DATA__
sometimes includes content that looks like a URL. Google then crawls it and reports a 404 error in Google Search Console.
Is there a simple way to block Google from looking into __NEXT_DATA__
?
I tried using: <!--googleoff: index-->
but apparently it doesn't really work.
Edit: even tried to base64 encode it but then it is rendered already and sent to the client. If I change it, I get a hydration error.
I've written about this Google behavior on the Webmasters stack site: Google follows JavaScript string as relative path - produces 404 error. I find this feature of Googlebot to be annoying, but it doesn't seem to hurt your site when it happens.
Google doesn't penalize sites for having 404 errors. In fact, Google prefers when sites appropriately return 404 errors for URLs that are not supposed to contain content.
Google's crawl budget is very large. Googlebot is usually willing to crawl ten or one hundred times as many pages as it indexes. Your crawl budget will increase as your site gains reputation (in the form of links from other sites.) I wouldn't worry about this eating into your crawl budget unless Googlebot is trying to crawl thousands of these URLs.
Google needs to be able to access your JavaScript to render and index your site. If Google can't see your hydrated pages it won't be able to index your content. You shouldn't try to block your JavaScript from Googlebot. The only way to do it would be to put this JavaScript in a separate .js
file and block it with robots.txt
. See Preventing robots from crawling specific part of a page
One way to prevent this is to change your JavaScript so the content looks less like URLs. Google seems to tend to crawl string literals that look like a URL path with a /
in them or end in .html
. If it is your own JS code that is triggering these heuristics you can rewrite it to break up your string literals. For example use var foo = "/"+"path"+"/"+"file"+"."+"html"
rather than var foo = "/path/file.html"
Another thing that would help is rendering your site server side. See Rendering: Fundamentals | Next.js. That will cause the initial page load to have rendered HTML rather than being built from __NEXT_DATA__
. Rending your site server side can also have additional performance and SEO benefits.