web-crawlerblazor-server-sidegoogle-crawlers

How can I write dynamic text to the <head> that the Google crawler will see?


It looks like the Google crawler gets the static page (pre calling OnInitializedAsync) for a page. This is for a Blazor Interactive Server app.

I need to populate both the <PageTitle> and <HeadContent> with data from the database. From my testing the web page shows, as expected:

<title>event 1112 • 8/12/2024 • , </title> 
<script type="application/ld+json">
{
    "@context": "https://schema.org",
    "@type": "Event",
    "name": "event 1112",
    "description": "sdfsdfsdfsdf\r\nasdasd\r\n111\r\nqweqwe",
    "image": [
        "https://louishowe20240425.blob.core.windows.net/organizations/TestOrganization/1587542280_3_30.jpeg",
        "https://louishowe20240425.blob.core.windows.net/organizations/TestOrganization/central_bank_46.png"
    ],
    "url": "https://louishowe-dev.azurewebsites.net:443/Event/Profile%3Fid=20524",
    "endDate": "2024-08-12T12:00:00+03:00",
    "eventStatus": "https://schema.org/EventScheduled",
    "organizer": {
        "@type": "Organization",
        "name": "TestOrganization",
        "url": "https://louishowe-dev.azurewebsites.net:443/Organization/Profile%3Fid=30017",
        "email": "Test@gmail.com",
        "telephone": "+1996441551"
    },
    "startDate": "2024-08-12T11:30:00+03:00"
}
</script>

But the Google Rich Result Test shows:

<title> &#x2022;  &#x2022; </title>
<script type="application/ld&#x2B;json">{}</script>

Where the {} is what my code behind returns if it does not yet have the data read from the DB.

I asked about this on Microsoft Q&A (2nd answer) and got this answer (from an individual there who knows Blazor very well):

remember a web crawler will only have access to the pre-render html produced by blazor (no OnInitializedAsync() or events)

So, how can I give the crawler the fully rendered page instead of the initial static html?


Solution

  • The solution has a couple of components:

    1. Populate the page on the pre-render call if the request is from a crawler.
    2. To know when it's a crawler, add &crawler=true to the urls in your sitemap (getting the request headers inside a SignalR circuit is not easy, hence this solution).
    3. If the JSON content is generated in your code, make the @StructuredDataProperty getter a MarkupString, not a string. This makes no difference in the final web page, but it makes a big difference to Google when it reads it (no idea why).