This is not a SEO question.
I am curious how to markup HTML in a semantic correct way concerning the used language. Please correct me if my markup is mistaken.
My questions is: do I need the lang
attribute in the html
tag when I already use the hreflang
attribute in the link
tag?
Are both directives semantically different? I mean: will the self-reference in the link
tag in both examples semantically be understood as indicating the language of the document?
The code samples below might clarify my question a bit:
Example of an English webpage
http://example.com/en/
<!DOCTYPE html>
<html lang="en">
<head>
<title>English webpage</title>
<link rel="canonical" href="http://example.com/en">
<link rel="alternate" href="http://example.com/en/" hreflang="en">
<link rel="alternate" href="http://example.com/nl/" hreflang="nl">
<link rel="alternate" href="http://example.com/en/" hreflang="x-default">
</head>
<body>
<p>This is a webpage written in English.
This page is also available in Dutch.
The default language of this page is English.
</body>
</html>
Example of a Dutch webpage
http://example.com/nl/
<!DOCTYPE html>
<html lang="nl">
<head>
<title>Nederlandse webpagina</title>
<link rel="canonical" href="http://example.com/nl">
<link rel="alternate" href="http://example.com/en/" hreflang="en">
<link rel="alternate" href="http://example.com/nl/" hreflang="nl">
<link rel="alternate" href="http://example.com/en/" hreflang="x-default">
</head>
<body>
<p>Dit is een Nederlandstalige web pagina.
Deze pagina is beschikbaar in het Engels.
De standaardtaal van deze pagina is Engels.
</body>
</html>
You should always provide the lang
attribute on the html
element.
Two reasons relevant to your case:
The HTML spec describes how the language of a node gets determined. The hreflang
attribute plays no role here.
If you don’t provide lang
on the html
element, this node has no language.
An alternate
+hreflang
link is only interpreted to point to a translation of the current document if the value of link
-hreflang
differs from the value of html
-lang
:
If the
alternate
keyword is used with thehreflang
attribute, and that attribute’s value differs from the root element’s language, it indicates that the referenced document is a translation.
If you don’t provide lang
on the html
element, the alternate
+hreflang
links are not considered to point to translations.
Even if a user agent deduces the language of the document by taking self-referential¹ alternate
+hreflang
links into account, there are situations in which this could fail:
If the HTML document gets opened locally, it no longer has a HTTP URL, so a user agent can’t deduce that the alternate
+hreflang
link refers to this document.
If the HTML documents gets retrieved over a different URL (e.g., with tracking parameters), the alternate
+hreflang
link no longer refers to the current URL, so a user agent can’t deduce that it does apply to this URL, too.
(With a canonical
link, both situations could be mitigated, but that’s one more thing a user agent would have to support. Not all do.)
¹ Strictly speaking, a self-referential alternate
+hreflang
hyperlink is not semantic, because alternate
is defined to refer to "an alternate representation of the current document", but a document is of course not an alternate representation of itself. However, as Google Search documents its use, it’s now common to see this markup.