[SOLVED] Can I use the “Host” directive in robots.txt?

Can I use the “Host” directive in robots.txt?

Searching for specific information on the robots.txt, I stumbled upon a Yandex help page^‡ on this topic. It suggests that I could use the Host directive to tell crawlers my preferred mirror domain:

User-Agent: *
Disallow: /dir/
Host: www.example.com

Also, the Wikipedia article states that Google too understands the Host directive, but there wasn’t much (i.e. none) information.

At robotstxt.org, I didn’t find anything on Host (or Crawl-delay as stated on Wikipedia).

Is it encouraged to use the Host directive at all?
Are there any resources at Google on this robots.txt specific?
How is compatibility with other crawlers?

^‡ At least since the beginning of 2021, the linked entry does not deal with the directive in question any longer.

Solution

The original robots.txt specification says:

Unrecognised headers are ignored.

They call it "headers" but this term is not defined anywhere. But as it’s mentioned in the section about the format, and in the same paragraph as User-agent and Disallow, it seems safe to assume that "headers" means "field names".

So yes, you can use Host or any other field name.

Robots.txt parsers that support such fields, well, support them.
Robots.txt parsers that don’t support such fields must ignore them.

But keep in mind: As they are not specified by the robots.txt project, you can’t be sure that different parsers support this field in the same way. So you’d have to check every supporting parser manually.