In RFC 5646, Tags for Identifying Languages, § 3.1.2 Record and Field Definitions, the following explanation is given for the semantics of the Preferred-Value field when appearing in a record whose Type is "variant":
- For fields of type 'script', 'region', or 'variant', 'Preferred-Value' contains the subtag of the same type that is preferred for forming the language tag.
My initial interpretation of this was that if the Type of the record is variant, then the value of a Preferred-Value is also a variant — "a subtag of the same type". In other words, I read "of the same type" as "of the same type as the record itself".
However, there are records in the current version of the language tag registry (2018-04-23 at the time I write this — it doesn’t seem there are versioned links) which do not match this interpretation. For example:
Type: variant
Subtag: arevela
Description: Eastern Armenian
Added: 2006-09-18
Deprecated: 2018-03-24
Preferred-Value: hy
Prefix: hy
The Preferred-Value here is not a variant — a variant must be either 5-8 alphanumeric ASCII characters or 1 digit plus three alphanumeric characters. In this case in particular, it’s clear that it’s referring to the Armenian language (the first segment of a language tag) rather than to a variant.
However, when looking through other entries, most Preferred-Value values do conform to my initial interpretation. For example:
Type: region
Subtag: YD
Description: Democratic Yemen
Added: 2005-10-16
Deprecated: 1990-08-14
Preferred-Value: YE
Here, Preferred-Value does appear to be another region code. The rules for script/region/variant types are given together — the Preferred-Value is the "same type" for all of these. If for a region record a "same type Preferred-Value" means "also a region", how is it that for a variant record Preferred-Value may point at a different type? More importantly, if this is possible, is the only way to determine the type of the Preferred-Value field to test its grammar?
You are right. That arevela
entry does not conform to the registry specification. It seems as though they noticed this; the registry as of 2021-02-23 has a new entry for arevela
without that Preferred-Value
. It instead has the comment "Preferred tag is hy".
Your comments also seems to be correct (and my initial interpretation of the section in the spec was wrong). They've changed those entries too, so all extlangs now have a Preferred-Value
that is a primary language subtag identical to the extlang.