sortingsparqlmultilingualwikidatacountry

SPARQL on Wikidata: Official languages of a country sorted by commonality


I am trying to get the languages spoken in a Country. The returned results should be returned in order. This order should be either the official ordering of the languages given by the country, if obtainable, or the order given by the number of people speaking the language. The ordering of the languages after the first one is not particularly important, but the first language returned must be either the main official language or, if not defined, it should be the most commonly spoken language.

Taking the example of Switzerland, the official languages order (given, likely, by the number of people speaking it) is, according to Wikipedia: German, French, Italian, Romansh.

The Wikidata page on Switzerland shows the official languages in the following order: German, Italian, French, Romansh. This is not the order shown in Wikipedia. However, on the Wikidata pages I've looked at, the first language listed is consistently the main one (see example for Spain).

The following SPARQL query retrieves the official languages for Switzerland (try it here):

SELECT *
{
  BIND(wd:Q39 as ?country)
  
  OPTIONAL {
    ?country wdt:P37 ?officialLanguages.
    ?officialLanguages wdt:P424 ?officialLanguagesCode.
  }
}

This query returns yet another ordering of the languages, different from the Wikidata page ordering. The result is: French, German, Italian, Romansh. The first language is not the main language anymore, unlike what is shown in the Wikidata page.

First question

Why is the ordering returned by this query different than the ordering of the languages in the Wikidata page?

Second question

How can one get the ordered list of official languages spoken in a country?


Solution

  • Why is the ordering returned by this query different than the ordering of the languages in the Wikidata page?

    When you ask a query, the output order is not relevant (you're getting a set of results). The only way for getting what you want is to explicit this information adding further data in the knowledge graph.

    How can one get the ordered list of official languages spoken in a country?

    The only way is to manually add more information to Wikidata. For example, in Q39#P37 I've just added the property proportion (P1107) as qualifier. Now you can submit the following query (sorting by DESC(?proportion)):

    SELECT *
    {
      BIND(wd:Q39 as ?country)
      
      OPTIONAL {
        ?country p:P37 [
          ps:P37 ?officialLanguage ;
          pq:P1107 ?proportion
        ] .
        ?officialLanguage wdt:P424 ?officialLanguageCode .
      }
    }
    ORDER BY ?country DESC(?proportion)
    

    There are other alternatives for adding data to Wikidata. Another (I think sub-optimal) option to add ordering to the official languages could be to add a series ordinal (P1545) qualifier to indicate an order of importance (1, 2, 3, etc.).

    Currently, you don't see any qualifier in other items because, evidently, no one has raised the issue so far. Hence you can either be bold and adding such qualifiers or open a new topic somewhere on Wikidata and see how consensus evolves.