shsparqlwikimediawikidata-query-service

Validate language code for Wikimedia languages


I have a shell script that uses Wikidata Query Service (WDQS) to get required data. The SPARQL query that run WDQS takes input parameter language code.

Is there a way that I can check in shell script if the input language code is a valid Wikimedia language code as the first column data in below link https://www.wikidata.org/wiki/Help:Wikimedia_language_codes/lists/all


Solution

  • These codes are possible values of wdt:P424. From the property proposal:

    ā€” Is there a big difference to ISO 639-1?
    ā€” Many of them are the same as ISO, but it is not done in a consistent way. Some language codes have two letters, some three, and a few even more. And there are also a few cases where it is completely different (als: ISO: tosk Albanian, Wikimedia: Alemannic).

    You could retrieve all these codes using the following simple SPARQL query:

    SELECT DISTINCT ?code { [] wdt:P424 ?code } ORDER BY ?code
    

    Try it!

    In fact, the list you have linked to is periodically generated by a bot. The full query is:

    SELECT ?item ?c
    (CONCAT("{","{#language:",?c,"}","}") as ?display)
    (CONCAT("{","{#language:",?c,"|","en}","}") as ?displayEN)
    (CONCAT("{","{#language:",?c,"|","fr}","}") as ?displayFR)
    {
      ?item wdt:P424 ?c .
      MINUS{?item wdt:P31/wdt:P279* wd:Q14827288} #--exclude Wikimedia projects
      MINUS{?item wdt:P31/wdt:P279* wd:Q17442446} #--exclude Wikimedia internal stuff
    }
    

    You could:

    I would prefer the third option:

    #!/bin/sh
    echo "Enter language code:"
    read code
    request="curl -g -s https://query.wikidata.org/sparql?query=ASK{?lang%20wdt:P424%20\"$code\"}"
    
    if $request | grep -q "true"; then
        echo "Valid code";
    else 
        echo "Invalid code";
    fi