Parsing actual IANA language codes

Submitted by Frederic Marand on

While working on i18n in Go with the golang.org/x/text package, I had to dive into the subtleties of language tags (RFC5646) and their filtering/lookup (RFC4647) and, would you believe it ? things are much less simple than we usually take them to be, even in a real-life i18n environment like Drupal 8/9/10.

Now, for decades, I had been accustomed to using tags like en_US or fr_FR, most than half-believing that language+region format, relying on ISO-639 language codes and ISO-3166 country list, was enough for all needs; and indeed, in Drupal projects, it always proved to be enough.

Handling the same topic with the official Go package golang.org/x/text meant discovering the intricacies of tags, subtags, regions, extlang, macrolanguage, region, scope, and more, all of those building on the official IANA list at iana.org/assignments/language-subtag-registry/language-subtag-registry (forget about iana.org/assignments/language-tags/language-tags.xhtml: it is now as obsolete as its XHTML format).

A quick look at that page shows wonders like handling of deprecated languages or regions, or the official codes for Klingon, Sindaring or Tengwar. All of it in a format visibly meant to be both readable and machine-consumed, yet not a plain JSON, XML, or YAML. Tantalizing...

So I decided to parse it to a more readily usable form: that parser is now Free Software at github.com/fgm/iana_lang_registry_tools

iana_lang_registry_tools README start

By all means, use it and suggest extensions based on your needs.