Parsing actual IANA language codes

Submitted by Frederic Marand on

While working on i18n in Go with the package, I had to dive into the subtleties of language tags (RFC5646) and their filtering/lookup (RFC4647) and, would you believe it ? things are much less simple than we usually take them to be, even in a real-life i18n environment like Drupal 8/9/10.

Now, for decades, I had been accustomed to using tags like en_US or fr_FR, most than half-believing that language+region format, relying on ISO-639 language codes and ISO-3166 country list, was enough for all needs; and indeed, in Drupal projects, it always proved to be enough.

Handling the same topic with the official Go package meant discovering the intricacies of tags, subtags, regions, extlang, macrolanguage, region, scope, and more, all of those building on the official IANA list at (forget about it is now as obsolete as its XHTML format).

A quick look at that page shows wonders like handling of deprecated languages or regions, or the official codes for Klingon, Sindaring or Tengwar. All of it in a format visibly meant to be both readable and machine-consumed, yet not a plain JSON, XML, or YAML. Tantalizing...

So I decided to parse it to a more readily usable form: that parser is now Free Software at

iana_lang_registry_tools README start

By all means, use it and suggest extensions based on your needs.