home/blog/how-it-works-translations

How it works: translations

3/23/2024

The key trick

Google Sheets has a function to use Google Translate: GOOGLE_TRANSLATE(text, source language, target language). That's it. That's the magic.

The rest is just clever design.

Spreadsheets

All translatable tokens on my site (basically everything, minus blog contents) are stored in a Google Sheet that looks like this, but with more languages and more tokens:

A fragment of a Google Sheet, showing language tags as column headers and Google Translate functions that reference their column header, the content of the American English column, and the column header of the American English column to produce text in that column's language.
en-USesfr
USE_COOKIESUse cookies=GOOGLETRANSLATE($B2, $B$1, C$1)=GOOGLETRANSLATE($B2, $B$1, D$1)

I picked this trick up at my last company and improved upon it. They referenced tokens by row number, meaning that you could break everything by shuffling translations around. Here, tokens have a specific, stable string ID so I can reorganize and group as I want.

Ingestion

Google Sheets has an API to let you fetch a sheet as JSON. I get it as nested arrays, with each sub-array being a column.

From there, assembling a ID:string dictionary is a matter of finding the column with the appropriate locale and pairing it with the ID column.

Locale matching

Your browser tells every server what languages you speak, and in what order you prefer them, through the Accept-Language header. It's then up to the server to figure out what works best. There's no standard algorithm for figuring this out, so it's up to the implementer. Here's mine:

  1. Get the list of supported languages from my table header
  2. Get the user's list of languages from the Accept-Language header
  3. Sort the user's list by preference (a float in the range [0:1])
  4. Go down the user's list in descending order, looking for an exact match in our list of supported languages, and return the first match.
  5. If no match, return the default (American English)

Yes, I know this has flaws. It doesn't neatly handle, for example, folks whose language is set to British English (en-UK). It should try to find en-UK, and then when that fails find the first en-* supported language. I haven't done it yet because IETF language tags are complicated to parse.

Nobody's complained about it yet, and most browsers will send both the country-specific language and then the base language, so it works out anyway.

Applying the language

This site has both server and client components, and they have to consume this language dictionary differently.

Server components directly call a getTranslations() function that handles all of the above and returns the dictionary, from which they can grab individual tokens.

Client components consume a React context provided at the site's root layout (a server component), and that Context is seeded with the same getTranslations() output.