How it works: translations
3/23/2024The key trick
Google Sheets has a function to use Google Translate:
GOOGLE_TRANSLATE(text, source language, target language)
. That's it. That's
the magic.
The rest is just clever design.
Spreadsheets
All translatable tokens on my site (basically everything, minus blog contents) are stored in a Google Sheet that looks like this, but with more languages and more tokens:
en-US | es | fr | |
---|---|---|---|
USE_COOKIES | Use cookies | =GOOGLETRANSLATE($B2, $B$1, C$1) | =GOOGLETRANSLATE($B2, $B$1, D$1) |
I picked this trick up at my last company and improved upon it. They referenced tokens by row number, meaning that you could break everything by shuffling translations around. Here, tokens have a specific, stable string ID so I can reorganize and group as I want.
Ingestion
Google Sheets has an API to let you fetch a sheet as JSON. I get it as nested arrays, with each sub-array being a column.
From there, assembling a ID:string dictionary is a matter of finding the column with the appropriate locale and pairing it with the ID column.
Locale matching
Your browser tells every server what languages you speak, and in what order you
prefer them, through the Accept-Language
header. It's then up to the server to
figure out what works best. There's no standard algorithm for figuring this out,
so it's up to the implementer. Here's mine:
- Get the list of supported languages from my table header
- Get the user's list of languages from the
Accept-Language
header - Sort the user's list by preference (a float in the range [0:1])
- Go down the user's list in descending order, looking for an exact match in our list of supported languages, and return the first match.
- If no match, return the default (American English)
Yes, I know this has flaws. It doesn't neatly handle, for example, folks whose language is set to British English (en-UK). It should try to find en-UK, and then when that fails find the first en-* supported language. I haven't done it yet because IETF language tags are complicated to parse.
Nobody's complained about it yet, and most browsers will send both the country-specific language and then the base language, so it works out anyway.
Applying the language
This site has both server and client components, and they have to consume this language dictionary differently.
Server components directly call a getTranslations()
function that handles all
of the above and returns the dictionary, from which they can grab individual
tokens.
Client components consume a React context provided at the site's root layout (a
server component), and that Context is seeded with the same getTranslations()
output.