Internationalizing Bike Index

For more context on this project, see the feature announcement on Bike Index’s blog.

Unless your Rails app has been internationalized since its inception, internationalizing it minimally entails three broad efforts:

  1. Adding the ability to detect and set the locale for a given web request.
  2. If using the default Rails i18n framework, externalizing user-facing strings, moving them from views (mainly but not exclusively) to YAML.
  3. Translating your now-externalized strings to other languages.

For Bike Index, we did some research into the approaches taken by other internationalized open-source Rails projects – in particular, Discourse and GitLab. This work was useful in developing a mental model of the work to be done, although naturally we deviated with them where different needs or constraints demanded it.

Locale detection

There are a few ways to detect a user’s locale:

  1. from a locale query param (settable via a UI element),
  2. from a database value tied to the user’s account (settable via a user preferences UI),
  3. from the ACCEPT_LANGUAGE header set on a request (settable via the user’s browser preferences), and
  4. inferring a locale from the user’s geocoded location.

To minimize complexity, we implemented only (1) through (3).

# app/controllers/application_controller.rb L81-87 (f3cb4792)

def set_locale
  if controller_namespace == "admin"
    return I18n.with_locale(I18n.default_locale) { yield }
  end

  I18n.with_locale(requested_locale) { yield }
end

[source]

# app/controllers/application_controller.rb L65-79 (f3cb4792)

def requested_locale
  return @requested_locale if defined?(@requested_locale)

  requested_locale =
    locale_from_request_params.presence ||
    current_user&.preferred_language.presence ||
    locale_from_request_header.presence

  @requested_locale =
    if I18n.available_locales.include?(requested_locale.to_s.to_sym)
      requested_locale
    else
      I18n.default_locale
    end
end

[source]

Translation management

Translation management is a “buy vs. build” decision point. The central questions to engage with here are

  1. Do we want developers to be the gatekeepers to updating translations? (In our case, no.)
  2. Do we want to accept translation contributions from users via the site UI? (In our case, a nice-to-have but infeasible for a v1.)
  3. Are we able to invest the resources into building our own translation management solution? (In our case, likely not.)

That left us pricing a variety of translation management services we’d seen used elsewhere and researching their feature sets, including Transifex, LingoHub, and Phrase.

All involved committing to a monthly subscription that ranged from \$19 to \$180 per month, plus the cost of translation, which we estimated would total \$8,000-\$10,000 for an initial Dutch translation.

Some more digging surfaced Translation.io, which is lightweight, focused on Rails (and Laravel) projects, and pushed all the right buttons for us:

  1. It’s free for open-source projects
  2. It automatically integrates translations by Google Translate (imperfect but a cost-effective 80-90% solution), further reducing the costs involved

As a non-profit, we’re relatively price-sensitive and don’t want to use funds inefficiently, so the potential savings gave Translation.io a big leg up in our deliberations.

Its most significant feature-gap relative to its alternatives – automated GitHub PRs to sync translations – could be implemented with some shell script integrated into our build pipeline, so we had a clear winner.

String externalization

The key decision for this stage is what format to use for translation files, the choices being YAML (the Rails default) and GetText (broadly popular beyond the Rails ecosystem).

GetText has several advantages over the Rails default, especially for large projects – the most compelling arguably being that strings don’t need to be externalized from templates to a translation file. Instead, the source string lives in the template but is merely wrapped in a special method.

But, as is often the case in a Rails context, the defaults are collectively better optimized on the needs of a moderately-scaled project like Bike Index than the alternatives, even if those alternatives are in one sense or another individually better. The “Rails way”: because sometimes a set of individually sub-optimal but complementary tools / practices / etc are collectively optimal along dimensions more important to you today

There is pre-existing tooling that both mitigate the disadvantages of the Rails default i18n framework and amplify its benefits, so we chose to not stray too far from Rails conventions in order to leverage as much open-source prior art as possible. Additionally, the YAML approach allows non-developers (marketing?) to edit source copy without diving into the source code.

String externalization is by far the most time-consuming and labor-intensive part of a translation project.

We automated as much as possible using a variety of code-gen and text wrangling tools:

Pretty Good Practices

Some learnings and conventions that emerged while scanning through and extracting strings from ~15,000 lines of template, controller, and React code:

Retain semantic completeness

To aid translation, it’s desirable to externalize strings in a form as close to semantically complete as possible. i.e., when the interpretability of a translation entry does not depend on words external to the entry

As a corollary: Use coarse conditional branching in templates. Duplication of user-facing strings is desirable DRY 👏 is 👏 not 👏 the 👏 summum 👏 bonum 👏 of 👏 software 👏 development 👏 whenever its alernative is to break up a string into units that, in isolation, might lose their context or encode an assumption about word order that may not hold in another language. Yet another reason to keep templates as “dumb” as possible.

For example, instead of

- when = expedited ? "soon" : "eventually"
= "You'll #{when} receive your delivery"

prefer

- if expedited?
  = "You'll soon receive your delivery"
- else
  = "You'll eventually receive your delivery"

The former externalizes with the following structure:

soon: spoedig
youll_receive_delivery: U ontvangt %{wanneer} uw levering ontvangt

The resulting copy in Dutch is

U ontvangt spoedig uw levering ontvangt

but the translation we’d want here might be:

youll_soon_receive_delivery: U ontvangt binnenkort uw levering

Mailer translation format

Mailers can be namespaced by mailer name, email name, and email format as follows (note the .text and .html in the translation keys):

# config/locales/en.yml

geolocated_message:
  html:
    is_located_at_html: 'Is located at: <strong>%{address}</strong>'
  text:
    your_bike_is_at: 'Your %{bike} is at: %{address}'
-# app/views/organized_mailer/geolocated_message.html.haml

%p= t(".html.is_located_at_html", address: @organization_message.address)
-# app/views/organized_mailer/geolocated_message.text.haml

= t(".text.your_bike_is_at",
  bike: @bike.type,
  address: @organization_message.address)

Controllers

Controllers typically define user-visible strings on flash messages. We defined a translation helper method in a ControllerHelpers mixin that method wraps I18n.translate and infers the scope in accordance with the convention of scoping translations by their lexical location in the code base.

Both :scope and :controller_method can be overriden using the corresponding keyword args. Note that base controllers should be passed :scope or :controller_method explicitly. See the translation method docstring for implementation details

# app/controllers/concerns/controller_helpers.rb L145-157 (ee68edb1)

def translation(key, scope: nil, controller_method: nil, **kwargs)
  if scope.blank? && controller_method.blank?
    controller_method =
      caller_locations
        .slice(0, 2)
        .map(&:label)
        .reject { |label| label =~ /rescue in/ }
        .first
  end

  scope ||= [
    :controllers,
    controller_namespace,
    controller_name,
    controller_method.to_sym
  ]

  I18n.t(key, **kwargs, scope: scope.compact)
end

[source]

JavaScript

Client-side translations are defined under a :javascript keyspace in en.yml.

# config/locales/en.yml

javascript:
  bikes_search:

The translation method can be invoked directly as I18n.t() in your JS and passed a complete scope:

<span className="attr-title">
  {I18n.t("javascript.bikes_search.registry")}
</span>

Equivalently, a curried instance of I18n.t can be initiated locally (by convention, bound to t) with the local keyspace set as needed:

// app/javascript/packs/external_registry_search/components/ExternalRegistrySearchResult.js

const t = BikeIndex.translator("bikes_search");
// . . .
<span className="attr-title">
  {t("registry")}
</span>

A client-side JS translations file is generated when the prepare_translations rake task is run. See PR #1353 for implementation details.

Pre-Deployment Translation Syncing

When building master, we check for un-synced translations and, if any are found, stop the build and open a PR to master with the translation updates.

# .circleci/config.yml L102-104 (465b3072)

- run:
    name: Sync translations (only on master by default)
    command: bin/check_translations

[source]

# bin/check_translations L73-88 (e36adcd5)

output "${YELLOW}" "Creating Update Pull Request"

if [[ "${TRANSLATION_BRANCH}" != "master" ]]; then
  PR_URL="Related PR: ${CIRCLE_PULL_REQUEST}"
fi

git push -u origin "${BRANCH}"

hub pull-request \
    -m "[i18n] Translation update: Build ${CIRCLE_BUILD_NUM}

        Merge to unblock CI job ${CIRCLE_BUILD_NUM}: ${CIRCLE_BUILD_URL}
        ${PR_URL}"

output "${GREEN}" "Translation update PR created."
exit 1

[source]