Translating Website Content: The Benefits of Pseudo-Localization Techniques

…or, as they might say in Pseudoland, “Psëëùùdøø-løøçåålîîzååtîîøøñ Tëëçhñîîqùùëës.”

Pseudo-localization (or as I’ve seen it written “pseudo localization” or “pseudolocalization”) describes the practice of simulating the localization process before jumping headfirst into the real translation work. It can be an excellent, cost-effective, and well-worthwhile part of any well-designed localization strategy by helping ward-off nasty (and potentially expensive) surprises later, such as text expansion, character-encoding issues, and potential text clipping issues.

Pseudo-localization represents not a technique but a collection of techniques. However the basic strategy behind them is the same:

Replace the localizable text within the material in question with something identifiable.

This can be done via script, macro, or through a tool, but shouldn’t need to involve actual human translators. In fact, one of its values is in its ability to allow some level of linguistic testing by non-native readers of the target language. The various pseudo-localization techniques themselves range from the very simple to the quite elaborate, and should ideally target those areas of most risk and uncertainty.

Likely pseudo-localization objectives include:

  • Getting a “feel” for what the localized versions will look like.
  • Identifying text-expansion issues and other post-translation UI problems.
  • Uncovering character-encoding problems with the translation process or character-encoding handling problems with the material itself.
  • Discovering pre-processing errors with the translation process (i.e. not all translatable material is getting “pulled” for translation).
  • Identifying erroneously hard-coded strings.
  • Revealing latent functionality issues in the material itself (that only manifest after translation).
  • Detecting problems with automatic sorting.
  • Confirming that the files-to-be-localized have been correctly identified.

Now, some options. These techniques may be combined in order to address different pseudo-localization objectives:

Replacing Characters with Xs
This technique replaces all characters within a string with an “X” or similar easily-identifiable character. Since the return string becomes unrecognizable as a derivative of the source string, it really can be considered a form of greeking, its use limited to indentifying hard-coded strings and localization pre-processing errors.

The technique is reasonably simple to implement, and I’ve seen several tools that have such functionality built-in.

“Pseudoese” is a fictional written language where Roman characters get replaced by similar-looking diacritical characters. All vowels are actually pairs of diacritical vowels, allowing Pseudoese to emulate the type of character expansion that is typical when English is translated into Western European languages. See the following “before and after” example.

The original string:

The same string pseudo-translated into “Pseudoese”:

One advantage of using Pseudoese is that the source text is still readable, but easily identifiable as being localized. This technique would not be appropriate for identifying potential Asian-language-specific issues, however.

Lorem Ipsum
Lorem Ipsumis similar to the “greeking” technique, but is designed to emulate the look-and-feel of actual text. This technique is mostly useful in the context of document publishing, and can be used to help identify what impact the changing of text will have on the layout of a document. For example, will a +30% expansion of the text wreak havoc on the plan to keep a document as a single-fold brochure for all languages?

Note that since Lorem Ipsum text is composed of all Roman characters, the technique wouldn’t be helpful for identifying potential encoding issues at all.

String Identifiers
This technique is especially useful for answering the question “where did this string come from?” when looking at a runtime version of a piece of software – especially if there are lots of potential places from which the string might be getting pulled.

Consider the following string from the file, “misc_strings.rc”:

IDS_STRING_GET_COFFEE “This process is going to take awhile. Good time for coffee?”

Adding a string identifier to the string itself would transform it to something like this:

IDS_STRING_GET_COFFEE “misc_strings.rc:IDS_STRING_GET_COFFEE:This process is going to take awhile. Good time for coffee?”

If you create a runtime version from a set of resources that are processed like this, you’d be able to identify where the text in a runtime dialog was coming from. While potentially useful outside of the context of a localization project, this technique can also be useful during testing of a localized user interface by explicitly identifying which string from a series of identical-looking but unique strings need to be changed.

Note that adding string identifiers is probably best used in combination with other, more language-oriented techniques (like Pseudoese or Machine Translation).

Simple Prefixes / Suffixes
Similar to string identifiers, the addition of prefixes and suffixes to strings is probably a technique that you’d want to use in conjunction with something else. Its utility is most apparent for projects where translated strings are at risk of getting cut-off in the display – because their widths are being explicitly defined – and are at risk of going unnoticed by someone who may not read the language.

The same coffee string with a prefix and suffix:

IDS_STRING_GET_COFFEE “X_This process is going to take awhile. Good time for coffee?_X”

Machine Translation
Probably the closest thing to emulating what would actually happen during the real translation work – and the best way to test for Asian-language-specific issues – would be to run the text through an actual machine-translation pass. With the wide world of machine-translation options available (again ranging from the very simple to the quite elaborate), the actual details for this technique will be dependent on what machine translation tool is used.

Like all of these techniques, however, the manner in which machine translations are applied should try to be representative of the actual process that will be used during the actual translation work.

If translators will be translating XLIFF files, for example, it makes sense to introduce the pseudo-localization step on the XLIFF files as opposed to the source files themselves. If translators are going to be making use of translation memories, it may make sense to introduce the pseudo-translated string by way of a pseudo-translated TM.

That’s the high-level tour! As always, Lionbridge is willing and able to help you to develop and execute an appropriate pseudo-localization strategy for your next localization project. And remember to visit the Lionbridge Knowledge Center whenever you’re looking for more translation and localization strategies.

Your feedback is encouraged and appreciated. Thanks!

Leave a Reply