Subscribe to our blog

Your email:

Real-Time Translation

Learn about the new Lionbridge partnership with IBM

Real-Time Translation Blog

Lionbridge Translation and Localization

Current Articles | RSS Feed RSS Feed

Machine Translation: What Is It?

  | Share on Twitter Twitter | Share on Facebook Facebook | Buzz This  Google Buzz | Submit to Digg digg it |  Add to delicious  delicious |  Submit to StumbleUpon StumbleUpon |  Share on LinkedIn LinkedIn |  Share On Technorati Technorati | 

Please welcome a newcomer to our Translation Team Blog: Stuart Sklair, Lionbridge Solution Architect. Stuart covers Machine Translation today, and introduces the latest Lionbridge FAQ, "What's Machine Translation." 


Machine Translation is the use of computer software to translate one language into another. It's been around since the 1960s, with several ups and downs in terms of research and development. Fictional stories abound of the early days of MT with phrases such as "out of sight, out of mind" being translated as "blind fool," and "the spirit is willing but the body is weak" becoming "the vodka is strong but the meat is rotten."

Today, MT is gaining public exposure as the Web has become an essential part of global commercial communication. MT "engines" are being incorporated in browsers and search engines. Improvements in MT's output quality have helped in this resurgence, both for general purposes and for use by professional language service providers (LSPs).

MT systems first came about as a combination of some computational linguists' attempts to understand human language through computer models and the US Department of Defense's need to translate Russian documents into English during the Cold War. Since then, a number of different approaches to MT have emerged: 

  • Rules-based. In this approach, just like you might have learned to diagram the grammatical structure of sentences in school, the software attempts to deconstruct the grammar of the input language to build a grammatical model of each sentence. The grammatical model of the input language is then mapped to the grammatical model of the output language.
  • Statistically-based. Here, the MT engine is trained based on large volumes of existing content and its translation known as "bilingual text corpora." The MT engine uses the large volumes of data to create statistical rules. These rules determine the appropriate selection based on the probability that given a certain word, phrase, or sentence in one language, a particular word, phrase, or sentence is the correct translation in the target language. While this approach is not language specific, large volumes of electronic text of similar content are required to get the best quality output from the MT engine.
  • Example-based. Similar to the "statistically-based" MT approach, a bilingual text corpus is required. However, in the example-based approach, the corpus is used as a knowledge base to derive translations directly from examples of parallel structures of source text and translation found in the corpus.

Although there are frequent, heated debates about which MT philosophy is more effective, when it comes to MT for commercial translations, it's probably not worth getting too hung up on which approach is used. Typically, language service providers use MT as one element of a complete quality translation process.


Machine Translation FAQ Read the Full FAQ
To learn more about MT and how it works, read our latest Translation FAQ, "What's Machine Translation?"


Lionbridge publishes Translation FAQs regularly, so:

Comments

Nice explanation of how translation software works. I would imagine that no one method is best but each method has it's own strengths and weaknesses. Is there software that plays to the strengths of each method?
Posted @ Wednesday, March 17, 2010 10:49 AM by traductor
Hat Tip to Rafa Moral our MT expert for his help in drafting this response. 
 
There are quite a number of MT engines out there using the methods discussed in the blog in their purist form, as a variation of each approach, or in some hybrid version. 
 
For the purpose of replying to your comment, we have just picked out three MT systems: 
 
Google: has a purely statistical (SMT) based engine. 
 
Systran: used to be purely rule base (RBMT) the latest enterprise version 7, is a hybrid - that is both statistical and rule based. 
 
Lionbridge: Our own MT engine is purely RBMT. It clearly separates out the linguistic elements from the core MT engine. This allows our linguists to modify the dictionaries and grammar rules easily without the need for deeper knowledge of the MT engine itself. 
 
Most MT companies are starting to say that they have or will soon have hybrid systems, but we believe it will be easier for RBMT like ours systems to add on a statistical component than for SMT systems to add the rules based component.
Posted @ Thursday, March 25, 2010 4:59 AM by Stuart Sklair
Post Comment
Name
 *
Email
 *
Website (optional)
Comment
 *

Allowed tags: <a> link, <b> bold, <i> italics