In neural machine translation, the system is trained with large numbers of texts in one language and corresponding translations in another, to create a model for moving between the two. But when it’s fed nonsense inputs, Rush said, the system can “hallucinate” bizarre outputs—not unlike the way Google’s DeepDream identifies and accentuates patterns in images.
“The models are black-boxes, that are learned from as many training instances that you can find,” Rush said. “The vast majority of these will look like human language, and when you give it a new one it is trained to produce something, at all costs, that also looks like human language. However if you give it something very different, the best translation will be something still fluent, but not at all connected to the input.”
Sean Colbath, a senior scientist at BBN Technologies who works on machine translation, agreed that strange outputs are probably due to Google Translate’s algorithm looking for order in chaos. He also pointed out that the languages that generate the strangest results—Somali, Hawaiian and Maori—have smaller bodies of translated text than more widely spoken languages like English or Chinese. As a result, he said, it’s possible that Google used religious texts like the Bible, which has been translated into many languages, to train its model in those languages, resulting in the religious content.