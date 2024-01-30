In an academic paper recently published on arXiv, researchers have highlighted the deep-seated impact of Machine Translation (MT) tools on the content of the world wide web. The study found that a significant fraction of online content, approximately 2.19 billion out of the 6.38 billion sentences scrutinized, is translated into multiple languages using MT tools. These multi-way translations, especially in less common languages, are often of inferior quality, shorter, and more predictable than their two-way counterparts.
English: The Dominant Language of Machine Translations
The study underscores the fact that the content is predominantly generated in English and is then translated into other languages. This suggests a noticeable dominance of AI-generated material in less frequently used languages on the web. This scenario could potentially amplify the challenges AI currently grapples with in English-language material, as the web content in rarer languages is largely characterized by machine-generated translations.
Averting the Perils of 'Model Collapse'
The paper also brings to light the potential problem of 'model collapse' in generative AI. This phenomenon occurs when AI-generated content is used for training, resulting in defects in the newly created AI models. The issue is more pronounced for languages with less online material available for training. While multi-way sentences can aid in identifying AI-produced content for filtering purposes, implementing such a process would necessitate higher energy consumption, thereby escalating the environmental impact of training generative AI systems.
John Tinsley's Insight on AI in Language Industry
In related developments, John Tinsley, the VP of AI Solutions at language AI agency Translated, has shared his views on the challenges, advancements, and future trajectories of AI in the language industry. Tinsley has particularly emphasized the influence of large language models on translation and the complexities associated with multilingual content generation. He also unveiled a new product initiative, 'Human in the Loop', which aims to automate the process of enhancing machine translation through continuous fine-tuning based on user feedback and human data.