Machine translation has come a long way in the past few years. From barely intelligible gibberish in some languages a few years back, machine translation now has quality that sometimes rivals human beings. The quality that you get from Microsoft Azure Cognitive Services is much higher than what you get from the free translation engines.
SharePoint’s language support has also grown in recent years, with a few more languages being added. This post was originally written in the context of specific SharePoint products, PointFire 365 and PointFire Power Translator, which both support a lot of languages, but has been edited to be less about those products and more about Microsoft language technology.
SharePoint itself supports 51 languages, as does PointFire 365. When a modern Communication or Teams site is created, all 51 languages are activated automatically, as opposed to classic where alternate languages had to be added individually, or on-premise where before that where 50 individual language packs first had to be installed on all the servers in the farm.
The Azure Translator Text API v3, part of Cognitive Services, supports even more languages, but it is a different list of languages although there is considerable overlap. If you are unaware of the PointFire products, PointFire 365 is the one that handles localizing the user interface and filtering content by language, while PointFire Power Translator is the one that carries out machine translation of documents, classic and modern pages and metadata in SharePoint Online and OneDrive. The overlap, where a language is supported by both SharePoint and Azure means documents and SharePoint pages immediately show up in any of those languages. However this table is also useful for any project combining SharePoint and Azure text translation.
The quality of machine translation can vary by language. In the back end, PointFire Power Translator uses one of four different translation technologies, powered by Azure translation technologies. The technology that most people are used to, which powered the old Bing and Google translation engines, is statistical machine translation. This was the state of the art until a few years ago, using syntax-based statistical translation models with a few additional tricks to improve translation quality. It trains on large corpora of text that is already translated, trying to mimic the translation process using statistics. This technology got better and better over the years. For several languages the best quality available uses these statistical models. For others there are even higher quality models available.
|Chinese Simplified||Neural HP||Y|
|Portuguese (Portugal)||Neural (Portuguese)||Y|
|Serbian (Latin, Serbia)||Statistical||Y|
Around 2015, based in large part on algorithms developed at the University of Toronto and Université de Montréal, Neural Network models emerged as a better alternative to statistical models. These deep networks require enormous amounts of computing power to train. Interestingly, large software companies like Microsoft and Google tend to publish their results and make their insights and tools available to each other, so that they can each improve on one another’s work. Because of this, neural machine translation technology has progressed quickly. It still requires massive amounts of computing power to train such models, something like 100 processors for a week for each training run, but Microsoft is a leader in technologies to use less processing power so it uses a fraction of that. For most major languages, the best quality model is using neural translation.
Then in March 2018 Microsoft announced it had achieved “Human Parity” for some translation tasks. Mind you this is a controversial claim and neither the humans with whom parity was achieved nor the people doing the rating of translation quality were professional translators, but we use the “HP” label to refer to the technology being used, not necessarily to the quality level. These initial engines were not suitable for production, they were far too massive. Microsoft then improved on the size and performance of these complex models, using groundbreaking techniques. For example they use a large deep neural network to train a much faster wide shallow network, gaining a huge performance improvement and improved translation quality. They train a separate neural network to detect and correct errors in the input data. They use the trick that so many of us have used to hilarious effect: it translates sentences from English to another language, then translates that translation back to English to see whether it is the same. These new “human parity” engines have been sweeping international competitions of translation quality and, as opposed to a lot of the other entries, these are available in production. Chinese and German were released to general availability in late November 2018, and French, Hindi, Italian, Spanish, Japanese, Korean and Russian, to/from English are now available. More details on the Microsoft Translator blog if you are interested. In the August 2019 competition, in which two groups from Microsoft entered, “superhuman” quality started being achieved, that is to say machine translation had a higher quality score than the reference humans.
The highest quality of any translation engine on earth is still not good enough for you? We offer an even higher quality. Using Microsoft’s Custom Translator, you can re-train an existing neural machine translation engine using your own professionally translated documents so that it adopts your vocabulary and your style. If you’re keen, you can even train it on a different dialect or language. PointFire Power Translator supports these custom models as well as the standard ones in the table above.
The table shows there is a lot of overlap between the languages supported by SharePoint and those supported by Azure Translator Text API. Some of them are an exact match, but some of them have a mapping that you may need to be aware of. For example, SharePoint supports Dari but not Persian, while the translator API supports Persian and not Dari. Written Persian and written Dari are close enough that we have declared them to be the same. When you check the translations, keep it in mind. SharePoint supports two versions of Portuguese. The translator supports only one. It is a hybrid but it looks more like Brazilian Portuguese. We use the same engine for both, but it’s a good idea to check the translations. For Norwegian, SharePoint explicitly uses bokmål, while the translator uses a different language code that may refer to nynorsk. From the limited knowledge of Norwegian available to our team, the language code looks like a mistake and we believe that both use bokmål.
There are a number of languages for which PointFire Power Translator can provide translations of SharePoint or OneDrive documents, but which SharePoint itself does not support. These languages include Icelandic, Kiswahili, and Maltese. In addition to Earth languages, there are in theory two versions of Klingon, one that uses the Latin alphabet and the other that uses the Klingon scripts. We regret that the version using the Klingon script, if you have installed the correct Klingon font, is no longer working 🙁 and we don’t know why. The other one does work. Qapla’!