Language and speech recognition technology equal to humans gets closer

This is our last post of the year aside from our job posts. We thank all of our subscribers and readers for a great year.

Note: The Posse List team will be posting jobs (the market is busy) over the holiday period.

For those that can, enjoy the holidays.

                                         – Angela Gambetta and The Posse List Management Team



Language machine translation

24 December 2014– Ah, technology. So hard to keep up. Just on the medical front alone the cost of manipulating life’s building blocks is falling at such a rapid clip … faster than Moore’s Law predictions for silicon … that we can now tailor drugs to match a person’s genome, and fuse technology with living matter.

And so it goes in the legal field, especially language translation.

This past year we attended numerous Google, Microsoft and general technology events focused on language translation. We have run several posts on Google’s quest to end the language barrier and the possible effect on non-English e-discovery document reviews. We wrote about the long way companies had come toward making automated language translation easier, faster and more reliable, a world of seamless and immediate translation still out of our grasp.

But … things are getting better and better. This year Microsoft announced that nearly instantaneous speech translation was available on Skype under plans by Microsoft to “cross the language boundary” on several of its products. Rival services are being developed by Google and NTT DoCoMo. Microsoft has been experimenting with machine translation for more than a decade and has combined it with separate work on speech recognition accuracy. Microsoft’s Cortana voice assistant for Windows Phone already understands voice patterns. Microsoft has been developing a “deep neural net” that can recognize speech and learn from languages.

Easier, faster and more reliable translations, a world of seamless and immediate translation within our grasp. Yes, still the usual issues: context, syntax, intonation and ambiguity. Because a computer system is not context aware, it could grab the wrong word. Additionally, it doesn’t understand the language at all. It just tries to decode words, instead of decoding the meaning. Many languages are not similar at all, and do not have corresponding common words and/or their usage is not the same at all.

But the technology continues to improve. In Paris we attended a Microsoft Research event for its Natural Language Processing group and saw the further developments (over last year) on their Machine Translation (MT) project which is focused on creating MT systems and technologies that cater to the multitude of translation scenarios today, including legal. The key is Statistical Machine Translation (SMT) and that breaks down into areas such as syntax-based SMT and phrase-based SMT. Plus there is Word Alignment and Language Modeling technologies. This summer we were in Israel at a Google workshop on advanced language modeling.

These toolkits mean that problems with morphology, syntax, semantics and word sense disambiguation are being solved. Not solved yet, but coming. For the vendors and the multinational companies who need it, the business model is a no brainer. The value of an automated, instant, seamless translation platform to a corporation means the vendor that solves it could charge a substantial amount of money for such a tool.

And not just legal. At the Mobile World Congress in Barcelona and at the Consumer Electronics Show in Las Vegas we saw the recent breakthroughs in speech recognition and artificial intelligence that will soon make gadgets dramatically better at understanding people. This new breed of highly competent machines, which are able to not only hear us but to understand context and nuance, are estimated to be 2-3 years away according to engineers at Google.

Google is probably ahead of the pack. They are working on what we think is a pretty ambitious research project … creating speech systems that plug into a company’s data bases. One project currently being tested in the lab allows computers to hear and essentially “think” about what people say into Google’s digital ear. Recent inventions in the field of speech and machine learning should lead to major changes in how we murmur, shout, question and interrogate our devices.

And over at Apple, Siri engineers are working toward speech recognition that’s smart enough to engage in authentic conversations with users. This kind of conversational interaction is where the leading edge is right now.

Most analysts tell us that the “Big Bang” in this area occurred two and a half years ago when researchers from Google and the University of Toronto published an influential paper about using “deep neural networks” to model speech in computers, and followed this up several months later with another paper resulting from a collaboration with Microsoft and IBM. This led to what Google engineers call the biggest single improvement in 20 years of speech research. The findings resurrected a decades-old invention around digital neural networks. The technology tested well in the 1980s at predicting and analyzing large fields of data, but performance was hindered by the wimpy speed of computers at the time. Neural networks only became a viable option recently, following a massive speed-up in computer processing and in the development of new software approaches.

Google’s lab projects have built off of this research. They have moved on from an older method, called feed-forward neural networks, in favor of recurrent neural networks. Why important? The switch allows the system to store more information, and process longer and more complex sequences. Google’s breakthrough results from a simplification of the underlying code that will let its software hold more ideas and concepts within the same system, making it easier to ask complicated questions and get sensible answers.

And it explains why all of this tech has moved beyond the written word translation to today’s new mantra of context, physical location and certain other things it knows about the speaker: to make assumptions on where a conversation is going and what it all means. Why, just like humans do!! What it means … and this is very simplified … is that a computer, hearing a word, will be able to recognize the context from the sentence and then layering that information over geography. Extremely difficult and it will take time to perfect.

And for Google which has the money/people firepower for an unprecedented number of technological evolutions in speech recognition, their project have been yielding techniques that are making their way into all of the other elements of Google’s mammoth brain. Pretty much like space travel research in the 1960s and 1970s for those that remember: you build something to go to the moon, and in the meantime, you develop a hundred other technologies that are useful.

In the contract attorney world, predictive coding platforms … the technology making document review faster, better, cheaper … has been tagged as “the great disruptor” in the e-discovery market. Assuming the power (read: financial vested interests) struggle among/between law firm-vendor-corporate client ever resolves itself. But the technology is driving change. Staffing agencies and e-discovery vendors and even corporate in-house legal departments are utilizing “data swat-teams” comprised of contract attorneys who possess the tech skills + the analysis ability with a greater emphasis on data search specialists who have the ability to conduct complex searches, analyze information and generate reports.

It has been said the only contract attorneys deemed “safe” are those who have fluency in one or more non-English languages. Non-English document reviews were up 38% last year and comprised 72% of all document review jobs.

Ah, but times might just be a-changin’ even for them. Just as predictive coding has the potential to rend the English language document review market those nasty algorithms and artificial intelligence manifestations are making their way across all languages. As Marc Andreessen said several years ago in his prescient essay Software is eating the world “all of the technology required to transform industries through software is finally working and can be widely delivered at global scale … don’t be on the wrong side of software-based disruption”.

About the Author Gregory P. Bufithis, Esq.

Gregory P. Bufithis is the Founder & Chairman of The Posse List. He has over 25 years of experience in intellectual property law and digital media in the U.S. and Europe.

follow me on: