About to hit the gold rush of NLP startups — here’s why – TechCrunch

to remember natural language processing? NLP originated several years ago, but only in 2018 did AI researchers prove that it is possible to train a neural network once on a large amount of data and use it over and over for different tasks. In 2019, Google’s GPT-2 from Open AI and Google’s T5 came out, showing that it was amazingly good (now integrated into Google Duplex, pictured). Rather, concerns have been raised about its possible misuse.

But since then, things have gone very well.

2021 saw a veritable ‘Cambrian explosion’ of NLP startups and big language models.

This year, Google released LambDa, a large language model for chatbot apps. Then he launched Deepmind Alpha Code and later Flamingo – a language model capable of visual comprehension. In July of this year alone, big science project Bloom released a massive open source language model and Meta announced that they had trained a single language model capable of translating between 200 languages.

We’ve now reached a kind of tipping point where we’ll see many commercial applications of NLP – some using some of these open source and publicly available platforms – hit the market. You could almost say Goldrush has started startups trying to build on this technology, with an arms race developing between providers of big language models.

One of those startups is Humanloop, a university college of artificial intelligence that claims to make it easier for companies to embrace this new wave of NLP technology through a suite of tools that help humans “teach” AI algorithms. This means that a lawyer, doctor or banker can put into the platform a portion of the knowledge that the software then applies at scale across a large data set, allowing for a broader application of AI across various industries.

It has now been drawn in a $2.6 million seed funding round led by Index Ventures, with participation from Y Combinator, Local Globe and Albion.

Founded in 2020 by a team of prominent computer scientists from UCL and Cambridge, alumni of Google and Amazon, Humanloop Apps says, may include building a picture of the national real estate market from unstructured data on the Internet; reading electronic health records to identify people who may be candidates to try new treatments; And even edit comments on Facebook groups.

“People would be shocked if they knew what language-based AI is capable of now,” CEO Reda Habib said in a statement. “But getting data in a form that an algorithm can use is the biggest challenge. With Humanloop, we want to democratize access to AI and enable the next generation of smart, self-service applications — by allowing any company to take their domain expertise and efficiently slice it into a machine learning model. .”

Humanloop claims its success is the growth of “probabilistic deep learning,” where algorithms can work on what they don’t know, by tuning in noise in data sets, finding the good stuff, and asking humans for help in the parts they don’t know. t understand.

Other startups that are building their own large language models and putting them behind APIs include cohere AI ($164.9 million in funding) and Open AI GPT-3. snorkel AI ($135.3 million in funding) is also a new startup in this field.

However, Humanloop says it is less focused on developing models and more on the tools needed to adapt them to specific use cases.

Adds Erin Price-Wright, partner at Index Ventures who led the investment. “In fact, machine learning itself is becoming increasingly commoditized and available, but it’s still really difficult for non-technical people to transfer their knowledge to a machine and help the algorithm improve its model.” This is why Humanloop allows people to modify the data.

If the NLP gold rush is indeed on its way, expect a whole host of other startups to appear soon…