Textgain is developing an LLM to detect hate speech

In a world where online discourse can quickly devolve into a digital battlefield, one AI startup is stepping up to play referee.
Textgain is developing an LLM to detect hate speech

The Belgian AI startup Textgain is developing an advanced language model named CaLICO, designed to detect hate speech across all 24 official European languages. The European Commission has provided a budget of €250,000 and two million hours of development time on the Lumi supercomputer, located in Finland, to support this ambitious project. The model is expected to be ready by the summer of 2025.

Textgain, a spin-off from the University of Antwerp, has taken on this significant challenge after winning the Large AI Grand Challenge, a prestigious competition organized by the European Commission. CEO Guy De Pauw emphasizes Textgain’s unique approach, stating, “We have a unique position in the saturated market of AI providers because we develop our own language model instead of building on existing AI models.”

Hate speech detection presents a particular challenge for existing language models like those developed by OpenAI and Google. These commercial models avoid processing toxic language, making it difficult to utilize them for hate speech detection. Textgain aims to address this gap by creating a model that can handle and process toxic content without generating it.

The development of CaLICO involves a team of annotators from different countries mapping the cultural nuances of hate speech. This approach ensures that the language model considers cultural contexts when identifying hate speech, which is critical in a multilingual and multicultural region like Europe.

According to De Pauw, Textgain’s affiliation with the University of Antwerp is a significant advantage. “This connection allows us to develop technologies addressing social problems,” he explains. The startup has already demonstrated its capabilities with previous AI tools like Klare.ai, which processes company documents without sharing sensitive information, and Rhetoric, which detects hate speech in Flemish media reports.

The European Union’s Digital Services Act mandates online platforms to eliminate the spread of hate speech quickly. CaLICO is poised to play a crucial role in helping these platforms comply with such legislation. Textgain’s focus on creating a reliable and precise model, developed in collaboration with policymakers, security services, social organizations, and scientists, sets it apart from generic solutions offered by larger commercial entities.

Redouan el Hamouchi, co-founder of Textgain, highlights the importance of multilingualism in moderating online content. “In our digitalized world, there is a growing need for advanced tools to moderate content. Multilingualism is essential in this respect,” he says. The extensive development time on the Lumi supercomputer will enable CaLICO to handle different languages and cultural contexts within the European Union effectively.

Textgain’s efforts align with the EU’s strict standards for AI technology, focusing on transparency, explainability, and ethics. De Pauw emphasizes the company’s commitment to maintaining the highest ethical standards: “Reliability and precision are more important than generic solutions and unrealistic promises.”

Posted by Alex Ivanovs

Alex is the lead editor at Stack Diary and covers stories on tech, artificial intelligence, security, privacy and web development. He previously worked as a lead contributor for Huffington Post for their Code column.