ROSTOCK — The German Research Center for Artificial Intelligence (DFKI) and the Hessian Center for Artificial Intelligence have launched the Occiglot initiative, unveiling open-source language models tailored for European languages. This pioneering project aims to address the challenges faced by less common languages and non-commercial projects in accessing advanced language models.
Since the emergence of generative language models like ChatGPT, there has been a growing need to democratize access to linguistic resources. However, the dominance of English on the internet and the resource-intensive nature of language model development have posed significant barriers. The Occiglot initiative seeks to bridge this gap by fostering collaboration among researchers, language experts, software developers, and users.
Supported by DFKI, hessian.AI, and the Federal Ministry of Education and Research (BMBF), Occiglot represents a collaborative effort to promote linguistic diversity and digital sovereignty in Europe. “The development of European language models is key to maintaining Europe’s academic and economic competitiveness and its digital and AI sovereignty,” commented Prof. Dr. Georg Rehm, Principal Researcher at DFKI.
European Research Collective and Call for Collaboration
Occiglot serves as an open European collective, bringing together researchers from esteemed institutions such as DFKI, Hessian.AI, TU Darmstadt, and the Catholic University of Leuven. With a focus on inclusivity and collaboration, the initiative actively seeks partnerships within the international AI and NLP community.
Kristian Kersting, head of the Fundamentals of Systemic AI research department at DFKI, emphasized the importance of collaboration in driving innovation: “We need more synergies to exploit the enormous potential for Germany and Europe. We need a strong AI ecosystem with accessible models and computing infrastructure.”
Occiglot-LLM Release v0.1
The release of Occiglot-LLM v0.1 marks a significant milestone in the initiative’s journey. Ten language models, each comprising seven billion parameters, have been published, with a focus on the five largest European languages: English, German, French, Spanish, and Italian. These models, available under the Apache 2.0 license on the Hugging Face platform, represent the first step towards building comprehensive language models for European languages.
Roadmap
Looking ahead, Occiglot aims to develop language models that support all 24 official languages of the European Union, as well as several unofficial and regional languages. With a robust roadmap in place, the initiative is poised to expand its corpus and foster a vibrant community of contributors.
As Occiglot paves the way for linguistic inclusivity and accessibility, it reaffirms its commitment to promoting diversity and innovation in the field of artificial intelligence.