Language Onboarding and Development
Language Onboarding & Development is an initiative led by the Wikimedia Language and Product Localization aimed at facilitating interventions and engagements for new language communities and those with a smaller presence on our platform through tooling and support systems to enable them to move towards their goals. This initiative contributes to existing efforts to make it as easy to consume and contribute useful knowledge in all the world’s languages as it is in the largest ones. We work closely with the Wikimedia communities on language- and region-specific needs within the products intended for multilingual use. Additionally, we identify gaps in product workflows and periodically report the findings that affect the use of the product in certain languages or regions, and surface special needs that are necessary for a uniform multilingual product experience.
Background
The Knowledge gaps Taxonomy by the research team categorizes content gaps into "Representation" and "Interaction". Representation gaps are categorized into geography, language, socio-economic status, important topics, etc. Full paper, pages 22 onward. Page 23 highlights the language gap and defines it as referring to the difference in content coverage across different languages. Additionally, the movement strategy recommendations, Improve User Experience recommends “Clear pathways for advancing new wiki proposals (including new language versions) and for reusing community-developed software features on them.”
State of Language Wikis
As of January 2025, there are 342 languages that have at least one content project hosted by Wikimedia (source), compared to 7,000+ languages in use today, according to Ethnologue. ( NB: many of these languages might be non-written languages, or the language community might not want a Wikimedia content project.) Hundreds of these languages are currently being worked on in the Incubator with the goal of graduating. WMF’s vision asks us to imagine a world in which humans can “freely share in the sum of all knowledge.” As we consider language as knowledge and as a means of sharing knowledge, understanding the gaps we have in terms of language coverage and representation is vital.
As of February 2025, according to the List of Wikipedia, there are 100+ languages hosted on the Wikimedia platform with 20 or fewer active editors, even though some of them are languages spoken by many millions of people. The reasons why these languages have low Wikimedia presence and generally low presence in online and offline written publishing are systemic, complicated, and diverse. There are some things we can do to make the development of Wikimedia content in these languages more accessible. Some are in the space of product design and infrastructure, and some of them are more in the space of human-to-human community support.

New language versions go through several stages before becoming wikis (reference). Across each of these stages, they face various challenges, both social and technical. Existing research and materials reveal technical challenges in every phase of language onboarding: adding new languages to the Incubator, complexities in developing and reviewing content, and a slow process in creating a wiki site when a language graduates from the Incubator. When a wiki is fully created, the real life of the community begins: writing articles, communicating with each other, inviting more writers, growing content, etc.
Areas of Work
- Strategic interventions aimed at contributing to the Foundation's Product and Tech pillars to achieve the strategic goals for the department and help languages through tooling and support approaches to develop their languages, primarily through content growth.
- Technical interventions and engagements aimed at addressing existing gaps in technical workflows or features related to language wikis. This might include handling requests appropriately, ensuring important bugs are fixed, etc., among other areas. See here. These include adding new languages (specifically to Language data, Translatewiki, Mediawiki core when specifically asked by language speakers), occasional removal of languages, support for new languages including keyboard support, namespaces configuration, etc.
Disclaimer: The process of onboarding languages entails different stages as shown in the diagram above. There are areas that are outside the scope of this initiative, including evaluating the feasibility of new languages (owned by the Language committee), Wiki creation, Incubator maintenance, etc.
Resources
- Languages Onboarding Experiment 2024 – Executive Summary a cross departmental experiment by WMF teams
- Research findings from the Language Diversity Hub examining challenges faced by contributors to small language versions of Wikipedia.
- Session on increasing language diversity on Wikimedia projects by Sadik Shahadu at Wikindaba 2023.
- Slides from a Celtic Knot 2024 on the current and future state of language incubation, research findings and potential improvements.
- Slides from a Wikimania 2024 session that focused on the state of language technology and onboarding at Wikimedia. Link to the video recording.
- WMF's study on Incubator and language representation across Wikimedia projects.
- Gathering at WikiIndaba organized by User:MMunyoki (WMF) focusing on Language onboarding & incubator meetup.