Wikimedia Language and Product Localization/Newsletter/2024/January/zh

亮點

經過維基孵育場五年的發展 豐語維基百科正式上線
豐語維基百科誕生於2018年巴塞隆納維基媒體黑客松,現已經從孵育場畢業後正式上線!在和多哥有數百萬人使用豐語,並有許多人以豐語為母語。在,豐語更作為國家語言被廣泛使用。這個全新的豐語維基百科花了五年時間。由於許多人不會書寫豐語,而且非洲的本土語言比其他語言受到較少關注,因此一個社群來支持該對於發起該的社群成員來說是一項艱鉅的挑戰。[1]此外,了解更多關於最近獲得批准的四個新的維基媒體語言(Dagaare維基百科、標準摩洛哥塔馬塞特文維基百科、巴塔克托巴文維基百科和班亞爾文維基語錄)。
Introducing Sentencex, tool for enhanced Natural Language Processing (NLP) and multilingual sentence extraction
The language team has just launched a new tool called Sentencex, now available in both Python and JavaScript. Sentence segmentation, an essential part of natural language processing, involves breaking down a text into individual sentences. This process has various uses and helps improve language functionality and speed, especially in Wikimedia's new machine translation system (MinT) and the section translation project.[2]
You can find the tool on GitHub and see it in action.
MinT translation service available to 55 new Wikipedias, doubles content, ranks second in usage

The new machine translation service, MinT, which now offers machine translation for the first time to 55 Wikipedias, has had a positive impact on Wikimedia language communities. This extensive language support has nearly doubled published translations, and articles created using MinT have a low deletion rate (1.72%). MinT is now used in 8% of the translations published with Content Translation, making it the second most used translation service in Wikipedia, after Google Translate, in just a few short months.[3]
Open language identification service now available for 200+ languages
The Language team created an open language identification service to automatically detect the language in which a given text is written to simplify users' interaction with Wikimedia platforms. The service supports the detection of 201 languages, and anyone can access the API to use the service. Currently, the final checks for the service and the evaluation of its ability to withstand high traffic are underway.[4]

Wikisource now recognizes handwritten texts with Transkribus
Handwritten text recognition is now active on Wikisource through the Transkribus OCR Engine. Transkribus, an AI-powered platform, simplifies the handling of handwritten or printed manuscripts by offering various models tailored to different writing scripts, historical periods, and other factors. The Transkribus engine is now available as an option alongside Google and Tesseract and it is currently operational on the Wikisources listed on this page.[5]
Unified section translation dashboard for desktop and mobile users
The Language team is actively working towards the adoption of a unified section translation dashboard for both desktop and mobile users. Originally designed for mobile in Content Translation, it's now being refined to serve as a unified dashboard across various platforms, providing an improved translation environment. Currently in beta mode, you can test it on Test Wikipedia or any Section Translation-enabled wiki using the URL parameter "unified-dashboard=true" (e.g., ig.wikipedia.org/wiki/Special:ContentTranslation?unified-dashboard=true).
This unified dashboard offers a seamless cross-platform translation experience. Users can start translating on their desktop and continue on a mobile device, or vice versa. It also supports section translations on the desktop, giving users flexibility across devices.
- 語言社群會議即將在2月21日 (三) 12:00至13:00 UTC舉行。如果您想參加,請在此連結報名。想分享您的的技術更新嗎?請隨意將其新增至議程文件的Technical updates章節。
- 如果您錯過了2023年11月的第一次語言社群會議,您可以透過觀看影片錄影和閱讀筆記來跟上進度。
- 如果您正在尋找技術性任務,請查看各個Wikimedia Phabricator上的語言專案儲存庫中尚未被指派的簡單任務。
- 如果您正在尋找編輯和翻譯文章和介面訊息的工具,您可以使用Translatewiki.net上的內容翻譯和Special:Translate工具。這些工具讓處理不同語言的內容變得更加容易。
請繼續關注下一期!你可以訂閱這份電子報。
參考資料
- ↑ https://diff.wikimedia.org/2023/10/04/welcome-to-the-fon-wikipedia/
- ↑ https://diff.wikimedia.org/2023/10/23/sentencex-empowering-nlp-with-multilingual-sentence-extraction/
- ↑ https://diff.wikimedia.org/2023/11/20/unlocking-the-worlds-languages-in-wikipedia-a-look-into-mints-impact-so-far/
- ↑ https://diff.wikimedia.org/2023/10/24/open-language-identification-api-for-200-languages/
- ↑ https://diff.wikimedia.org/2023/07/13/enabling-handwritten-text-recognition-on-wikisource-using-transkribus-ocr-engine/