File:Beyond Automatic Translation, Aligning Wikipedia sections across multiple languages.pdf

Summary

Description
English: https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#March_2018 Sections are the building blocks of Wikipedia articles. For editors, they can be used as an entry point for creating and expanding articles. For readers, they enhance readability of Wikipedia content. In this talk, we present an ongoing research to align article sections across Wikipedia languages. We show how the available technology for automatic translations are not good enough for translating section titles. We then show a complementary approach for section alignment, using Wikidata and cross-lingual word embeddings. We will present some of the use-cases of a methodology for aligning sections across languages, including improved section recommendation, especially in medium to smaller size languages where the language itself may not contain enough signal about the structure of the articles and signals can be inferred from other larger Wikipedia languages.
Date
Source Own work
Author Diego (WMF)

Licensing

I, the copyright holder of this work, hereby publish it under the following license:
w:en:Creative Commons
attribution share alike
This file is licensed under the Creative Commons Attribution-Share Alike 4.0 International license.
You are free:
  • to share – to copy, distribute and transmit the work
  • to remix – to adapt the work
Under the following conditions:
  • attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
  • share alike – If you remix, transform, or build upon the material, you must distribute your contributions under the same or compatible license as the original.
Category:CC-BY-SA-4.0#Beyond%20Automatic%20Translation,%20Aligning%20Wikipedia%20sections%20across%20multiple%20languages.pdf
Category:Self-published work Category:Wiki Research Category:Documents in English Category:English-language PDF files Category:Wikimedia Research Showcase Category:Knowledge Gaps Category:Wikimedia multilingualism
Category:CC-BY-SA-4.0 Category:Documents in English Category:English-language PDF files Category:Knowledge Gaps Category:Self-published work Category:Wiki Research Category:Wikimedia Research Showcase Category:Wikimedia multilingualism