Edit check/Tone Check
This page is currently a draft.
|
Tone Check
Prompt people to write using a neutral tone.
Category:WMF ProjectsCategory:WMF Projects 2025q1Category:WMF Projects 2025q2Category:WMF Projects 2025q2
|
This page holds the work the Editing Team is doing in collaboration with the Machine Learning Team to develop Tone Check (formerly Peacock Check).
Tone Check is an Edit Check that uses a language model to prompt people adding promotional, derogatory, or otherwise subjective language to consider "neutralizing" the tone of what they are writing
A notable aspect of this project: Tone Check is the first Edit Check that uses artificial intelligence. In this case, BERT language model to identify biased language within the new text people are attempting to publish to Wikipedia.
To participate in and follow this project's development, we recommend adding this page to your watchlist.
Status
Currently being worked on
Last update:
- Evaluating model performance in English, Spanish, French, Japanese, and Portuguese.
- Defining the aspects of Tone Check will be configurable on-wiki
- Instrumenting what information about Tone Check (and how people engage with it) will be logged.
Feedback opportunities
- User experience: in what ways do you think the Tone Check user experience could be improved? See talk page for testing instructions.
- Model performance: in what cases does the responses the model offers differ from Wikipedia policies? Sign up here to help evaluate the model.
- Configurability: what aspects of Tone Check will be configurable on-wiki? | T393820
- Logging: what information will be logged (and made available to volunteers on-wiki) about when Tone Check becomes activated? | T395166, T395175
Planning
An A/B test to evaluate the impact of Tone Check.
Please visit Edit check#Status to gain a more granular understanding of where the development stands.
Objectives
Tone Check is intended to simultaneously:
- Cause newer volunteers acting in good faith to add new information to Wikipedia's main namespace that is written in a neutral tone
- Reduce the effort and attention experienced volunteers need to allocate towards ensuring text in the main namespace is written in a neutral tone.
Design
This section is currently a draft. Material may not yet be complete, information may presently be omitted, and certain parts of the content may be subject to radical, rapid alteration. More information pertaining to this may be available on the talk page. |
User experience
This section will contain a general description of the UX (e.g. what needs to be true for Tone Check to be shown, where is it presented, what choice does it invite people to make, what design principles have shaped the current approach, etc.), screenshots of the proposed user experience, and a link to the latest prototype with instructions for people to try.
Language selection
This section to include the languages we're prioritizing for initial experiment, the languages we're planning to scale to next, and why we came to select these languages. See phab:T388471.
Model
Tone Check leverages a Small Language Model (SLM) to detect the presence of promotional, derogatory, or otherwise subjective language. The SLM we are using is a BERT model, which is open source and presents its weights openly.
The model works by being fine-tuned on examples of Wikipedia revisions. It learns from instances where experienced editors have applied a specific template ("peacock") to flag tone violations, as well as instances where that template was removed. This process teaches the BERT model to identify patterns associated with appropriate and inappropriate tones based on Wikipedia's editorial standards. Under the hood, SLMs work by transforming text into high-dimensional vectors, which are then compared with the label, allowing the model to find a hyperplane that splits text into negative or positive cases.
The model was trained using 20,000 data points from 10 languages consisting of:
- Positive examples: Revisions on Wikipedia that were marked with the "peacock" template, indicating a tone policy violation.
- Negative examples: Revisions where the "peacock" template had been removed (signifying no policy violation).
Small Language Models (like the one being used for Tone Check) differ from Large Language Models (LLMs) in that the former are trained to adapt for particular use cases by learning from a focused dataset. In the case of Tone Check, this means the SLM learns directly from the expertise of experienced Wikipedia volunteers. Hence, they offer more explainability and flexibility compared to LLMs. Also SLMs requires significantly fewer computational resources than its larger counterparts.
LLMs on the other hand, are designed to work for general-purposes, with limited context and through a chat or prompting interface. LLMs require a huge amount of computation resources, and their behavior is difficult to explain, due the high amount of parameters involved.
Evaluating the model
Model
Before measuring the impact of the overall Tone Check experience through a controlled experiment in production, the team conducted two evaluations comparing the model's predictions to human-provided labels.
Outlined below is information about the purpose of each evaluation and what we found.
Internal evaluation
Goals
The first evaluation we conducted was internal, involving just the WMF product teams who were working on this feature. This review was meant to:
- Evaluate whether the model aligned with human decisions often enough that we could consider its predictions reliable enough to move forward with a community-involved evaluation process
- Figure out a prediction probability score threshold above which we could consider the model's predictions fairly accurate
- Expose any edge cases or specific types of edits in which the model consistently does not perform well
Process
To assess the above, the team:
- Created a list of 300 sample edits from English Wikipedia.
- Assigned about 30 edits to each of the participants from our teams.
- Asked each participant to go through the sample edits and indicate whether or not they contained promotional, derogatory, or otherwise subjective language that should be flagged by the Tone Check.
- Compared the model's predictions to the human-provided labels.
- Analyzed the cases where the model's predictions differed from the human-provided labels.
Findings
- In English, false negatives (cases where the model predicts there isn't a tone check issue, but a human says there is) are very easily filtered out if we only return predictions with a probability score over 0.55.
- In English, most false positives (cases where the model predicts that there is a tone check issue, but a human says there isn't) can be filtered out if we only return predictions with a probability score over 0.8.
- There are some types of edits that the model has a hard time with - like edits that include a quote, where the quoted language is non-neutral in tone. In these cases, the model's predictions had a lower probability score.
Volunteer evaluation

The results of the internal evaluation gave us confidence to move forward with an external review involving experienced volunteers.. We had enough positive examples (as defined above) to continue evaluating the model in English, French, Japanese, Portuguese, and Spanish.
Goals
This second review meant to:
- Help us confirm that experienced volunteers agree with what the model identifies as promotional, derogatory, or otherwise subjective language
- Evaluate whether the model's predictions about edits in French, Japanese, Portuguese, and Spanish are as reliable as they are about edits in English
Process
To assess the above, the team:
- Created a list of 100 sample edits from each of the aforementioned Wikipedias.
- Invited participants from each Wikipedia community to sign up and participate.
- Provided the participants with a tool they could use to review and label each of the sample edits in the language(s) they were helping with.
- Asked each participant to review and label at least 30 sample edits.
- Compared the model's predictions to the human-provided labels.
- Analyzed the cases where the model's predictions differed from the human-provided labels.
Findings
#TODO
User experience
The viability of Tone Check, like the broader Edit Check project, depends on the feature being able to simultaneously:
- Reduce the moderation workload experienced volunteers carry
- Increase the rate at which new(er) volunteers contribute constructively
To evaluate the extent to which Tone Check is effective at the above, the team will be conducting qualitative and quantitative experiments.
Below you will find:
- Impacts the features introduced as part of the Edit Check are intended to cause and avert
- Data we will use to help[1] determine the extent to which a feature has/has not caused a particular impact
- Evaluation methods we will use to gather the data necessary to determine the impact of a given feature
ID | Outcome | Data | Evaluation Method(s) |
---|---|---|---|
1. | Key performance indicator: The quality of new content edits newcomers and Junior Contributors make in the main namespace will increase because a greater percentage of these edits will not contain peacock language |
|
A/B test, qualitative feedback (e.g. talk page discussions, false positive reporting) |
2. | Key performance indicator: Newcomers and Junior Contributors will experience Peacock Check as encouraging because it will offer them more clarity about what is expected of the new information they add to Wikipedia | Proportion of new content edits started (defined as reaching point that peacock check was or would be shown) that are successfully published (not reverted). | A/B test, qualitative feedback (e.g. usability tests, interviews, etc.) |
3. | New account holders will be more likely to publish an unreverted edit to the main namespace within 24 hours of creating an account because they will be made aware the new text they're attempting to publish needs to be written in a neutral tone, when they don't first think/know to write in this way themselves | Proportion of newcomers who publish ≥1 constructive edit in the Wikipedia main namespace on a mobile device within 24 hours of creating an account (constructive activation). | A/B test |
4. | Newcomers and Junior Contributors will be more aware of the need to write in a neutral tone when contributing new text because the visual editor will prompt them to do so in cases where they have written text that contains peacock language. | The proportion of newcomers and Junior Contributors that publish at least one new content edit that does not contain peacock language. | A/B test |
5. | Newcomers and Junior Contributors will be more likely to return to publish a new content edit in the future that does not include peacock language because Peacock Check will have caused them to realize when they are at risk of of this not being true. |
|
A/B test |
ID | Outcome | Data | Evaluation Method(s) |
---|---|---|---|
1. | Edit quality decreases | Proportion of published edits that add new content and are still reverted within 48hours. Note: Will include a breakdown of the revert rate of published new content edit edits with and without non-neutral language. | A/B test and leading indicators analysis |
2. | Edit completion rate drastically decreases | Proportion of new content edits started (defined as reaching point that peacock check was or would be shown) that are published. Note: Will include breakdown by the number of checks shown to identify if lower completion rate corresponds with higher number of check shown. | A/B test and leading indicators analysis |
3. | Edit abandonment rate drastically increases | Proportion of edits that are started (event.action = init ) that are successfully published (event.action = saveSuccess ). |
A/B test and leading indicators analysis |
5. | People shown Tone Check are blocked at higher rates | Proportion of contributors blocked after publishing an edit where Tone Check was shown compared to contributors not shown the Tone Check | A/B test and leading indicators analysis |
6. | High false positive rates | Proportion of contributors that decline revising the text they’ve drafted and indicate that it was irrelevant. | A/B test, leading indicators analysis, and qualitative feedback |
Findings
This section will include the findings from the experiments described in #Evaluating impact.
Configurability
Tone Check will be implemented – like all Edit Checks – in a way that enables volunteers to explicitly configure how it behaves and who Tone Check is made available to.
Configurability happens on a per project basis so that volunteers can ensure the Tone Check experience is aligned with local policies and conventions.
The particular facets of Tone Check that will be community configurable are still being decided. If there are particular aspects of Tone Check that you think need to be configured on-wiki, we ask that you share what you are thinking in T393820 or on the talk page.
ID | Configurable facet | Potential value(s) | Default value | Notes |
---|---|---|---|---|
Timeline
Time | Activity | Status | Notes |
---|---|---|---|
Peter to populate this section with a high-level timeline of the project: background analysis, initial model development, community conversations/consultations, usability study, pre-mortem, internal model evaluation, volunteer model evaluation, development, pilot experiment, etc.
Background
Writing in a neutral tone is an important part of Wikipedia's neutral point of view policy.
Writing in a neutral tone is also a practice many new volunteers find to be unintuitive. An October 2024 analysis of the new content edits newer volunteers[2] published to English Wikipedia found:
- 56% of the new content edits newer volunteers published contained peacock words.
- 22% of the new content edits newer volunteers published that contained peacock words were reverted
- New content edits containing peacock words were 46.7% more likely to be reverted than new content edits without peacock words
History
Tone Check, and the broader Edit Check initiative, is a response to a range of community conversations and initiatives. Some which include those listed below. For more historical context, please see Edit check#Background.
- Editing Team Community Conversation (April 2025)
- New page patrol/Reviewers (en.wiki) (April 2025)
- ESEAP Strategy Summit 2025
- Wikimedia CEE annual planning conversation (April 2025)
- Afrika Baraza meeting (May 2025)
- Supporting moderators at the Wikimedia Foundation (August 2023)
- Editing the Wiki Way software and the future of editing (August 2021)
- Existing maintenance templates
- ar.wiki: تحيز, تعارض مصالح, تعظيم, دعاية, رأي منحاز, عبارة محايدة؟, غير متوازن, مصدر منحاز, وجهة نظر معجب, أهمية مبالغ بها ,استشهاد منشور ذاتي،تحيز,تعارض مصالح,تعظيم ،تلاعب بالألفاظ ,حيادية خريطة , عاية, رأي منحاز,سيرة شخصية ذاتية ,،سيرة شخصية نشر ذاتي فقط ,،عبارة محايدة؟ ,،غير متوازن , مبهمة , ،مساهمة مدفوعة غير مصرح عنها ,،مصادر متحزبة ,مصدر منحاز, ،نشر ذاتي سطري , نظرية هامشية ,وجهات نظر قليلة , وجهة نظر معجب
- cs.wiki: https://cs.wikipedia.org/wiki/%C5%A0ablona:NPOV, https://cs.wikipedia.org/wiki/%C5%A0ablona:Vyh%C3%BDbav%C3%A1_slova
- de.wiki: https://de.wikipedia.org/wiki/Vorlage:Neutralität
- en.wiki: https://en.wikipedia.org/wiki/Category:Neutrality_templates
- es.wiki: https://es.wikipedia.org/wiki/Plantilla:No_neutralidad, https://es.wikipedia.org/wiki/Plantilla:Promocional, https://es.wikipedia.org/wiki/Plantilla:PVfan, https://es.wikipedia.org/wiki/Plantilla:Globalizar
- fa.wiki: https://fa.wikipedia.org/wiki/%D8%B1%D8%AF%D9%87:%D8%A7%D9%84%DA%AF%D9%88:%D8%AF%DB%8C%D8%AF%DA%AF%D8%A7%D9%87_%D8%A8%DB%8C%E2%80%8C%D8%B7%D8%B1%D9%81
- fr.wiki: Non-neutre, Désaccord de neutralité, Section non neutre, Dithyrambe, Curriculum vitae, Catalogue de vente, Promotionnel, section promotionnelle, Name dropping, Passage promotionnel, Passage lyrique, Passage non neutre
- id.wiki: Tak netral, Berbunga-bunga, Iklan, Seperti resume, Fanpov, Peacock, Autobiografi, Konflik kepentingan
- it.wiki: https://it.wikipedia.org/wiki/Template:P
- ja.wiki: Template:観点, Template:宣伝, Template:大言壮語
- lv.wiki: tps://lv.wikipedia.org/wiki/Veidne:Pov,https://lv.wikipedia.org/wiki/Veidne:Konfl,https://lv.wikipedia.org/wiki/Veidne:Autobiogr%C4%81fija
- no.wiki: https://no.wikipedia.org/wiki/Mal:Objektivitet-seksjon, https://no.wikipedia.org/wiki/Mal:Objektivitet
- pl.wiki: https://pl.wikipedia.org/wiki/Szablon:Dopracowa%C4%87
{{Dopracować{{!}}param_name=...}}
(Template Dopracować is a general template for issues, being precised with params, relevant parameters:pov
,neutralność
,reklama
,spam
,polonocentryzm
,povpol
,zależne
,wieszak
,źródła promocyjne
,źródła zależne
– case insensitive) - ro.wiki: https://ro.wikipedia.org/wiki/Format:PDVN, https://ro.wikipedia.org/wiki/Format:Jv, https://ro.wikipedia.org/wiki/Format:Ton_nepotrivit, also the template https://ro.wikipedia.org/wiki/Format:Problemearticol with the parameters ton, ton nepotrivit or PDVN
- ru.wiki: ttps://ru.wikipedia.org/wiki/Шаблон:Проверить_нейтральность, https://ru.wikipedia.org/wiki/Шаблон:Конфликт_интересов, https://ru.wikipedia.org/wiki/Шаблон:Реклама, https://ru.wikipedia.org/wiki/Шаблон:Автобиография, https://ru.wikipedia.org/wiki/Шаблон:Недостаточно_критики, https://ru.wikipedia.org/wiki/Шаблон:Нейтральность_раздела_под_сомнением, https://ru.wikipedia.org/wiki/Шаблон:Нейтральность%3F (inline one)
- zh.wik: , Advert, Fanpov, Newsrelease, Review, Tone, Unencyclopedic, Trivia, Autobiography, COI, BLPdispute, POV, Copy edit
Edit Check
This initiative sits within the larger Edit Check project – an effort to meet people while they are editing with actionable feedback about Wikipedia policies.
Edit Check is intended to simultaneously deliver impact for two key groups of people.
Experienced volunteers who need:
- Relief from repairing preventable damage
- Capacity to confront complexity
New(er) volunteers who need:
- Actionable feedback
- Compelling opportunities to contribute
- Clarity about what is expected of them
FAQ
Why AI?
- AI increases Wikipedia projects' ability to detect promotional/non-neutral language before people publish it.
Which AI model do you use?
- We use an open-source model called BERT. The model we use is not a large language model (LLM). It is actually a smaller language model which the Machine learning team prefers, because it tells us how probable each of its predictions is, and it's easier to adapt to our custom data.
What language(s) does/will Tone Check support?
What – if any – on-wiki logging will be in place so volunteers can see when Tone Check was shown?
- To start, we're planning for an edit tag to be appended to all edits in which ≥1 Tone Check is shown.
- This approach follows what was implemented for Reference Check.
Why did you not implement Tone Check as an Abuse filter?
What will we do to ensure Tone Check does not cause people to publish more subtle forms of promotional, derogatory, or otherwise subjective language that is more difficult for the model and people to detect?
What control will volunteers have over how Tone Check behaves and who it is available to?
ADD questions from the internal pre-mortem we conducted.
What control do volunteers have over how the model behaves?