Wikimore

Group:	Technology → Developer Experience
Start:	2024-11-14
Team members:	Anthony Borba, Derrick Jackson (contractor), Dom Walden, Edward Tadros (contractor), Elena Tonkovidova, Emeka Chukwukere, Esther Akinloose, George Mikesell (contractor), Rummana Yasmeen, Vaughn Walters,
Backlog:	#Quality-and-Test-Engineering-Team_(QTE)
Management:	Sean Long

The Quality Services team is a sub-team of the Developer Experience group within the Wikimedia Foundation. Established in November 2024 after a split of the QTE team, we improve the overall quality of the software developed by the Foundation.

Join our IRC channel: #wikimedia-qte ^connect.

Check out our blog posts at phab:phame/blog/view/21/.

Key guiding principles

Ensuring Quality is a part of everyone's work.
Ensuring Quality is in every part of the SDLC.

Things we do

Test Engineering Support
Team	Test Engineering Contacts	OKR Alignment
Abstract-Wikipedia	Elena Tonkovidova	WE2 - Vital Knowledge
Trust and Safety	Dom Walden/Derrick Jackson	WE4 - Safety and Security
Connection	Vaughn Walters	WE1.4 - Task prioritization
Community Tech	Dom Walden/George Mikesell	WE1 - Contributor Experiences, PES1.2 - Improve product prioritization via community signals
Editing	Rummana Yasmeen/Esther Akinloose	WE1 - Contributor Experiences
Contributor Growth	Elena Tonkovidova	WE1 - Contributor Experiences
Data Products	Emeka Chukwukere
Mobile Apps	Anthony Borba	WE3 - Consumer Experiences
Reader Experience	Edward Tadros / Elena Tonkidova	WE3 - Consumer Experiences
Reader Growth	Edward Tadros / Elena Tonkidova	WE3 - Consumer Experiences
Language and Product Localization	George Mikesell	WE2 - Vital Knowledge

How we work

As of 18 June 2025, the Quality Services team follows an embedded model with a unified team under the Developer Experience organization.

The Quality Services team supports 11 developer teams - each developer team having their own processes and structure and unique challenges and expertise.

The Test Engineer embedded on each organization is currently responsible for ensuring the quality of their team's work. This involves both monitoring Phabricator, as well as reporting on the overall health of the features.

How we test

Software testing happens on each team individually, therefore QS testing strategy has evolved team by team to serve the needs of the individual teams. This means that teams differ in numerous aspects:

Automation
Manual / Exploratory Testing
Tools used for testing
Documenting tests
Documenting bugs
Maintaining metrics

This section will detail how we currently test for each team supported.

Abstract-Wikipedia (Elena)

Test Strategy doc: QA Abstract Wikipedia Testing Playbook
Bug Tracking: Phabricator - Abstract Wikipedia
Automation Strategy: Unit Testing + Selenium
QA Process: - Integrated QA support and Abstract Wikipedia team/Chores
Test plan documentation: Per project Google doc, for example Test Plan for AW Wikipedia integration PRD FY2025 Q4 and the follow-up detailed QA doc: testing Wikifunctions integration
Metrics tracked:

Trust and Safety (Dominic and Derrick)

Test Strategy doc: No singular document describing approach for team
Bug Tracking: Trust and Safety Phabricator
Automation Strategy: Trust and Safety commits code directly to others' repos, so follows their strategy
QA Process: Tickets come in to QA column, a QA engineer tests it.
Test plan documentation: Per project, User:DWalden (WMF)/Test2wiki k8s migration
Metrics tracked: None QA related

Connection (Vaughn)

Test Strategy doc: No comprehensive doc.
Bug Tracking: Campaign Events Phabricator
Automation Strategy: Selenium tests
QA Process: Phab board QA column, selenium tests, pixel tests,
Test plan documentation: Sandbox/VWalters-WMF/CampaignEvents
Metrics tracked: Pixel tests, Selenium daily, and not specifically QA related but interesting is Superset for event data

Community Tech (Dom/George)

Test Strategy doc: No doc that I am aware of, and it would vary depending on the project anyways.
Bug Tracking: Community Tech Phabricator
Automation Strategy: Community Tech delivers code directly to the repo, and follows their testing protocols/directions from there
QA Process:Split into two teams that are time zone friendly. Testers pick up any tasks that are in the QA column. If an issue related to the task is found, it will be moved back to the 'In Development' column with a ping to the engineer on the Phab task and Slack for a heads up.

USA- Sea Lion Squad
World- Fox Squad

Test plan documentation: Test Plan in general, but will be different per project
Metrics tracked: None that I am aware of for QA

Editing (Rummana/Esther)

Test Strategy doc:
Bug Tracking:
Automation Strategy:
QA Process:
Test plan documentation:
Metrics tracked:

Contributor Growth (Elena)

Test Strategy doc: GrowthExperiments testing
Bug Tracking: Growth Team Current work phab board - QA/ Test in Production column
Automation Strategy: [WIP] // should be similar to Wikifunctions e2e Test Traceability Matrix and AW Automation Testing
QA Process: Integrated QA support and Growth/Team/Chores
Test plan documentation: per project, for example testing Add Link
Metrics tracked:

Data Products (Emeka)

Test Strategy doc:
Bug Tracking:
Automation Strategy:
QA Process:
Test plan documentation:
Metrics tracked:

Mobile Apps (Anthony)

Test Strategy doc: In draft
Bug Tracking: Phabricator board - iOS and Android
Automation Strategy: Android: Smoke test (espresso) + XCTest (unit/integration)
QA Process: QA handles regression/smoke test suites per RC, records bugs and retests as necessary
Test plan documentation: Smoke test documented in google sheets with per-ticket testing tracked in Phabricator.
Metrics tracked: # of bugs found weekly, # of smoke tests run, # of consumer reported bugs
Smoke Test Suite: Smoke Test Documentation

Structured Data (Elena) - Archived

Test Strategy doc:
Bug Tracking:
Automation Strategy:
QA Process:
Test plan documentation: testing UploadWizard improvements; testing MediaSearch
Metrics tracked:

Reader Experience (Edward / Elena)

Test Strategy doc: TBD
Bug Tracking: TBD
Automation Strategy: Define smoke test suite per project, automate repeatable tests.
QA Process: Currently only testing what makes it to the QA column. Deployment validation testing has not been implemented.
Test plan documentation: Individual testing is documented in Phab tasks. Plan is still evolving.
Metrics tracked: TBD

Reader Growth (Edward / Elena)

Test Strategy doc: TBD
Bug Tracking: TBD
Automation Strategy: Define smoke test suite per project, automate repeatable tests.
QA Process: TBD
Test plan documentation: TBD
Metrics tracked: TBD

Language and Localization Product (George)

Test Strategy doc: No doc that I am aware of, and it would vary depending on the project anyways
Bug Tracking: LPL Phabricator
Automation Strategy: None
QA Process: Test what comes in to the QA column
Test plan documentation: Test Plan in general, but will be different per project
Metrics tracked: None

Testing Concepts

Quality assurance (QA), like all engineering concepts, has its own set of vocabulary and concepts. QA engineers and test engineers use terms like "Regression testing", "Smoke testing", "automation" and acronyms like "UAT" and "CUJ" as well as concepts like risk-based testing. This section helps provide clarification as to what these mean in the context of the Wikimedia Foundation.

Smoke testing

Smoke testing is a small set of tests to determine that core functionality performs as expected.

Regression testing

Regression testing is a large set of tests (typically all available) that determines that previously developed functionality works correctly.

Automated testing

Automated testing uses computers and software to accomplish tasks important in validating software.

Unit and Integration testing

Unit testing is the smallest possible test. It tests assumptions about code written by an engineer. For example, a unit test validates that a function returns the correct value or that a method modifies the underlying object in an expected way. As engineers we can use unit tests to ensure that "negative" cases (inputting unexpected values or adding latency in the form of pauses) does not inadvertently harm the functionality.

Integration testing validates that two or more distinct pieces work together.

// This is pseudocode

function addNumbers( a, b ) {
  return a + b;
}

assert( addNumbers( 1, 2 ) ) = 3 // this is a unit test, validating addition works
assert( addnumbers( 1, potato ) ) = error thrown // A negative unit test (validating it fails correctly)

function subNumbers( a, b ) {
  return a - b;
}

let finalValue = addNumbers( 1, 2 ) + subNumbers( 1, 2 )
assert finalValue = 0 // this is an integration test, showing two distinct pieces of code work together

Critical User Journey (CUJ)

The most important user journeys within a given scope (feature, full site).

User Acceptance Testing (UAT)

A type of testing done by users with open-ended questions to determine if it solves the proposed need. Typically not only finds bugs but also design improvement opportunities.

End to End testing (E2E)

A scope of testing that includes pieces outside of the worked on scope, typically. Marked by a distinct "start" and "end" point.

Risk-based testing

Risk-based testing is an approach to testing that prioritizes what to validate based on the ultimate risk presented to the organization or consumers. For example, if you do not adhere to risk-based testing, you may take the approach to test the "code last touched" as a way to limit the amount of bugs introduced. In risk-based testing, you may prioritize testing primary consumer user journeys over the item last touched.

Risk based testing requires agreements between engineering, QA, product management and other leaders to ensure that risk is fully understood, as there is typically a de-prioritization of "non-risky" changes.

Automated tests available

Selenium

extensions/Wikibase/client/data-bridge