MariaDB/PII

This page describes the procedure for removing Personally Identifiable Information (PII) from Wiki Replicas.

This procedure must be followed every time a new wiki is added to the database. The full procedure for adding a new wiki is described at Add a wiki.

Sanitize the Wiki Data

Note: As of Jun 2024 - the default section for new wikis is s5 but this might not be the case in the future.

Use the `sre.mysql.sanitize-wiki` cookbook from cumin:

sudo cookbook [GLOBAL_ARGS] sre.mysql.sanitize-wiki [-h] --wiki WIKI [--check-only] --task TASK [--only-grant-and-view]
cookbook sre.mysql.sanitize-wiki --wiki myfirstwiki --wiki mysecondwiki
Will sanitize those wikis according to what is documented above.

cookbook sre.mysql.sanitize-wiki --wiki mywiki --check-only
This will run the first part of the procedure, limited to checking if there is some sanitization to perform.

Optional arguments:

 -h, --help            show this help message and exit
 --wiki WIKI           Name of the wiki (can be repeated) (default: None)
 --check-only          Only perform checks without making changes (default: False)
 --task TASK           Phabricator task (default: None)
 --only-grant-and-view
                       Only perform grant permissions and create view database. This should be quicker than a typical run. It will skip checking existing needs for sanitization in dbs, and focus on the last steps as in
                       https://wikitech.wikimedia.org/wiki/MariaDB/PII#Grant_Permissions_for_SQL_Views (default: False)

Notify Relevant Teams

Once the database is sanitized and the view database is created, the work managed by DBAs is complete.

The next step is to create the views, and that is managed by the Wikimedia Cloud Services team.

Until we define a better process, assign the task to fnegri.

We should not rely on a single person. There is an open discussion about this in phab:T382607.

Restarting sanitarium instances

To restart all instances on sanitarium hosts use the `sre.mysql.sanitarium_restart` cookbook:

 sudo cookbook --dry-run sre.mysql.sanitarium_restart

It shows a summary before and asks for confirmation before restarting the instances. It supports `-t` to log events into Phabricator and `--dc` and `--hostnames` to limit which hosts to operate on.

Optional arguments:

-t TASK_ID, --task-id TASK_ID     Phabricator task ID
--dc DC                           Datacenter e.g. --dc=eqiad
--hostnames [HOSTNAMES ...]       Limit restarts: --hostnames db0000 db1234


Manual PII cleanup process

The process used before the cookbook is described below:

Sanitize the Wiki Data

Go to the sanitarium hosts in both data centers and run the following command to clean up the data:

redact_sanitarium.sh -d $NEW_WIKI_NAME -S /run/mysqld/mysqld.s5.sock | mysql -S /run/mysqld/mysqld.s5.sock

Run the Private Data Check Script

Execute the check_private_data.py script to identify which table/columns need to be dropped:

check_private_data.py -S /run/mysqld/mysqld.s5.sock

If the output makes sense, you can pipe it directly to MySQL to drop the necessary table/columns:

check_private_data.py -S /run/mysqld/mysqld.s5.sock | mysql -S /run/mysqld/mysqld.s5.sock

Verify changes on each wikireplicas host

As of December 2024, the wikireplicas hosts are clouddb10[13-20] (owned by the Cloud Services team), plus an-redacteddb1001 (owned by the Data Platform SRE team)

Ensure that you run check_private_data.py on each replicas host after the whole process on your side is done to avoid leaking Personally Identifiable Information:

check_private_data.py -S /path/to/socket

Grant Permissions for SQL Views

Identify the wikireplicas hosts that belong to the relevant database section (e.g., s5). Grant the labsdbuser role the appropriate grants by running:

GRANT SELECT, SHOW VIEW ON `NEW_WIKI_NAME_p`.* TO `labsdbuser`;

This needs to be done on all wikireplicas hosts that have the relevant database section.

Create the View Database

Create the view database on all the wikireplicas hosts that belong to the relevant database section:

CREATE DATABASE NEW_WIKI_NAME_p;