Extension:JsonConfig/Transforms
JsonConfig can already expose tabular data to Lua modules used via {{#invoke:}} with cross-site remote loading, but we are now (as of May 2025) adding a transform pipeline to allow loading data and processing it via Lua scripts on the central store wiki (Commons for Wikimedia production).
This is initially aimed at building a richer data pipeline for Charts so data formats can be modified, or data subsetted or combined before rendering, but is available to be used by other workflows that target Data:
pages, whether tabular data or other formats.
Background
- Architectural Decision Record
- link to patch in code review
- Examples:
Security considerations
On Commons you can emulate the execution model with a wikipage parse, using a second Lua module that loads data in and calls your code, and you can trigger those via the web interface (preview) or the web API (action=parse
). We don't expect the additional data-specific pipeline helper API to add much attack surface.
Tabular data should though ensure that it's checking validity of transformed data before using it -- just as when loading non-transformed user-provided data!
External APIs
API action=jsontransform
has been added to run data fetches and transforms, this is distinct from action=jsondata
to make it language-agnostic and require POSTs (so it's always fetching up to date data for rendering). It will be available only on the store wiki (Commons in Wikimedia production) and is marked for internal use, but is open if you want to poke it for testing/previewing and this will likely be used by future editor helper tools.
Try a live demo on Commons (NOT YET DEPLOYED)!
As with Scribunto's {{#invoke:}}
you pass a module, a function, and a list of parameters:
- title=Example.tab
- jtmodule=Example
- jtfunction=scream
- jtargs=what=AAARGH!
Arguments use a multi-value field which can be separated by the pipe character "|" or, if it's necessary to be able to send values which contain a pipe, use U+001f UNIT SEPARATOR at the beginning and as a separator (this is the standard way of escaping such fields for the MediaWiki API).
Lua module APIs
Transform functions are given two parameters, one the data JSON converted to a Lua table, and the other the arguments as an associative table of key-value pairs.
As with Scribunto arguments on {{#invoke:}}
, they are strings meant to be name-value pairs.
The return value from a transform function should be the modified table -- an identity function can simply return its original input, or you are free to modify it or create a new object in the same format.
For the initial primary use case, data transformations for Charts, you'll be working with input and output data in tabular data format but other types of data may be transformable in future front-end features.
Here's an example that converts the temperature stored in the "high" and "low" columns of a monthly temperature chart:
Module:Weekly average temperature chart
local p = {}
local function celsius_to_fahrenheit(val)
return val * 1.8 + 32
end
--
-- input data:
-- * tabular JSON structure with 1 label and 2 temperature columns stored in C
--
-- arguments:
-- * units: "F" or "C"
--
-- exercises for the reader:
-- * try adding "K" or "R" output units
-- * try adding alternate input units
--
function p.convert_temps(tab, args)
if args.units == "C" then
-- Stored data is in Celsius
return tab
elseif args.units == "F" then
-- Have to convert if asked for Fahrenheit
for _, row in ipairs(tab.data) do
-- first column is month
row[2] = celsius_to_fahrenheit(row[2])
row[3] = celsius_to_fahrenheit(row[3])
end
return tab
else
error("Units must be either 'C' or 'F'")
end
end
return p
Data:Sample weekly temperature dataset.tab
(just a stub with two entries for demo)
{
"license": "CC0-1.0",
"description": {
"en": "Sample monthly temperature data"
},
"schema": {
"fields": [
{
"name": "month",
"type": "localized",
"title": {
"en": "Month"
}
},
{
"name": "low",
"type": "number",
"title": {
"en": "Low temp (C)"
}
},
{
"name": "high",
"type": "number",
"title": {
"en": "High temp (C)"
}
}
]
},
"data": [
[
{
"en": "January"
},
5,
20
],
[
{
"en": "July"
},
15,
30
]
]
}
todo: link to Special:ApiSandbox demo
Internal APIs and data formats
The implementation of this feature is in JsonConfig, and it's agnostic to the data type as long as JSON-able objects come out. It ought to work with cases other than Charts and tabular data as well, such as filtering map JSON. However this has not yet been tested.
JCSingleton::getContent()
is supplemented with JCSingleton::getContentLoader()
which returns a builder allowing for additional options, so far just the transform
method. This returns a JCContentWrapper
with an actual rehydrated content object and some metadata on success, or localizable errors from lower in the stack on failure.
Errors in Lua execution or backend loading should be reported through in a more or less readable manner.
use JsonConfig\JCSingleton;
use JsonConfig\JCTransform;
// Example of loading optionally transformed content;
// being on a local or remote backend is transparent.
$page = "Charts/Wikibase_property/test.tab"
$jct = JCSingleton::parseTitle( $page, NS_DATA );
$module = "Charts/Wikibase_property/test";
$name = "transform";
$args = [
"entity" => "Q65", // Los Angeles
"property" => "P1082", // population
"qualifier" => "P585", // point in time
];
$transform = new JCTransform( $module, $name, $args );
$status = JCSingleton::getContentLoader( $title )
->transform( $transform )
->load();
if ( $status->isOk() ) {
$wrapper = $status->getValue();
$content = $status->getContent();
$expiry = $status->getExpiry(); // seconds TTL
$deps = $status->getDependencies(); // TitleValue[] of pages on Commons
// @todo apply these to our ParserOutput to record dependencies
} else {
// Report the error message
}
These dependencies should be tracked on page renderings by adding them onto the ParserOutput; a helper function will be added to simplify this.
Using transforms in Charts
See Extension:Chart/Transforms for Chart-specific user documentation on using transforms.
Category:JsonConfig transforms documentation