Extension:MathSearch
The MathSearch extension integrates the MathWeb Search engine as well as the BaseX search engine to MediaWiki.
Prerequisites
This extension requires some preconfiguration effort. You should plan 15 to 20 minutes for the installation of the prerequisites.
- Math Extension in LaTeXML mode see here
- A local/docker installation of the BaseX REST search as described here.
Only MySQL is supported as database type.
For some features Extension:SyntaxHighlight is required.
Installation
- Download and move the extracted
MathSearch
folder to yourextensions/
directory.
Developers and code contributors should install the extension from Git instead, using:cd extensions/
git clone https://gerrit.wikimedia.org/r/mediawiki/extensions/MathSearch - Add the following code at the bottom of your LocalSettings.php file:
wfLoadExtension( 'MathSearch' );
- Run the update script which will automatically create the necessary database tables that this extension needs.
Done – Navigate to Special:Version on your wiki to verify that the extension is successfully installed.
Configuration parameters
$wgMathSearchBaseX: The link to your (local) BaseX installation e.g. "http://localhost:8080/rest/sampleHarvest";
with the port number that is used for the basex server (which is for us by default 8080).
To enable debugging one can set:
$wgMathDebug = true;
Text search
Text search works best with CirrusSearch, but the default MySQL search works as well.
Testing the Web Search
On MediaWiki with MathSearch extension installed, the Special:MathSearch page can be found. Here, Wikipages can be found by the tex or MathML expressions specified in the structured search-fields. Make sure, the indexing steps have been done before and MWS or BaseX is activated.
Indexing
In order to use the MathSearch capabilities you have to create indexes for the formulas in your wiki. At the moment updates are not supported. So, you have to re-index every time the equations change. If you have frequent changes, it might be good to install a cron-job.
From the mediawiki root, run:
php extensions/MathSearch/maintenance/UpdateMath.php -m latexml
As a result the mathindex and mathlog table should be filled in your database. You should check that mathlog has entries with content MathML in the form
<math xmlns="http://www.w3.org/1998/Math/MathML" id="p1.1.m1.1" class="ltx_Math" alttext="{\displaystyle{\displaystyle E=mc^{2}}}" display="inline">
<semantics id="p1.1.m1.1a">
<mrow id="p1.1.m1.1.6" xref="p1.1.m1.1.6.cmml">
<mi id="p1.1.m1.1.1" xref="p1.1.m1.1.1.cmml">E</mi>
<mo id="p1.1.m1.1.2" xref="p1.1.m1.1.2.cmml">=</mo>
<mrow id="p1.1.m1.1.6.1" xref="p1.1.m1.1.6.1.cmml">
<mi id="p1.1.m1.1.3" xref="p1.1.m1.1.3.cmml">m</mi>
<mo id="p1.1.m1.1.6.1.1" xref="p1.1.m1.1.6.1.1.cmml"></mo>
<msup id="p1.1.m1.1.6.1.2" xref="p1.1.m1.1.6.1.2.cmml">
<mi id="p1.1.m1.1.4" xref="p1.1.m1.1.4.cmml">c</mi>
<mn id="p1.1.m1.1.5.1" xref="p1.1.m1.1.5.1.cmml">2</mn>
</msup>
</mrow>
</mrow>
<annotation-xml encoding="MathML-Content" id="p1.1.m1.1b">
<apply id="p1.1.m1.1.6.cmml" xref="p1.1.m1.1.6">
<eq id="p1.1.m1.1.2.cmml" xref="p1.1.m1.1.2"/>
<ci id="p1.1.m1.1.1.cmml" xref="p1.1.m1.1.1">𝐸</ci>
<apply id="p1.1.m1.1.6.1.cmml" xref="p1.1.m1.1.6.1">
<times id="p1.1.m1.1.6.1.1.cmml" xref="p1.1.m1.1.6.1.1"/>
<ci id="p1.1.m1.1.3.cmml" xref="p1.1.m1.1.3">𝑚</ci>
<apply id="p1.1.m1.1.6.1.2.cmml" xref="p1.1.m1.1.6.1.2">
<csymbol cd="ambiguous" id="p1.1.m1.1.6.1.2.1.cmml" xref="p1.1.m1.1.6.1.2">superscript</csymbol>
<ci id="p1.1.m1.1.4.cmml" xref="p1.1.m1.1.4">𝑐</ci>
<cn type="integer" id="p1.1.m1.1.5.1.cmml" xref="p1.1.m1.1.5.1">2</cn>
</apply>
</apply>
</apply>
</annotation-xml>
<annotation encoding="application/x-tex" id="p1.1.m1.1c">{\displaystyle{\displaystyle E=mc^{2}}}</annotation>
</semantics>
</math>
then run:
mkdir my_harvests php extensions/MathSearch/maintenance/CreateMWSHarvest.php ./my_harvests 30000 --mwsns="mws:"
This is the default (MWS_HARVEST_PATH="../data/wiki") path. If you have changed the path in /your/path/to/mediawiki/extensions/MathSearch/mws/config/mws_services.conf you have to specify another part
docker run -e HARVESTS_PATH=/my_harvests -p 1985:1985 -v ./my_harvests:/my_harvests ghcr.io/mardi4nfdi/formulasearch:main
Further Configuration
$wgMathSearchBaseXBackendUrl = 'http://localhost:8087/rest/sampleHarvest';
replacing 8087 with the appropriate port, and also
$wgMathSearchBaseXRequestOptions['password']="Your Password";
Testing
Adding some data from e.g. the ~/math.xml file:
curl -u admin:Password -i -X PUT -T ~/math.xml "http://localhost:8087/rest/sampleHarvest"
one can check the indexed formula in the browser at
http://localhost:8080/index.php/Special:MathIndex
again replacing 8080 with the appropriate port.