Alternative parsers/de

Diese Seite ist eine Sammlung von Links, Beschreibungen und Statusmeldungen der verschiedenen alternativen MediaWiki-Parser-das heißt, andere Programme und Projekte als MediaWiki selbst, die in der Lage oder beabsichtigt sind, den Text-Markup-Syntax von MediaWiki in etwas anderes zu übersetzen. Einige davon haben einen recht engen Verwendungszweck, während andere mögliche Kandidaten sind, um den etwas labyrinthischen Code zu ersetzen, der derzeit MediaWiki selbst steuert.

Viele der hier verlinkten Seiten sind wahrscheinlich veraltet und werden nicht ausreichend gewartet oder sogar aufgegeben. Um jedoch nicht die gleiche Arbeit immer wieder zu wiederholen, schien es sinnvoll, das zusammenzutragen, was "da draußen" existiert. In addition, although so many alternative parsers exist, almost no unofficial parser powers any wiki site, except for Parsoid which powers some Wikimedia Foundation wikis, and wikitextparser which powers the OpenTTD wiki through TrueWiki.

Parser, die einen Abstrakter Syntaxbaum (AST) erstellen und den Zugriff darauf ermöglichen, sind unter #Parser, die einen AST bereitstellen aufgeführt; Parser, die keinen AST erstellen, jedoch einige Informationen extrahieren, sind unter #Parser, die einige Informationen extrahieren aufgeführt; die restlichen Parser sind unter #Andere Parser aufgeführt.

Parser, die einen AS bereitstellen

Freie Software

Name und Link Hauptautor(en) Sprache Eingabe Ausgabe Vollständige Implementierung Kann die Ausgabe zurück in Markup umwandeln Kommentare / Andere Informationen Lizenz
Parsoid Started by Gabriel Wicke; maintained by the Content Transform Team at the Wikimedia FoundationWikiPEG / PHP (formerly Node.js)markup, XML dumps, test casestokens, HTML5 DOM with RDFa and round-trip data JaJaFully-featured round-tripping parser/runtime that powers the Visual editor on Wikipedia. Default parser for some WMF wikis. Will become the default parser for MediaWiki. See Parser Unification.GPLv2+
DizzyLogic Wiki Parser Dizzy LogicC++XML dumpsSyntax tree in XML, plain text NeinNeinFast datamining-oriented parser for English Wikipedia. Capable of processing all of English Wikipedia into plain text and XML in 2-3 hours on a modern processor. Convenient graphical interface. Windows installer available (64-bit).MIT license
mwparserfromhell The EarwigPythonmarkupAST almostJaA Python library to convert Wiki markup to a navigable string, which can be used to examine and manipulate templates. Compatible with Python 3.8+, and no runtime dependencies.MIT License
wikiapi kanashimi JavaScriptmarkupJavaScript native object almostJaParses sections, templates with parameters, links, images and categories, wiki-table to JS array or JS array to wiki-table, and many more. You may modify parts of the wikitext, then regenerate the page just using parsed.toString(). Runs on node.js and browser.MIT
Sweble Wikitext Parser Hannes DohrnJavamarkupAST, XML, HTML almost?Claims to be very thorough. There are three papers surrounding the Sweble Wikitext Parser.Apache License 2.0
wikitextparser 5j9PythonmarkupAST almostJaProvides several accessor methods in an object tree to navigate to structural elements like sections, tables, links etc. Supports extracting table data as list of lists. Available via pip, supports Python 3.GPLv3
mwlib PediaPress.comPython with C librarymarkup and otherparse tree, HTML, PDF, XML, OpenDocument Nein?Used by MediaWiki's "Print/export" feature, see Reading/Web/PDF Functionality.BSD
wb2pdf Dirk HünnigerHaskellonline articleLaTeX, PDF, Parse Tree, HTML, OpenDocument, EPUB Nein?Recursive Descent based on Monadic Parser Combinators. Allows for non context-free input, especially non well formatted HTML as often found on Wikipedia.GPL
XWiki Rendering Framework XWiki dev teamJavavarious WikiMarkupsWell formed sequence of events, HTML/XHTML, other WikiMarkups NeinNeinXWiki can be used a full-fledged wiki supporting several WikiMarkups (including MediaWiki's markup). It also offers a standalone Rendering Engine that can be used as a Java library for parsing/rendering WikiMarkups. Cant output to mediawiki format as of 2016/03 though.LGPL
mediawiki-parser Peter Potrowl, Erik RosePythonmarkupXHTML, raw text, AST NeinNeinGSoC-2011 project; the use of a PEG parser makes it easy to improve. Parser functions are not supported yet.GPLv3
smc.mw Marcus BrinkmannPythonmarkupAST, HTML NeinNeinStateful PEG parser based on Grako (Archived 2014-03-09 at the Wayback MachineCategory:Webarchive template wayback links), with a very clean separation of parsing stages, grammars and semantic transformations.BSD
Pandoc John MacFarlaneHaskellmarkupmany & AST Neinnot identicalCan convert subset of mediawiki markup to ~35 different formats (5 of which are flavors of markdown).GPLv2
MwParserFromScratch CXuesong C# markup AST NeinJa A portable .NET library that parses wikitext into Abstract Syntax Tree. For now it supports most of the common markup expressions except file links, double-underscored magic words, and tables. Apache License
mediawiki-parser Ben Gamari Haskell markup or MediaWiki XML AST almostNein mediawiki-parser served as the basis of the extraction pipeline of the NIST TREC Complex Answer Retrieval information retrieval track. It is a PEG parser capable of producing abstract syntax tree representing most of the Mediawiki syntax. BSD-3-Clause
parse_wiki_text Fredrik Portström Rust markup AST NeinNein Parse Wiki Text attempts to take all uncertainty out of parsing wiki text by converting it to another format that is easy to work with. The target format is Rust objects that can ergonomically be processed using iterators and match expressions. modified MIT
wikitextprocessor Tatu Ylonen Python XML dumps AST ?? Can expand templates and Lua macros. MIT unless otherwise noted in individual files (see LICENSE)
wikiparser-node Bhsd TypeScriptmarkupAST, HTML almostJaParsing, modifying, and linting wikitext. Runs in Node.js and browser (online playground).GPLv3


Proprietär

Name und Link Hauptautor(en) Sprache Eingabe Ausgabe Vollständige Implementierung Kann die Ausgabe zurück in Markup umwandeln Kommentare / Andere Informationen Lizenz
WikiTaxi Ralf JunkerDelphi / PascalMediaWiki markup, page or fragmentNode-tree, HTML, potentially others almostHand-crafted parser with template expansion, parser functions (core and extended), tag extensions (<ref>, <source>), wiki text parsing. Used for the WikiTaxi offline reader.No sources available

Verlassen

Name und Link Hauptautor(en) Sprache Eingabe Ausgabe Vollständige Implementierung Kann die Ausgabe zurück in Markup umwandeln Kommentare / Andere Informationen Lizenz
DKPro JWPL parser Torsten Zesch, Richard Eckart de Castilho, Oliver Ferschke, Elisabeth NiemannJavaXML dumpAPI to access pages, outlinks, inlinks and more Nein"JWPL (Java Wikipedia Library) is a free, Java-based application programming interface that allows to access all information contained in Wikipedia." "JWPL is for you: If you need structured access to Wikipedia in Java." Older parser not maintained any more - JWPL uses Sweble now.LGPL
FlexBisonParse Timwiflex, bison and Cmarkup fragmentCustom XML NeinIntended as an eventual replacement to the parsing code inside MediaWiki itself.
sanskrit-coders/wiki-tools Vishvas VasukiScalaMediawiki textMediawiki text and Section tree NeinOnly parses mediawiki sections - that's it. One can parse a wiki page with multiple sections, get a section tree, add, access and delete sections.Creative commons
Perl Wikipedia Toolkit Michal JuroszPerlXML dump, SQL dumpOwn parse tree, WikiMedia markup NeinPerl Wikipedia Toolkit developed for Computer-assisted Wikipedia translation. (Little functional)
WikiOnCD (Archived 2006-01-15 at the Wayback MachineCategory:Webarchive template wayback links) Andrew RodlandPerlSQL dump or markupHTML, Parse tree (eventually?) NeinStarted out as an offline wiki browser, but grew a parser when Wiki2static turned out to be too limiting. No web presence yet; code is in the SVN.GPL
WikiPress Publisher[dead link] Erwin JurschitzaDelphi 7XML dumpDocBook XML, Digibib XML, HTML NeinUsed for the German DVD, generates lists of bad markup.No sources available
Saya.Parser.Wiki[dead link] Nana SakisakaC++markupAST NeinPure C++11 parser implemented with Boost.Spirit.Qi.Boost Software License 1.0

Parser, die einige Informationen extrahieren

Name und Link Hauptautor(en) Sprache Eingabe Ausgabe Vollständige Implementierung Kann die Ausgabe zurück in Markup umwandeln Kommentare / Andere Informationen Lizenz
PHP-Wikipedia-Syntax-Parser Don Wilson PHP markup Associative array Nein Parses top-level sections, w:Wikipedia:Persondata, infoboxes, external links, categories, and interlanguage links. GPL
Wiki-infobox-parser Zhipeng JiangJavaScriptmarkupJSON Nein A light Wikipedia Infobox Parser written in JavaScript. MIT
wiktextract Tatu YlonenPythonXML dumpsJSON  ? Parses most of the English Wiktionary into a JSON. Can expand templates and Lua macros. You can run it locally, or directly grab the JSON output hosted at . MIT
ParseWiki Gerges PHP wikitext Associative array Yes A library that helps parse wikitext data GPL-3.0
wtf_wikipedia Spencer Kelly JavaScriptmarkupJSON almostNeinSupports recursive links & templates, parses infoboxes and links, resolves special templates, parses images and categories. runs server-side & browser.MIT
gensim.segment_wiki RaRe Technologies Python MediaWiki XML JSON NeinNein Gensim is a robust open-source vector space modeling and topic modeling toolkit implemented in Python, segment_wiki - script for wikipedia parsing & extraction. LGPLv2.1

Andere Parser

Name und Link Hauptautor(en) Sprache Eingabe Ausgabe Vollständige Implementierung Kommentare / Andere Informationen Lizenz
Mylyn WikiText David GreenJavaLocal filesHTML, DocBook, Eclipse Help, DITA, extensible NeinIntegration with Ant and Eclipse runtime.EPL
wikipedia-js kenshiro_oNode.jsmarkupHTML NeinA simple client that enables you to query Wikipedia articles in english. The results are formatted in basic HTML. You can retrieve either a summary of an article (i.e. before the table of contents) or a full article.MIT
WikiExtractor Giuseppe Attardi, Antonio FuschettoPythonXML dumpstext NeinSimple and fast tool for extracting plain text from Wikipedia dumps. It performs template expansion and handles parser functions (core and extended).GPL
Mediawiki2HTML Machine Johannes BuchnerPHPmarkupHTML NeinProject for parsing without the Mediawiki engine.AGPL3 + any later version
Java API (Bliki engine) Axel KramerJavamarkup fragmentHTML, PDF almostJava Wikipedia API - (supports ParserFunctions, Lua/Scribunto...).EPLv1.0 or GPLv2.1+
WikiCloth nricciarRubymarkupHTML NeinRuby implementation of the MediaWiki markup language, including a fair amount of the parser functions.MIT
YaCy YaCy dev teamJavaXML dumpXML with Dublin Core Metadata NeinYaCy is a search engine and a MediaWiki parser is included as one of the import modules. MediaWiki xml dumps are first converted to Dublin Core XML as intermediate format and then inserted into the search index using the built-in Dublin Core importer.GPL
WiktionaryParser dev teampythonmarkupJSON NeinWiktionary parser. As of October 2019, downloads the article on-the-fly and parses "etymologies, definitions, pronunciations, examples, audio links and related words".MIT
LuaWiki Alexander MiselLua, PEGmarkupHTML NeinLuaWiki has a parser which supports most common syntaxes used in article namespace, however it defined a different grammar for templates.GPLv3
wiktionary-dumps excarnateSojournerPythonXML dumpvarious NeinA collection of scripts for extracting various information specifically from database dumps of the English Wiktionary. Only one or two may be more broadly useful. Active as of 2023.CC0
wikiparser-java javalc6 Java XML dump various NeinThe library has been developed to parse and render English Wiktionary. In addition to English, several other languages are supported. Apache License
other wiktionary parsers variousvariousmarkupvarious NeinSee list at <stackoverflow.com/q/3364279>various

Verlassen

Name und Link Hauptautor(en) Sprache Eingabe Ausgabe Vollständige Implementierung Kommentare / Andere Informationen Lizenz
libmwparser SaitmohCXML dumps, markupXML, XHTML, Expanded WikiText almostPrimary an wikimedias offline reader with interwiki support. Libmwparser is a source independent library which supports most of MediaWiki syntax and some extensions like math or gallery.GPL
Wiky.php Toni LähdekorpiPHP, Regular ExpressionsmarkupHTML NeinA tiny PHP library that uses only regular expressions to convert Wiki markup to HTML.Apache License/GPL/LGPL/MPL/CC
Wiky Tanin Na NakornRubymarkupHTML NeinA simple Ruby library to convert Wiki markup to HTML.Apache License
Wiky.js Tanin Na NakornJavaScriptmarkupHTML NeinA simple JavaScript library to convert Wiki markup to HTML (limited subset).Apache License
txtwiki.js Joao SaJavaScriptmarkupText NeinA JavaScript library to convert MediaWiki markup to plaintext.MIT License
mw2html Connelly BarnesPythonWiki urlHTML NeinMinimal setup - gets the basic job of creating a static copy of the wiki done.Public Domain
PHP5 WP Dan GoldsmithPHPmarkupHTML NeinParser With Plugin Framework To Add Additional Syntax. Configurable for alternative markup i.e. PMWIKI.MPL 2.0
JAMWiki RyanJavaJAMWiki front-endHTML NeinJava Wiki engine that supports MediaWiki syntax. The roadmap also calls for XML import and export that will be compatible with Mediawiki.LGPLv2
InstaView PilafJavaScriptmarkup fragmentHTML NeinProvides instant preview while editing a page (without reloading).BSD
InstaView C. Scott AnanianJavaScriptmarkup fragmentHTML NeinPort of Pilaf's code to node.js, volo, and the browser.BSD
Tero-dump Tero Karvinen ?Local wiki installation, including MySQL, PHP, web serverHTML NeinScripts for grabbing the whole wiki; does not include images.
Text_Wiki_Mediawiki MultiplePHPmarkupHTML, LaTeX, Plain text NeinPart of the Text_Wiki library.LGPL
TomeRaider export Erik ZachtePerlXML dumpTomeRaider database NeinSee en:Wikipedia:TomeRaider database for more details.
Waikiki Magnus ManskeC++SQL dump (via SQLite)HTML NeinAbandoned in favour of "flexbisonparse", but has been used inside some experimental "front ends".
Wikiwyg (Archived 2008-12-16 at the Wayback MachineCategory:Webarchive template wayback links) Jim HigsonJavaScriptA live installation of MediaWikiHTML (via XML) NeinMore than just a parser; attempts to create a fully functional client-side interface.
wik2dict GuakaPythonSQL dumpDICT Nein
wiki2pdf Stephan WalterPython (and PHP)markup fragment or set of online articlesLaTeX, PDF NeinProject is incomplete and dormant.
WikiPDF Felipe SanchesPython (and PHP)One selected articleLaTeX based on templates, PDF NeinMediawiki extension that uses Stephan Walter's wiki2pdf as backend.
Wikifilter  ?C++ (VS)XML dumpsHTML NeinA Windows program that uses Apache/IIS to serve the pages. Abandoned in 2006, before ParserFunctions were available.
Wikipedia Dump Reader Benjamin ThyreauPythonXML dumpsOn screen NeinCross platform viewer.GPLv2/~BSD license
Marker Ryan BlueRubymarkup (subset)HTML or formatted text NeinMarker is a Ruby implementation of a subset of the MediaWiki markup language, intended bring MediaWiki's markup language to non-wiki applications with multiple output formats.GPL
Kiwi Thomas Luce, Karl Matthias, AboutUs.orgC, Ruby, PEGmarkupHTML almostKiwi is a PEG-based C implementation with Ruby bindings and a command line parser. It is very fast and supports most of the MediaWiki syntax.BSD


Ein Nicht-Parser-Dumper

One of the common uses of alternative parsers is to dump wiki content into static form, such as HTML or PDF. Tim Starling has written a script which isn't a parser, but uses the MediaWiki internal code to dump an entire wiki to HTML, from the command-line. See Extension:DumpHTML. This has been used (years ago) to create the static dumps at https://dumps.wikimedia.org

There are also similar dumpers as part of the Kiwix project, for example mwoffliner, and you can query the RESTBase API to obtain HTML-format output with semantic information (such as tranclusions) included.

Verwandte Themen

Category:Parser/de
Category:Parser/de Category:Webarchive template wayback links