Extension:ConvertPDF2Wiki

Category:Extensions without an imageCategory:GPL licensed extensions
MediaWiki extensions manual
ConvertPDF2Wiki
Release status: stableCategory:Stable extensions
Implementation Special page Category:Special page extensions, API Category:API extensions
Description Allows users to import a PDF and convert it to a wiki page, including embedded images
Author(s) Bertrand Gorge (BertrandGorgetalk)
Latest version 1.0 (2024-12-09)
Compatibility policy Master maintains backward compatibility.Category:Extensions with master compatibility policy
MediaWiki 1.39+Category:Extensions with manual MediaWiki version
PHP 8.1+
License GNU General Public License 2.0 or later
Download Category:Extensions in GitHub version control
https://github.com/neayi/mw-convertPDF2Wiki/blob/main/README.md
Category:All extensionsCategory:Extensions not in ExtensionJson

The ConvertPDF2Wiki extension allows a PDF to be imported as a wiki page, extracting images and text as much as possible.

Usage

This extension adds a special page, "Special:Import_PDF", that allows you to upload a PDF file (or point to the URL of a PDF file somewhere on the web) and then converts the PDF to a wiki, creating a new page.

The process is as follows:

  1. Go to the new special page: "Special:Import_PDF"
  2. Select the PDF file
  3. Choose the images you want to keep (get rid of logos or other nonessential images)
  4. Rotate images that might be upside down
  5. Select a title for the new page in the wiki (a default title is guessed from the PDF document)
  6. Edit your page to polish the details (tables might need to be recreated, etc...)

The selected images are imported with a name that matches the page title and added at the bottom of the page in case they do not appear in the text's flow.

If the title matches an existing page, the converted text is added at the bottom of the existing page.

Installation

Dependencies

The extension relies on the following three utilities that must be installed as well:

ImageMagick

ImageMagick is used to rotate the images. See: https://imagemagick.org/

To install:

$ pecl install imagick

pdftohtml

PDFtoHTML is used to convert the PDF to an HTML document. See: https://poppler.freedesktop.org

To install:

$ apt-get install poppler-utils

Pandoc

Pandoc is used to convert from HTML to Wikitext. See https://pandoc.org/installing.html

To install:

$ apt-get install pandoc

See also

Here are some other extensions that do a similar job with Docx documents (PDF can also easily be converted to docx):

Category:API extensions Category:All extensions Category:Extensions in GitHub version control Category:Extensions not in ExtensionJson Category:Extensions with manual MediaWiki version Category:Extensions with master compatibility policy Category:Extensions without an image Category:GPL licensed extensions Category:Special page extensions Category:Stable extensions