Commons:Machine-readable data/eo

Shortcut: COM:MRD

On Wikimedia Commons, a lot of metadata (including license and author) are not machine readable. There is an API module, iiprop=extmetadata which can be used to retrieve some values (example), but as the information is entered as free text into the file description page itself, this is not perfect. The ongoing Structured data on Commons project aims to move the metadata as fully-structured data and will eventually supersede the machine-readable data presented in this page.

In the meantime, and to ease a transition towards more structured data at a future time, Wikimedia Commons use a set of standard templates which have been made machine-readable in some ways, through HTML elements. Some scripts already make use of that. It is worth noting that this data is available for any wiki using Wikimedia Commons, where it can be read from the html of the File: page just as other local data.

Maŝino-legebla dateno

Machine readable data set by infobox templates

These are several standard infobox templates tagging different elements of the template with different tags to allow parsing of the information. Several different styles of tags are used:

  • Microformat tags follow industry standards and can be parsed by already existing tools.
  • <td> id attributes (identifiers) are custom markings which allow more complete tags, which have to be read by custom tools. Most universal infoboxes have two column structure: column #1 holds name of the field and column #2 holds the value
    • Traditionally <td> id attributes were used to tag the name call in the first column in a row. To get the data, you would need to get the contents of the following <td> cell in the second column.
    • {{Creator}} and {{Institution}} templates have more complicated structure, so the cells with the actual data are tagged with attributes using magenta background.
Ŝablono Nomo de ŝablona parametro Priskribo <td> identigilo-atributo Microformat Komento
{{Information}}descriptionpriskribo de dosierofileinfotpl_deschProduct.description.Often contains multiple languages annotated with {{Lang}}.
{{Information}}datedate the original work was createdfileinfotpl_datehCalendar vevent.dtstartSometimes additionally, or only, contains publication date. These two dates have different meanings for copyright. When used, {{Date context}} can indicate the difference. Microformat added by {{Date}} template
{{Information}}sourcefonto de dosierofileinfotpl_srcOften contains entire tables. We have no good way to deal with this source templates yet. Source templates often have references to catalogue IDs, but these are also not machine readable.
{{Information}}authoraŭtoro de dosierofileinfotpl_autThis can be author, creator and/or copyright holder and is used mixed. Often contains the {{Creator}} template which is described below.
{{Information}}permissionpermesilo pri la dosierofileinfotpl_perm
{{Information}}other versionsaliaj versioj de la dosierofileinfotpl_ver
{{Artwork}}descriptionpriskribo de la artobjektofileinfotpl_deschProduct.description
{{Artwork}}datedato kiam la artobjekto kreiĝisfileinfotpl_datehCalendar vevent.dtstartmicroformat added by {{Date}} template
{{Artwork}}sourcefonto de dosierofileinfotpl_src
{{Artwork}}artistkreinto de la artobjektofileinfotpl_aut"hProduct.fn value"
{{Artwork}}authoraŭtoro de la artobjektofileinfotpl_aut"hProduct.fn value"
{{Artwork}}permissionpermesilo pri la dosiero kaj la artobjektofileinfotpl_perm
{{Artwork}}other versionsaliaj versioj de la dosierofileinfotpl_ver
{{Artwork}}titletitolo de la artobjektofileinfotpl_art_titlehProduct.fn
{{Artwork}}object typespeco de la artobjektofileinfotpl_art_object_type
{{Artwork}}mediumtechnique or medium of the artworkfileinfotpl_art_medium
{{Artwork}}dimensionsgrando de la artobjektofileinfotpl_art_dimensions
{{Artwork}}galleryinstitution holding the artworkfileinfotpl_art_gallery
{{Artwork}}locationlocation of the artwork within the institutionfileinfotpl_art_locationhProduct.locality
{{Artwork}}accession numberaccession number of the artworkfileinfotpl_art_idhProduct.identifier
{{Artwork}}object historyhistorio de la artobjektofileinfotpl_art_object_history
{{Artwork}}exhibition historyekspozicia historio de la artobjektofileinfotpl_art_exhibition_history
{{Artwork}}credit linecredit line of the artworkfileinfotpl_art_credit_line
{{Artwork}}inscriptionsinscriptions on the artworkfileinfotpl_art_inscriptions
{{Artwork}}notesnoto pri la artobjektofileinfotpl_art_notes
{{Artwork}}referencesreferencoj pri la artobjektofileinfotpl_art_references
{{Book}}Authoraŭtoro de la librofileinfotpl_author
{{Book}}Editorredaktoro de la librofileinfotpl_book_editor
{{Book}}Translatortradukinto de la librofileinfotpl_book_translator
{{Book}}Illustratorilustrinto de la librofileinfotpl_book_illustrator
{{Book}}Titletitolo de la librofileinfotpl_book_title
{{Book}}Subtitlesubtitolo de la librofileinfotpl_book_subtitle
{{Book}}Series titletitolo de la libroserio al kiu apartenas la librofileinfotpl_book_series-title
{{Book}}Authority filedateno por aŭtoritata kontrolofileinfotpl_book_authority
{{Book}}Publishereldonejo de la librofileinfotpl_book_publisher
{{Book}}Printerpresejo de la librofileinfotpl_book_printer
{{Book}}Year of publicationdato aŭ jaro kiam la libro estis eldonitafileinfotpl_date
{{Book}}Place of publicationloko en kiu la libro estis eldonitafileinfotpl_book_place-of-publication
{{Book}}Languagelingvo de la librofileinfotpl_book_language
{{Book}}Descriptionpriskribo de la librofileinfotpl_desc
{{Creator}}NameNomo de kreintocreatorvCard.fn
{{Creator}}Alternative namesAlia(j) nomo(j) de kreintofileinfotpl_creator_alt-name_valuevCard.nickname
{{Creator}}DescriptionNacieco(j) kaj okupo(j) de la kreintofileinfotpl_creator_desc_valuevCard.note
{{Creator}}Date of deathMortodato de la kreintofileinfotpl_creator_deathdate_value
{{Creator}}Date of birthDato de naskiĝo de kreintofileinfotpl_creator_birthdate_valuevCard.bday
{{Creator}}Location of birth/deathLoko de morto de kreintofileinfotpl_creator_deathloc_value
{{Creator}}Location of birthLoko de naskiĝo de kreintofileinfotpl_creator_birthloc_value
{{Creator}}Work periodWork period of creatorfileinfotpl_creator_work-period_value
{{Creator}}Work locationWork location of creatorfileinfotpl_creator_work-location_valuev
{{Creator}}Imageportreto aŭ fotografaĵo prezentanta la kreintonfileinfotpl_creator_image
{{Creator}}Authority fileAuthority control related to the creatorfileinfotpl_creator_authority_value


{{FileContentsByBot}}(various)depends, please confer {{FileContentsByBot}}(various)hproduct-by-botbig data set and still growing, please confer {{FileContentsByBot}}
{{Photograph}}titletitolo de la fotografaĵofileinfotpl_art_titlehProduct.fn
{{Photograph}}descriptionpriskribo de la fotografaĵofileinfotpl_deschProduct.description
{{Photograph}}original descriptionoriginala arkiva priskribo de la fotografaĵofileinfotpl_deschProduct.description
{{Photograph}}datedate of creation of the original artworkfileinfotpl_datehCalendar vevent.dtstartmicroformat added by {{Date}} template
{{Photograph}}mediumtechnique or medium of the photographfileinfotpl_art_medium
{{Photograph}}dimensionsalto kaj larĝo de la fotografaĵofileinfotpl_art_dimensions
{{Photograph}}artistkreinto de la fotografaĵofileinfotpl_aut"hProduct.fn value"
{{Photograph}}institutioninstitution holding the artworkfileinfotpl_art_gallery
{{Photograph}}locationlocation of the photograph within the institutionfileinfotpl_art_locationhProduct.locality
{{Photograph}}sourcefonto de dosierofileinfotpl_src
{{Photograph}}permissionpermission/license for the file and artworkfileinfotpl_perm
{{Photograph}}other versionsalia versioj de la dosierofileinfotpl_ver
{{Photograph}}accession numberaccession number of the photographhProduct.identifier

Alternative format for CommonsMetadata

Because the table + id based format proved very hard to add to templates which were not formatted similarly to the Commons information template, CommonsMetadata allows an alternative format, similar to license templates: the whole information template has to be enclosed in a fileinfotpl class and the tag containing the specific information needs to have a fileinfotpl_* class (same names as above, but class, not id).

Maŝine legebla dateno difinita de permesilo-ŝablonoj

Introduced in October 2010, using classes <span class="licensetpl_XXX">

licensetpl
An element identifying a license. Wraps the entire license code and should be a SINGLE license, not a multi license.
licensetpl_short
Short name of the license: “Public domain”, “CC BY-SA 3.0”, “CC by 2.0 fr”, etc.
licensetpl_long
Long name of the license: “Public domain”, “Creative Commons Attribution-Share Alike 3.0”,
licensetpl_attr_req
Whether attribution is required. “true” or “false”.
licensetpl_attr
The requested attribution: Free text.
licensetpl_link_req
Whether a link to the license is required for this license. “true” or “false”.
licensetpl_link
The link to the license deed. “www.creativecommons.org/licenses/by-sa/XXX/YYY”
licensetpl_nonfree
“true“ if this is a non-free license (not used on Commons, only on wikis with an EDP)

Multiple licensetpl blocks for the same work might be wrapped in a block using the class licensetpl_wrapper.

Ŝablonoj difinantaj tiajn informojn

  • Templates setting licensetpl include:

{{PD-Layout}}, {{Cc-by-sa-3.0-migrated}}, {{Cc-by-layout}}, {{Cc-by-sa-layout}}, {{Cc-zero}}, {{FAL}}, {{GFDL}}, {{GFDL-1.2}}, {{GPL}} kaj {{LGPL}}.

Machine readable data set by style formatting templates

Style formatting templates, meant to provide uniform styles to different families of non-license templates, carry machine readable data identifying these families.

Ŝablono Intenco class name
{{Restriction-Layout}} used by Restriction tags restrictiontemplate
{{FoP-Layout}} used by freedom of panorama tags foptemplate
{{Partnership-Layout}} used by Partnership templates partnershiptemplate
{{Source-Layout}} used by generic Source templates sourcetemplate
{{Created with}} used by Created with ... templates createdwithtemplate

Templates regarding non-copyright legal restrictions carry these classes to identify specific types of restrictions.

Ŝablono(j) Intenco class name
{{Trademarked}} Trademarked images restriction-trademarked
{{Copydesign}} Copyrighted designs restriction-design
{{Communist symbol}} Komunistaj simboloj restriction-communist
{{Italy-MiBAC-disclaimer}} {{Soprintendenza}} Italia kultura varo restriction-ita-mibac
{{Australian Commonwealth reserve}} Australian reserves restriction-aus-reserve
{{Personality rights}} {{Romania personality rights}} Personality rights restriction-personality
{{2257}} Child Protection and Obscenity Enforcement Act warning (United States) restriction-2257
{{Costume}} Kostumaĵo restriction-costume
{{Fan art}} Fervorula arto restriction-fan-art
{{Currency}} Valuto restriction-currency
{{IHL Symbol}} Symbols restricted by International Humanitarian Law restriction-ihl
{{Nazi symbol}} Naziaj aŭ faŝisma simbolo restriction-nazi
{{Insignia}} Official insignia restriction-insignia

Maŝine legebla dateno difinitaj de specifaj ŝablonoj

More machine-readable data are set. Here is a non-exhaustive list:

{{Personality rights}}
<span class="commons-template-name" style="display:none" id="commons-template-personality-rights">Personality rights</span>
{{Credit line}}
<td id="fileinfotpl_credit" class="fileinfo-paramfield fileinfotpl_credit" style=""></td>

Machine-readable data set by location templates

{{Location}} and similar templates add machine-readable geocodes in the following format: <span class="geo">12.34;24.68</span> (latitude and longitude as floating-point numbers, separated by a semicolon). The coordinates use the en:WGS84 system (same as the GPS and most online maps). See Commons:Geocoding for more details.

Uzado

Aplikprograma Interfaco de MediaVikio

The MediaWiki API now serves a limited number of metadata. Consider the following query:

(Open in API Sandbox) that returns some useful parameters such as Credit, Artist, LicenseUrl and Copyrighted and is used by Media Viewer, for example.

Scripts using machine-readable data

Eksteraj iloj

Vidu ankaŭ

Difini novan maŝin-legeblan datenon

  • Do NOT use HTML id's, use classes. An ID can only be used once per page and most of these fields can occur multiple times per page. Consider for instance descriptions of derivative works, which can include information about the original and the derivative.
  • When possible, wrap the actual data, not some field header. This last method is historically used for all our Information templates, but much harder to support in the long run.
  • Wrap data, not the way the data is formatted.
  • Expect that formatting is lost when converting to data. Visual dress up is not part of the information.
  • Don't wrap multiple units of information inside one field. There is a difference between a publication date and a creation date. Both are dates, but both are different 'data fields'. Also CC BY-SA-4.0-3.0-2.5 is not a license name, those would be 3 licenses with the name CC BY-SA-##.
  • Make sure that the data value has one unit, or outputs one consistent unit.

Problemoj

Jen aferoj ne jam maŝine legeblaj:

  • Derivative works
  • Works included in works. See also Category:FoP_templates
  • licenses derivates or works included in works are a mess.
  • Author vs. Copyright holder
  • usernames vs 'real names'
  • Catalogue IDs etc
  • VRTS permissions
  • Publication date vs creation date
Category:Commons help/eo
Category:Commons help/eo