Module:category tree/topic/Places
- The following documentation is generated by Template:topic cat data submodule documentation. [edit]
- Useful links: root page • root page’s subpages • links • transclusions • testcases • sandbox
IntroductionIntroduction
This is the documentation page for the main data module for the Module:category tree/topic category tree subsystem, as well as for its submodules. Collectively, these modules handle generating the descriptions and categorization for topic pages such as Category:en:Birds, Category:es:France and Category:zh:State capitals of Germany, and the corresponding non-language-specific pages such as Category:Birds, Category:France and Category:State capitals of Germany. (All other categories handled through the {{auto cat}}
system are handled by the Module:category tree/poscatboiler subsystem.)
The main data module at Module:category tree/topic does not contain data itself, but rather imports the data from its submodules, and applies some post-processing.
- To find which submodule implements a specific category, use the search box on the right.
- To add a new data submodule, copy an existing submodule and modify its contents. Then, add its name to the
subpages
list at the top of Module:category tree/topic.
ConceptsConcepts
Per-language and umbrella categoriesPer-language and umbrella categories
The topic cat system internally makes a distinction based on which languages a category applies to:
- Per-language categories. These are of the form
langcode:label
(e.g. Category:es:Birds and Category:de:States of the United States). Here,langcode
is the language code of a recognized full Wiktionary language (see WT:LOL for the list of all such languages and their codes), andlabel
is a topic, generally one that can apply to multiple languages. The intended category contents is terms in the language in question that are either related to, instances of or types of the topic in question (depending on the type of category; see below). Associated with each per-language category is an umbrella category; see below. The following restrictions apply to per-language categories:- The language mentioned by
langcode
must currently be a full language, not an etymology-only language. (Etymology-only languages include lects such as Provençal, considered a variety of Occitan, and Biblical Hebrew, considered a variety of Hebrew. See here for the list of such lects.) - The category label specified by
label
as found in the category name always begins with a capital letter, whether or not the underlying form of the label is capitalized (contrast Category:en:Birds with Category:en:France). Internally, this is different, and the internal form of a label begins with a lowercase or uppercase letter as appropriate (birds but France).
- The language mentioned by
- Umbrella categories. These are of the form
label
, i.e. a bare category label. As with per-language categories, this label is always capitalized in the category name, regardless of the underlying form of the label. Examples are Category:Birds, Category:France and Category:State capitals of Germany. Umbrella categories serve to group all the per-language categories for a particular topic. They also serve to group more specific subcategories, e.g. under Category:Birds can be found Category:Birds of prey, Category:Freshwater birds, Category:Columbids (which includes doves and pigeons), etc. as well as Category:Eggs and Category:Feathers. Umbrella categories should not normally directly contain any terms. - Unlike for the poscatboiler system, language-specific categories do NOT currently exist. These would be topics that only make sense for a given language or small set of languages, and which are allowed for that language or those languages. Currently, all topics are cross-language even if in practice they don't make sense except in conjunction with a subset of languages; but this may change in the future.
Category typesCategory types
In addition to the above distinction, the topic cat system divides categories according to the category type, which specifies the relationship between the category and the members of that category:
- Related-to categories (
type = "related-to"
) contain terms that are semantically related to the category topic. For example, Category:en:Chess contains terms such as checkmate, rank (a row on a chessboard), endgame, en passant, Grandmaster, etc. "Related to" is a nebulous criterion, and as a result the terms in the category should be related to the category as directly as possible, to avoid the category becoming a grab bag of random terms. - Name (
type = "name"
) categories contain terms that are names of individual, specific instances of the category. For example, Category:Chess openings contains names of specific openings, such as Ruy Lopez and Sicilian Defense. Even more clearly, Category:Moons of Jupiter contains names of individual moons that orbit the planet Jupiter. - Type (
type = "type"
) categories contains terms for types of the entity described by the category name. For example, Category:Checkmate patterns contains types of checkmates, such as ladder mate and smothered mate. Even more clearly, Category:Hobbyists contains terms for types of hobbyists, such as oenophile (a wine enthusiast), numismatist (a stamp collector), etc. (If this were a name category, it would contain names of specific, presumably famous, hobbyists — something that would probably not be dictionary-worthy material.) - Set (
type = "set"
) categories are used when the distinction between names and types of a given topic may not always be clear, but the overall membership is still well-defined. For example, Category:Heraldic charges contains terms for components of coats of arms, e.g. bend sinister (a diagonal band from lower left to upper right), fleur-de-lis (a stylized image of a lily, as is commonly associated with New Orleans) and quatrefoil (a symmetrical shape made from the outline of four circles). - Grouping (
type = "grouping"
) categories are higher-level categories that are used only to group more specific categories and should not contain elements themselves (but nevertheless sometimes do). An example is Category:Industries, which contains subcategories devoted to particular industries (e.g. Category:Banking, Category:Mining, Category:Music industry, Category:Oil industry, etc.). - Top-level (
type = "toplevel"
) categories are special high-level categories that list all the categories of one of the above types, and which are always namedList of type categories
, e.g. Category:List of related-to categories (listing all the "related-to" umbrella categories) or Category:es:List of name categories (listing all the Spanish name-type categories). The number of top-level categories is fixed.
Note that name, type and set categories are conceptually similar to each other, in that each contains terms that have an is-a relationship with the topic in question, whereas related-to categories express a weaker sort of relation between term and topic, merely asserting that the term is in some way "related" or "pertinent" to the topic in question. For this reason, when creating new topics, you should always strive to create name, type or set topics whenever possible, and avoid related-to topics unless there is no alternative and you're convinced this topic is really necessary. Before creating such a category:
- Consider whether there is another category already in existence that will cover this semantic space.
- Consider whether you can convert the category to a name, type or set category.
- Investigate whether there needs to be a category for the semantic concept at all (in particular, abstract concepts often do not merit related-to categories).
- Make sure there are enough terms to fill up this category in at least two languages (one of which should be English). What qualifies as "enough" varies a bit from topic to topic but generally should be at least 10.
- Make sure the terms you add or consider adding to this category are directly related to the topic at hand. Do not add terms merely because the term contains the name of the topic in it (e.g. if you create a category named
brick
, do not add terms like brick house, thick as a brick or yellow brick road merely becaues they have the word "brick" in them; instead, use the ===Related terms=== section of the brick lemma to include these terms).
It should also be noted that name, type and set categories typically use the plural in their topic name, which related-to categories often use the singular. This is not a hard and fast rule, however, and there are exceptions in both directions. If it's not obvious what type of category a given topic refers to, consider making this explicit in the topic name, e.g. names of stars
or types of stars
rather than just stars
. (In the future, all, or at least most, topic categories may be named in such a fashion.)
Adding, removing or modifying categoriesAdding, removing or modifying categories
A sample entry is as follows (in this case, found in Module:category tree/topic/History):
labels["ancient history"] = { type = "related-to", description = "default", parents = {"history"}, }
This generates the description and categorization for all per-language categories of the form langcode:Ancient history
(e.g. Category:en:Ancient history) as well as for the umbrella category Category:Ancient history (see above for the definition of per-language and umbrella categories).
The meaning of this snippet is as follows:
- The label itself needs to use proper capitalization or lower case in the first letter of the label, even though the label as it appears in the category name is always capitalized, consistent with the principle that category names begin with a capital letter. In this case, the label is lowercase, and other labels that reference it need to use the same casing (as in the example below). By contrast, a label like
Ancient Near East
(as in the example below) is capitalized because the label refers to a specific region, and toponyms are capitalized in English. - the
type
field specifies the category type, as described above. This label is a "related-to" category. - The
description
field gives the description text that will appear when a user visits the category page. Certain special values are recognized, including"default"
, which generates a default label. The value of the default label depends on the label's name, the language of the category, and the label's type. In this case, it is equivalent to"{{{langname}}} terms related to [[ancient]] [[history]]"
(where{{{langname}}}
is replaced with the name of the language in question) and"terms related to [[ancient]] [[history]]"
" for the umbrella category. See #Descriptions below for more information on specifying descriptions. - The
parents
field gives the labels of the parent categories. Here, the category specifies a single parent"history"
. This means that a category such as Category:en:Ancient history will have Category:en:History as its parent. An additional top-level list parent will automatically be added (in this case Category:en:List of related-to categories) as well as the umbrella parent Category:Ancient history.
Another example follows:
labels["places in Romance of the Three Kingdoms"] = { type = "name", displaytitle = "places in ''Romance of the Three Kingdoms''", description = "=places in ''{{w|Romance of the Three Kingdoms}}''", parents = {"Romance of the Three Kingdoms", "China"}, }
This is a subcategory of "Romance of the Three Kingdoms"
(a 14th century Chinese historical novel) and accordingly specifies "Romance of the Three Kingdoms"
as the parent, along with "China"
(note the capitalization, in accordance with the principles laid out above). A description is given explicitly, preceded by =
(which in this case prepends "names for specific" to the description). The displaytitle
field is also set so that the name of the work is italicized.
Category label fieldsCategory label fields
The following fields are recognized for the object describing a label:
type
- The type of the label ("related-to", "name", "type", "set", "grouping" or "toplevel", as described above. Mandatory. It is possible to specify multiple comma-separated types, for "mixed" categories that can contain more than one type of term. For example, the label
flags
currently hastype = "related-to,name,type"
because it contains a mixture of terms related to flags (e.g. flagpole and grommet), terms for individual flags (e.g. Star-Spangled Banner) and terms for types of flags (e.g. prayer flag, flag of convenience). Mixed categories are strongly dispreferred and should be split into separate per-type categories. description
- A plain English description for the label. This should generally be no longer than one sentence. Place additional, longer explanatory text in the
additional
field described below, and put{{wikipedia}}
boxes in thetopright
field described below so that they are correctly right-aligned with the description. Template invocations and special template-like references such as{{{langname}}}
and{{{langcode}}}
will be expanded appropriately; see #Template substitutions in field values below. Certain values are handled specially, including"default"
(and variants such as"default with the"
,"default wikify"
and"default no singularize"
) and phrases preceded by an=
sign, as explained in more detail below. parents
- A table listing one or more parent labels of this label. This controls the parent categories that the category is contained within, as well as the chain of breadcrumbs appearing across the top of the page (see below).
- An item in the table can be either a single string (the parent label), or a table containing (at least) the two elements
name
andsort
. In the latter case,name
specifies the parent label name, while thesort
value specifies the sort key to use to sort it in that category. The default sort key is the category's label. - If a parent label begins with
Category:
it is interpreted as a raw category name, rather than as a label name. It can still have its own sort key as usual. - The first listed parent controls the category's parent breadcrumb in the chain of breadcrumbs at the top of the page. (The breadcrumb of the category itself is determined by the
breadcrumb
setting, as described below.)
- An item in the table can be either a single string (the parent label), or a table containing (at least) the two elements
breadcrumb
- The text of the last breadcrumb that appears at the top of the category page.
- By default, it is the same as the category label, with the first letter capitalized.
- The value can be either a string, or a table containing two elements called
name
andnocap
. In the latter case,name
specifies the breadcrumb text, whilenocap
can be used to disable the automatic capitalization of the breadcrumb text that normally happens. - Note that the breadcrumbs collectively are the chain of links that serve as a navigation aid for the hierarchical organization of categories. For example, a category like Category:en:Ancient Near East will have a breadcrumb chain similar to "Fundamental » All languages » English » All topics » History » Ancient history » Ancient Near East", where each breadcrumb is a link to a category at the appropriate level. The last breadcrumb here is "Ancient Near East", and its text is controlled by this field.
displaytitle
-
- Apply special formatting such as italics to the category page title, as with the
{{DISPLAYTITLE:...}}
magic word (see mw:Help:Magic words). The same formatting is also applied to breadcrumbs, descriptions and other mentions of the label in formatted text. The value of this is either a string (which should be the formatted label, e.g."The Matrix"
,"people in Romance of the Three Kingdoms"
or"Glee (TV series)"
) or a Lua function to generate the formatted category title. The Lua function is passed two parameters: the raw label (without any preceding language code) and the language object of the category's language (ornil
for umbrella categories). It should return the appropriately formatted label. If the value of this field is a string, template invocations and special template-like references such as{{{langname}}}
and{{{langcode}}}
will be expanded appropriately; see below. See Module:category tree/topic/Culture for examples of usingdisplaytitle
.
- Apply special formatting such as italics to the category page title, as with the
topright
- Introductory text to display right-aligned, before the edit and recent-entries boxes on the right side. This field should be used for
{{wikipedia}}
and other similar boxes. Template invocations and special template-like references such as{{{langname}}}
and{{{langcode}}}
are expanded appropriately, just as withdescription
; see #Template substitutions in field values below. Compare thepreceding
field, which is similar totopright
but used for left-aligned text placed above the description. preceding
- Introductory text to display directly before the text in the
description
field. The difference between the two is thatdescription
text will also be shown in the list of children categories shown on the parent category's page, while thepreceding
text will not. For this reason, usepreceding
instead ofdescription
for{{also}}
hatnotes and similar text, and keepdescription
relatively short. Template invocations and special template-like references such as{{{langname}}}
and{{{langcode}}}
are expanded appropriately, just as withdescription
; see #Template substitutions in field values below. Compare thetopright
field, which is similar topreceding
but is right-aligned, placed above the edit and recent-entries boxes. additional
- Additional text to display directly after the text in the the
description
field. The difference between the two is thatdescription
text will also be shown in the list of children categories shown on the parent category's page, while theadditional
text will not. For this reason, useadditional
instead ofdescription
for long explanatory notes, See also references and the like, and keepdescription
relatively short. Template invocations and special template-like references such as{{{langname}}}
and{{{langcode}}}
are expanded appropriately, just as withdescription
; see #Template substitutions in field values below. wp
- Display a box linking to a Wikipedia entry in the upper right corner. The value can either be
true
to link to an entry that is the same as the label; a string, to link to that entry; or a list of strings ortrue
, to generate multiple boxes, one per list item. For example, if the labelpesäpallo
haswp = true
, a box will be generated that links to Pesäpallo on Wikipedia, and if the labelfootball (American)
haswp = "American football"
, a box will be generated that links to American football on Wikipedia. wpcat
- Display a box linking to a Wikipedia category in the upper right corner. This is similar to
wp
except that the link is to a category (the generated entry or entries is/are prepended withCategory:
). For example, if the labelanimals
haswpcat = true
set, a box will be generated that links to Category:Animals on Wikipedia. commonscat
- Display a box linking to a Wikimedia Commons category in the upper right corner. This is similar to
wpcat
except that the link is to Wikimedia Commons instead of Wikipedia. For example, if the labelracquet sports
hascommonscat = true
set, a box will be generated that links to Category:Racquet sports on Wikimedia Commons. topic
- Text indicating the topic being handled by this category. This appears in the auto-generated "additional" message following the description, which indicates what type this category is (based on the
type
field) and what sorts of terms should go into it. This does not normally need to be specified, as it's derived directly from the label. But it is useful e.g. for the label types of planets, which setstopic = "planets"
, because the auto-generated "additional" message contains the text" ... It should contain terms for types of {{{topic}}}, ..."
, and using the label directly will result in redundant text. Template invocations and special template-like references such as{{{langname}}}
and{{{langcode}}}
are expanded appropriately, just as withdescription
; see #Template substitutions in field values below. The value of this field can be"default"
or"default with the"
, which will be expanded appropriately based on the label. umbrella
- A table describing the umbrella category that collects all language-specific categories associated with this label. The umbrella category is named using the label, without any language prefix. For example, for the label ancient history, the umbrella category is named Category:Ancient history, and is a parent category (in addition to any categories specified using
parents
) of Category:en:Ancient history, Category:fr:Ancient history and all other language-specific categories holding adjectives. This table contains the following fields:description
- A plain English description for the umbrella category. By default, it is derived from the
description
field of the label itself by removing language references (specifically,{{{langname}}}
,{{{langcode}}}:
,{{{langcode}}}
and{{{langcat}}}
) and adding This category concerns the topic: before the result. Text is automatically added to the end indicating that this category is an umbrella category that only contains other categories, and does not contain pages describing terms. breadcrumb
- The last breadcrumb in the chain of breadcrumbs at the top of the category page; see above. By default, this is the category label.
topright
- Like the
topright
field on regular category pages; see above. preceding
- Like the
preceding
field on regular category pages; see above. additional
- Like the
additional
field on regular category pages; see above. topic
- Like the
topic
field on regular category pages; see above.
umbrella_description
- The same as the
description
subfield of theumbrella
field.
Template substitutions in field valuesTemplate substitutions in field values
Template invocations can be inserted in the text of description
, parents
(both name and sort key), breadcrumb
, toc_template
and toc_template_full
values, and will be expanded appropriately. In addition, the following special template-like invocations are recognized and replaced by the equivalent text:
{{PAGENAME}}
- The name of the current page. (Note that two braces are used here instead of three, as with the other parameters described below.)
{{{langname}}}
- The name of the language that the category belongs to. Not recognized in umbrella fields.
{{{langcode}}}
- The code of the language that the category belongs to (e.g.
en
for English,de
for German). Not recognized in umbrella fields. {{{langcat}}}
- The name of the language's main category, which adds "language" to the regular name. Not recognized in umbrella fields.
{{{langlink}}}
- A link to the language's main category. Not recognized in umbrella fields.
{{{umbrella_msg}}}
- The message normally at the end of the description for umbrella categories, indicating that the category contains no terms but only subcategories.
{{{topic}}}
- The value of the
topic
field (or theumbrella.topic
field for umbrella categories), if specified; else, the value ofdisplaytitle
(if specified) or the label, with "the" added if the description is"default with the"
or a variant containing"with the"
(such as"default with the wikify"
).
DescriptionsDescriptions
The description field is of one of three types:
- An English sentence, ending in a period.
- A phrase preceded by
=
and not ending in a period. - The value
"default"
or one of its variants, such as"default with the"
or"default wikify"
.
If preceded by =
, the description is generated from the specified phrase by prepending {{{LANGNAME}}}
(which is replaced with the language name) followed by standard type-dependent text, and appending a period. The text prepended is currently as follows:
Type | Text |
---|---|
related-to |
terms related to |
set |
terms for types or instances of |
name |
names of specific |
type |
terms for types of |
grouping |
categories concerning more specific variants of |
toplevel |
N/A |
For example, for the label biblical characters
, the description is currently "=characters in the [[Bible]]"
, which expands to {{{LANGNAME}}} names of specific characters in the [[Bible]].
, and in turn is expanded to e.g. French names of specific characters in the [[Bible]].
(if the category is Category:fr:Biblical characters).
Note that no standard text is provided for top-level categories, all of which include a custom description.
If "default"
or one of its variants is used as the description, a default description is generated as if the description consisted of =
prepended to the label, except that the word the
might be added to the beginning of the label, and the words in the label might be wikilinked. Specifically:
- If the description is of the form
"default with the"
(or a form such as"default with the wikify"
,"default with the no singularize"
, etc.), the wordthe
is prefixed to the label. - If the label is of the form
"default wikify"
(or a related form), the label is linked to Wikipedia. If the label ends in an -s, the label is linked to a Wikipedia entry based on the singular form of the label (which converts -ies to -y; converts -xes, -ches or -shes, respectively, to -x, -ch or -sh; and otherwise just removes -s), unless the label is"default wikify no singularize"
or a related form, in which case the label is linked unchanged. - Otherwise, the code attempts to link the entire label or the individual words of the label to Wiktionay terms, as follows:
- If the label ends in -s and
no singularize
is not specified in the description, and the singular form of the label (generated according to the algorithm described just above) is a Wiktionary term, the label is linked to that term. Note that "is a Wiktionary term" simply means that a page of this name exists; the code does not currently check to see whether there is an English entry or whether the term is a lemma. - Otherwise, if the label itself is a Wiktionary term, the label is linked to that term.
- Otherwise, the label is split into individual words, and each word is checked to see if a page named according to that word exists. If so, the individual words are linked to their corresponding Wiktionary entries; otherwise, the label is left unlinked. Note that the last word is handled specially if it ends in -s and
no singularize
is not found in the description, in that the code first attempts to link the word to its singular equivalent, falling back to the word itself if the singular equivalent doesn't name a Wiktionary term.
- If the label ends in -s and
For example, a label video games
will be linked as [[video game]]s
because the page video game exists, but Arabic deities
will be linked as [[Arabian]] [[deity|deities]]
because neither Arabian deity nor Arabian deities exists as a page. The use of no singularize
is needed with labels such as linguistics
, comics
and humanities
, because their respective singular forms linguistic, comic and humanity exist as Wiktionary pages.
Finally, note that the components of a default-type description (wikify
, with the
and no singularize
) can be given in any order if more than one of them needs to be specified.
HandlersHandlers
It is also possible to have handlers that can handle arbitrarily-formed labels, e.g. political divisions of country
for any country
(categories such as Category:tg:Emirates of the United Arab Emirates) or divisions of polity
for any division
and polity
(e.g. Category:fr:Counties of South Korea or Category:pt:Municipalities of Tocantins, Brazil). Currently, handlers exist only in the toponym-handling code in Module:category tree/topic/Places and in Module:category tree/topic/Names. As example, the following is the handler for script letter names
:
table.insert(handlers, function(label) local script = label:match("^(.*) letter names$") if script then local sc = require("Module:scripts").getByCanonicalName(script) if sc then local script_page local appendix = ("Appendix: %s script"):format(script) local appendix_title = mw.title.new(appendix) if appendix_title and appendix_title.exists then script_page = appendix else script_page = "w:" .. sc:getWikipediaArticle() end local link = ("[[%s|%s script]]"):format(script_page, script) return { type = "name", description = ("{{{langname}}} terms that serve as names for letters and symbols directly based on letters, " .. "such as [[ligature]]s and letters with [[diacritic]]s, of the %s."):format(link), parents = {"letter names"}, } end end end)
The handler checks is passed a single argument (the label), checks if the passed-in label has a recognized form, and if so, returns an object that follows the same format as described above for directly-specified labels. In this case, the handler makes sure the given script name specifies an actual script, and constructs an appropriate link for the script, depending on whether an appendix page for the script exists (falling back to Wikipedia).
NOTE: The handler needs to be prepared to handle both umbrella categories and per-language categories. The label is passed in as it appears in the category; this means the handler may need to handle both uppercase-initial and lowercase-initial variants of the label. (For this handler, this isn't an issue because the script always appears uppercased.) One way to do that is to convert the label to lowercase-initial before further processing, using mw.getContentLanguage():lcfirst()
.
Note also that if a handler is specified, the module should return a table holding both the label and handler data; see the above modules.
SubpagesSubpages
- Animals
- Body
- Buildings and structures
- Communication
- Culture
- Design
- Food and drink
- Games
- History
- Human
- Lifeforms
- Mathematics
- Miscellaneous
- Music
- Names
- Nature
- Numbers
- People
- Philosophy
- Physical actions
- Places
- Plants
- Religion
- Sciences
- Sex
- Society
- Sports
- Technology
- Thesaurus
- Time
- Transport
- data
- documentation
- hierarchy
- hierarchy/documentation
- thesaurus data
- thesaurus data/documentation
- utilities
local labels = {}
local handlers = {}
local m_table = require("Module:table")
local en_utilities_module = "Module:en-utilities"
local string_utilities_module = "Module:string utilities"
local m_locations = require("Module:place/locations")
local m_placetypes = require("Module:place/placetypes")
local placetype_data = m_placetypes.placetype_data
local internal_error = m_locations.internal_error
local dump = mw.dumpObject
local insert = table.insert
local concat = table.concat
local is_callable = require("Module:fun").is_callable
--[==[ intro:
This module is part of the category tree code and contains code to generate the descriptions of place-related categories
such as [[Category:de:Hokkaido Prefecture, Japan]], [[Category:es:Cities in France]],
[[Category:pt:Municipalities of Tocantins, Brazil]], etc.). Note that this module doesn't actually create the
categories; that must be done separately, with the text "{{tl|auto cat}}" as the definition of the category. (This
process should automatically happen periodically for non-empty categories, because they will appear in
[[Special:WantedCategories]] and a bot will periodically examine that list and create any needed category.)
There are two ways that category descriptions are specified: (1) by manually adding an entry to the `labels` table,
keyed by the label (the category minus the language code) with a value consisting of a Lua table specifying the
description text and the category's parents; (2) through handlers (pieces of Lua code) added to the `handlers` list,
which recognize labels of a specific type (e.g. `Cities in France`) and generate the appropriate specification for that
label on-the-fly.
See [[Module:place]] for an introduction to the terminology associated with places along with a list of all the relevant
modules, along with for more specific information on types of toponyms and placetypes and how their categorization
works.
]==]
local function lcfirst(label)
return mw.getContentLanguage():lcfirst(label)
end
local function gsub_literally(str, from, to)
local m_strutils = require(string_utilities_module)
return (str:gsub(m_strutils.pattern_escape(from), m_strutils.replacement_escape(to)))
end
local class_to_bare_category_parent = {
["polity"] = "polities",
["subpolity"] = "political divisions",
["settlement"] = "settlements",
["non-admin settlement"] = "settlements",
["capital"] = "capital cities",
["natural feature"] = "natural features",
["man-made structure"] = "man-made structures",
["geographic region"] = "geographic and cultural areas",
}
local class_is_political_division = {
["polity"] = true, -- strictly false but there are placetypes ambiguous between polity and subpolity
["subpolity"] = true,
["settlement"] = true,
["non-admin settlement"] = false,
["capital"] = true,
["natural feature"] = false,
["man-made structure"] = false,
["geographic region"] = false,
["generic place"] = false,
}
local capital_cat_to_placetype = {}
for placetype, capital_cat in pairs(m_placetypes.placetype_to_capital_cat) do
capital_cat_to_placetype[capital_cat] = placetype
end
-- Handler for bare categories for all types of capitals. This needs to precede the handler for bare placetype
-- categories as some of the types of capitals exist as placetypes as well.
insert(handlers, function(label)
label = lcfirst(label)
local capital_placetype = capital_cat_to_placetype[label]
if capital_placetype then
local pl_placetype = m_placetypes.pluralize_placetype(capital_placetype)
local linkdesc = m_placetypes.get_placetype_display_form(pl_placetype, "top-level")
if linkdesc == nil then
internal_error("Unrecognized placetype %s when processing label %s", capital_placetype, label)
end
if linkdesc == false then
mw.log(("Display form for pl_placetype %s is false, can't categorize"):format(dump(pl_placetype)))
return nil
end
return {
type = "name",
topic = label,
description = "{{{langname}}} names of [[capital]]s of " .. linkdesc .. ".",
parents = {"capital cities"},
}
end
end)
-- Handler for bare placetype categories. FIXME: Add wpcat= and commonscat= info. Previously we had it for various
-- so-called "generic" placetypes, but sometimes the categories were wrong.
insert(handlers, function(label)
for _, canon_label in ipairs { lcfirst(label), label } do
local ptdesc, ptdata = m_placetypes.get_placetype_display_form(canon_label, "top-level", "return full")
if ptdesc then
local from_category_props = {
from_category = true,
no_split_qualifiers = true,
}
local bare_category_parent = m_placetypes.get_equiv_placetype_prop(canon_label, function(pt)
local bare_category_parent = m_placetypes.get_placetype_prop(pt, "bare_category_parent")
if bare_category_parent then
return bare_category_parent
end
local class = m_placetypes.get_placetype_prop(pt, "class")
if class then
if class_to_bare_category_parent[class] == nil then
internal_error("Saw unknown category class %s derived from placetype %s",
class, canon_label)
end
return class_to_bare_category_parent[class]
end
end, from_category_props)
if not bare_category_parent then
internal_error("Saw placetype %s without a `class` or `bare_category_parent` setting, either " ..
"directly or through a fallback", canon_label)
end
local addl_bare_category_parents = m_placetypes.get_equiv_placetype_prop(canon_label, function(pt)
return m_placetypes.get_placetype_prop(pt, "addl_bare_category_parents")
end, from_category_props)
local bare_category_breadcrumb = m_placetypes.get_equiv_placetype_prop(canon_label, function(pt)
return m_placetypes.get_placetype_prop(pt, "bare_category_breadcrumb")
end, from_category_props)
if type(bare_category_parent) == "string" and bare_category_breadcrumb then
bare_category_parent = {name = bare_category_parent, sort = bare_category_breadcrumb}
end
local parents = {bare_category_parent}
if addl_bare_category_parents then
m_table.extend(parents, addl_bare_category_parents)
end
return {
type = "name",
topic = canon_label,
description = "{{{langname}}} " .. ptdesc .. ".",
breadcrumb = bare_category_breadcrumb,
parents = parents,
}
elseif ptdesc == false then
mw.log(("Display form for canon_label %s is false, can't categorize"):format(dump(canon_label)))
end
end
end)
local function fetch_primary_placetype(key, spec)
local placetype = spec.placetype
if type(placetype) == "table" then
placetype = placetype[1]
end
if not placetype then
internal_error("No placetype specified or defaulted for key %s, spec %s", key, spec)
end
return placetype
end
--[==[
Construct an appropriately linked location based on the full or elliptical placename, preceded by `"the "`` if
appropriate. Specifically:
Fetch the full and elliptical_placenames. If they are the same, just link to the placename directly. Otherwise, check if
the full placename exists; if so link to it. Otherwise, if the elliptical placename exists, link to it but display it as
the full placename. Finally, if neither full placename nor elliptical placename exists, fall back to linking to the full
placename. That way, we prefer full placenames to elliptical placenames if both or neither exist as Wiktionary entries,
but if only one exists, we link to that one rather than have a red link.
]==]
local function construct_linked_location(group, key, spec)
local full_placename, elliptical_placename = m_locations.key_to_placename(group, key)
local linked_placename
if elliptical_placename ~= full_placename then
local full_placename_title = mw.title.new(full_placename)
if full_placename_title and full_placename_title.exists then
linked_placename = m_locations.construct_linked_placename(spec, full_placename)
else
local elliptical_placename_title = mw.title.new(elliptical_placename)
if elliptical_placename_title and elliptical_placename_title.exists then
linked_placename = m_locations.construct_linked_placename(spec, elliptical_placename, full_placename)
end
end
end
return linked_placename or m_locations.construct_linked_placename(spec, full_placename)
end
--[==[
Construct the description of a location, including its container trail either to the end or until we encounter a
`no_include_container_in_desc` setting. For example, for the city of [[Birmingham]], the description will read
`"[[Birmingham]], a [[city]] in the [[West Midlands]] (which is a [[county]] of [[England]], which is a
[[constituent country]] of the [[United Kingdom]], which is a [[country]] in [[Europe]])"`. FIXME: Possibly we should
adopt the way city descriptions used to read, which was similar to `"the city of [[Birmingham]], in the county of the
[[West Midlands]], in the [[constituent country]] of [[England]], in the [[country]] of the [[United Kingdom]], in
[[Europe]]"`.
]==]
local function construct_location_desc(group, key, spec)
local parts = {}
local function ins(txt)
insert(parts, txt)
end
ins(construct_linked_location(group, key, spec))
local iteration = 0
local need_closing_paren = false
local containers = {{group = group, key = key, spec = spec}}
local container_iterator = m_locations.iterate_containers(group, key, spec)
while true do
iteration = iteration + 1
local include_container_in_desc = false
for _, container in ipairs(containers) do
if not container.spec.no_include_container_in_desc then
include_container_in_desc = true
break
end
end
if not include_container_in_desc then
break
end
local next_containers = container_iterator()
if not next_containers then
break
end
local is_former = nil
for _, container in ipairs(containers) do
local this_is_former = container.spec.is_former_place
if is_former == nil then
is_former = this_is_former
elseif is_former ~= this_is_former then
internal_error("When processing container trail of key %s, found a mixture of former and non-former " ..
"containers: %s", key, containers)
end
end
if #containers > 1 then
local placetypes = {}
local prepositions = {}
for _, container in ipairs(containers) do
local container_type = fetch_primary_placetype(container.key, container.spec)
m_table.insertIfNot(placetypes, m_placetypes.pluralize_placetype(container_type))
m_table.insertIfNot(prepositions, m_placetypes.get_placetype_entry_preposition(container_type))
end
if iteration == 1 then
ins(", ")
elseif iteration == 2 then
ins(" (which are ")
need_closing_paren = true
else
ins(", which are ")
end
if is_former then
ins("former ")
end
ins(m_table.serialCommaJoin(placetypes))
ins(" ")
ins(concat(prepositions, "/"))
else
if iteration == 1 then
ins(", ")
elseif iteration == 2 then
ins(" (which is ")
need_closing_paren = true
else
ins(", which is ")
end
local container_type = fetch_primary_placetype(containers[1].key, containers[1].spec)
if is_former then
ins("a former ")
else
ins(m_placetypes.get_placetype_article(container_type))
ins(" ")
end
ins(container_type)
ins(" ")
ins(m_placetypes.get_placetype_entry_preposition(container_type))
end
ins(" ")
first_container = false
containers = next_containers
local container_locations = {}
for _, container in ipairs(containers) do
insert(container_locations, construct_linked_location(container.group, container.key,
container.spec))
end
ins(m_table.serialCommaJoin(container_locations))
end
if need_closing_paren then
ins(")")
end
return concat(parts)
end
-- Fetch or construct the description of the location specified by `key`. If the `keydesc` property is specified,
-- use it directly but substitute any occurrence of `+++` with the auto-constructed location description, which
-- mentions the placename corresponding to the key, its placetype and container, and repeats the description up
-- the container trail until either there are no more containers or (more usually) the `no_include_container_in_desc`
-- setting is found (which is set on all continents and continent-level regions).
local function fetch_or_construct_location_desc(group, key, spec)
local val = spec.keydesc
if is_callable(val) then
val = val(group, key, spec)
spec.keydesc = val
end
val = val or "+++"
if val:find("%+%+%+") then
val = gsub_literally(val, "+++", construct_location_desc(group, key, spec))
end
return val
end
local function normalize_cat_as(cat_as, div)
if type(cat_as) ~= "table" or cat_as.type then
cat_as = {cat_as}
end
local ret_cat_as = {}
for _, pt_cat_as in ipairs(cat_as) do
if type(pt_cat_as) == "string" then
pt_cat_as = {type = pt_cat_as}
end
insert(ret_cat_as, {type = pt_cat_as.type, prep = pt_cat_as.prep or div.prep or "of"})
end
return ret_cat_as
end
-- Find the specified plural placetype among the divs for a given known location. Return a list of cat_as specs, where
-- each spec is of the form {type = "PLURAL_PLACETYPE", prep = "PREP"} indicating the plural placetype to use when
-- categorizing and the preposition to follow.
local function find_placetype_cat_as(divs, pl_placetype)
if divs then
if type(divs) ~= "table" then
divs = {divs}
end
for _, div in ipairs(divs) do
if type(div) == "string" then
div = {type = div}
end
if div.type == pl_placetype then
local cat_as = div.cat_as or div.type
return normalize_cat_as(cat_as, div)
end
end
end
return nil
end
-- Handler for bare placename categories for known locations in `locations` in [[Module:place/locations]].
insert(handlers, function(label)
for _, canon_label in ipairs { label, lcfirst(label) } do
local group, spec = m_locations.find_canonical_key(canon_label)
if group then
-- wp= defaults to true (Wikipedia article matches location's full placename)
local wp = spec.wp
if wp == nil then
wp = true
end
-- wpcat= defaults to wp= (if Wikipedia article has its own name, Wikipedia category and Commons category
-- generally follow)
local wpcat = spec.wpcat
if wpcat == nil then
wpcat = wp
end
-- commonscat= defaults to wpcat= (if Wikipedia category has its own name, Commons category generally
-- follows)
local commonscat = spec.commonscat
if commonscat == nil then
commonscat = wpcat
end
local parents = {}
local bare_label_parents = spec.overriding_bare_label_parents
local container_iterator = m_locations.iterate_containers(group, canon_label, spec)
local containers = container_iterator()
if not bare_label_parents then
bare_label_parents = {"+++"}
end
local full_location_placename, elliptical_location_placename = m_locations.key_to_placename(group, canon_label)
local full_container_placename
if containers then
full_container_placename, _ = m_locations.key_to_placename(containers[1].group, containers[1].key)
end
local inserted_containers = false
for _, parent in ipairs(bare_label_parents) do
if parent == "+++" then
parent = "PL_PLACETYPE PREP CONTAINER"
end
if parent:find("CONTAINER") then
if not containers then
internal_error("Parent category %s needs the container of %s but no containers specified: %s",
parent, canon_label, spec)
end
local location_type = fetch_primary_placetype(canon_label, spec)
local pl_location_type = m_placetypes.pluralize_placetype(location_type)
for _, container in ipairs(containers) do
local per_container_parent = parent
local cat_as_list
if per_container_parent:find("PL_PLACETYPE") then
if spec.bare_category_parent_type then
cat_as_list = normalize_cat_as(spec.bare_category_parent_type, spec)
else
cat_as_list = find_placetype_cat_as(container.spec.divs, pl_location_type) or
find_placetype_cat_as(container.spec.addl_divs, pl_location_type)
end
end
if not cat_as_list then
local canon_placetype, ptdata, ptmatch = m_placetypes.get_placetype_data(location_type, "from category")
if not canon_placetype or not (ptdata.generic_before_non_cities or ptdata.generic_before_cities) then
internal_error("Unable to locate plural location type %s among the divs or addl_divs " ..
"for container key %s spec %s, and the location type is either not in placetype_data or " ..
"not identified as a generic placetype", pl_location_type, container.key, container.spec)
end
cat_as_list = {{type = pl_location_type, prep =
m_placetypes.get_placetype_entry_preposition(location_type)}}
end
local prefixed_key = m_placetypes.get_prefixed_key(container.key, container.spec)
per_container_parent = gsub_literally(per_container_parent, "CONTAINER", prefixed_key)
for _, cat_as in ipairs(cat_as_list) do
local per_container_per_placetype_parent = per_container_parent
per_container_per_placetype_parent = gsub_literally(per_container_per_placetype_parent, "PL_PLACETYPE",
cat_as.type)
per_container_per_placetype_parent = gsub_literally(per_container_per_placetype_parent, "PREP",
cat_as.prep)
m_table.insertIfNot(parents, per_container_per_placetype_parent)
end
end
inserted_containers = true
else
m_table.insertIfNot(parents, parent)
end
end
if not inserted_containers and containers then
-- If we didn't insert the containers above in some form, insert them now as bare categories. Note that
-- this may be different categories from the container categories inserted above.
for _, container in ipairs(containers) do
m_table.insertIfNot(parents, container.key)
end
end
if spec.addl_parents then
for _, parent in ipairs(spec.addl_parents) do
m_table.insertIfNot(parents, parent)
end
end
local function format_boxval(val, specname)
if val == true then
val = "%l"
end
if type(val) == "string" then
val = gsub_literally(val, "%l", full_location_placename)
val = gsub_literally(val, "%e", elliptical_location_placename)
if val:find("%%c") then
if not full_container_placename then
internal_error("Wikipedia/Commons spec %s = %s has %%c in it but key %s has no " ..
"containers: %s", specname, val, canon_label, spec)
end
val = gsub_literally(val, "%c", full_container_placename)
end
end
return val
end
local description = spec.fulldesc or (
"{{{langname}}} terms related to the people, culture, or territory of " ..
fetch_or_construct_location_desc(group, canon_label, spec) .. ".")
local full_placename, _ = m_locations.key_to_placename(group, canon_label)
return {
type = "topic",
description = description,
breadcrumb = full_placename,
parents = parents,
wp = format_boxval(wp, "wp"),
wpcat = format_boxval(wpcat, "wpcat"),
commonscat = format_boxval(commonscat, "commonscat"),
}
end
end
end)
local function find_canonical_key_from_place(place, canon_label)
local has_the = false
local key
if place:find("^the ") then
key = place:gsub("^the ", "")
has_the = true
else
key = place
end
local group, spec = m_locations.find_canonical_key(key)
if group then
local requires_the = spec.the or false
if has_the ~= requires_the then
if has_the then
mw.log(("Mismatch in category name '%s', has 'the' in the category when it should not"):format(
canon_label))
else
mw.log(("Mismatch in category name '%s', should have 'the' in the category but does not"):
format(canon_label))
end
return nil
end
return group, key, spec
end
return nil
end
-- Handler for generic placetypes (those whose categories are added through category generation handlers or through
-- explicit category specs in the placetype data) for known locations in [[Module:place/locations]]. All such
-- placetypes have either a `generic_before_non_cities` setting (meaning they can occur before non-city locations) or
-- `generic_before_cities` setting (meaning they can occur before cities), or both. Examples of such categories are
-- "cities in the Bahamas" or "rivers in Western Australia, Australia", or (for city locations)
-- "neighbourhoods of Hong Kong" or "places in Melbourne".
insert(handlers, function(label)
for _, canon_label in ipairs { lcfirst(label), label } do
local placetype, in_of, place = canon_label:match("^([A-Za-z%- ]-) (in) (.*)$")
if not placetype then
placetype, in_of, place = canon_label:match("^([A-Za-z%- ]-) (of) (.*)$")
end
if placetype then
local normalized_placetype = placetype == "neighbourhoods" and "neighborhoods" or placetype
local canon_placetype, ptdata, ptmatch = m_placetypes.get_placetype_data(normalized_placetype, "from category")
if canon_placetype and (ptdata.generic_before_non_cities or ptdata.generic_before_cities) then
local group, key, spec = find_canonical_key_from_place(place, canon_label)
if group then
-- Check whether the location uses British spelling, but also check all containers, because
-- it's too hard to keep in sync the `british_spelling` setting for locations at all different
-- levels (e.g. cities of various countries, first and second level administrative division, etc.),
-- so we just set it at top level on the country.
local uses_british_spelling = spec.british_spelling
if uses_british_spelling == nil then
for containers in m_locations.iterate_containers(group, key, spec) do
local must_outer_break = false
for _, container in ipairs(containers) do
if container.spec.british_spelling ~= nil then
uses_british_spelling = container.spec.british_spelling
must_outer_break = true
break
end
end
if must_outer_break then
break
end
end
end
local allow_cat = true
if placetype == "neighborhoods" and uses_british_spelling or
placetype == "neighbourhoods" and not uses_british_spelling then
mw.log(("Mismatch in spelling of placetype '%s' in category '%s', should be '%s'"):format(
placetype, canon_label, uses_british_spelling and "neighbourhoods" or "neighborhoods"))
allow_cat = false
end
if spec.is_former_place and placetype ~= "places" then
allow_cat = false
end
local expected_prep
if spec.is_city then
expected_prep = ptdata.generic_before_cities
else
expected_prep = ptdata.generic_before_non_cities
end
if not expected_prep then
allow_cat = false
end
if allow_cat then
if expected_prep ~= in_of then
mw.log(("Mismatch in category name '%s', has '%s' when it should have '%s'"):format(
canon_label, in_of, expected_prep))
return nil
end
local linkdesc = m_placetypes.get_placetype_display_form(placetype,
spec.is_city and "city" or "noncity", "return full")
if linkdesc == false then
mw.log(("Display form for placetype %s is false, can't categorize"):format(dump(placetype)))
return nil
end
if not linkdesc then
internal_error("Unrecognized placetype %s when processing key %s, data %s, label %s",
placetype, key, spec, canon_label)
end
desc = linkdesc .. " " .. in_of .. " " .. fetch_or_construct_location_desc(group, key, spec)
desc = "{{{langname}}} " .. desc .. "."
local parents = {}
insert(parents, key)
if spec.no_container_parent then
-- top-level country, constituent country, continent or the like
insert(parents, {name = normalized_placetype, sort = key})
if spec.placetype == "country" or m_table.contains(spec.placetype, "country") then
local category_class = m_placetypes.get_equiv_placetype_prop(normalized_placetype,
function(pt) return m_placetypes.get_placetype_prop(pt, "class") end, {
from_category = true,
no_split_qualifiers = true,
})
if not category_class then
internal_error("Saw placetype %s that is either unknown or has no `class` " ..
"setting in `placetype_data`", normalized_placetype)
end
if class_is_political_division[category_class] == nil then
internal_error("Saw unknown category class %s derived from placetype %s",
category_class, normalized_placetype)
end
if class_is_political_division[category_class] then
insert(parents, "political divisions of specific countries")
end
end
else
local container_iterator = m_locations.iterate_containers(group, key, spec)
local next_containers = container_iterator()
if next_containers then
for _, container in ipairs(next_containers) do
local container_prep
if container.spec.is_city then
container_prep = ptdata.generic_before_cities
else
container_prep = ptdata.generic_before_non_cities
end
if not container_prep then
internal_error("For container key %s spec %s defines is_city = %s but " ..
"there is no corresponding `generic_before_*` setting in the " ..
"placedata for placetype %s", container.key, container.spec,
container.spec.is_city, placetype)
end
insert(parents, {
name = placetype .. " " .. container_prep .. " " .. m_placetypes.get_prefixed_key(
container.key, container.spec),
sort = key
})
end
else
-- unrecognized countries or the like
insert(parents, {name = normalized_placetype, sort = key})
end
end
return {
type = "name",
topic = canon_label,
description = desc,
breadcrumb = placetype,
parents = parents,
}
end
end
end
end
end
end)
-- Handler for "state capitals of the United States", "provincial capitals of Canada", etc. This must precede the next
-- handler for specific political and misc (non-political) divisions of polities and subpolities, such as
-- "provinces of the Philippines", because "departmental capitals" is listed in cat_as for French prefectures and so
-- will trigger an error if that handler runs before this one.
insert(handlers, function(label)
label = lcfirst(label)
local capital_cat, place = label:match("^([a-z%- ]- capitals) of (.*)$")
-- Make sure we recognize the type of capital.
if place and capital_cat_to_placetype[capital_cat] then
local placetype = capital_cat_to_placetype[capital_cat]
local pl_placetype = m_placetypes.pluralize_placetype(placetype)
-- Locate the container, fetch its known political divisions, and make sure the placetype corresponding to the
-- type of capital is among the list.
local group, key, spec = find_canonical_key_from_place(place, canon_label)
if group and (spec.divs or spec.addl_divs) then
local saw_match = false
local variant_matches = {}
local divlists = {}
if spec.divs then
insert(divlists, spec.divs)
end
if spec.addl_divs then
insert(divlists, spec.addl_divs)
end
for _, divlist in ipairs(divlists) do
for _, div in ipairs(divlist) do
if type(div) == "string" then
div = {type = div}
end
-- HACK. Currently if we don't find a match for the placetype, we map e.g. 'autonomous region'
-- -> 'regional capitals' and 'union territory' -> 'territorial capitals'. When encountering a
-- political division like 'autonomous region' or 'union territory', chop off everything up
-- through a space to make things match. To make this clearer, we record all such
-- "variant match" cases, and down below we insert a note into the category text indicating that
-- such "variant matches" are included among the category.
if pl_placetype == div.type or pl_placetype == div.type:gsub("^.* ", "") then
saw_match = true
if pl_placetype ~= div.type then
insert(variant_matches, div.type)
end
end
end
end
if saw_match then
-- Everything checks out, construct the category description.
local placetype_desc = m_placetypes.get_placetype_display_form(pl_placetype,
placetype.is_city and "city" or "noncity")
if placetype_desc == false then
mw.log(("Display form for pl_placetype %s is false, can't categorize"):format(dump(pl_placetype)))
return nil
end
if not placetype_desc then
internal_error("Unrecognized plural placetype %s, generated as the plural of %s, which " ..
"was found as the placetype of capital placetype %s in label %s", pl_placetype,
placetype, capital_cat, label)
end
local variant_match_text = ""
if variant_matches[1] then
local real_variant_match_descs = {}
for i, variant_match in ipairs(variant_matches) do
local variant_match_desc = m_placetypes.get_placetype_display_form(variant_match,
placetype.is_city and "city" or "noncity")
if variant_match_desc == nil then
internal_error("Unrecognized variant match plural placetype %s, coming from " ..
"place key %s, data %s in label %s", variant_match, key, spec, label)
end
if variant_match_desc then
-- skip those for which the description is `false`, like `ABBREVIATION_OF states`
-- in the United States divs.
insert(real_variant_match_descs, variant_match_desc)
end
end
if real_variant_match_descs[1] then
variant_match_text = " (including " .. m_table.serialCommaJoin(real_variant_match_descs)
.. ")"
end
end
local desc = "{{{langname}}} names of [[capital]]s of " .. placetype_desc .. variant_match_text ..
" of " .. fetch_or_construct_location_desc(group, key, spec) .. "."
local full_placename, _ = m_locations.key_to_placename(group, key)
return {
type = "name",
topic = label,
description = desc,
breadcrumb = full_placename,
parents = {{name = capital_cat, sort = key}, key},
}
end
end
end
end)
local overriding_category_descriptions = {
["autonomous cities of Spain"] = "the [[w:Autonomous communities of Spain#Autonomous_cities|autonomous cities of Spain]]",
["regions of Greece"] = "the regions ([[periphery|peripheries]]) of [[Greece]]",
["regions of North Macedonia"] = "the regions ([[periphery|peripheries]]) of [[North Macedonia]]",
["subprefectures of Japan"] = "[[subprefecture]]s of [[Japan]]ese [[prefecture]]s",
}
-- Handler for specific political and misc (non-political) divisions of locations (polities, subpolities, cities, etc.),
-- such as "provinces of the Philippines", "counties of Wales", "municipalities of Tocantins, Brazil",
-- "boroughs of New York City", etc. This does not handle categories for generic placetypes (cities, rivers, etc.) of
-- locations, which are handled by different handlers above.
insert(handlers, function(label)
-- The label comes with an initial capitalization but we have to check both lowercase-initial and capital-initial
-- versions of the placetype to handle e.g. [[:Category:en:Indian reserves of Canada]].
for _, canon_label in ipairs { label, lcfirst(label) } do
for _, minimal_placetype in ipairs { true, false } do
local match_quantifier = minimal_placetype and "-" or "+"
-- Some categories have two "of"s in them, and depending on the category, it's correct to do either a greedy
-- ([[:Category:en:Abbreviations of states of the United States]], with placetype `abbreviations of states`)
-- or non-greedy ([[:Category:en:Provinces of the Democratic Republic of the Congo]], with placetype
-- `provinces`) match. We can't know in advance which is correct so we try both possibilities, doing the
-- non-greedy one first as it seems more common (there are many locations with "of" in them, but currently
-- only `abbreviations of states` occurs with a following location).
local placetype, in_of, place = canon_label:match("^([A-Za-z%- ]" .. match_quantifier .. ") (of) (.*)$")
if not placetype then
placetype, in_of, place = canon_label:match("^([A-Za-z%- ]" .. match_quantifier .. ") (in) (.*)$")
end
if placetype then
local group, key, spec = find_canonical_key_from_place(place, canon_label)
if group then
local function find_placetype(divs)
if divs then
if type(divs) ~= "table" then
divs = {divs}
end
for _, div in ipairs(divs) do
if type(div) == "string" then
div = {type = div}
end
local cat_as = div.cat_as or div.type
if type(cat_as) ~= "table" then
cat_as = {cat_as}
end
for _, pt_cat_as in ipairs(cat_as) do
if type(pt_cat_as) == "string" then
pt_cat_as = {type = pt_cat_as}
end
if placetype == pt_cat_as.type then
local div_parent = pt_cat_as.container_parent_type
if div_parent == nil then -- allow false
div_parent = div.container_parent_type
end
if div_parent == nil then
div_parent = placetype
end
return div_parent, pt_cat_as.prep or div.prep or "of"
end
end
end
end
return nil
end
local div_parent, div_prep = find_placetype(spec.divs)
if div_parent == nil then -- allow false
div_parent, div_prep = find_placetype(spec.addl_divs)
end
if div_parent == nil then -- allow false
div_parent, div_prep = find_placetype(spec.addl_divs_for_categorization)
end
if div_parent ~= nil then
if div_prep ~= in_of then
mw.log(("Mismatch in category name '%s', has '%s' when it should have '%s'"):format(
canon_label, in_of, div_prep))
return nil
end
local linkdesc = m_placetypes.get_placetype_display_form(placetype, spec.is_city and "city" or "noncity",
"return full")
if linkdesc == false then
mw.log(("Display form for placetype %s is false, can't categorize"):format(dump(placetype)))
return nil
end
if not linkdesc then
internal_error("Unrecognized placetype %s when processing key %s, data %s, label %s",
placetype, key, spec, canon_label)
end
local desc = overriding_category_descriptions[canon_label]
if not desc then
desc = linkdesc .. " " .. in_of .. " " .. fetch_or_construct_location_desc(group, key, spec)
end
desc = "{{{langname}}} " .. desc .. "."
local parents = {}
insert(parents, key)
if div_parent then -- div_parent may be `false`
if spec.no_container_parent then
-- top-level country, constituent country, continent or the like
insert(parents, {name = placetype, sort = " " .. key})
if spec.placetype == "country" or m_table.contains(spec.placetype, "country") then
insert(parents, "political divisions of specific countries")
end
else
local container_iterator = m_locations.iterate_containers(group, key, spec)
local next_containers = container_iterator()
if next_containers then
for _, container in ipairs(next_containers) do
insert(parents, {
name = div_parent .. " " .. in_of .. " " .. m_placetypes.get_prefixed_key(
container.key, container.spec),
sort = key
})
end
else
-- unrecognized countries or the like
insert(parents, {name = placetype, sort = " " .. key})
end
end
end
return {
type = "name",
topic = canon_label,
description = desc,
breadcrumb = placetype,
parents = parents,
}
end
end
end
end
end
end)
labels["city nicknames"] = {
type = "name",
-- special-cased description
description = "{{{langname}}} informal alternative names for [[city|cities]] (e.g., [[Big Apple]] for [[New York City]]).",
parents = {"cities", "nicknames"},
}
labels["exonyms"] = {
type = "name",
-- special-cased description
description = "{{{langname}}} [[exonym]]s.",
parents = {"places"},
}
labels["political divisions of specific countries"] = {
type = "grouping",
description = "{{{langname}}} categories for political divisions of specific countries.",
parents = {"places"},
}
-- Misc. FIXME: Remove the need for this.
labels["nomes of Ancient Egypt"] = {
type = "name",
-- special-cased description
description = "{{{langname}}} names of the [[nome]]s of [[Ancient Egypt]].",
breadcrumb = "nomes",
parents = {"Ancient Egypt"},
}
-- FIXME: Everything here has been moved from [[Module:category tree/topic/Earth]]. Most should be removed.
labels["Atlantic Ocean"] = {
type = "related-to",
description = "default with the",
parents = {"Earth"},
}
labels["British Isles"] = {
type = "related-to",
description = "=the people, culture, or territory of [[Great Britain]], [[Ireland]], and other nearby islands",
parents = {"Europe", "islands"},
}
labels["European Union"] = {
type = "related-to",
description = "default with the",
parents = {"Europe"},
}
labels["Gascony"] = {
type = "related-to",
description = "default",
parents = {"Occitania, France"},
}
labels["Indian subcontinent"] = {
type = "related-to",
description = "default with the",
parents = {"South Asia"},
}
labels["Bengal"] = {
type = "related-to",
description = "{{{langname}}} terms related to the people, culture, or territory of [[Bengal]].",
parents = {"Indian subcontinent"},
}
labels["Kashmir"] = {
type = "related-to",
description = "{{{langname}}} terms related to the people, culture, or territory of [[Kashmir]].",
parents = {"Indian subcontinent"},
}
labels["Kashmir, India"] = {
type = "related-to",
description = "{{{langname}}} names of places in {{w|Kashmir, India}}.",
parents = {"India", "Kashmir"},
}
labels["Korea"] = {
type = "related-to",
description = "=the people, culture, or territory of [[Korea]]",
parents = {"Asia"},
}
labels["Languedoc"] = {
type = "related-to",
description = "default",
parents = {"Occitania, France"},
}
labels["Lapland"] = {
type = "related-to",
description = "=[[Lapland]], a region in northernmost Europe",
parents = {"Europe", "Finland", "Norway", "Russia", "Sweden"},
}
labels["Middle East"] = {
type = "related-to",
description = "default with the",
parents = {"Africa", "Asia"},
}
labels["Netherlands Antilles"] = {
type = "related-to",
description = "=the people, culture, or territory of the [[Netherlands Antilles]]",
parents = {"Netherlands", "North America"},
}
labels["Provence"] = {
type = "related-to",
description = "default",
parents = {"Provence-Alpes-Côte d'Azur, France"},
}
labels["South Asia"] = {
type = "related-to",
description = "default",
parents = {"Eurasia", "Asia"},
}
return {LABELS = labels, HANDLERS = handlers}