Help talk:Extension:ParserFunctions/2022
This page used the Structured Discussions extension to give structured discussions. It has since been converted to wikitext, so the content and history here are only an approximation of what was actually displayed at the time these comments were made. |
![]() Archives
|
---|
Using in templates
It took me a while to figure out why my Template:IPaddr broke when I moved a {{lc:}} that encompassed the whole template to somewhere deep within, where it was actually needed. Instead of {{lc: [template contents]}} I wrote just {{lc:{{{1|}}}}}, as only the first anonymous argument needed lower case substitution. At first glance this code is perfectly right and it worked like a charm. However, when used to display the IPv6 loopback address ('::1') the working of the template collapsed, as the wikicode generated a <dl><dd></dd>...</dl> sequence!
Using {{lc:<nowiki>{{{1|}}}</nowiki>}} seems to stop the interpretation of {{{1}}}, or even {{lc:<nowiki></nowiki>{{{1|}}}}}, but I am wondering if there is a nicer way to tell {{lc:}} not to interpret the argument if it starts with a colon. Dandorid (talk) 13:37, 27 January 2022 (UTC)
- Why not using
{{lc:<nowiki/>{{{1|}}}}}
? - Internally the empty
<nowiki/>
tag is first replaced before preprocessing the wikitext, by a special marker (delimited by a pair of reserved ASCII controls forbidden in valid HTML, but containing an identifier which is not case-sensitive 'the identifier is referencing the hidden content of the nowiki tag, but here it is empty): this should prevent the interpretation of the next colon following this stripped marker as being part of the "lc:" syntax for calling the builtin parser function or as being part of a namespace prefix, so this colon (at start of the value of the IPv6 address coming from the expanded value of parameter{{{1|}}}
) will be passed verbatim to the builtin function (the builting function will also receive the special marker, it may eventually decide to drop it). - Later that special stripped marker will be stripped when the final HTML is tidied (at end of the template/functions expansion phase), if the builtin function has not already stripped it in its return value from its input parameter (some parser functions first strip their parameters, for example removing leading/trailing spaces or stripped markers; others don't and preserve them (but this may have suprising results in some parser functions, notably those extracting substrings or computing string lengths, if they are not aware of the possible presence of stripped markers, which should be either kept entirely or removed completely, but not subdivided; the same remark applies to invokations of Lua functions stored in Scribunto modules, but it is less visible, given the fact that function invokations start by
#invoke:
which is immediately followed by a module name that cannot start by a colon, then follwed by a vertical bar and the function name which also cannot start with a colon, and all other optional function parameters are prefixed by a single vertical bar). - If this still does not work, maybe a workaround would be to invoke a Lua function invokation (instead of a wiki template transclusion) to process the IPv6 address (it may be even more efficient, as it has many subtle complexities due to its variable allowed format, if you want to transform it into canonical form).
- Note also that an IPv6 address starting by
::
may also be written by prefixing it with0::
instead (so::1
and0::1
are both valid and equivalent): this0::
prefix in IPv6 is currently only used in IPv6 for some local link addresses and only the least significant 64-bit part (in fact only 32-bit in almost all existing implementations) may have an host address part (there's no other assignment in the0::/16
IPv6 address block). - Note also that there exists two "canonical forms" for IPv6 addresses: one where only the first sequence of
/(0{1,4}:)+/
16-bit fields is replaced by "::" and all other occurences of/0{1,4}:/
are replaced by "0:"; the other one removes any occurence of "::" and insert the necessary number of "0:" fields to match the expected 128-bit length of IPv6 (i.e. with eight 16-bit fields). There cannot be two occurences of "::" in the same IPv6 adress specification. As well, both canonical forms effectively have its letter case unified (for its ASCII hexadecimal digits, generally to lowercase). Some applications also do not compress leading zeroes in each 16-bit field, to use instead a fixed format so that a fields starts by 2, 3 or 4 zeroes, but usually the compressed form without duplicate leading zeroes in fields, and where the first sequence of zero fields separated by ":" is cleared to exhibit a remaining "::" is the most widely used, as it is easier to read - For internal processing this formatting generally does not matter, all is performed internally in protocol frame formats using 128-bit binary fields, not binary... except for IPv6 addresses between square brackets when they are used as hostnames in URLs: because URLs are indexed also for domain names, canonicalization of the string form is necessary and usually uses the most compact format, even if this does not change the protocol for DNS queries, or for reverse IPv6 DNS lookups which uses a very different "backward" dotted format, grouping hexadecimal digits one by one and not by group of four... with an exception for a small range of IPv6 addresses that are designed for "compatibility" to be strictly equivalent to IPv4 addresses: in that case the reverse IPv6 DNS lookup "fallbacks" at the final least significant 32-bit part, to use reverse IPv4 lookup byte by byte, but modern DNS implementations do not need such fallback and can also reverse lookup in that space without using any special delegation: this IPV4 address is already reindexed inside IPv6 by internet gateways for routing and delegation announcements; but if your DNS provider does not handle this small IPv6 space, the fallback to IPv4 queries still works in that subspace and will work for long and in practice its is rarely used by final clients which for now all uses a dual stack, it is only used by IPv6-only clients but generally immediately handled by the router of their upstream ISP or DNS provider, as well modern OSes all have support for this specific ccompatiblity space in their existing IPV6 implementation when performing IPV4 lookup via an IPV6 connection, either with UDP, or TCP, or the new secure HTTP proxying protocol aka "DOH").
- Other forms are not using strings, but just binary representation, unsigned 128-bit integer values, or an ordered pair of unsigned 64-bit integer values, or an ordered vector of sixteen unsigned 8-bit bytes: this is generally used for internal purpose, such as indices and the efficient implementation of filters/routers/firewalls, to internally compress their database and faster processing, but it is of course also the format on which it is used in the core IPv6 protocol itself.
- Note also that if the IP address was already preprocessed once using
{{lc:}}
in a template, and the template result is then passed as a parameter value to another template, you may have dififculty to pass this value if you don't use again another empty nowiki tag to prefix this value. You should also not embed the IPv6 address inside a nowiki tag. Stripped markers for empty wiki tags are also normally constant in all occurences as their content value is also a constant empty string and so these markers use the same internal reference identifier (MediaWiki may also optimize internally these constants and reduce automatically sequences of empty nowiki markers to a single one; the internal form of this marker is implementation dependant but does not matter, what only matters is the presence of the pair of leading/trailing forbidden controls to surround them). Verdy p (talk) 14:35, 27 January 2022 (UTC)- Thanks for this elaborate answer. I think that your proposed solution is a valid one. I have, for the moment, settled for putting a <span> inside it, which also works.
- Your explanation about IPv6 addresses is correct, but the point is that we need to describe exactly that on the Wiki page for IPv6 addresses. Therefore, the IPaddr template needs to take any form (full, shortened, valid or even invalid, as the text may be) and still try to display it like an IP address. It is not likely that this template would be called from another template, as it is a markup template used directly in the written text. But anyway it is good to know that these issues with {{lc:}} exists, so I can be vigilant. Dandorid (talk) 11:48, 28 January 2022 (UTC)
- Doing IP address sanitization and normalization is really tricky to do with the Mediawiki parser functions in templates. It's a use case where Lua modules called with Scribunto will make it much easier to perform and test correctly. Such module can perfectly handle the presence of some "nowiki" tags (or stripped markers) in the given parameters, it can easily manage whitespace trimming, lettercase, and the various ways to abbreviate an IPv6 address, with a variable number of colons, and the possible presence of surrounding square brackets found in URLs, and then return a canonicalized form of that address that is easy to compare (it may return several forms: a short form using "::", an expanded form with static length using only single ":" colons, an hexadecimal form without separator, an IPv4 dotted-decimal form for some compatibility ranges, all that with a single letter case for hexadecimal parts....). There's already such Lua modules using such technics. Verdy p (talk) 14:18, 11 December 2022 (UTC)
- Sounds like another interesting case of phab:T14974. I think no way of avoiding it will be nice. Matěj Suchánek (talk) 15:40, 27 January 2022 (UTC)
#if not working?
I'm working with the #if statement inside a cargo_query statement, and the #ifs are not parsing. I tested outside of the cargo_query, and still not working. See example here
After the If Testing header, you can see the #if not parsing - just printing to screen as if regular wiki text. What am I doing wrong? The template with the #if testing is Repertoire. Link to the template page here May214 (talk) 14:56, 21 March 2022 (UTC)
#replace multiple strings?
RESOLVED | |
Use nested #replace s. |
The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.
Is it possible to replace multiple different strings within one string?
For example, I would want to do {{#replace:The dog is jumping.|dog,jumping|cat,walking}}
or something similar to receive the output ”The cat is walking.”
Is this possible in any way? V G5001 (talk) 17:39, 14 September 2022 (UTC)
- Yes, just nest
#replace
s:{{#replace:{{#replace:The dog is jumping.|dog|cat}}|jumping|walking}}
- Just be aware of the expansion depth limit (to say nothing of code readability); if you need a lot of separate replaces on the same string, it will probably be better to write it in Lua, as a Scribunto module. (You could also use Extension:Variables, but that extension unfortunately has an uncertain future given the direction the parser is headed in.) 「ディノ奴千?!」☎ Dinoguy1000 17:46, 14 September 2022 (UTC)
- Thanks, this worked V G5001 (talk) 18:09, 14 September 2022 (UTC)
If 1979, create category 1970s
Hi! I'm sorry if this is a stupid question but numbers tend to hurt my brain. On my wiki, I have a template that generates links to specific dates based off three parameters, month, day, and year. However, I want the <code>year</code> field to generate a category for the decade that year is apart of. For example Jan|17|1979 should generate the category, 1970. I feel like this can be done via #expr somewhere but it's going over my head.
Any help would be appreciated. AmeliaLH (talk) 15:09, 21 October 2022 (UTC)
- If I'm not mistaken, it sounds like something like
{{#expr:1979-(1979 mod 10)}}
or{{#expr:floor(1979/10)*10}}
would do what you are asking for. There may be a simpler solution, but its not coming to my head right now, so hopefully this'll do. Aidan9382 (talk) 15:19, 21 October 2022 (UTC) - If you have access to stringfunctions,
{{ #sub: 1979 | 0 | 3 }}0
will also work. (If you need it to work on years that aren't four digits long, you can manage it by replacing the "3" with{{ #expr: {{ #len: 1979 }} - 1 }}
, but by that point you might as well just use one of Aidan's #expr suggestions.) 「ディノ奴千?!」☎ Dinoguy1000 21:50, 21 October 2022 (UTC)
#sub,{{#sub:{{#parsoid\0fragment:0}}test|1}}
This should return est! Istudymw (talk) 06:32, 23 October 2022 (UTC)
- No, the #sub: returns the substring starting at character 1 as specified here (no ending position is specified so this is until the end of the string). #sub is a parser function, so its parameters are NOT '''pre'''processed by Mediawiki, so it can contain any syntax needed, not necessarily MediaWiki or HTML, and it is also not stripped from leading/trailing whitespaces, because the parameter is not named. It is only processed by PHP
- So #sub will return "<nowiki>This is a </nowiki>test" unchanged!
- (the same would be true if you used a parserfunction call to a Lua module using #invoke).
- Parserfunctions can do what ever they want for the parameters. and pass return a string in any format, which can further be used as an argument to call another parserfunction (or Lua module). Only the parserfunction itself can decide whever to strip leading/trailing whitespaces, HTML comment, or "nowiki" tags, At that level MediaWiki only processes pipes (|) to separate characters, and "noinclude" or "includeonly" tags).
- Then, after the call, the return value will be processed by Mediawiki: at that time it will process "<nowiki>This is a </nowiki>test" for the rest of the expansion of the page. And then the "nowiki" tags will be considered by MediaWiki and will result into "This is a test", that will be displayed. The effect of "nowiki" tags does not remove the content, it just indicates that the content surrounded by this tag must not be parsed by MediaWiki, if it ever contains some wiki syntax (such as "~~~~" that it would otherwise replace by the user's signature.
- You are most probably making a confusion with the "noinclude" tag. Verdy p (talk) 07:23, 23 October 2022 (UTC)
- On both wikis I tried it on, it does return "est". It won't work on Wikimedia wikis, though, as they've disabled the string functions on their wikis. – Robin Hood (talk) 07:40, 23 October 2022 (UTC)
- I don't know where you tested it; but clearly "<nowiki>This is a </nowiki>" should not be deleted at all.
- However as this tag "nowiki" looks like an HTML tag, it may be stripped by using a function that drops HTML tags to make a plain-text only string, but that implementation would be bogous as well, because in that case it may transform "<span>This is a </span>test" into "test" (not really what HTML considers as the plaintext of the HTML content which should be "This is a test", e.g. when using the standard DOM API for HTML or XML) and certainly not "est" (which makes no sense at all, in HTML, XML, or in MediaWiki!): why do you want to drop an extra character AFTER the closing tag?
- So this is likely a bug of the "#sub" parser function implementation (whichs is, as you said, part of the "string functions", which is not enabled on Wikimedia wikis). I tested that "#sub" parser function on other wikis where string functions are enabled, and the "t" after the closing tag is NOT removed. Those wikis that do that may have not been updated with the correct version of string functions to fix that very undesirable bug (their internal code to do HTML tag stripping has a problem, such as a bogous regular expression)
- On which wiki do you see that result? Which version of the "string functions" do that use (look at their Special:Version page)?
- ---
- I found a wiki that has that bogous behavior in #sub: Translatewiki.net, which uses incorrect HTML-stripping code that really strips too much where it should return either "This is a test" (if it knows and assumes the semantic of MediaWiki "nowiki" tags), or "test" (if it strips all the "nowiki" element with its content, the same way it would string "ABC<script>...</script>DEF" into "ABCDEF" and not "ABCEF"). Verdy p (talk) 07:47, 23 October 2022 (UTC)
- I tested it on my test bed wikis, which are mostly just past the setup point and nothing more. I specifically tested on both 1.29 and 1.35, since I figured Parsoid might make a difference (not that it should, since this is at the pre-processor level, but I figured it was a good idea to try both). I haven't updated it in a while, so I don't have anything more recent installed yet.
- I'm not sure I follow your logic on what should be stripped, because you would think that stripping one character would either strip the < off of the <nowiki> or, if it had parsed that properly, it would strip the T from This, not the t from test. I'm assuming that's an artifact of the strip item process, though, so the nowiki section gets ignored entirely. – Robin Hood (talk) 07:57, 23 October 2022 (UTC)
- Oh and to answer your question about versions, both wikis have ParserFunctions 1.6.0. – Robin Hood (talk) 07:59, 23 October 2022 (UTC)
- Note that Parsoid has no effect on that. This is purely a bug inside the implementation of the "string functions" extension (that is not supported directly by Wikimedia wikis and core MediaWiki developers). Instead, Wikimedia uses the supported Scribunto extension and implements these functions in Lua (but not that Translatewiki.net still does not support Scribunto/Lua...)
- The effect of "#sub" is very weird, avoid it as much as possible on your wikis! (Note that in Lua, string indexing starts at 1, whereas in string functions, string indexing starts at 0).
- If we assume that "#sub" uses string indexing starting at 0, then "<nowiki>This is a </nowiki>test" will be first "HTML-stripped" into "test", then it returns the substring starting at position 1, i.e. drops the first character "t" and returns "est". If string functions were not using "HTML-striping", the result would be "nowiki>This is a </nowiki>test", where it drops only the first "<".
- I could test it in a sandbox page of Translatewiki.net, and visibly #sub in string functions really uses string indexing starting at 0, and it first strims its string parameter, then drops all HTML-like or XML-like elements (including "nowiki" even if it's not really HTML or XML) **with** their content, before computing and returning the substring. Because whitespace-trimming is performed first before "HTML tag stripping", if you want to disable the whitespace trimming of the parameter, you can surround that value with "<nowiki/>", so:
- "ABC{{#sub:<nowiki/> DEF <nowiki/>|1}}XYZ" returns "ABCDEF XYZ"
- "ABC{{#sub:<nowiki/> <br>DEF <nowiki/>|1}}XYZ" returns "ABCbr>DEF XYZ" (so the "HTML stripping" is not real, apparently it just strips "nowiki" opening and closing tags, **after** the initial whitespace trimming of the argument string) Verdy p (talk) 08:04, 23 October 2022 (UTC)
- As I said, I wouldn't have expected Parsoid to affect the results, since parsing the parameter itself is entirely at the preprocessor level. There was a lot that changed in 1.35 besides Parsoid, though, so I figured it made sense to check both.
- As for what's supported by MediaWiki, it's been my experience that they don't seem to realize that not everybody is on the same update cycle they are or running all the same extensions as they are. Even so, at this point, Scribunto and ParserFunctions are both optional. Until they're a required part of the install process, I would expect WMF to support anything that they're distributing. I just checked and at least as of 1.38.1, both are being distributed as optional components. – Robin Hood (talk) 08:15, 23 October 2022 (UTC)
- You got me curious, so I looked at the version in 1.38 and now I see what's going on. Firstly, it's using the older Parser Function syntax where it parses all of the parameters first and then passes them along to the function that's handling that specific parser function, in this case runsub. So, if I recall correctly, that means the input to the function is converted to "<stripmarker>test". The very first thing runsub does is call killMarkers, so now it's left with just "test". From there, it's obvious why it produces "est".
- Edit: I see you've updated your reply with similar info. At least now we understand. And I agree, for straight text, #sub is fine, but for anything out of the ordinary, avoid #sub at all costs. – Robin Hood (talk) 08:33, 23 October 2022 (UTC)
- You can see the same results at en.uesp.net (which is MW 1.29.3) and starfield.wiki.net (which is on 1.35.2). – Robin Hood (talk) 08:06, 23 October 2022 (UTC)
How to compare text strings ?
Hello!
I am looking for a way to test 2 text strings alphabetically in template code. I want to add a parent bilateral relations category, which are in the format of e.g. commons:Category:Relations of Bangladesh and Myanmar (note the alphabetical order: Bangladesh < Myanmar). What I would like to do is something like that:
{{#parsoid\0fragment:1}}
Of course this does not work, because #ifexpr compares numerical expressions, not alphabetical ones. The way I am currently doing it is (simplified):
{{#parsoid\0fragment:2}}
The trouble is that if parameters are in the reverse alphabetical order (e.g. {{{1}}} is Myanmar and {{{2}}} is Bangladesh) and there is a category redirect (e.g. commons:Category:Relations of Myanmar and Bangladesh softly redirects to commons:Category:Relations of Bangladesh and Myanmar), then this code adds the redirected category instead of the target one.
Does anyone have any idea? The template in question is commons:Template:Aircraft of in category. Place Clichy (talk) 09:19, 31 October 2022 (UTC)
- There's no builtin support in "#ifexpr:" or "#if:" to compare strings. You need another parserfunction, for example you can use "#invoke:" via Scribunto, to call a function defined by a Lua module. Note however that Lua basically performs a lexicographic comparison of strings with its "<" operator, it does not trim them, does not parse any HTML (not even HTML comments that may be present in parameters), does not convert HTML character entities, does not normalize strings, and then does not perform any UCA collation (so "é" would sort *after* "f" and not between "e" and "f").
- You may want to call:
{{#ifeq: {{{1|}}} | {{{2|}}} | <!-- empty --> | {{#ifexpr: {{#invoke:Modulename|compare|{{{1|}}}|{{{2|}}}}} < 0 | {{#ifexist: Category:Relations of {{{1|}}} and {{{2|}}} | [[Category:Relations of {{{1|}}} and {{{2|}}}]] | [[Category:Relations of {{{2|}}} and {{{1|}}}]] }} | {{#ifexist: Category:Relations of {{{2|}}} and {{{1|}}} | [[Category:Relations of {{{2|}}} and {{{1|}}}]] | [[Category:Relations of {{{1|}}} and {{{2|}}}]] }} }} }}
- However this does not resolve the redirects (#ifexist are giving false hints). For that you need a Lua module that can not only test the effective existence iof either links, and then detect if one is a redirect and get its target (it has to load the page and parse its begining, because MediaWiki still does not expose in Lua if a page is a redirect and what is its target; MediaWiki internally parse pages and detects that and maintain that in a cache that is used when loading any page name via some links, but it does not index that information in an accessible way; loading and parsing the page manually in Lua is a bit costly and errorprone due to the MediaWiki syntax).
- So the best way you can do is to use your template with parameters 1 and 2, not perform any test on them. But then update the page containing tranclusions of your template using the explicit parameters values in the correct order (and then swap that order if one is a redirect).
- There are other caveats: the parameters 1 and 2 may contain disambiguation suffixes, that may be removed in the binary relation (e.g. "Paris, Texas" and "Austin, Texas": would you name your category as "Relations of Austin, Texas and Paris, Texas", or as "Relations of Austin and Paris (Texas)"... Beware that naming pages automatically is tricky, there are frequently "aliases" (e.g. "Relations of France with the United States" or the reverse, note that there may be other way to express the combination), and some preferences that may change over time (or will need to take into account some decisions, not always the same between countries or languages, and sometimes conflicting). As well you have to manage the possible insertion of articles (like "the" in English) before some entity names, which may not be present when entity names are used alone in page names (e.g. with "United States": "Relations of France and the United States", "Relations of the United Kingdom and the United States", "Relations of the United States and Vietnam").
- Such binary relations with arbitrary combination should be avoided, they explode exponentially and are a nightmare to maintain (e.g. for 200 countries, you get almost 40,000 relations, and most of them will be empty; and for ternary relations you'd reach about 8,000,000!). They should be created manually and added individually where relevant. Verdy p (talk) 09:52, 31 October 2022 (UTC)
MediaWiki still does not expose in Lua if a page is a redirect and what is its target
- This is blatantly false; the mw.Title library supports finding if a title is a redirect and what title it redirects to, as seen with e.g. w:Template:Target of. 「ディノ奴千?!」☎ Dinoguy1000 11:13, 31 October 2022 (UTC)
- Interesting to know, because the last time I checked, there was no such extension in the Scribunto library. So it was added recently after many years asking for it (yes I know it was present in the internal PHP API, but it was not at the original and pages had to be parsed to find if it was a redirect and find its target; this was added to accelerate the navigation, because of course the MediaWiki parser could store the result when parsing a saved page)!
- Also please moderate your terms and avoid such fast unthought reply in your first phrase. For many years we had to use a workaround for that (for example in Commons) because there was no such builtin support. And remember that this question is essentially about categories in Commons, rather than English Wikipedia. Verdy p (talk) 13:26, 31 October 2022 (UTC)
isRedirect
has been part of Lua since at the latest March 2013;redirectTarget
dates to May 2016 (phab:T68974). Hardly "recent" on either account, when we've only had Scribunto/Lua since 2012 or so.Also please moderate your terms and avoid such fast unthought reply in your first phrase.
- Given basically everything I've seen you say/do, and my own past interactions with you, I think I won't, thanks. 「ディノ奴千?!」☎ Dinoguy1000 14:13, 31 October 2022 (UTC)
- Actually, it doesn’t really matter whether Scribunto provides information on what MediaWiki thinks to be a redirect; it won’t catch category redirects using c:Template:Category redirect anyway. Category redirects are rarely if ever real redirects. Tacsipacsi (talk) 14:19, 31 October 2022 (UTC)
- On Commons there are redirects on categories. Especially those given in the example above.
- There's a workaround used actually in Commons that can also detect soft redirects on categories, and find their targets (that cannot use the "redirectTarget", which is also very costly, jsut like almost all functions in the "mw.title" module in Lua, and does not work in practice due to its severe limitations). This still requires parsing category pages, because there's still no support in MediaWiki for them (by some extension?). I've tried the "redirectTarget" and yyes your suggestion does not work and is not a correct reply to the request made above, so my reply was correct (absolutely not "blatantly false" as you said with your abusive reply).
- And if you (Dinoguy1000) don't want to moderate your terms in direct reply to a thread where you were not involved or cited at all, then you are clearly abusing the contributor terms, because you don't provide any help to any one, and you are here just to cause troubles. Verdy p (talk) 15:01, 31 October 2022 (UTC)
- Thanks for the input! I guess that my question is now: is there an available Lua-coded function which compares two text strings alphabetically e.g. is A < B? User:Verdy p mentioned that Lua basically performs a lexicographic comparison of strings with its "<" operator but I'm not sure how I can use this operator, and writing a Lua module entirely for that seems overkill and out of my reach.
- The suggestion of putting parameters in the right order in the first place is not feasible, as the template does other things too. Obviously Aircraft of Brazil in France (populated by
{{Aircraft of in category|Brazil|France}}
is not the same as Aircraft of France in Brazil (populated by{{Aircraft of in category|France|Brazil}}
; however both should be in the same parent Category:Relations of Brazil and France. - I do not really intend testing for category redirects. Category redirects are always soft redirects (either on Commons or English Wikipedia), so they're hard to track.
- The article before the country name is managed by {{CountryPrefixThe}} and that works well.
- Re: other suggestions, there is in fact an implicit assumption that this one template will only be used for country names found in commons:Category:Bilateral relations and its subcategories. There may therefore be no need to clean HTML formatting, disambiguators and the like. In case the bilateral relations category does not exist, the template's code catches in a maintenance category and it can be created manually. Of course, there are some cases that cannot be entirely foreseen, such as the inconsistent use of China vs. People's Republic of China in the bilateral relations category tree, but they can, or have to, be managed manually.
- My main concern really is the management of these category redirects related to alphabetical order. Place Clichy (talk) 19:04, 31 October 2022 (UTC)
- Lexicographic comparison means that it only compares the texts byte per byte (it is UTF-8 encoded). Lua strings themselves do not directly handle Unicode and the related UCA collation.
- MediaWiki provides an API with the module "ustring", which adds some support for Unicode, but not any comparison operator or UCA collation for now (what it supports is the concept of Unicode "code points" so that a single code point may be encoded on several UTF-8 bytes, and positions for substrings are counted by codepoints, being aware of their variable encoding length; it also provides support for normalization, as well as case conversion needed by MediaWiki for its builting basic parser functions "LC:" "UC:", "LCFIRST", "UCFIRST"; note that it does not perform any MediaWiki parsing, so it's up to the caller to manage trimming).
- So for now there's no collation in mw.string, and so no function you can call from it to compare strings. Some modules have defined a "weak" collation algorithm for sorting. But still this won't be sufficient for your need on Commons, because there's actually for now NO standard fixing the order in category names between "Relations of A and B" and "Relations of B and A". So you'll end up having redirecting categories from one to the other (using
{{Category redirect|Target name}}
: you need to parse in Lua the target page to detect these templates and fing the target that you'd like to link to (and that will solve the ordering problem without needing any collation, and also take into account the problem of variable disambiguation suffixes that may be needed in category names). - As you see, there's no "simple" instantaneous solution. This requires code and tests by navigating all the categories you'll want to link to and find how they are effectively named.
- There's a module in Commons for that: "Module:Redirect", but others can be helpful to help you manage category redirections.
- One example is "Module:Countries" that performs such detection of redirects (pluis handles know "aliases" for category names that don't always need a disambiguation suffixes, or a leading article), and also provides a basic collation for its listed items (note that they are ordered using their *translated* names, found in Wikidata; that order is "crude", but for now has been sufficient even if technically it's still not fully UCA compliant, and cannot manage collation orders depending on the language used: the order is locale-neutral, similar to the UCA DUCET, except that it sometimes needs tweaks, notably in Chinese where the order of times can be tuned, or in Germanic languages that consider letters with diacritics sorted as primary letters at end of their alphabet and not as secondary variants: tweaking the order is made in data modules). Verdy p (talk) 19:19, 31 October 2022 (UTC)
What is ParserFunctions programming language?
Which programming language does the functions in ParserFunctions use? Sokote zaman (talk) 10:06, 1 November 2022 (UTC)
- The #time function uses PHP's datetime format, except that it also defines extra functionality through x-prefixed properties.
- The #expr function uses some custom language. Its operators are similar to the ones used in SQL (hence a single equals sign for equality). C.Ezra.M (talk) 10:56, 1 November 2022 (UTC)
- Thank you for your reply
- What language do other functions use?
- Thanks Sokote zaman (talk) 16:07, 1 November 2022 (UTC)
- Also:
- Thank you for your reply.
- Thank you Sokote zaman (talk) 05:16, 2 November 2022 (UTC)
Cannot subst string functions?
RESOLVED | |
StringFunctions is not enabled on Commons. |
The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.
{{Countries of Asia|prefix={{subst:#sub:{{subst:FULLPAGENAME}}|0|{{subst:#pos:{{subst:FULLPAGENAME}}| in }}}}}}
for example, i used this string in c:special:diff/702219712. why the substitution of string functions are not done? RoyZuo (talk) 15:26, 3 November 2022 (UTC)
- You're just trying to use an uninstalled extension: "#sub:" and "#pos:" are part of the "string functions" extensions, which is not available on Commons (and not even needed in the page you tried to edit to use them). Commons uses basic parser functions, and some other extensions, but not this one.
- Before using "subst:", first check that your edit works without it; add "subst:" only after tests. Verdy p (talk) 15:51, 3 November 2022 (UTC)
- obviously i'm just using that page to test, not intending to use the string on that page. cant even understand that? RoyZuo (talk) 15:58, 3 November 2022 (UTC)
- Just try in a local sandbox or in a preview:
{{#sub:ABC|1}}
on Commons, and you'll see that it is not recognized (and left unexpanded, so you cannot use "subst:" on such invokation if the extension is not installed and supported on the wiki). - The "string functions" (i.e. "#sub:", "#pos:", "#len:") are not par of the core parser functions (you are talking here on a page about the core ParserFunctions, which does not include these functions). Commons however allows you to use another extension, Scribunto, which supports string functions in a more advanced module.
- The situation may be different depending on wikis (e.g. "string functions" are working in Translatewiki.net, which on the opposite does not support the "Scribunto" extension to call Lua modules; "String functions" are documented in another subject page but not this one).
- First identify the extension defining the function, and then check if it is deployed on the target wiki (there's a summary matric in MediaWiki about deployment of extensions, but you can also look at the "Special:Version" page on the target wiki). Verdy p (talk) 16:06, 3 November 2022 (UTC)
How to config list of full month names?
There is a small problem. {{#time: F | 2022-1-1 }}
will produce the result Tháng 1
on Vietnamese Wikipedia, but the community wants to localize the result to tháng 1
(first character is lowercase). How to config list of full month names of Vietnamese language? Thanks! Plantaest (talk) 11:10, 11 December 2022 (UTC)
- "#time" and "#timel" are part of the core ParserFunction extension, prebundled with MediaWiki.
- Its localisation is part of that extension itself.
- See Extension:ParserFunctions.
- Note that its translation is said there to be made in translatewiki.net (in the ext-parserfunctions message group). Once translated there, they are later synchronized to the MediaWiki repository, and a bit later deployed to wikis (this usually takes about one week, but may be more).
- However, that message group does not contain month names (or weekday names), but only a few usage error messages: instead, these month/weekday names (and other related date/time items or formats, with support for many calendars) seem to be imported into the Mediawiki project from CLDR.
- But you should look at how they are translated in CLDR. See https://cldr.unicode.org/translation/date-time/date-time-names for the description, and then look at data charts (on https://unicode-org.github.io/cldr-staging/charts/latest/by_type/date_&_time.fields.html) and search for
·vi·
in the web page with your browser (you'll see that they were vetted in CLDR with leading capitals). You may attempt to contact the CLDR project (or participate to it, including in its own local discussions) if Vietnamese people think that such forced capitalization should not be used in Vietnamese (this capitalization is not a requirement, it is not forced in many other languages) - If CLDR rejects the request, but the Wikimedia community wants to change that, Mediawiki may override these inside its data sources in PHP: ask to Phabricator if you need such overrides (if you make such Phabricator request, they will probably check before the opinion from the CLDR technical comity and its vetters, but getting a reply from them can be long: new CDLR data are released occurs only about yearly, the vetting process has always been very slow to get a consensus for changes, and sometimes a single review cycle is not sufficient and it takes several years to reach the minimum vetting quorum needed for such changes; but the CLDR TC may sometimes decide faster than vetters). Note also that CLDR data is not used just by Mediawiki, it is used for the localisation of lot of softwares (including for example standard C libraries, and i18n components inside operating system APIs). However the single demand for changing *only* the capitalization should be easier to obtain (there's likely no major technical issues). Verdy p (talk) 11:47, 11 December 2022 (UTC)
- Thanks Verdy p. I saw the Vietnamese translation of ParserFunction extension, and I think the list of full month names is not in the translation. In
ParserFunctions.php
, line 511 shows that#time
depends onsprintfDate
function, and I don't know how this function works. Plantaest (talk) 12:07, 11 December 2022 (UTC) - sprintfDate is part of PHP... its uses data from CLDR (directly inside its sources, or indirectly via some OS API, I don't know). Also I don't know if PHP developers have given a way to provide overrides. Generally date elements and formats are so common in many softwares that i18n libraries have stopped maintaining such data themselves; so this is done is CLDR which supports the vastest set of languages and scripts, with maximal interoperatbility across systems (and with less risks of serious errors or ambiguities that could be causing serious problems, e.g. when processing data containing date/time).
- Today, CDLR data is de facto the standard used in lot of systems for such very common things, because it is technically "clean" (this does not mean that it is "complete" and the only possible choice for localization: CLDR is not an absolute requirement for many softwares that can still provide overrides, or ignore completly some parts of the data, but then they have to maintain these overrides themselves and to endorse the risks).
- For some 18n libraries however such updates of CLDR can take many years... or will never happen if those softwares are not updated, notably applications statically linked with C standard libraries, and systems installed with such softwares are not also updated themselves (there are for examples examples for security-critical components implementing wellknown protocols like HTTP and MIME, where any change is blocked or will not happen just because CLDR vetted for prefering something else; this generally happens for English date formats, documented and stabilized in IETF RFC's or standards from ISO, W3C, ITU, and other international standard bodies: the CLDR TC admits that, and when needed provides a few specific "technical locale codes", such as "root" or "POSIX"). Verdy p (talk) 12:09, 11 December 2022 (UTC)
- Thanks again, I will follow your instructions. If changing data of CLDR is difficult, I think I will add
lowercase()
function in some wiki templates or Lua modules. Plantaest (talk) 12:20, 11 December 2022 (UTC) - But you should still try to create a ticket for that on Phabricator (which will also allow tracking an identical request made to CLDR, and then follow the decisions and updates: if there's an override used in Mediawiki that becomes unnecesary later if CLDR accepts it, the tracker will help make that cleanup for data that we likely don't to maintain alone). Note that a well documented community decision in Wikimedia is not ignored by the CLDR TC: they honor the request also by seeing evidences: the CLDR data is just the current best "state of the art" they know about (that's why it includes a "vetting" process open to everyone in the world: that data CAN change later).
- Note: be careful in templates or modules to NOT force a
lowercase()
call or similar{{lc:}}
ParserFunction call without knowing precisely the language to which it applies (it must still not apply to English or German for example, or in "default" options applying to all languages you're not currently supporting in these templates/modules: do that specifically for Vietnamese languages and varieties you support). And be prepared to support users of your wiki, that may complain that some other templates or pages no longer work as expected, showing a lowercase initial instead of the expected leading capital: announce the change in a public area, document it in an appropriate place for how to simply solve that "problem" (e.g. with a simple{{Ucfirst:}}
in wiki pages and templates). - You don't want to see others reverting your change, flaming you (or worse entering in some edit war and blocking you, independantly of the efforts you took to check as many affected pages as you could find and test). One way to avoid that risk is to first add a tracker category in templates/modules before applying the change and look at its existing usage on your wiki (Look at the "Special:What links here" tool). There may be some limited temporary quirks by applying the change, but explain that affected pages are currently being updated to take the change into account and fix these pages that can't work instantly with some compatibility code you've prepared: this takes some time (may be several days to do that correctly with enough care and not in precipitation with risks of adding some new errors), so allow people to be patient and to even help you in the process (quirks that can happen are for example how pages are autocategoried with some date element: this may change how a target category is named; other affected names could be the names of subpages; in some cases it may help to add some redirection pages to make a smoother transition, with less people complaining or hostile to your change). See for example how the addition of the "c:" interwiki prefix was added a few years ago for Wikimedia Commons, instead of just "commons:" which unfortunately was also one of its local namespaces: it took some time of preparation to detect affected pages and document the changes and how to fix some remaining easy-to-fix bugs that may persist but were not detected, and to hear the discussions about how to perform that change with less friction. Verdy p (talk) 12:25, 11 December 2022 (UTC)
- Thanks for the advice. I will be careful with changes! The Vietnamese Wikipedia community is very kind; just discuss first, everyone will follow. (I'm not good at English, so I can only write basic sentences) Plantaest (talk) 22:00, 11 December 2022 (UTC)
Helpful magic words needed
The commons tempkate "Information" and all its derivates need parameter values for <code>date</code> and <code>author</code>.
Parser functions or magic words to generate these values wuld be very helpful.
Currently some users take "<code><nowiki>~~~~~</nowiki></code>" as a workaround for the date; but this needs correction because a date consist of year-month-day and is not a timestamp which has also the time of the day.
For the author a valid user name is needed, and not a signature as created by <code><nowiki>~~~</nowiki></code>; nevertheless some users take it as a workaround, because no better possibility exists. But it makes more troubles and needs cleanup, because an author can be a wikipdia user but not his talk page.
Impressive names for these functions may be e.g. ~date and ~user, or ~~~d and ~~~u. -- sarang♥사랑 14:04, 30 December 2022 (UTC)
- Date can be inserted via some variant of
{{ subst:#time: F j, Y | now }}
, which can be included in whatever default skeletons are provided for copy-pasting to quickly fill out on pages. Username is a slightly more interesting case, assuming the syntax automatically substitutes when the page is saved, as~~~~
et al do, but this is the wrong place to request such an addition; you'll need to file a feature request on Phabricator. 「ディノ奴千?!」☎ Dinoguy1000 21:42, 30 December 2022 (UTC) - For the username, one can use
{{subst:REVISIONUSER}}
(although I don’t think~~~
is wrong; different users write their names differently). For the date, it should be{{subst:#time:Y-m-d}}
(ISO 8601 format) so that it can be formatted in the user’s interface language. (The|now
part is unnecessary, as “now” is the default anyway if nothing else is specified.) Tacsipacsi (talk) 01:54, 31 December 2022 (UTC)