User talk:GraphBot

Comparison to graphDataImport

Maybe there could be a See also wikilink to User:Tomastvivlaren/graphDataImport and possibly some explanation (could also be here on the talk page) about the differences between these two? Prototyperspective (talk) 12:59, 5 June 2025 (UTC)

I added the see-also link. This bot aims to be completely automated and can only support a subset of the graph template (i.e. anything the chart extension doesn't support, the bot cannot). The usage of this bot simply involves invoking en:Template:PortGraph (in the en_wiki) with the same parameters as en:Template:Graph:Chart + a name attribute and optionally a title attribute, and then the bot takes care of the rest. The user script doesn't require waiting for the bot, but doesn't automatically use en:Template:ChartDisplay. GalStar (talk) 17:35, 5 June 2025 (UTC)
Thanks for explaining and adding the link! Would you say it would make sense to let this bot convert most graphs automatically and then use the graphDataImport script to semi-manually import all the graphs that couldn't be imported that way? For example, will this bot actually import all the graphs in practice? By the way, you may be interested in Wikipedia:WikiProject Data Visualization which is looking for more participants and where possibly some link could be added. Prototyperspective (talk) 17:52, 5 June 2025 (UTC)
Yes, there are likely edge-cases that I have not covered (as well as non-english language support) that graphDataImport would be better at. At the moment I would say I have basically full coverage of en:Template:Graph:Chart, but of course, there are others (the pie chart template, etc.). I ignore a few attributes, like the angle of the labels on the x-axis; the main issue is the lack of control of the height of the output graph. However I've done a lot of research into the limitations of the new chart extension and think that I cover some cases better than graphDataImport can. Specifically I use the en:Template:ChartDisplay, which allows for width customization. I'll definitely bring this bot up on that talk page, thanks for the link. GalStar (talk) 18:02, 5 June 2025 (UTC)

@GalStar and Prototyperspective: Good summary! Key advantages of GraphBot is that it can generate replacement wikicode for enwiki and maybe soon dewiki, and that it reduces the need for manual work substantially. At least in theory. What other use cases did you have in mind that GraphBot currently supports? There are a few differences in the output that might be worth discussing. Should they be harmonized? Here's a comparison example: https://commons.wikimedia.org/w/index.php?title=Data:Crude_Oil_Production.tab&diff=1044378605&oldid=1042988832 (Tab page differences) and https://commons.wikimedia.org/w/index.php?title=Data:Crude_Oil_Production.chart&diff=1044377875&oldid=1042988852 (Chart page differences). This includes things like:

  • Creative Commons license
  • Description
  • Column keys/names
  • Categories
  • Edit summaries
  • Whether legend should be hidden when there is only one curve (see a workaround)
  • Whether x-axis years should be numbers or strings
  • Probably also how Graph:Chart or Graph:Port types are mapped to .chart types. (See mapping)

Neither of the tools can import graph:lines currently. A working solution is developed for GraphDataImport, but I’m holding off on releasing it until the filter transform — which can represent graph:lines series — works. One thing I miss in GraphBot-generated data pages that is rather important is a reference to the source Wikipedia article version — included in the Commons edit summary and/or in the .tab file’s "sources" field. Tomastvivlaren (talk) 23:13, 15 June 2025 (UTC)

Thanks so much for taking a close look at this; it's good to be aware of these differences, and we should definitely reduce them.
  • I see that graphDataImport uses CC-BY-SA-4.0 on the tab file and CC0-1.0 on the chart file, I'm curious as to why, since the chart file could include snippets of text from wikipedia, which would be CC-BY-SA-4.0 licensed.
  • I'm working on a fix for the description/edit summaries, it was helpfully pointed out by a fellow wikipedian (who also caught a licensing issue)
  • The sources field was something I was not aware of, and am fixing at the moment, great catch. Ditto with categories.
  • I'm mixed on the legend hiding matter, as it often clarifies naming (sometimes the graph name is "TFR", which means "Total Fertility Rate"; the legend uses the longhand while the title uses the shorthand)
  • I'm curious about the x-axis years thing, I intentionally went for numbers, due to how I interpret templates.
As for the graph:line series, we should coordinate on the transforms we're using. GalStar (talk) 01:44, 17 June 2025 (UTC)
I've fixed many of these issues, these diffs should be the most up-to-date:
Data:India Natural Growth.tab and Data:India Natural Growth.chart GalStar (talk) 17:52, 17 June 2025 (UTC)
And for the description issue, I actually do generate one if there is a "description" parameter passed in; it seems that hasn't been done. GalStar (talk) 01:50, 17 June 2025 (UTC)
Sorry for all the pings, but I think that leaving the field names the same as they were in the template is preferable to equating them to the title, mainly for readability reasons, and keeping the translation literal, but if you'd like I could change it rather easily to simple being equivalent to the title. GalStar (talk) 01:57, 17 June 2025 (UTC)
Thank you for helpful answers!
  • Representing years as strings was a misunderstanding on my side. It does not appear to change anything.
  • I am not sure on the CC licensing issue. See this discussion. But if we use CC-BY-SA-4.0 we are required to state the source WIkipedia page version.
  • A couple of people have complained over legends when there is only one curve. Regarding if it should be hidden, we could perhaps look at the graph legend parameter? Or include the yTitle in the .tab description and .chart title? Or hide it if the title is y, y1 or empty? Or if there is only one curve?
  • I have no strong opinions regarding how to label the columns. Tomastvivlaren (talk) 18:25, 18 June 2025 (UTC)