Commons:Bots/Work requests
![]() |
This is a page for requesting work to be done by a bot. This is an appropriate place to simply put ideas for bots. However be aware of various tools available to all users which can be used to accomplish the work without the need for a bot:
|
![]() |
SpBot archives all sections tagged with {{Section resolved|1=~~~~}} after 7 days. |
|
# | Bot request | Status | 💬 | 👥 | 🙋 Last editor | 🕒 (UTC) | 🤖 Last botop editor | 🕒 (UTC) |
---|---|---|---|---|---|---|---|---|
1 | US GOV accounts on Flickr | 8 | 6 | Ooligan | 2025-05-02 03:05 | TheSandDoctor | 2025-03-13 19:18 | |
2 | Commons:IA books | 3 | 2 | ShakespeareFan00 | 2025-07-13 07:32 | TheSandDoctor | 2025-07-13 05:52 | |
3 | Brenham Banner-Press newspaper files | 2 | 2 | TheSandDoctor | 2025-07-13 05:57 | TheSandDoctor | 2025-07-13 05:57 | |
4 | Donating map images | 3 | 2 | Wikiwerner | 2025-06-22 11:13 | |||
5 | Newly available MOHAI high-resolution versions of UW Library photos (already uploaded in low-res to Commons) | 1 | 1 | PK-WIKI | 2025-05-19 23:27 | |||
6 | Remediation of unintended copyright violations | 2 | 2 | Antti T. Leppänen | 2025-06-01 08:02 | |||
7 | Tidy up Quality images of manhole covers | 1 | 1 | Lvova | 2025-06-06 22:57 | |||
8 | Changing Structured Data claims en-masee? | 2 | 2 | 999real | 2025-07-13 18:43 | |||
9 | Malformed dates | 3 | 2 | Pigsonthewing | 2025-07-15 07:43 |
Legend |
---|
|
|
|
|
|
Manual settings |
When exceptions occur, please check the setting first. |
US GOV accounts on Flickr
I am requesting a upload of all USGOV Flickr accounts.
Unfortunately, many of them are locked behind the copyright tag. The C copyright tag is (unfortunately) the default tag on Flickr and most likely were never changed to the proper tag of being public domain via USGOV work. A change to USGOV means it has to be manually changed, which someone never did.
I say this because there was a recent change in administration that seemed to aim to gut the govt, including shuttering US A.I.D.. I'm just concerned the Flickr images will get deleted. Thank y ou. SeichanGant (talk) 17:57, 17 February 2025 (UTC)
- I would suggest asking for a bot to review the account and if there is a copyright tag, make a list somewhere on site and let a human examine it. Otherwise download the details. Leave the bot operator to handle the bot tasks and let us humans do what we can. Ricky81682 (talk) 20:33, 17 February 2025 (UTC)
- Flagging to Don-vip's attention as Don-vip runs OptimusPrimeBot and rather than someone like myself re-inventing the wheel, this might be a task suited to its skillset. TheSandDoctor (talk) 19:18, 13 March 2025 (UTC)
- It's tricky. Sometimes the works are public domain because made by federal employees and wrongly licensed as copyrighted on Flickr. But it's often really copyrighted, because even published on an official Flickr account, content may be created by someone else. So the license must be individually checked for each file, and so it's fastidious. For now my bot imports a lot of US Gov pictures released under a free license, I would like to categorize all these pictures before starting to look into the "copyrighted" ones. Help appreciated :) vip (talk) 23:09, 13 March 2025 (UTC)
- @vip I fully understand that you would like to fix what is allready on Commons before more stuff is uploaded. But we have sadly seen Trump remove lots of images so if there are any accounts that are in risk of removal then perhaps it would be good to upload those and worry about categories later. I would ofcourse recommend that there is some sort of human evaluation before the files are uploaded so files are only uploaded if most of the files from the stream is likely to be usable and to have a valid license. --MGA73 (talk) 15:12, 1 May 2025 (UTC)
- It's tricky. Sometimes the works are public domain because made by federal employees and wrongly licensed as copyrighted on Flickr. But it's often really copyrighted, because even published on an official Flickr account, content may be created by someone else. So the license must be individually checked for each file, and so it's fastidious. For now my bot imports a lot of US Gov pictures released under a free license, I would like to categorize all these pictures before starting to look into the "copyrighted" ones. Help appreciated :) vip (talk) 23:09, 13 March 2025 (UTC)
- Flagging to Don-vip's attention as Don-vip runs OptimusPrimeBot and rather than someone like myself re-inventing the wheel, this might be a task suited to its skillset. TheSandDoctor (talk) 19:18, 13 March 2025 (UTC)
Commons:IA books
There is a need for a bot to identify archive.org hosted items in the public domain which are linked to but not uploaded as local copies (licensing permitting).
Would it be possible for a bot to scan through commons generate a report , of linked but not hosted IA resources? Which could be then used as the input to a second bot that batch automates IA-upload requests on Commons?..
Thanks.
It would also be nice to have replacements for the tools Fae was running before they left, in order to automate the curation of the 1 million and growing , files which were mirrored (Some of which may need to be rebuilt due to quality or failed uploads locally.) ShakespeareFan00 (talk) 19:13, 22 April 2025 (UTC)
- @ShakespeareFan00: I'm wondering if you could give an example to help better illustrate the request here. Do you mean to scan all of archive.org for stuff in the public domain to then upload to Commons or am I misunderstanding?
- For the second point: forgive me as I'm not familiar offhand but who was Fae? Do you remember their full username? I'm curious to look into what their tools were. TheSandDoctor (talk) 05:52, 13 July 2025 (UTC)
- On the original user - User:Fæ who did most of the original mass uploading.
- The bot(s) would identfy works that are potential public domain on archive.org, and then automate an IA upload style process to Commons, identify for (C) strings in works already on commons (PDF & Djvu).
- Bot 1 - Scans archive.org using appropriate API, for scanned works (ignores those that are Loan only, post 1978 (unless US govt etc.) and uploads hi quality scanned versions to Commons. Fae's original tools were also using pre-defined categoreisation based on tags IA had assigned as it was possible to obtain a RSS type feed of the category at IA, when the files were being uploaded. (see the subsection on the IA Books page I linked.)
- Bot 2 - Identifies existing PDF or DJUV scans on Commons uploaded from IA, and attempts to make a DJVU from high quality scans if only a PDF or low quality DJVU exists. Some of the PDF copies are NOT of readable quality for Wikisource purposes. (The issue of PDF display quality has been raised on Phabricator at least twice, without resolution, hence the need to find a format that HAS been reliable in the past like DJVU.)
- Bot 2 -(Commons:IA_books#Automatic_detection_of_possible_copyright_issues) Checks existing Commons file by searching for (C) strings, within scan text, or metadata. Flags those it finds for review. ( Report generation, on a weekly basis until empty.)
- Bot 3 - (Commons:IA_books#Automatic_detection_and_deletion_of_cover_pages checks for the first page being a pure white blank page ( or a generic Google/Jstor/USDA archived document header) and flags them for manual removal ( The original attempted to remove them.)
- Bot 4 - (Commons:IA_books#Harmonization_with_2015_Flickr_uploads) Based on IA identifier, attempts to place images and the scans in an 'appropriate' category, either based on work title, or on the IA identifer.
- Bot 5 - (Commons:IA_books/residuals - Bot scans the list of identiders here, and attempts to upload a hi-quality scan, for files where the previous upload failed.) Bot would also need to update the list, as files get reliably added, updating the list to indicate the files presence on Commons.
Those are the tools provided previously that I can think of. I'm not sure if Fae left any source code on the relevant dev repositories or servers.
To the above I would add one additional bot.. A bot that attempts to upload a 'legible' scan (most likely DJVU) of every single Catalog of Copyright Entries Volumes held at IA. These are very important documents, that see regular in respect of determining the status (and inclusibility of works on Commons and Wikisource.). Fae did manage to get PDF versions, but in places these are degraded to the point of not being legible to a reader. Hence the need for 'legible' copies. The long term was to try and get Wikimedians to 'Mars-shot' an effort to get these transcribed into Wikisource and Wikidata. ShakespeareFan00 (talk) 07:32, 13 July 2025 (UTC)
Brenham Banner-Press newspaper files
There may be perhaps 24k files located within Category:Media contributed by Abilene Library Consortium without any categories. The vast majority are pages under Category:Brenham Banner-Press. Can a bot first add that category? If the operator wants to add dates or other more complicated things, that's possible but a category should be a good start. I believe I have sorted all the Category:American Flag, Cameron County and Matamoros Advertiser ones but if those are missed, it can go into the main category. -- Ricky81682 (talk) 04:41, 4 May 2025 (UTC)
- @Ricky81682: That seems possible for a bot task. I gather that you mean that they don't have any categories other than Category:Media contributed by Abilene Library Consortium, right? Assuming that's the case, the proposal is to add the ones with only that one category to Brenham Banner-Press? The catch here is that for a script it can't make the judgement call of if something is related or not, it's an all or nothing based on technical conditions (e.g. category membership) we define. I apologize for being potentially pedantic here, I just want to make sure we're on the same page to proceed and that I have the right picture of the request. TheSandDoctor (talk) 05:57, 13 July 2025 (UTC)
Donating map images
I wrote at the Help Desk
I have a website https://maproom.org which presents images of maps from atlases, mostly published in the 19th century. I used to sell higher-resolution versions of these images, but my sales have dropped to zero. I'm considering donating all the higher-resolution versions to Wikipedia. That's 2484 jpgs, totalling around 70 GB.
My main concern is to minimise the amount of bureaucracy for me. I would not want to have to specify a filename for each image, let alone add it to categories. I can provide access to a database with information (subject, date, source, etc.) for each image.
I anticipate that this will involve more work than any Commons volunteer would want to take on. But if there is a way of managing things, please let me know
and was encouraged to post here. The database fields include
- the name of the uploaded file
- the title of the plate, as in the work from which it was scanned
- the title of the plate in English
- a list of search terms (which might be useful in assigning categories)
Maproom (talk) 16:26, 12 May 2025 (UTC)
- See Commons:Batch uploading? Wikiwerner (talk) 18:50, 14 May 2025 (UTC)
- I have started a batch upload request: see Commons:Batch uploading/Maproom.org. Wikiwerner (talk) 11:13, 22 June 2025 (UTC)
Newly available MOHAI high-resolution versions of UW Library photos (already uploaded in low-res to Commons)
Commons has bot-uploaded files such as:
Which were imported from University of Washington Libraries:
The Museum of History and Industry in Seattle has now added high-resoultion versions of many/most of these same images to their own collection:
also mentioned here: Commons talk:Batch uploading/University of Washington Digital Collections#Higher-resolution images available from MOHAI
Is anyone able to use a bot to import all of these high-res images to replace the versions on Commons? Is this the best place on Commons to discuss such a project?
Remediation of unintended copyright violations
Hello, {{Mindef}} defaults to a CC-Zero license. But since 2022, it must be the CC-By-SA 4.0, per template documentation and the underlying permission. I asked on Commons:Village pump/Copyright#Template show for unintended copyright violations on recent uploads, where Antti T. Leppänen set up set up a PetScan query. It shows around 1200 results of files that were uploaded after 2022-01-01 with {{Mindef}} and placed into the CC-Zero category. Not all of them are obviously erroneously licensed, File:C7 assault rifle in Dutch use.jpg for instance was taken pre-2022. It would be nice to have a bot switching over the licensing statement of Mindef-tagged files to {{Mindef|BY-SA}}
for either all files uploaded in 2022 and after or for files created in 2022 and after. Furthermore, I think that the currently optional statement of licensing should be made mandatory. Regards, Grand-Duc (talk) 14:13, 31 May 2025 (UTC)
- Hi, what is actually the correct criterion? Is it really creation or rather upload on Mindef website or even upload on Commons after the cutoff date? And what is the correct cutoff date? According to the talk page, the CC0 license was last noted in March 2021. Antti T. Leppänen (talk) 08:02, 1 June 2025 (UTC)
Tidy up Quality images of manhole covers
For now we have 432 QI of manhole covers. After the categorisation we can met them in Architecture/Close-ups, Architecture/Other, Objects/Other and probably somewhere else. Екатерина Борисова has an idea to put all of them into Objects/Industrial, and I can with available tools remove them from everywhere and collect in one place, but not chronologically (because now all these categories are organised by time). Theoretically it's possible to see dates from linksto (like Commons:Quality images candidates/Archives July 30 2024). Can anybody help with this idea and reorganise these QI? Анастасия Львоваru/en 22:57, 6 June 2025 (UTC)
Changing Structured Data claims en-masee?
Referal here from Commons:Village pump/Technical#Fixing structured data en-masse which explains the problem to be solved. ShakespeareFan00 (talk) 14:51, 11 June 2025 (UTC)
- I removed them with QuickStatements, I had to remove the entire copyright status (P6216): public domain (Q19652) because it doesn't allow removing a qualifier only, we can add it back with the correct qualifier later REAL 💬 ⬆ 18:43, 13 July 2025 (UTC)
Malformed dates
This SPARQL query shows >8K items with SDC "Inception" dates of between 1 and 1000 AD.
Many are like this one, where the date was entered in the format "1-4-09"; I found the correct date, "2009-01-04", in EXIF.
Can a bot check all the results in the query and if they are photographs (and not paintings etc), update using the date on EXIF? And if no EXIF date is found, add them to a maintenance category. It may be worth repeating this on a scheduled basis. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:56, 15 June 2025 (UTC)
- @Pigsonthewing: How would you know that the 1-4-09 date shouldn't be 2009-04-01? In some places, it would mean January 4. In other places, it would mean April 1. -- Auntof6 (talk) 00:50, 15 July 2025 (UTC)
- I'm not suggesting that we make that assumption. I'm suggesting we use the date from EXIF, which is in ISO standard format. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 07:43, 15 July 2025 (UTC)