Commons:Batch uploading/Fonds Ancely

Fonds Ancely

This upload is part of a partnership between Wikimédia France and the Library of Toulouse. It consists of 2085 public domain files. You may see general notes and work in progress on User:Jean-Frédéric/Ancely.

The metadata is held in a OAI PMH repository. The code explores it and retrieves records ; then if applicable the various fields are matched to a manual alignement of Commons categories and tags, community curated. This is then fed to a data ingestion templates which translates the metadata to {{Artwork}}. Actual upload is made with Pywikipedia-rewrite by User:AncelyBot.

In its current state, the categorisation system with the alignment outputs 31,801 categories (1,694 distinct) − the drawback is that many are high-level categories (“Shawls”, “men”, etc.)

Looking forward your thoughts, Jean-Fred (talk) 22:49, 6 March 2013 (UTC)

Opinions

  • Uploaded five more − see Special:ListFiles/AncelyBot Jean-Fred (talk) 01:14, 16 March 2013 (UTC)
  • Uploaded fifteen more − and I will continue uploading files until my demands are met! Jean-Fred (talk) 00:23, 19 March 2013 (UTC)
  •  Support everything looks fine for me. (may be a bit overcat) --PierreSelim (talk) 14:24, 20 March 2013 (UTC)
  • Ok, uploading 100 right now. Jean-Fred (talk) 21:06, 11 April 2013 (UTC)
  • Looks very good. The only thing that worries me a bit is the number of categories per image. That might become a problem. Please upload more! Multichill (talk) 10:39, 13 April 2013 (UTC)
  •  Oppose now, we have forgotten to finish the Creator mapping User:Jean-Frédéric/Ancely/Creator --PierreSelim (talk) 12:02, 25 April 2013 (UTC)
  • Uploaded the first 350. Jean-Fred (talk) 23:08, 7 May 2013 (UTC)
  • Uploaded the first 500. Jean-Fred (talk) 13:04, 8 May 2013 (UTC)
  • Uploaded the first 800. Jean-Fred (talk) 14:05, 10 May 2013 (UTC)
  • Made it 1,000. Jean-Fred (talk) 23:15, 12 May 2013 (UTC)
  • ✓ Done. 2041 files uploaded + 33 dupes + 11 errors = 2085 files, the size of the corpus. Jean-Fred (talk) 14:49, 24 May 2013 (UTC)

Conclusion

Dupes

The following files were already on Commons − we might want to update their file descriptions (current: 33)

Errors

The following files failed to upload (current: 11)

Categorisation statistics
Per category

30266 categories, 1760 distincts Mean: 17.1965909091 Median: 2.0 Max 1045 // Min 1

Top 10: [(u'Mountains in art', 1045), (u'Men in art', 992), (u'Women in art', 878), (u'Trees in art', 780), (u'Houses in art', 736), (u'Pyr\xe9n\xe9es-Atlantiques', 693), (u'Hautes-Pyr\xe9n\xe9es', 617), (u'Pyrenees', 470), (u'National costumes in art', 468), (u'Rivers in art', 440)]

Lose 10: [(u'Estrades', 1), (u'Pierre Bayle', 1), (u'Morla\xe0s', 1), (u'Louis-Fran\xe7ois Couch\xe9', 1), (u'Jean Racine', 1), (u'Faience in France', 1), (u'Marmite', 1), (u'Corsica', 1), (u'Dordogne River', 1), (u'Esera River', 1)]

Per file

Mean: 14.5160671463 Median: 13.0 Max 47 // Min 0

Top N: [('B315556101_A_LEVASSEUR_066', 47), ('B315556101_A_LEVASSEUR_068', 46), ('B315556101_A_LEVASSEUR_018', 44), ('B315556101_A_LEVASSEUR_056', 42), ('B315556101_A_LEVASSEUR_057', 42)]

Lose N: [('B315556101_A_BERTHIER_010', 1), ('B315556101_A_BERTHIER_024', 0), ('B315556101_A_BERTHIER_021', 0), ('B315556101_A_BERTHIER_018', 0), ('B315556101_A_BERTHIER_013', 0)]

Assigned to Job Progress
Jean-Frédéric Metadata pre-processing Status: Done
Jean-Frédéric, Symac, Léna, PierreSelim Metadata alignment Status: Done
User:Jean-Frédéric Upload Status: Done
Dupes and errors processing Status: todo
Category:Commons batch uploading#Fonds%20Ancely Category:Batch uploading supported by Wikimédia France#Fonds%20Ancely
Category:Batch uploading supported by Wikimédia France Category:Commons batch uploading