User:Inductiveload/Requests/Batch uploads

Requests

I can upload batches of files from the IA or HathiTrust. However, I will require the metadata to do so. I will not do uploads if you don't give me the data (unless I really, really want to anyway).

I can also create files from batches of images. In this case, you will need to provide details of where I can get the images from. I can help you with batch downloading images if you need. If you already have the images, probably the easiest way to share them with me is to upload to the Internet Archive as an "image ZIP" following these instructions.

Data file format

I will need a spreadsheet (XLSX, CSV or ODS) with the following columns (the names are important, don't change them).

ColumnRequired?PurposeExample
titleRequiredThe title of the work. For a batch, this is often the same for every row.The Atlantic
subtitleOptionalthe work subtitle. Optional (but give it if there is one)A magazine of Literature, Science, Art and Politics
authorOptionalAuthor(s), slash separated"Oscar Wilde" or "Q30875"
editorOptionalEditors(s), slash separated
illustratorOptionalIllustrator(s), slash separated
translatorOptionalTranslator(s), slash separated
yearRequiredThe publication year1868
volumeOptionalThe volume number22
subpageOptionalThe volume subpage at Wikisource (if it's not just "Volume XX"). Not required if the work doesn't have a subpage (e.g. a simple single-volume book), or if it does and it's "Volume XX" (in that case, it is inferred from the presence of volume).
vol_detailOptionalOptional detail string for the volume for the book template and the index pageJuly–December 1868
vol_dispOptionalThe volume display string for the Commons book template. Will not be used in a page title. If not given, "Volume XX" and then the vol_detail, if any, brackets.Volume 22 (July–December 1868)
filenameOptionalThe target filename (no extension). If not given, a default will be attempted with a format like Brave New World - Huxley - 1932 or The Atlantic Monthly - Volume 22.The Atlantic Monthly - Volume 22
idRequiredThe external source ID. The URL for "url" sources, blank if you provide a file archive to me somehow. Required otherwise.atlantic22bostuoft
sourceRequiredThe source: either "ia", "ht" or "url"ia
fileOptionalIf uploading files from some file archive you give me (rather than directly from the IA, or a URL etc), the filename in that collectionFile 1.pdf
oclcOptionalThe OCLC number297234877
lccnOptionalThe LCCN number
cityOptionalThe city of publicationsBoston
publisherOptionalThe publisherFields, Osgood, & Co.
printerOptionalThe printer
licenseRequiredThe license (so it can be inserted into {{pd-scan|)PD-US-expired
pagelistOptionalManual pagelist tag. If you don't provide this, one will be generated from the IA or HT metadata, if possible. This is usually incomplete, but it's generally a good start.
img_pgOptionalThe image page (used as the title page). Usually the source provides this information via the page list metadata.
languageRequiredThe work's language codeen
commonscatsRequiredCategories for the work at Commons, slash-separatedThe Atlantic Monthly, 1868
vollistOptional (required for multi-vol works)The volume list template (or wikitext){{Atlantic Monthly volumes}}
only_pagesOptionalif only some pages should be include from the source, then one or more numbers of ranges, comma-separated.1-100,103,105-199
rm_pagesOptionalif some pages should be excluded, then one or more numbers of ranges, comma-separated. Note: applies after the included pages.1,5-8,1234
to_wsOptionalif the file should be uploaded to Wikisource, rather than Commons, then y
ws_langOptionalthe target Wikisource: where the index pages will be made, and where the files will be uploaded if to_ws is set. Default is en. Use mul for Multilingual Wikisource.
accessOptionalset to us if the work is not accessible outside the US (usually for Hathi)
no_commons_untilOptionalThe date after which the file can me moved to Commons (used for the until parameter of {{Do not move to commons}}. Mandatory if to_ws is set2035
no_commons_reasonOptionalThe reason the file shouldn't be moved to Commons (used for the why parameter of {{Do not move to commons}}. Mandatory if to_ws is setMulti-author work published in UK
userOptionalRequesting user name - will be used in the index page creation summary if given (which will both ping that user and make it clear who found that file)Inductiveload
  • All data, like printer, that is available should be provided. It's a lot easier to put it in now than patch it in later.
  • You can add as many other columns as you like for your own purposes, such as building up strings. They will be ignored.

There some examples here: https://drive.google.com/drive/folders/1fW5ozskDJiyVoQycUoGEB7d-L_Uh6N7b

Authors, etc

If you provide strings like Oscar Wilde, they will be used as-is. If you provide a Wikidata ID like Q30875, then it will be used in the creator template at commons and the linked Wikisource author page (in this case, Author:Oscar Wilde) will be used for the index page.

Separate multiple authors with slashes, e.g. Oscar Wilde/Albert Einstein.

Sources

I can download from the following sources using the relevant ID of the work at that source:

  • ia: The Internet Archive
  • ht: HathiTrust

I can also use direct URLs to any other online resource at a publicly-accessible location. Set url in this case.

I may be able to add other sources if generally useful - just ask. This can include something like a Dropbox or other (decent: no dodgy hosts, please) web drive, as long as the images are in a unique folder per work and are in order.

I can upload files locally to Wikisources if needed, if they are not suitable for Commons for copyright reasons.

If the file is not a US work (e.g. a non-US author), you must not specify PD-US as the copyright if the file is going to go to Commons. You should specify a suitable template. Usually, this is PD-old-auto-expired: in that case you must also give deathyear to show why the work is PD in the country of origin.

If the file is coming to Wikisource (usually because it's copyright in the country of origin, but not in the US), you should set to_ws to yes, set ws_lang if not en and you must provide no_commons_until and no_commons_reason.

Spreadsheet automation

Note, you can often use the volume number to build the other cells with spreadsheet equations. For example, if the volume number is col G and the title is col C, then the filename for row 2 might be =C2 & " - Volume " & G2.

Likewise, you can increment numbers. If row 2's volume is 1, then you can make row 3's 2 using =G2 + 1.

You can zero-pad number with, e.g. TEXT(G2, "00")

In this way, you can save a lot of tedious typing. However, do make sure that the data stays accurate. Very often things like publisher, printer or even the date ranges of volumes can change halfway though a series.

If you use formulae, I'd prefer to receive an XLSX file than a CSV file, since I can adjust the formulae if needed.

Authority control

The OCLC number is optional, but highly recommended, because the OCLC ID is a very good way to link the files and indexes with structured data, as it (should be) a unique key.

Sending the file

You can send me the file by creating a task on my Workboard at Phabricator and attaching your spreadsheet, or commenting on my talk page and providing a link to some other file host (e.g. Google Drive, Dropbox, etc).

If you use formulae in your spreadsheet, I'd rather have the original spreadsheet (XLSX/ODS) than an exported CSV file, because if I need to make changes to anything, it's easier if the formulae still work.

Known issues

  • Pagelists are generated from the source's upstream data. The quality of this ranges from near-perfect to complete junk. It will be your responsibility to deal with that these. All indexes are created with "to be checked" statuses for this reason.
    • You can provide a pagelist field, then it will be set to "to be proofread".

Your tasks

You have some work to do even once the batch upload is complete:

  • If the works are part of a series, any index volume list templates (e.g. {{American Printer volumes}}) in the vollist column should be created also
  • All the Commons categories you specify should exist and be categorised
  • Finishing the pagelists on the index pages (the upload will include an automatically-generated pagelist from the IA or HathiTrust metadata, but this is usually incomplete)
  • Adding {{small scan link}} templates to Author and Portal pages as appropriate
  • Generally tidying up if there are other rough edges.

By making a batch upload request, you agree to undertake these tasks.