Wikimore

Summary

Description

English: The article outlines a method for creating offline, high-quality video transcriptions and subtitles using OpenAI's Whisper model with Python, emphasizing privacy, accuracy, and accessibility without needing cloud-based speech-to-text services.

https://github.com/KBNLresearch/videotools

The author explores the Whisper model for automatic speech recognition (ASR) to address limitations in existing cloud-based services, such as low transcription quality, privacy concerns, file size restrictions, and costs.

Key advantages of using Whisper include:

Offline Capabilities and Privacy: Whisper's large model (around 3GB) can run locally on a laptop, enabling privacy-compliant transcription without internet dependency.
Language and Accuracy: The model performs exceptionally well with multiple languages, especially Dutch and English, and effectively transcribes complex terms and named entities.
Real-time Processing: The large model provides near real-time transcription speed (a 15-minute video processes in about 15-20 minutes). Smaller, faster models are also available with reduced accuracy.
Subtitle Generation: Whisper can automatically generate accurate subtitles, enhancing accessibility for viewers with hearing impairments

The article includes Python code examples and repository links to help users implement the Whisper-based transcription workflow. Tools like FFmpeg are needed to handle video and audio formats, and optional modules allow transcript refinement using ChatGPT, albeit at a cost to offline privacy.

Date

6 November 2024

Source

Original article by the author

Author

Olaf Janssen, Wikimedia coordinator at the KB, national library of the Netherlands

Olaf Janssen

	Alternative names	Olaf D. Janssen; Olaf Daniel Janssen; O. D. Janssen
	Description	Dutch librarian
	Date of birth	20^th century date QS:P,+1950-00-00T00:00:00Z/7
	Location of birth	Dongen
	Work location	The Hague (2001–)
	Authority file	: Q66439268 ORCID: 0000-0002-9058-9941

Other versions

https://doi.org/10.5281/zenodo.14047913 and https://zenodo.org/records/14047913
Github: in the KBNLresearch and ookgezellig repositories

Licensing

This file is licensed under the Creative Commons Attribution 4.0 International license.

You are free:

to share – to copy, distribute and transmit the work
to remix – to adapt the work

Under the following conditions:

attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

Category:CC-BY-4.0#How%20to%20create%20high-quality%20offline%20video%20transcriptions%20and%20subtitles%20using%20Whisper%20and%20Python%20-%206%20November%202024.pdf Category:GLAM at Koninklijke Bibliotheek - Articles Category:Olaf Janssen Category:GLAM at Koninklijke Bibliotheek - Tutorials Category:Speech recognition Category:Python (programming language) Category:OpenAI

Wikimore

File:How to create high-quality offline video transcriptions and subtitles using Whisper and Python - 6 November 2024.pdf

Summary

Licensing