diff --git a/content/post/20230303_transcribe/index.md b/content/post/20230303_transcribe/index.md index 64b47ed..8c2d2ca 100644 --- a/content/post/20230303_transcribe/index.md +++ b/content/post/20230303_transcribe/index.md @@ -140,4 +140,21 @@ ffmpeg -y -ss 01:23:45 -i input.webm -frames:v 1 -q:v 2 output.jpg A markdown document is generated by inserting each paragraph in turn. A screenshot is inserted as well, *unless* it is too similar to the last inserted screenshot. This happens when the speaker lingers on a slide for a while, generating a lot of text without changing the video much. -Finally, I use [Pandoc](https://github.com/jgm/pandoc) to convert that markdown file into a PDF. \ No newline at end of file +Finally, I use [Pandoc](https://github.com/jgm/pandoc) to convert that markdown file into a PDF. + +## Image Similarity + +How do I decide whether a frame is "too similar" to a previous frame? +I experimented with a few options and settled on the `dhash` function in `imagehash`. +A description of dhash is provided [here](https://www.hackerfactor.com/blog/index.php?/archives/529-Kind-of-Like-That.html). + +In short, the difference hash works like this: +* Image is reduced to 9x8 (72 pixels) and grayscale. +* Compute pixel differences within rows, yeilding 8x8 grid of differences +* Each of 64 bits in the hash is set if the left pixel is brighter than the right pixel +* The distance between two hashes is the hamming distance - the number of bits changed. + +For my purposes, we want to call *some* small variation in frames "the same", since many videos of talks have a small overlay of the presenter speaking. +However, we don't want to be too liberal, since it's also common for slides to change only incrementally as a concept is explained. +I settled on a difference of 1 bit as providing a reasonable test. +If the overlay of the speaker is too large, this doesn't work quite as well, but I'd rather include extra images in the output rather than too few.