The problem
When you want to preserve your privacy, using cloud-based speech-to-text services is probably not a good idea. But how to still benefit from the user experience of quickly recording a (blog post) idea on your (Android) smartphone, and having it transcribed into a (markdown) file?
The solution
- Android’s Sound Recording app (in high quality mode to create
.wav
files). - Syncthing, to get the recordings from the smartphone directly into the
~/blog/content/posts/
folder. - Georgi Gerganov’s
whisper.cpp
repo. - A bit of Bash-scripting, see below.
Without any previous experience in AI/LLM usage, but having read Google’s “We Have No Moat” memo, I was positively surprised about how easy implementing my workflow idea was.
The first result was this previous blog post (in German). I didn’t “go meta” and also drafted this post with the described workflow.
Script setup
Admittedly, the following is not awesome a, but it was a nice afternoon project on a rainy weekend day. The whole thing is executed in the
#!/bin/bash
file="$1"
slug="$2"
# https://github.com/ggerganov/whisper.cpp/
tool="$HOME/GitHub.com/whisper.cpp"
size="${3:-small}"
Audio preparation
Next, we convert the input file
to Whisper’s required 16kHz,
- overwriting any existing file with
ffmpeg -y
, and - suppressing any non-essential output with
-v error
:
temp="$slug.wav"
ffmpeg -y \
-v error \
-i "$file" \
-ar 16000 -ac 1 -c:a pcm_s16le \
"$temp"
# Yes, I like to align things ☺️
Transcription with Whisper
This temp
file is now processed into a .txt
file,
using the model size
defined above:
"$tool/main" \
--model "$tool/models/ggml-$size.bin" \
--threads 8 \
--output-txt \
--print-colors \
--no-timestamps \
--language auto \
"$temp"
The transcription progress and quality can be observed via the
confidence-colored preview.
From the few tests I ran, I found small
to be good enough.
medium
detected only a few more words correctly,
so its 3x higher memory usage
seems not worth it for this use-case of drafting a blog post.
Converting the transcript into a Hugo blog post draft
For convenience and Hugo-compatibility,
the script also prepends metadata
to the blog post’s .md
file:
date="$(date -u +%Y-%m-%d)"
blog="$date-$slug.md"
cat >"$blog" <<HEREDOC
---
title: $(head -1 "$temp".txt)
date: "$date"
draft: true
---
$(cat "$temp".txt)
HEREDOC
Cleanup
For some reason, all transcribed lines are prefixed with whitespace,
so we’ll just remove that with sd
and remove the temp
& input file
s,
so that my Android Sound Recorder doesn’t fill up with old cruft.
sd '^ ' '' "$blog".md
rm "$temp"* "$file"
Bonus: Sync with benefits
Because Syncthing copies the blog post files back to my Android,
I can edit them when inspiration strikes.
The blog’s .gitignore
just needs a content/post/.st*
rule,
and Syncthing needs an img/
ignore rule to avoid
cluttering Android’s Sound Recorder folder with blog post images.