From hours of manual editing to zero.
An automated pipeline that records coaching calls, transcribes them, segments conversations by speaker, generates titles and show notes, and publishes episodes — all without a human touching the audio.
Three calls a week, hours of manual work each time
Our coaching program runs live group calls every Monday, Wednesday, and Friday. Each call contains two to three separate coaching sessions — different clients, different topics, different conversations. After every call, someone had to manually listen through the full recording, identify where each session started and ended, cut the audio into individual segments, write a title and description for each one, and publish them to the right feed so clients could listen back. Three times a week, every week. It was slow, expensive, and the bottleneck meant clients sometimes waited days to get content from a call that happened that morning.
The process got more complicated when we moved from Zoom to Discord. Our coaching community lives in Discord, and forcing non-technical clients to juggle two platforms was causing support issues. Discord solved the community problem but created a recording problem — there's no built-in call recording. We brought in Craig Bot with custom triggers tied to the coaching room so recording would start automatically when the coach joined, without anyone having to think about it. That got us the raw recordings, but everything downstream was still manual.
What happens when a coaching call ends
The full cycle — from the coach hanging up to episodes being live in the app — takes under an hour with zero human intervention.
Months of iteration to get it right
Finding the segment boundaries
The hardest part wasn't the audio processing — it was teaching the AI to reliably identify where one coaching session ends and another begins. Coaching conversations don't have clean markers. People trail off, there's small talk between sessions, sometimes the transition is a single sentence. I iterated over months on the prompt engineering, refining what language patterns to look for, adjusting how the AI handles ambiguous boundaries, and building in validation so a segment wouldn't be created unless the AI was confident in both the start and end points.
Audio quality — knowing when to simplify
Raw Discord recordings don't sound like studio audio. I initially built a full audio processing chain — high-pass filter to cut low-frequency rumble, noise reduction, dynamic range compression, EQ for voice clarity, and a limiter to prevent clipping. But after testing on real coaching calls, the heavily processed audio sounded worse — over-compressed and artificial. I stripped it back to gentle loudness normalization to -16 LUFS (the podcast standard) using FFmpeg, and the results were significantly better. The final pipeline applies just enough processing that clients never think about audio quality, without destroying the natural sound of the conversation. Knowing when to simplify is part of the iteration.
Title and summary quality
Early versions of the auto-generated titles were too generic or too literal. "Coaching Session About Anxiety" isn't useful when you have hundreds of episodes. I iterated on the prompt engineering to produce titles that capture the specific angle or insight from each conversation — the kind of title that makes someone want to listen. Same with show notes: they needed to be descriptive enough to be useful but concise enough to scan. This was months of "generate, review, adjust prompt, repeat."
Built with
Runs autonomously on a Raspberry Pi — deployed via Docker, monitored remotely, no cloud compute costs.
What it actually moved
Live and running
The pipeline is live and processes every coaching call automatically. The full-call recordings and segmented Q&A episodes are published directly into the ESC App, where 500+ members can access them. This project is one piece of the larger ESC ecosystem — for the full platform story, see the ESC App case study.
Want to talk about building something like this?
This pipeline is one piece of a larger ecosystem. Get in touch to discuss automation, audio processing, or building tools that eliminate manual work.