Production Case Study

From hours of manual editing to zero.

An automated pipeline that records coaching calls, transcribes them, segments conversations by speaker, generates titles and show notes, and publishes episodes — all without a human touching the audio.

Project Automated Coaching Call Processor

Role Sole Engineer

Timeline Late 2024 — Late 2025

Status Live · Runs autonomously on Raspberry Pi

The Problem

Three calls a week, hours of manual work each time

Our coaching program runs live group calls every Monday, Wednesday, and Friday. Each call contains two to three separate coaching sessions — different clients, different topics, different conversations. After every call, someone had to manually listen through the full recording, identify where each session started and ended, cut the audio into individual segments, write a title and description for each one, and publish them to the right feed so clients could listen back. Three times a week, every week. It was slow, expensive, and the bottleneck meant clients sometimes waited days to get content from a call that happened that morning.

The process got more complicated when we moved from Zoom to Discord. Our coaching community lives in Discord, and forcing non-technical clients to juggle two platforms was causing support issues. Discord solved the community problem but created a recording problem — there's no built-in call recording. We brought in Craig Bot with custom triggers tied to the coaching room so recording would start automatically when the coach joined, without anyone having to think about it. That got us the raw recordings, but everything downstream was still manual.

The Pipeline

What happens when a coaching call ends

The full cycle — from the coach hanging up to episodes being live in the app — takes under an hour with zero human intervention.

Step 01

Recording

The coach joins the Discord coaching room. Craig Bot detects the join event and begins recording automatically. When the call ends, Craig Bot saves the audio file into a designated Google Drive input folder.

Step 02

Detection & Validation

The pipeline monitors the Google Drive folder for new files. When one appears, it checks the metadata — verifying the recording came from the correct room and falls within the expected coaching call window. Recordings outside the expected 45–120 minute window are automatically rejected to prevent processing test calls or corrupted files.

Step 03

Transcription

The validated audio file is transcribed using OpenAI Whisper, producing a full text transcript of the entire call. Large recordings that exceed the API's 25MB limit are automatically split and transcribed in chunks, with retry logic handling any transient failures.

Step 04

Segment Identification

The transcript is analyzed by Claude AI, which identifies the natural language patterns that signal the start and end of each coaching session — phrases like "does that feel complete" or "hey [name], are you ready." The AI marks the start and end timestamps for each session. The AI also skips the first 5–10 minutes of housekeeping and announcements at the top of each call, so published segments start with the actual coaching content — not "let me go over a few updates." Each segment gets a timestamp buffer — trimmed 5 seconds from the start and 3 seconds from the end — to avoid capturing transitions between conversations.

Step 05

Audio Segmentation

Using the timestamps from the previous step, the pipeline splits the original audio file into individual segments — one per coaching session. Loudness normalization is applied to ensure consistent quality across segments.

Step 06

Title & Show Notes

For each segment, Claude generates multiple title options that reflect the actual content of that conversation, plus a summary paragraph to use as the episode description.

Step 07

Content Repurposing

The coaching calls contain real client sessions that can't be published publicly, but the concepts discussed are valuable for attracting new clients. The pipeline uses Claude to extract core ideas from each call and generate standalone solo podcast scripts — reframed for public consumption with no client names, no session details, just the teaching. One private coaching call becomes public content without exposing anyone.

Step 08

Publishing

The processed files are uploaded back to Google Drive for archival and team access, and simultaneously published through the ESC App's admin system — the original full coaching call goes to the "Coaching Calls" feed, and each segmented conversation goes to the "Q&A" feed. Clients see the new content in the app within an hour of the call ending.

What Made It Hard

Months of iteration to get it right

Finding the segment boundaries

The hardest part wasn't the audio processing — it was teaching the AI to reliably identify where one coaching session ends and another begins. Coaching conversations don't have clean markers. People trail off, there's small talk between sessions, sometimes the transition is a single sentence. I iterated over months on the prompt engineering, refining what language patterns to look for, adjusting how the AI handles ambiguous boundaries, and building in validation so a segment wouldn't be created unless the AI was confident in both the start and end points.

Audio quality — knowing when to simplify

Raw Discord recordings don't sound like studio audio. I initially built a full audio processing chain — high-pass filter to cut low-frequency rumble, noise reduction, dynamic range compression, EQ for voice clarity, and a limiter to prevent clipping. But after testing on real coaching calls, the heavily processed audio sounded worse — over-compressed and artificial. I stripped it back to gentle loudness normalization to -16 LUFS (the podcast standard) using FFmpeg, and the results were significantly better. The final pipeline applies just enough processing that clients never think about audio quality, without destroying the natural sound of the conversation. Knowing when to simplify is part of the iteration.

Title and summary quality

Early versions of the auto-generated titles were too generic or too literal. "Coaching Session About Anxiety" isn't useful when you have hundreds of episodes. I iterated on the prompt engineering to produce titles that capture the specific angle or insight from each conversation — the kind of title that makes someone want to listen. Same with show notes: they needed to be descriptive enough to be useful but concise enough to scan. This was months of "generate, review, adjust prompt, repeat."

Tech Stack

Built with

Runs autonomously on a Raspberry Pi — deployed via Docker, monitored remotely, no cloud compute costs.

Node.js FFmpeg Google Drive API OpenAI Whisper Claude AI Docker Raspberry Pi Craig Bot (Discord)

Impact

What it actually moved

5-10 hrs → 0

Manual Processing Eliminated

< 1 hour

Call-to-Published Turnaround

3 per week

Calls Processed Automatically

$0 ongoing

Runs on Raspberry Pi

3 outputs

Full Recording, Q&A Episodes, Solo Podcast Script

Currently

Live and running

The pipeline is live and processes every coaching call automatically. The full-call recordings and segmented Q&A episodes are published directly into the ESC App, where 500+ members can access them. This project is one piece of the larger ESC ecosystem — for the full platform story, see the ESC App case study.

See the full platform — ESC App Case Study View the source — GitHub

Want to talk about building something like this?

This pipeline is one piece of a larger ecosystem. Get in touch to discuss automation, audio processing, or building tools that eliminate manual work.

View ESC Case Study Get in Touch