AI Podcast Generator - Opinionated MVP

2025-08-14 Python, FastAPI, Vue 3, TypeScript, Vite, Pinia, TailwindCSS, Axios, OpenAI, Google TTS

← Back to Projects

AI Podcast Generator is an end‑to‑end podcast generation pipeline that ingests curated learning materials (PDFs/MDX/JSON outlines), builds a two‑host conversational script, and produces an MP3 via TTS. This is an opinionated MVP: minimal surface area, clear defaults, no yak‑shaving.

What this is

  • A clean, modular FastAPI backend that demonstrates the full flow: ingest → script builder → TTS → storage/API.
  • Pluggable providers: default OpenAI for script generation and Google TTS for audio (easily swappable).
  • A Vue 3 frontend (Vite, TypeScript, Pinia, Tailwind) to create podcasts, list/play episodes, filter, and delete.
  • Sensible errors & debug routes to inspect state while developing.

What this isn’t

  • Not a no‑code product. It expects prepared inputs and uses simple heuristics in model prompts.
  • Not a generalized copyright crawler. It does not fetch from the web; it uses local materials you provide.

Quickstart (backend)

Requirements: Python 3.11+, poetry, and optionally OpenAI API keys.

  1. Install
poetry install
  1. Configure .env (example)
PORT=3000
BASE_URL=http://localhost:3000
OUTPUT_DIR=./storage/episodes
DATABASE_URL=postgresql+psycopg://user:pass@localhost:5432/podcast
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=   # optional
MATERIALS_DIRS=./materials/*,./materials/??
  1. Run
poetry run uvicorn app.api.v1.podcast:app --reload --port 3000
  1. Smoke test (create a podcast from local materials)
curl -X POST "http://localhost:3000/api/v1/podcasts" \
  -H 'content-type: application/json' \
  -d '{
    "title": "Sample Episode",
    "description": "Short intro",
    "materials_glob": "materials/**/*.json",
    "script_style": "two-host",
    "voice": "en-US-Standard-A"
  }'

Quickstart (frontend)

Frontend lives in frontend/ and is built with Vue 3, TypeScript, Pinia, and TailwindCSS.

  1. Install deps
pnpm install
  1. Configure .env
VITE_API_BASE=http://localhost:3000/api
  1. Run dev
pnpm dev
  1. Build
pnpm build

The app provides a UI for creating podcasts, previewing the list of episodes, playing audio, filtering, and deleting.

API (selected)

POST /v1/podcasts → create a background job

{
  "title": "Episode title",
  "description": "optional",
  "materials_glob": "materials/**/*.json",
  "voice": "en-US-Standard-A",
  "script_style": "two-host"
}

GET /v1/podcasts/{id} → returns status, progress, and result (MP3 URL) once ready

GET /v1/podcasts?status=done → list finished items

DELETE /v1/podcasts/{id} → cancel or remove

Debug routes (optional): /api/debug/raw returns raw state dump for troubleshooting.

How materials ingestion works

  • The service scans the selected directory for content pillar JSON or raw items (e.g. content_outline.json).
  • Materials can point to PDFs/MDX/TXT/JSON; the service builds a combined source text.
  • A prompt builds a two‑host script (A/B roles), with deterministic or OpenAI‑based generation (when API key is present).
  • Output is stored under OUTPUT_DIR and served at /media/<id>.mp3 (or via your storage adapter).

TTS

Default implementation uses Google TTS via gTTS (simple and local). You can swap it for another engine in app/services/tts_service.py.

  • Output files are stored in OUTPUT_DIR and served at /media/<id>.mp3.
  • Swap voices by changing a label/URL; add SSML if your engine supports it.

Frontend features

  • Create Podcast form.
  • List View with progress indicator and integrated audio player per episode.
  • Filtering by status (e.g. active/complete).
  • Delete episodes.
  • Automatic polling while a job is running.

Architecture (high level)

Materials (JSON/MDX/PDF) → Script builder (OpenAI/local) → TTS (Google TTS by default) → MP3 → Storage/API

                                     Vue 3 Frontend

Roadmap ideas

  • Streaming TTS + advanced playback controls.
  • Better error reporting and user notifications.
  • Progress bars and richer status updates.
  • Mobile‑friendly UI improvements.