scribe/README.md

# ScrAIbe – LocalAI-Backed Transcription and Summarization

ScrAIbe is a transcription and summarization service that:

- Sends audio to a LocalAI server running vibevoice.cpp for transcription and speaker diarization.
- Optionally uses a second LLM to generate a structured summary.
- Provides:
  - A web GUI for uploading audio and receiving transcripts via email.
  - A CLI and Python API for direct integration.
  - An MCP-style HTTP API (OpenAPI) for LLMs and external systems.
  - A watch-folder mode for automatic transcription, summarization, and email delivery.

No local speech models or heavy dependencies are required. ScrAIbe is designed as a thin client in front of your own AI services.

For more information: https://apstrom.ca

## Features

- Transcription with speaker diarization via LocalAI:
  - Uses the /v1/audio/diarization endpoint.
  - Compatible with vibevoice.cpp and other diarization-capable backends.
- Optional AI-powered summarization:
  - Task: transcript_and_summarize
  - Highlights:
    - Main topics and discussion points
    - Key decisions and outcomes
    - Action items and responsibilities
    - Open issues and risks
  - Improved, configurable summary prompts (via environment or file).
- Async web GUI (always enabled):
  - Upload audio via browser.
  - Jobs are queued and processed in the background (Celery + Redis).
  - Emails:
    - Immediate confirmation with queue position.
    - Final transcript (MD + DOCX + JSON) when ready.
    - Summary as MD + DOCX (if requested).
    - Error notification if processing fails.
- MCP-style HTTP API (optional):
  - Exposes an OpenAPI-compliant REST endpoint for external LLMs or services.
  - Allows:
    - Audio upload for transcription.
    - Job status checks.
    - Retrieval of transcript JSON (no summary).
  - Enabled via MCP_SERVER_ENABLED=true.
- Watch-folder mode (optional):
  - Monitors a directory for audio files.
  - For each file:
    - Transcribes and summarizes.
    - Emails transcript + summary + JSON to a configured address.
    - Deletes the source file after successful processing (configurable).
  - Enabled via WATCH_ENABLED=true.
- File formats:
  - Transcript:
    - .md
    - .docx (line-numbered, 30 lines per page, optional cover page)
  - Summary (if requested):
    - .md
    - .docx (markdown-aware WYSIWYG styling, optional cover page)
  - Full structured output: .json
- Customizable branding:
  - Web GUI title, logo, and accent color via environment variables.
  - Email logo, accent color, and subject lines via environment variables.
  - Optional cover pages for transcript and summary DOCX.
- CLI and Python API:
  - Simple command-line interface.
  - Drop-in Scraibe class for integration into other tools.
- Docker-ready:
  - Lightweight container, configured via environment variables.

## Architecture

- LocalAI (vibevoice.cpp):
  - Handles audio → transcript + speaker segments.
- Summarizer LLM (OpenAI-compatible chat endpoint):
  - Handles transcript → structured summary.
- ScrAIbe:
  - Orchestrates:
    - File upload to LocalAI
    - Transcript assembly
    - Chunked summarization
    - Output formatting (e.g., .md with transcript + summary)
  - Runs:
    - Web GUI (Gradio) – always enabled
    - MCP-style HTTP API (FastAPI) – optional
    - Watch-folder mode – optional
    - Celery worker (async processing)
    - Redis (in-container by default)

## Quick Start (Web GUI in Docker)

Run the container with your LocalAI and summarizer endpoints:

- docker run -d \
    -p 7860:7860 \
    -e LOCALAI_API_URL=http://localai:8080 \
    -e SUMMARIZER_API_URL=http://llm:8080 \
    -e EMAIL_SMTP_HOST=smtp.your-domain.com \
    -e EMAIL_SMTP_PORT=587 \
    -e EMAIL_SMTP_USER=transcribe@your-domain.com \
    -e EMAIL_SMTP_PASSWORD=your_password \
    -e EMAIL_FROM_ADDRESS="ScrAIbe <transcribe@your-domain.com>" \
    -e EMAIL_CONTACT_ADDRESS=support@your-domain.com \
    -e WEBUI_TITLE="Your Transcription Service" \
    -e WEBUI_LOGO_URL="https://your-domain.com/logo.png" \
    -e EMAIL_LOGO_URL="https://your-domain.com/logo.png" \
    -e EMAIL_ACCENT_COLOR="#7C6DA0" \
    scraibe:latest

Then open: http://<host>:7860

## Quick Start (CLI)

Basic usage:

- Transcribe:

  - python3 -m scraibe.cli -f "audio.wav" -o "./output" -of txt

- Transcribe and summarize:

  - python3 -m scraibe.cli -f "audio.wav" -o "./output" --task transcript_and_summarize

Environment variables must be set to point to your LocalAI and summarizer LLM.

## Python API

Example: transcribe only

- from scraibe import Scraibe

  - client = Scraibe()
  - text = client.transcribe("audio.wav")
  - print(text)

Example: transcribe and summarize

- from scraibe import Scraibe

  - client = Scraibe()
  - result = client.transcript_and_summarize("audio.wav")
  - transcript = result["transcript"]
  - summary = result["summary"]

You can override endpoints and models via environment variables or constructor parameters if needed.

## Command-Line Options

Run:

- python3 -m scraibe.cli -h

Key options:

- -f / --audio-files:
  - One or more audio files to process.
- --task:
  - transcribe (default)
  - transcript_and_summarize
- -o / --output-directory:
  - Output folder for generated files.
- -of / --output-format:
  - txt, json, md, html
  - For transcript_and_summarize, output is always saved as .md with:
    - # Transcript
    - # Summary

Other options (e.g., --language, --num-speakers) are accepted and forwarded where applicable; many legacy Whisper/Pyannote flags are kept for compatibility but ignored.

## Docker Usage

ScrAIbe is designed to run in Docker as a client to your LocalAI and summarizer LLM.

### Basic run (transcribe via CLI)

- docker run -it \
    -e LOCALAI_API_URL=http://localai:8080 \
    -v /path/to/audio:/audio \
    scraibe:latest \
    -f /audio/meeting.wav -o /audio/output -of txt

### Basic run (transcribe + summarize via CLI)

- docker run -it \
    -e LOCALAI_API_URL=http://localai:8080 \
    -e SUMMARIZER_API_URL=http://llm:8080 \
    -v /path/to/audio:/audio \
    scraibe:latest \
    -f /audio/meeting.wav -o /audio/output --task transcript_and_summarize

### Docker Environment Variables

The following environment variables configure ScrAIbe in Docker.

Transcription / Diarization (LocalAI):

- LOCALAI_API_URL:
  - Required.
  - Base URL of the LocalAI server.
  - Example: http://localai:8080
- LOCALAI_API_KEY:
  - Optional.
  - API key for LocalAI, if configured.
- LOCALAI_MODEL:
  - Optional (default: vibevoice-diarize).
  - Model name used for transcription/diarization.

Summarization LLM:

- SUMMARIZER_API_URL:
  - Required when using --task transcript_and_summarize.
  - Base URL of the summarization LLM (OpenAI-compatible /v1/chat/completions).
  - Example: http://llm:8080
- SUMMARIZER_API_KEY:
  - Optional.
  - API key for the summarization LLM, if required.
- SUMMARIZER_MODEL:
  - Optional (default: llama-3.1-8b-instruct).
  - Model name used for summarization.

Web GUI and branding:

- WEBUI_TITLE:
  - Title shown in the web GUI (default: A.P.Strom Transcription).
- WEBUI_LOGO_URL:
  - URL of the logo displayed in the web GUI header.
  - Example: https://your-domain.com/logo.png

Accent color (UI and emails):

- EMAIL_ACCENT_COLOR:
  - Accent color used in:
    - Web GUI buttons and accents
    - Email headings, links, and email addresses
  - Default: #7C6DA0

MCP-style HTTP API:

- MCP_SERVER_ENABLED:
  - Enable MCP-style HTTP API (default: false).
  - Values: true/false.
- MCP_SERVER_HOST:
  - Bind address (default: 0.0.0.0).
- MCP_SERVER_PORT:
  - Port (default: 8000).
- MCP_USE_CELERY:
  - Use Celery for async transcription (default: true).
  - If false, transcription runs in-process.

Watch-folder mode:

- WATCH_ENABLED:
  - Enable watch-folder mode (default: false).
  - Values: true/false.
- WATCH_DIR:
  - Directory to monitor for audio files (required if WATCH_ENABLED=true).
- WATCH_EMAIL_TO:
  - Email address to send transcript and summary (required if WATCH_ENABLED=true).
- WATCH_POLL_INTERVAL:
  - Seconds between scans (default: 10).
- WATCH_DELETE_ON_SUCCESS:
  - Delete source file after successful processing (default: true).

Async processing (Celery + Redis):

- CELERY_BROKER_URL:
  - Redis broker URL (default: redis://localhost:6379/0).
- CELERY_RESULT_BACKEND:
  - Redis backend URL (default: redis://localhost:6379/0).
- SCRAIBE_UPLOAD_DIR:
  - Directory where uploaded audio is stored (default: /tmp/scraibe_uploads).

Email configuration:

- EMAIL_SMTP_HOST:
  - SMTP server host.
- EMAIL_SMTP_PORT:
  - SMTP server port (e.g., 587).
- EMAIL_SMTP_USER:
  - SMTP username.
- EMAIL_SMTP_PASSWORD:
  - SMTP password.
- EMAIL_SMTP_USE_TLS:
  - Use TLS (true/false; default: true).
- EMAIL_FROM_ADDRESS:
  - Sender address (e.g., "ScrAIbe <transcribe@your-domain.com>").
- EMAIL_CONTACT_ADDRESS:
  - Support contact address shown in email templates.
- EMAIL_LOGO_URL:
  - URL of the logo used in emails (preferred).
- EMAIL_LOGO_PATH:
  - Fallback local path for email logo (default: /app/src/misc/logo1.png).
- EMAIL_CSS_PATH:
  - Path to the CSS used in emails (default: /app/src/misc/mail_style.css).

Email subject lines (customizable):

- EMAIL_SUBJECT_UPLOAD:
  - Subject for upload confirmation email.
  - Default: "ScrAIbe: Your transcription request has been received"
- EMAIL_SUBJECT_SUCCESS:
  - Subject for transcript-ready email.
  - Default: "ScrAIbe: Your transcript is ready"
- EMAIL_SUBJECT_ERROR:
  - Subject for error notification email.
  - Default: "ScrAIbe: Error with your transcription request"

Summary prompt customization:

- SUMMARY_PROMPT_CHUNK:
  - Override prompt used for each transcript chunk.
- SUMMARY_PROMPT_COMBINED:
  - Override prompt used for the final combined summary.
- SUMMARY_PROMPT_FILE:
  - Path to a file with prompts in sections:
    - [chunk]
    - [combined]

DOCX and cover pages:

- COVER_PAGE_ENABLED:
  - Add a cover page to transcript and summary DOCX files (default: false).
- COVER_PAGE_ORGANIZATION:
  - Organization name shown on the cover page.
- COVER_PAGE_TITLE_PREFIX:
  - Title prefix (e.g., "TRANSCRIPT" or "SUMMARY").
- COVER_PAGE_LOGO_URL:
  - Logo URL to include on the cover page.
- COVER_PAGE_LOGO_PATH:
  - Local logo path to include on the cover page.

Output files (async web GUI and watch-folder mode):

When a job completes, the user receives:

- Transcript:
  - .md file
  - .docx file (line-numbered, 30 lines per page, optional cover page)
- Summary (if requested):
  - .md file
  - .docx file (markdown-aware styling, optional cover page)
- JSON:
  - Structured transcript with diarization and metadata

All of these can also be overridden from the CLI when needed (e.g., --localai-api-url, --summarizer-api-url).

## Dependencies

Core runtime dependencies:

- Python 3.9+
- httpx
- numpy
- tqdm
- gradio
- celery[redis]
- redis
- python-docx
- fastapi
- uvicorn
- ffmpeg (for audio preprocessing)

No local Whisper, PyTorch, or Pyannote models are required.

## Contributing

Contributions are welcome. Please refer to CONTRIBUTING.md for guidelines.

## License

This project is licensed under GPL-3.0. See LICENSE for details.