scribe/README.md

# ScrAIbe – LocalAI-Backed Transcription and Summarization

ScrAIbe is a lightweight transcription and summarization client that:

- Sends audio to a LocalAI server running vibevoice.cpp for transcription and speaker diarization.
- Optionally uses a second LLM to generate a detailed, structured summary of the conversation.

No local speech models or heavy dependencies are required. ScrAIbe is designed to be run as a thin client in front of your own AI services.

## Features

- Transcription with speaker diarization via LocalAI:
  - Uses the `/v1/audio/diarization` endpoint.
  - Compatible with vibevoice.cpp and other diarization-capable backends.
- Optional AI-powered summarization:
  - Task: `transcript_and_summarize`
  - Chunks long transcripts, summarizes each chunk, then generates a final comprehensive summary.
  - Summary highlights:
    - Main topics and discussion points
    - Key decisions and outcomes
    - Action items and responsibilities
    - Open issues and risks
- CLI and Python API:
  - Simple command-line interface.
  - Drop-in `Scraibe` class for integration into other tools.
- Docker-ready:
  - Lightweight container, configured via environment variables.

## Architecture

- LocalAI (vibevoice.cpp):
  - Handles audio → transcript + speaker segments.
- Summarizer LLM (OpenAI-compatible chat endpoint):
  - Handles transcript → structured summary.
- ScrAIbe:
  - Orchestrates:
    - File upload to LocalAI
    - Transcript assembly
    - Chunked summarization
    - Output formatting (e.g., .md with transcript + summary)

## Quick Start (CLI)

Basic usage:

- Transcribe:

  - python3 -m scraibe.cli -f "audio.wav" -o "./output" -of txt

- Transcribe and summarize:

  - python3 -m scraibe.cli -f "audio.wav" -o "./output" --task transcript_and_summarize

Environment variables must be set to point to your LocalAI and summarizer LLM.

## Python API

Example: transcribe only

- from scraibe import Scraibe

  - client = Scraibe()
  - text = client.transcribe("audio.wav")
  - print(text)

Example: transcribe and summarize

- from scraibe import Scraibe

  - client = Scraibe()
  - result = client.transcript_and_summarize("audio.wav")
  - transcript = result["transcript"]
  - summary = result["summary"]

You can override endpoints and models via environment variables or constructor parameters if needed.

## Command-Line Options

Run:

- python3 -m scraibe.cli -h

Key options:

- -f / --audio-files:
  - One or more audio files to process.
- --task:
  - transcribe (default)
  - transcript_and_summarize
- -o / --output-directory:
  - Output folder for generated files.
- -of / --output-format:
  - txt, json, md, html
  - For transcript_and_summarize, output is always saved as .md with:
    - # Transcript
    - # Summary

Other options (e.g., --language, --num-speakers) are accepted and forwarded where applicable; many legacy Whisper/Pyannote flags are kept for compatibility but ignored.

## Docker Usage

ScrAIbe is designed to run in Docker as a client to your LocalAI and summarizer LLM.

### Basic run (transcribe)

- docker run -it \
    -e LOCALAI_API_URL=http://localai:8080 \
    -v /path/to/audio:/audio \
    scraibe:latest \
    -f /audio/meeting.wav -o /audio/output -of txt

### Basic run (transcribe + summarize)

- docker run -it \
    -e LOCALAI_API_URL=http://localai:8080 \
    -e SUMMARIZER_API_URL=http://llm:8080 \
    -v /path/to/audio:/audio \
    scraibe:latest \
    -f /audio/meeting.wav -o /audio/output --task transcript_and_summarize

### Docker Environment Variables

The following environment variables configure ScrAIbe in Docker.

Transcription / Diarization (LocalAI):

- LOCALAI_API_URL:
  - Required.
  - Base URL of the LocalAI server.
  - Example: http://localai:8080
- LOCALAI_API_KEY:
  - Optional.
  - API key for LocalAI, if configured.
- LOCALAI_MODEL:
  - Optional (default: vibevoice-diarize).
  - Model name used for transcription/diarization.

Summarization LLM:

- SUMMARIZER_API_URL:
  - Required when using --task transcript_and_summarize.
  - Base URL of the summarization LLM (OpenAI-compatible /v1/chat/completions).
  - Example: http://llm:8080
- SUMMARIZER_API_KEY:
  - Optional.
  - API key for the summarization LLM, if required.
- SUMMARIZER_MODEL:
  - Optional (default: llama-3.1-8b-instruct).
  - Model name used for summarization.

All of these can also be overridden from the CLI when needed (e.g., --localai-api-url, --summarizer-api-url).

## Dependencies

Core runtime dependencies:

- Python 3.9+
- httpx
- numpy
- tqdm
- ffmpeg (for audio preprocessing)

No local Whisper, PyTorch, or Pyannote models are required.

## Contributing

Contributions are welcome. Please refer to CONTRIBUTING.md for guidelines.

## License

This project is licensed under GPL-3.0. See LICENSE for details.