Sends audio to a LocalAI server running vibevoice.cpp for transcription and speaker diarization.
Optionally uses a second LLM to generate a detailed, structured summary of the conversation.

No local speech models or heavy dependencies are required. ScrAIbe is designed to be run as a thin client in front of your own AI services.

Features

Transcription with speaker diarization via LocalAI:
- Uses the /v1/audio/diarization endpoint.
- Compatible with vibevoice.cpp and other diarization-capable backends.
Optional AI-powered summarization:
- Task: transcript_and_summarize
- Chunks long transcripts, summarizes each chunk, then generates a final comprehensive summary.
- Summary highlights:
  - Main topics and discussion points
  - Key decisions and outcomes
  - Action items and responsibilities
  - Open issues and risks
CLI and Python API:
- Simple command-line interface.
- Drop-in Scraibe class for integration into other tools.
Docker-ready:
- Lightweight container, configured via environment variables.

Architecture

LocalAI (vibevoice.cpp):
- Handles audio → transcript + speaker segments.
Summarizer LLM (OpenAI-compatible chat endpoint):
- Handles transcript → structured summary.
ScrAIbe:
- Orchestrates:
  - File upload to LocalAI
  - Transcript assembly
  - Chunked summarization
  - Output formatting (e.g., .md with transcript + summary)

Quick Start (CLI)

Basic usage:

Transcribe:
- python3 -m scraibe.cli -f "audio.wav" -o "./output" -of txt
Transcribe and summarize:
- python3 -m scraibe.cli -f "audio.wav" -o "./output" --task transcript_and_summarize

Environment variables must be set to point to your LocalAI and summarizer LLM.

Python API

Example: transcribe only

from scraibe import Scraibe
- client = Scraibe()
- text = client.transcribe("audio.wav")
- print(text)

Example: transcribe and summarize

from scraibe import Scraibe
- client = Scraibe()
- result = client.transcript_and_summarize("audio.wav")
- transcript = result["transcript"]
- summary = result["summary"]

You can override endpoints and models via environment variables or constructor parameters if needed.

Command-Line Options

Run:

python3 -m scraibe.cli -h

Key options:

-f / --audio-files:
- One or more audio files to process.
--task:
- transcribe (default)
- transcript_and_summarize
-o / --output-directory:
- Output folder for generated files.
-of / --output-format:
- txt, json, md, html
- For transcript_and_summarize, output is always saved as .md with:
  - Transcript
  - Summary

Other options (e.g., --language, --num-speakers) are accepted and forwarded where applicable; many legacy Whisper/Pyannote flags are kept for compatibility but ignored.

Docker Usage

ScrAIbe is designed to run in Docker as a client to your LocalAI and summarizer LLM.

Basic run (transcribe)

docker run -it
-e LOCALAI_API_URL=http://localai:8080
-v /path/to/audio:/audio
scraibe:latest
-f /audio/meeting.wav -o /audio/output -of txt

Basic run (transcribe + summarize)

docker run -it
-e LOCALAI_API_URL=http://localai:8080
-e SUMMARIZER_API_URL=http://llm:8080
-v /path/to/audio:/audio
scraibe:latest
-f /audio/meeting.wav -o /audio/output --task transcript_and_summarize

Docker Environment Variables

The following environment variables configure ScrAIbe in Docker.

Transcription / Diarization (LocalAI):

LOCALAI_API_URL:
- Required.
- Base URL of the LocalAI server.
- Example: http://localai:8080
LOCALAI_API_KEY:
- Optional.
- API key for LocalAI, if configured.
LOCALAI_MODEL:
- Optional (default: vibevoice-diarize).
- Model name used for transcription/diarization.

Summarization LLM:

SUMMARIZER_API_URL:
- Required when using --task transcript_and_summarize.
- Base URL of the summarization LLM (OpenAI-compatible /v1/chat/completions).
- Example: http://llm:8080
SUMMARIZER_API_KEY:
- Optional.
- API key for the summarization LLM, if required.
SUMMARIZER_MODEL:
- Optional (default: llama-3.1-8b-instruct).
- Model name used for summarization.

All of these can also be overridden from the CLI when needed (e.g., --localai-api-url, --summarizer-api-url).

Dependencies

Core runtime dependencies:

Python 3.9+
httpx
numpy
tqdm
ffmpeg (for audio preprocessing)

No local Whisper, PyTorch, or Pyannote models are required.

Contributing

Contributions are welcome. Please refer to CONTRIBUTING.md for guidelines.

License

This project is licensed under GPL-3.0. See LICENSE for details.

Languages

Python 90.4%

HTML 6%

CSS 2.3%

Dockerfile 1%

Makefile 0.3%

README.md Unescape Escape

ScrAIbe – LocalAI-Backed Transcription and Summarization