172 lines
4.7 KiB
Markdown
172 lines
4.7 KiB
Markdown
# ScrAIbe – LocalAI-Backed Transcription and Summarization
|
||
|
||
ScrAIbe is a lightweight transcription and summarization client that:
|
||
|
||
- Sends audio to a LocalAI server running vibevoice.cpp for transcription and speaker diarization.
|
||
- Optionally uses a second LLM to generate a detailed, structured summary of the conversation.
|
||
|
||
No local speech models or heavy dependencies are required. ScrAIbe is designed to be run as a thin client in front of your own AI services.
|
||
|
||
## Features
|
||
|
||
- Transcription with speaker diarization via LocalAI:
|
||
- Uses the `/v1/audio/diarization` endpoint.
|
||
- Compatible with vibevoice.cpp and other diarization-capable backends.
|
||
- Optional AI-powered summarization:
|
||
- Task: `transcript_and_summarize`
|
||
- Chunks long transcripts, summarizes each chunk, then generates a final comprehensive summary.
|
||
- Summary highlights:
|
||
- Main topics and discussion points
|
||
- Key decisions and outcomes
|
||
- Action items and responsibilities
|
||
- Open issues and risks
|
||
- CLI and Python API:
|
||
- Simple command-line interface.
|
||
- Drop-in `Scraibe` class for integration into other tools.
|
||
- Docker-ready:
|
||
- Lightweight container, configured via environment variables.
|
||
|
||
## Architecture
|
||
|
||
- LocalAI (vibevoice.cpp):
|
||
- Handles audio → transcript + speaker segments.
|
||
- Summarizer LLM (OpenAI-compatible chat endpoint):
|
||
- Handles transcript → structured summary.
|
||
- ScrAIbe:
|
||
- Orchestrates:
|
||
- File upload to LocalAI
|
||
- Transcript assembly
|
||
- Chunked summarization
|
||
- Output formatting (e.g., .md with transcript + summary)
|
||
|
||
## Quick Start (CLI)
|
||
|
||
Basic usage:
|
||
|
||
- Transcribe:
|
||
|
||
- python3 -m scraibe.cli -f "audio.wav" -o "./output" -of txt
|
||
|
||
- Transcribe and summarize:
|
||
|
||
- python3 -m scraibe.cli -f "audio.wav" -o "./output" --task transcript_and_summarize
|
||
|
||
Environment variables must be set to point to your LocalAI and summarizer LLM.
|
||
|
||
## Python API
|
||
|
||
Example: transcribe only
|
||
|
||
- from scraibe import Scraibe
|
||
|
||
- client = Scraibe()
|
||
- text = client.transcribe("audio.wav")
|
||
- print(text)
|
||
|
||
Example: transcribe and summarize
|
||
|
||
- from scraibe import Scraibe
|
||
|
||
- client = Scraibe()
|
||
- result = client.transcript_and_summarize("audio.wav")
|
||
- transcript = result["transcript"]
|
||
- summary = result["summary"]
|
||
|
||
You can override endpoints and models via environment variables or constructor parameters if needed.
|
||
|
||
## Command-Line Options
|
||
|
||
Run:
|
||
|
||
- python3 -m scraibe.cli -h
|
||
|
||
Key options:
|
||
|
||
- -f / --audio-files:
|
||
- One or more audio files to process.
|
||
- --task:
|
||
- transcribe (default)
|
||
- transcript_and_summarize
|
||
- -o / --output-directory:
|
||
- Output folder for generated files.
|
||
- -of / --output-format:
|
||
- txt, json, md, html
|
||
- For transcript_and_summarize, output is always saved as .md with:
|
||
- # Transcript
|
||
- # Summary
|
||
|
||
Other options (e.g., --language, --num-speakers) are accepted and forwarded where applicable; many legacy Whisper/Pyannote flags are kept for compatibility but ignored.
|
||
|
||
## Docker Usage
|
||
|
||
ScrAIbe is designed to run in Docker as a client to your LocalAI and summarizer LLM.
|
||
|
||
### Basic run (transcribe)
|
||
|
||
- docker run -it \
|
||
-e LOCALAI_API_URL=http://localai:8080 \
|
||
-v /path/to/audio:/audio \
|
||
scraibe:latest \
|
||
-f /audio/meeting.wav -o /audio/output -of txt
|
||
|
||
### Basic run (transcribe + summarize)
|
||
|
||
- docker run -it \
|
||
-e LOCALAI_API_URL=http://localai:8080 \
|
||
-e SUMMARIZER_API_URL=http://llm:8080 \
|
||
-v /path/to/audio:/audio \
|
||
scraibe:latest \
|
||
-f /audio/meeting.wav -o /audio/output --task transcript_and_summarize
|
||
|
||
### Docker Environment Variables
|
||
|
||
The following environment variables configure ScrAIbe in Docker.
|
||
|
||
Transcription / Diarization (LocalAI):
|
||
|
||
- LOCALAI_API_URL:
|
||
- Required.
|
||
- Base URL of the LocalAI server.
|
||
- Example: http://localai:8080
|
||
- LOCALAI_API_KEY:
|
||
- Optional.
|
||
- API key for LocalAI, if configured.
|
||
- LOCALAI_MODEL:
|
||
- Optional (default: vibevoice-diarize).
|
||
- Model name used for transcription/diarization.
|
||
|
||
Summarization LLM:
|
||
|
||
- SUMMARIZER_API_URL:
|
||
- Required when using --task transcript_and_summarize.
|
||
- Base URL of the summarization LLM (OpenAI-compatible /v1/chat/completions).
|
||
- Example: http://llm:8080
|
||
- SUMMARIZER_API_KEY:
|
||
- Optional.
|
||
- API key for the summarization LLM, if required.
|
||
- SUMMARIZER_MODEL:
|
||
- Optional (default: llama-3.1-8b-instruct).
|
||
- Model name used for summarization.
|
||
|
||
All of these can also be overridden from the CLI when needed (e.g., --localai-api-url, --summarizer-api-url).
|
||
|
||
## Dependencies
|
||
|
||
Core runtime dependencies:
|
||
|
||
- Python 3.9+
|
||
- httpx
|
||
- numpy
|
||
- tqdm
|
||
- ffmpeg (for audio preprocessing)
|
||
|
||
No local Whisper, PyTorch, or Pyannote models are required.
|
||
|
||
## Contributing
|
||
|
||
Contributions are welcome. Please refer to CONTRIBUTING.md for guidelines.
|
||
|
||
## License
|
||
|
||
This project is licensed under GPL-3.0. See LICENSE for details.
|