# ScrAIbe – LocalAI-Backed Transcription and Summarization ScrAIbe is a lightweight transcription and summarization client that: - Sends audio to a LocalAI server running vibevoice.cpp for transcription and speaker diarization. - Optionally uses a second LLM to generate a detailed, structured summary of the conversation. No local speech models or heavy dependencies are required. ScrAIbe is designed to be run as a thin client in front of your own AI services. ## Features - Transcription with speaker diarization via LocalAI: - Uses the `/v1/audio/diarization` endpoint. - Compatible with vibevoice.cpp and other diarization-capable backends. - Optional AI-powered summarization: - Task: `transcript_and_summarize` - Chunks long transcripts, summarizes each chunk, then generates a final comprehensive summary. - Summary highlights: - Main topics and discussion points - Key decisions and outcomes - Action items and responsibilities - Open issues and risks - CLI and Python API: - Simple command-line interface. - Drop-in `Scraibe` class for integration into other tools. - Docker-ready: - Lightweight container, configured via environment variables. ## Architecture - LocalAI (vibevoice.cpp): - Handles audio → transcript + speaker segments. - Summarizer LLM (OpenAI-compatible chat endpoint): - Handles transcript → structured summary. - ScrAIbe: - Orchestrates: - File upload to LocalAI - Transcript assembly - Chunked summarization - Output formatting (e.g., .md with transcript + summary) ## Quick Start (CLI) Basic usage: - Transcribe: - python3 -m scraibe.cli -f "audio.wav" -o "./output" -of txt - Transcribe and summarize: - python3 -m scraibe.cli -f "audio.wav" -o "./output" --task transcript_and_summarize Environment variables must be set to point to your LocalAI and summarizer LLM. ## Python API Example: transcribe only - from scraibe import Scraibe - client = Scraibe() - text = client.transcribe("audio.wav") - print(text) Example: transcribe and summarize - from scraibe import Scraibe - client = Scraibe() - result = client.transcript_and_summarize("audio.wav") - transcript = result["transcript"] - summary = result["summary"] You can override endpoints and models via environment variables or constructor parameters if needed. ## Command-Line Options Run: - python3 -m scraibe.cli -h Key options: - -f / --audio-files: - One or more audio files to process. - --task: - transcribe (default) - transcript_and_summarize - -o / --output-directory: - Output folder for generated files. - -of / --output-format: - txt, json, md, html - For transcript_and_summarize, output is always saved as .md with: - # Transcript - # Summary Other options (e.g., --language, --num-speakers) are accepted and forwarded where applicable; many legacy Whisper/Pyannote flags are kept for compatibility but ignored. ## Docker Usage ScrAIbe is designed to run in Docker as a client to your LocalAI and summarizer LLM. ### Basic run (transcribe) - docker run -it \ -e LOCALAI_API_URL=http://localai:8080 \ -v /path/to/audio:/audio \ scraibe:latest \ -f /audio/meeting.wav -o /audio/output -of txt ### Basic run (transcribe + summarize) - docker run -it \ -e LOCALAI_API_URL=http://localai:8080 \ -e SUMMARIZER_API_URL=http://llm:8080 \ -v /path/to/audio:/audio \ scraibe:latest \ -f /audio/meeting.wav -o /audio/output --task transcript_and_summarize ### Docker Environment Variables The following environment variables configure ScrAIbe in Docker. Transcription / Diarization (LocalAI): - LOCALAI_API_URL: - Required. - Base URL of the LocalAI server. - Example: http://localai:8080 - LOCALAI_API_KEY: - Optional. - API key for LocalAI, if configured. - LOCALAI_MODEL: - Optional (default: vibevoice-diarize). - Model name used for transcription/diarization. Summarization LLM: - SUMMARIZER_API_URL: - Required when using --task transcript_and_summarize. - Base URL of the summarization LLM (OpenAI-compatible /v1/chat/completions). - Example: http://llm:8080 - SUMMARIZER_API_KEY: - Optional. - API key for the summarization LLM, if required. - SUMMARIZER_MODEL: - Optional (default: llama-3.1-8b-instruct). - Model name used for summarization. All of these can also be overridden from the CLI when needed (e.g., --localai-api-url, --summarizer-api-url). ## Dependencies Core runtime dependencies: - Python 3.9+ - httpx - numpy - tqdm - ffmpeg (for audio preprocessing) No local Whisper, PyTorch, or Pyannote models are required. ## Contributing Contributions are welcome. Please refer to CONTRIBUTING.md for guidelines. ## License This project is licensed under GPL-3.0. See LICENSE for details.