# ScrAIbe – LocalAI-Backed Transcription and Summarization ScrAIbe is a transcription and summarization service that: - Sends audio to a LocalAI server running vibevoice.cpp for transcription and speaker diarization. - Optionally uses a second LLM to generate a detailed, structured summary. - Provides: - A web GUI for uploading audio and receiving transcripts via email. - A CLI and Python API for direct integration. No local speech models or heavy dependencies are required. ScrAIbe is designed as a thin client in front of your own AI services. For more information: https://apstrom.ca ## Features - Transcription with speaker diarization via LocalAI: - Uses the /v1/audio/diarization endpoint. - Compatible with vibevoice.cpp and other diarization-capable backends. - Optional AI-powered summarization: - Task: transcript_and_summarize - Highlights: - Main topics and discussion points - Key decisions and outcomes - Action items and responsibilities - Open issues and risks - Async web GUI: - Upload audio via browser. - Jobs are queued and processed in the background (Celery + Redis). - Emails: - Immediate confirmation with queue position. - Final transcript (TXT + JSON) when ready. - Summary as MD file (if requested). - Error notification if processing fails. - Customizable branding: - Web GUI title, logo, and email logo via environment variables. - CLI and Python API: - Simple command-line interface. - Drop-in Scraibe class for integration into other tools. - Docker-ready: - Lightweight container, configured via environment variables. ## Architecture - LocalAI (vibevoice.cpp): - Handles audio → transcript + speaker segments. - Summarizer LLM (OpenAI-compatible chat endpoint): - Handles transcript → structured summary. - ScrAIbe: - Orchestrates: - File upload to LocalAI - Transcript assembly - Chunked summarization - Output formatting (e.g., .md with transcript + summary) - Runs: - Web GUI (Gradio) - Celery worker (async processing) - Redis (in-container by default) ## Quick Start (Web GUI in Docker) Run the container with your LocalAI and summarizer endpoints: - docker run -d \ -p 7860:7860 \ -e LOCALAI_API_URL=http://localai:8080 \ -e SUMMARIZER_API_URL=http://llm:8080 \ -e EMAIL_SMTP_HOST=smtp.your-domain.com \ -e EMAIL_SMTP_PORT=587 \ -e EMAIL_SMTP_USER=transcribe@your-domain.com \ -e EMAIL_SMTP_PASSWORD=your_password \ -e EMAIL_FROM_ADDRESS="ScrAIbe " \ -e EMAIL_CONTACT_ADDRESS=support@your-domain.com \ -e WEBUI_TITLE="Your Transcription Service" \ -e WEBUI_LOGO_URL="https://your-domain.com/logo.png" \ -e EMAIL_LOGO_URL="https://your-domain.com/logo.png" \ scraibe:latest Then open: http://:7860 ## Quick Start (CLI) Basic usage: - Transcribe: - python3 -m scraibe.cli -f "audio.wav" -o "./output" -of txt - Transcribe and summarize: - python3 -m scraibe.cli -f "audio.wav" -o "./output" --task transcript_and_summarize Environment variables must be set to point to your LocalAI and summarizer LLM. ## Python API Example: transcribe only - from scraibe import Scraibe - client = Scraibe() - text = client.transcribe("audio.wav") - print(text) Example: transcribe and summarize - from scraibe import Scraibe - client = Scraibe() - result = client.transcript_and_summarize("audio.wav") - transcript = result["transcript"] - summary = result["summary"] You can override endpoints and models via environment variables or constructor parameters if needed. ## Command-Line Options Run: - python3 -m scraibe.cli -h Key options: - -f / --audio-files: - One or more audio files to process. - --task: - transcribe (default) - transcript_and_summarize - -o / --output-directory: - Output folder for generated files. - -of / --output-format: - txt, json, md, html - For transcript_and_summarize, output is always saved as .md with: - # Transcript - # Summary Other options (e.g., --language, --num-speakers) are accepted and forwarded where applicable; many legacy Whisper/Pyannote flags are kept for compatibility but ignored. ## Docker Usage ScrAIbe is designed to run in Docker as a client to your LocalAI and summarizer LLM. ### Basic run (transcribe via CLI) - docker run -it \ -e LOCALAI_API_URL=http://localai:8080 \ -v /path/to/audio:/audio \ scraibe:latest \ -f /audio/meeting.wav -o /audio/output -of txt ### Basic run (transcribe + summarize via CLI) - docker run -it \ -e LOCALAI_API_URL=http://localai:8080 \ -e SUMMARIZER_API_URL=http://llm:8080 \ -v /path/to/audio:/audio \ scraibe:latest \ -f /audio/meeting.wav -o /audio/output --task transcript_and_summarize ### Docker Environment Variables The following environment variables configure ScrAIbe in Docker. Transcription / Diarization (LocalAI): - LOCALAI_API_URL: - Required. - Base URL of the LocalAI server. - Example: http://localai:8080 - LOCALAI_API_KEY: - Optional. - API key for LocalAI, if configured. - LOCALAI_MODEL: - Optional (default: vibevoice-diarize). - Model name used for transcription/diarization. Summarization LLM: - SUMMARIZER_API_URL: - Required when using --task transcript_and_summarize. - Base URL of the summarization LLM (OpenAI-compatible /v1/chat/completions). - Example: http://llm:8080 - SUMMARIZER_API_KEY: - Optional. - API key for the summarization LLM, if required. - SUMMARIZER_MODEL: - Optional (default: llama-3.1-8b-instruct). - Model name used for summarization. Web GUI and branding: - WEBUI_TITLE: - Title shown in the web GUI (default: A.P.Strom Transcription). - WEBUI_LOGO_URL: - URL of the logo displayed in the web GUI header. - Example: https://your-domain.com/logo.png Async processing (Celery + Redis): - CELERY_BROKER_URL: - Redis broker URL (default: redis://localhost:6379/0). - CELERY_RESULT_BACKEND: - Redis backend URL (default: redis://localhost:6379/0). - SCRAIBE_UPLOAD_DIR: - Directory where uploaded audio is stored (default: /tmp/scraibe_uploads). Email configuration: - EMAIL_SMTP_HOST: - SMTP server host. - EMAIL_SMTP_PORT: - SMTP server port (e.g., 587). - EMAIL_SMTP_USER: - SMTP username. - EMAIL_SMTP_PASSWORD: - SMTP password. - EMAIL_SMTP_USE_TLS: - Use TLS (true/false; default: true). - EMAIL_FROM_ADDRESS: - Sender address (e.g., "ScrAIbe "). - EMAIL_CONTACT_ADDRESS: - Support contact address shown in email templates. - EMAIL_LOGO_URL: - URL of the logo used in emails (preferred). - EMAIL_LOGO_PATH: - Fallback local path for email logo (default: /app/src/misc/logo1.png). - EMAIL_CSS_PATH: - Path to the CSS used in emails (default: /app/src/misc/mail_style.css). All of these can also be overridden from the CLI when needed (e.g., --localai-api-url, --summarizer-api-url). ## Dependencies Core runtime dependencies: - Python 3.9+ - httpx - numpy - tqdm - gradio - celery[redis] - redis - ffmpeg (for audio preprocessing) No local Whisper, PyTorch, or Pyannote models are required. ## Contributing Contributions are welcome. Please refer to CONTRIBUTING.md for guidelines. ## License This project is licensed under GPL-3.0. See LICENSE for details.