11 KiB
ScrAIbe – LocalAI-Backed Transcription and Summarization
ScrAIbe is a transcription and summarization service that:
- Sends audio to a LocalAI server running vibevoice.cpp for transcription and speaker diarization.
- Optionally uses a second LLM to generate a structured summary.
- Provides:
- A web GUI for uploading audio and receiving transcripts via email.
- A CLI and Python API for direct integration.
- An MCP-style HTTP API (OpenAPI) for LLMs and external systems.
- A watch-folder mode for automatic transcription, summarization, and email delivery.
No local speech models or heavy dependencies are required. ScrAIbe is designed as a thin client in front of your own AI services.
For more information: https://apstrom.ca
Features
- Transcription with speaker diarization via LocalAI:
- Uses the /v1/audio/diarization endpoint.
- Compatible with vibevoice.cpp and other diarization-capable backends.
- Optional AI-powered summarization:
- Task: transcript_and_summarize
- Highlights:
- Main topics and discussion points
- Key decisions and outcomes
- Action items and responsibilities
- Open issues and risks
- Improved, configurable summary prompts (via environment or file).
- Async web GUI (always enabled):
- Upload audio via browser.
- Jobs are queued and processed in the background (Celery + Redis).
- Emails:
- Immediate confirmation with queue position.
- Final transcript (MD + DOCX + JSON) when ready.
- Summary as MD + DOCX (if requested).
- Error notification if processing fails.
- MCP-style HTTP API (optional):
- Exposes an OpenAPI-compliant REST endpoint for external LLMs or services.
- Allows:
- Audio upload for transcription.
- Job status checks.
- Retrieval of transcript JSON (no summary).
- Enabled via MCP_SERVER_ENABLED=true.
- Watch-folder mode (optional):
- Monitors a directory for audio files.
- For each file:
- Transcribes and summarizes.
- Emails transcript + summary + JSON to a configured address.
- Deletes the source file after successful processing (configurable).
- Enabled via WATCH_ENABLED=true.
- File formats:
- Transcript:
- .md
- .docx (line-numbered, 30 lines per page, optional cover page)
- Summary (if requested):
- .md
- .docx (markdown-aware WYSIWYG styling, optional cover page)
- Full structured output: .json
- Transcript:
- Customizable branding:
- Web GUI title, logo, and accent color via environment variables.
- Email logo, accent color, and subject lines via environment variables.
- Optional cover pages for transcript and summary DOCX.
- CLI and Python API:
- Simple command-line interface.
- Drop-in Scraibe class for integration into other tools.
- Docker-ready:
- Lightweight container, configured via environment variables.
Architecture
- LocalAI (vibevoice.cpp):
- Handles audio → transcript + speaker segments.
- Summarizer LLM (OpenAI-compatible chat endpoint):
- Handles transcript → structured summary.
- ScrAIbe:
- Orchestrates:
- File upload to LocalAI
- Transcript assembly
- Chunked summarization
- Output formatting (e.g., .md with transcript + summary)
- Runs:
- Web GUI (Gradio) – always enabled
- MCP-style HTTP API (FastAPI) – optional
- Watch-folder mode – optional
- Celery worker (async processing)
- Redis (in-container by default)
- Orchestrates:
Quick Start (Web GUI in Docker)
Run the container with your LocalAI and summarizer endpoints:
- docker run -d
-p 7860:7860
-e LOCALAI_API_URL=http://localai:8080
-e SUMMARIZER_API_URL=http://llm:8080
-e EMAIL_SMTP_HOST=smtp.your-domain.com
-e EMAIL_SMTP_PORT=587
-e EMAIL_SMTP_USER=transcribe@your-domain.com
-e EMAIL_SMTP_PASSWORD=your_password
-e EMAIL_FROM_ADDRESS="ScrAIbe transcribe@your-domain.com"
-e EMAIL_CONTACT_ADDRESS=support@your-domain.com
-e WEBUI_TITLE="Your Transcription Service"
-e WEBUI_LOGO_URL="https://your-domain.com/logo.png"
-e EMAIL_LOGO_URL="https://your-domain.com/logo.png"
-e EMAIL_ACCENT_COLOR="#7C6DA0"
scraibe:latest
Then open: http://:7860
Quick Start (CLI)
Basic usage:
-
Transcribe:
- python3 -m scraibe.cli -f "audio.wav" -o "./output" -of txt
-
Transcribe and summarize:
- python3 -m scraibe.cli -f "audio.wav" -o "./output" --task transcript_and_summarize
Environment variables must be set to point to your LocalAI and summarizer LLM.
Python API
Example: transcribe only
-
from scraibe import Scraibe
- client = Scraibe()
- text = client.transcribe("audio.wav")
- print(text)
Example: transcribe and summarize
-
from scraibe import Scraibe
- client = Scraibe()
- result = client.transcript_and_summarize("audio.wav")
- transcript = result["transcript"]
- summary = result["summary"]
You can override endpoints and models via environment variables or constructor parameters if needed.
Command-Line Options
Run:
- python3 -m scraibe.cli -h
Key options:
- -f / --audio-files:
- One or more audio files to process.
- --task:
- transcribe (default)
- transcript_and_summarize
- -o / --output-directory:
- Output folder for generated files.
- -of / --output-format:
- txt, json, md, html
- For transcript_and_summarize, output is always saved as .md with:
-
Transcript
-
Summary
-
Other options (e.g., --language, --num-speakers) are accepted and forwarded where applicable; many legacy Whisper/Pyannote flags are kept for compatibility but ignored.
Docker Usage
ScrAIbe is designed to run in Docker as a client to your LocalAI and summarizer LLM.
Basic run (transcribe via CLI)
- docker run -it
-e LOCALAI_API_URL=http://localai:8080
-v /path/to/audio:/audio
scraibe:latest
-f /audio/meeting.wav -o /audio/output -of txt
Basic run (transcribe + summarize via CLI)
- docker run -it
-e LOCALAI_API_URL=http://localai:8080
-e SUMMARIZER_API_URL=http://llm:8080
-v /path/to/audio:/audio
scraibe:latest
-f /audio/meeting.wav -o /audio/output --task transcript_and_summarize
Docker Environment Variables
The following environment variables configure ScrAIbe in Docker.
Transcription / Diarization (LocalAI):
- LOCALAI_API_URL:
- Required.
- Base URL of the LocalAI server.
- Example: http://localai:8080
- LOCALAI_API_KEY:
- Optional.
- API key for LocalAI, if configured.
- LOCALAI_MODEL:
- Optional (default: vibevoice-diarize).
- Model name used for transcription/diarization.
Summarization LLM:
- SUMMARIZER_API_URL:
- Required when using --task transcript_and_summarize.
- Base URL of the summarization LLM (OpenAI-compatible /v1/chat/completions).
- Example: http://llm:8080
- SUMMARIZER_API_KEY:
- Optional.
- API key for the summarization LLM, if required.
- SUMMARIZER_MODEL:
- Optional (default: llama-3.1-8b-instruct).
- Model name used for summarization.
Web GUI and branding:
- WEBUI_TITLE:
- Title shown in the web GUI (default: A.P.Strom Transcription).
- WEBUI_LOGO_URL:
- URL of the logo displayed in the web GUI header.
- Example: https://your-domain.com/logo.png
Accent color (UI and emails):
- EMAIL_ACCENT_COLOR:
- Accent color used in:
- Web GUI buttons and accents
- Email headings, links, and email addresses
- Default: #7C6DA0
- Accent color used in:
MCP-style HTTP API:
- MCP_SERVER_ENABLED:
- Enable MCP-style HTTP API (default: false).
- Values: true/false.
- MCP_SERVER_HOST:
- Bind address (default: 0.0.0.0).
- MCP_SERVER_PORT:
- Port (default: 8000).
- MCP_USE_CELERY:
- Use Celery for async transcription (default: true).
- If false, transcription runs in-process.
Watch-folder mode:
- WATCH_ENABLED:
- Enable watch-folder mode (default: false).
- Values: true/false.
- WATCH_DIR:
- Directory to monitor for audio files (required if WATCH_ENABLED=true).
- WATCH_EMAIL_TO:
- Email address to send transcript and summary (required if WATCH_ENABLED=true).
- WATCH_POLL_INTERVAL:
- Seconds between scans (default: 10).
- WATCH_DELETE_ON_SUCCESS:
- Delete source file after successful processing (default: true).
Async processing (Celery + Redis):
- CELERY_BROKER_URL:
- Redis broker URL (default: redis://localhost:6379/0).
- CELERY_RESULT_BACKEND:
- Redis backend URL (default: redis://localhost:6379/0).
- SCRAIBE_UPLOAD_DIR:
- Directory where uploaded audio is stored (default: /tmp/scraibe_uploads).
Email configuration:
- EMAIL_SMTP_HOST:
- SMTP server host.
- EMAIL_SMTP_PORT:
- SMTP server port (e.g., 587).
- EMAIL_SMTP_USER:
- SMTP username.
- EMAIL_SMTP_PASSWORD:
- SMTP password.
- EMAIL_SMTP_USE_TLS:
- Use TLS (true/false; default: true).
- EMAIL_FROM_ADDRESS:
- Sender address (e.g., "ScrAIbe transcribe@your-domain.com").
- EMAIL_CONTACT_ADDRESS:
- Support contact address shown in email templates.
- EMAIL_LOGO_URL:
- URL of the logo used in emails (preferred).
- EMAIL_LOGO_PATH:
- Fallback local path for email logo (default: /app/src/misc/logo1.png).
- EMAIL_CSS_PATH:
- Path to the CSS used in emails (default: /app/src/misc/mail_style.css).
Email subject lines (customizable):
- EMAIL_SUBJECT_UPLOAD:
- Subject for upload confirmation email.
- Default: "ScrAIbe: Your transcription request has been received"
- EMAIL_SUBJECT_SUCCESS:
- Subject for transcript-ready email.
- Default: "ScrAIbe: Your transcript is ready"
- EMAIL_SUBJECT_ERROR:
- Subject for error notification email.
- Default: "ScrAIbe: Error with your transcription request"
Summary prompt customization:
- SUMMARY_PROMPT_CHUNK:
- Override prompt used for each transcript chunk.
- SUMMARY_PROMPT_COMBINED:
- Override prompt used for the final combined summary.
- SUMMARY_PROMPT_FILE:
- Path to a file with prompts in sections:
- [chunk]
- [combined]
- Path to a file with prompts in sections:
DOCX and cover pages:
- COVER_PAGE_ENABLED:
- Add a cover page to transcript and summary DOCX files (default: false).
- COVER_PAGE_ORGANIZATION:
- Organization name shown on the cover page.
- COVER_PAGE_TITLE_PREFIX:
- Title prefix (e.g., "TRANSCRIPT" or "SUMMARY").
- COVER_PAGE_LOGO_URL:
- Logo URL to include on the cover page.
- COVER_PAGE_LOGO_PATH:
- Local logo path to include on the cover page.
Output files (async web GUI and watch-folder mode):
When a job completes, the user receives:
- Transcript:
- .md file
- .docx file (line-numbered, 30 lines per page, optional cover page)
- Summary (if requested):
- .md file
- .docx file (markdown-aware styling, optional cover page)
- JSON:
- Structured transcript with diarization and metadata
All of these can also be overridden from the CLI when needed (e.g., --localai-api-url, --summarizer-api-url).
Dependencies
Core runtime dependencies:
- Python 3.9+
- httpx
- numpy
- tqdm
- gradio
- celery[redis]
- redis
- python-docx
- fastapi
- uvicorn
- ffmpeg (for audio preprocessing)
No local Whisper, PyTorch, or Pyannote models are required.
Contributing
Contributions are welcome. Please refer to CONTRIBUTING.md for guidelines.
License
This project is licensed under GPL-3.0. See LICENSE for details.