371 lines
11 KiB
Markdown
371 lines
11 KiB
Markdown
# ScrAIbe – LocalAI-Backed Transcription and Summarization
|
||
|
||
ScrAIbe is a transcription and summarization service that:
|
||
|
||
- Sends audio to a LocalAI server running vibevoice.cpp for transcription and speaker diarization.
|
||
- Optionally uses a second LLM to generate a structured summary.
|
||
- Provides:
|
||
- A web GUI for uploading audio and receiving transcripts via email.
|
||
- A CLI and Python API for direct integration.
|
||
- An MCP-style HTTP API (OpenAPI) for LLMs and external systems.
|
||
- A watch-folder mode for automatic transcription, summarization, and email delivery.
|
||
|
||
No local speech models or heavy dependencies are required. ScrAIbe is designed as a thin client in front of your own AI services.
|
||
|
||
For more information: https://apstrom.ca
|
||
|
||
## Features
|
||
|
||
- Transcription with speaker diarization via LocalAI:
|
||
- Uses the /v1/audio/diarization endpoint.
|
||
- Compatible with vibevoice.cpp and other diarization-capable backends.
|
||
- Optional AI-powered summarization:
|
||
- Task: transcript_and_summarize
|
||
- Highlights:
|
||
- Main topics and discussion points
|
||
- Key decisions and outcomes
|
||
- Action items and responsibilities
|
||
- Open issues and risks
|
||
- Improved, configurable summary prompts (via environment or file).
|
||
- Async web GUI (always enabled):
|
||
- Upload audio via browser.
|
||
- Jobs are queued and processed in the background (Celery + Redis).
|
||
- Emails:
|
||
- Immediate confirmation with queue position.
|
||
- Final transcript (MD + DOCX + JSON) when ready.
|
||
- Summary as MD + DOCX (if requested).
|
||
- Error notification if processing fails.
|
||
- MCP-style HTTP API (optional):
|
||
- Exposes an OpenAPI-compliant REST endpoint for external LLMs or services.
|
||
- Allows:
|
||
- Audio upload for transcription.
|
||
- Job status checks.
|
||
- Retrieval of transcript JSON (no summary).
|
||
- Enabled via MCP_SERVER_ENABLED=true.
|
||
- Watch-folder mode (optional):
|
||
- Monitors a directory for audio files.
|
||
- For each file:
|
||
- Transcribes and summarizes.
|
||
- Emails transcript + summary + JSON to a configured address.
|
||
- Deletes the source file after successful processing (configurable).
|
||
- Enabled via WATCH_ENABLED=true.
|
||
- File formats:
|
||
- Transcript:
|
||
- .md
|
||
- .docx (line-numbered, 30 lines per page, optional cover page)
|
||
- Summary (if requested):
|
||
- .md
|
||
- .docx (markdown-aware WYSIWYG styling, optional cover page)
|
||
- Full structured output: .json
|
||
- Customizable branding:
|
||
- Web GUI title, logo, and accent color via environment variables.
|
||
- Email logo, accent color, and subject lines via environment variables.
|
||
- Optional cover pages for transcript and summary DOCX.
|
||
- CLI and Python API:
|
||
- Simple command-line interface.
|
||
- Drop-in Scraibe class for integration into other tools.
|
||
- Docker-ready:
|
||
- Lightweight container, configured via environment variables.
|
||
|
||
## Architecture
|
||
|
||
- LocalAI (vibevoice.cpp):
|
||
- Handles audio → transcript + speaker segments.
|
||
- Summarizer LLM (OpenAI-compatible chat endpoint):
|
||
- Handles transcript → structured summary.
|
||
- ScrAIbe:
|
||
- Orchestrates:
|
||
- File upload to LocalAI
|
||
- Transcript assembly
|
||
- Chunked summarization
|
||
- Output formatting (e.g., .md with transcript + summary)
|
||
- Runs:
|
||
- Web GUI (Gradio) – always enabled
|
||
- MCP-style HTTP API (FastAPI) – optional
|
||
- Watch-folder mode – optional
|
||
- Celery worker (async processing)
|
||
- Redis (in-container by default)
|
||
|
||
## Quick Start (Web GUI in Docker)
|
||
|
||
Run the container with your LocalAI and summarizer endpoints:
|
||
|
||
- docker run -d \
|
||
-p 7860:7860 \
|
||
-e LOCALAI_API_URL=http://localai:8080 \
|
||
-e SUMMARIZER_API_URL=http://llm:8080 \
|
||
-e EMAIL_SMTP_HOST=smtp.your-domain.com \
|
||
-e EMAIL_SMTP_PORT=587 \
|
||
-e EMAIL_SMTP_USER=transcribe@your-domain.com \
|
||
-e EMAIL_SMTP_PASSWORD=your_password \
|
||
-e EMAIL_FROM_ADDRESS="ScrAIbe <transcribe@your-domain.com>" \
|
||
-e EMAIL_CONTACT_ADDRESS=support@your-domain.com \
|
||
-e WEBUI_TITLE="Your Transcription Service" \
|
||
-e WEBUI_LOGO_URL="https://your-domain.com/logo.png" \
|
||
-e EMAIL_LOGO_URL="https://your-domain.com/logo.png" \
|
||
-e EMAIL_ACCENT_COLOR="#7C6DA0" \
|
||
scraibe:latest
|
||
|
||
Then open: http://<host>:7860
|
||
|
||
## Quick Start (CLI)
|
||
|
||
Basic usage:
|
||
|
||
- Transcribe:
|
||
|
||
- python3 -m scraibe.cli -f "audio.wav" -o "./output" -of txt
|
||
|
||
- Transcribe and summarize:
|
||
|
||
- python3 -m scraibe.cli -f "audio.wav" -o "./output" --task transcript_and_summarize
|
||
|
||
Environment variables must be set to point to your LocalAI and summarizer LLM.
|
||
|
||
## Python API
|
||
|
||
Example: transcribe only
|
||
|
||
- from scraibe import Scraibe
|
||
|
||
- client = Scraibe()
|
||
- text = client.transcribe("audio.wav")
|
||
- print(text)
|
||
|
||
Example: transcribe and summarize
|
||
|
||
- from scraibe import Scraibe
|
||
|
||
- client = Scraibe()
|
||
- result = client.transcript_and_summarize("audio.wav")
|
||
- transcript = result["transcript"]
|
||
- summary = result["summary"]
|
||
|
||
You can override endpoints and models via environment variables or constructor parameters if needed.
|
||
|
||
## Command-Line Options
|
||
|
||
Run:
|
||
|
||
- python3 -m scraibe.cli -h
|
||
|
||
Key options:
|
||
|
||
- -f / --audio-files:
|
||
- One or more audio files to process.
|
||
- --task:
|
||
- transcribe (default)
|
||
- transcript_and_summarize
|
||
- -o / --output-directory:
|
||
- Output folder for generated files.
|
||
- -of / --output-format:
|
||
- txt, json, md, html
|
||
- For transcript_and_summarize, output is always saved as .md with:
|
||
- # Transcript
|
||
- # Summary
|
||
|
||
Other options (e.g., --language, --num-speakers) are accepted and forwarded where applicable; many legacy Whisper/Pyannote flags are kept for compatibility but ignored.
|
||
|
||
## Docker Usage
|
||
|
||
ScrAIbe is designed to run in Docker as a client to your LocalAI and summarizer LLM.
|
||
|
||
### Basic run (transcribe via CLI)
|
||
|
||
- docker run -it \
|
||
-e LOCALAI_API_URL=http://localai:8080 \
|
||
-v /path/to/audio:/audio \
|
||
scraibe:latest \
|
||
-f /audio/meeting.wav -o /audio/output -of txt
|
||
|
||
### Basic run (transcribe + summarize via CLI)
|
||
|
||
- docker run -it \
|
||
-e LOCALAI_API_URL=http://localai:8080 \
|
||
-e SUMMARIZER_API_URL=http://llm:8080 \
|
||
-v /path/to/audio:/audio \
|
||
scraibe:latest \
|
||
-f /audio/meeting.wav -o /audio/output --task transcript_and_summarize
|
||
|
||
### Docker Environment Variables
|
||
|
||
The following environment variables configure ScrAIbe in Docker.
|
||
|
||
Transcription / Diarization (LocalAI):
|
||
|
||
- LOCALAI_API_URL:
|
||
- Required.
|
||
- Base URL of the LocalAI server.
|
||
- Example: http://localai:8080
|
||
- LOCALAI_API_KEY:
|
||
- Optional.
|
||
- API key for LocalAI, if configured.
|
||
- LOCALAI_MODEL:
|
||
- Optional (default: vibevoice-diarize).
|
||
- Model name used for transcription/diarization.
|
||
|
||
Summarization LLM:
|
||
|
||
- SUMMARIZER_API_URL:
|
||
- Required when using --task transcript_and_summarize.
|
||
- Base URL of the summarization LLM (OpenAI-compatible /v1/chat/completions).
|
||
- Example: http://llm:8080
|
||
- SUMMARIZER_API_KEY:
|
||
- Optional.
|
||
- API key for the summarization LLM, if required.
|
||
- SUMMARIZER_MODEL:
|
||
- Optional (default: llama-3.1-8b-instruct).
|
||
- Model name used for summarization.
|
||
|
||
Web GUI and branding:
|
||
|
||
- WEBUI_TITLE:
|
||
- Title shown in the web GUI (default: A.P.Strom Transcription).
|
||
- WEBUI_LOGO_URL:
|
||
- URL of the logo displayed in the web GUI header.
|
||
- Example: https://your-domain.com/logo.png
|
||
|
||
Accent color (UI and emails):
|
||
|
||
- EMAIL_ACCENT_COLOR:
|
||
- Accent color used in:
|
||
- Web GUI buttons and accents
|
||
- Email headings, links, and email addresses
|
||
- Default: #7C6DA0
|
||
|
||
MCP-style HTTP API:
|
||
|
||
- MCP_SERVER_ENABLED:
|
||
- Enable MCP-style HTTP API (default: false).
|
||
- Values: true/false.
|
||
- MCP_SERVER_HOST:
|
||
- Bind address (default: 0.0.0.0).
|
||
- MCP_SERVER_PORT:
|
||
- Port (default: 8000).
|
||
- MCP_USE_CELERY:
|
||
- Use Celery for async transcription (default: true).
|
||
- If false, transcription runs in-process.
|
||
|
||
Watch-folder mode:
|
||
|
||
- WATCH_ENABLED:
|
||
- Enable watch-folder mode (default: false).
|
||
- Values: true/false.
|
||
- WATCH_DIR:
|
||
- Directory to monitor for audio files (required if WATCH_ENABLED=true).
|
||
- WATCH_EMAIL_TO:
|
||
- Email address to send transcript and summary (required if WATCH_ENABLED=true).
|
||
- WATCH_POLL_INTERVAL:
|
||
- Seconds between scans (default: 10).
|
||
- WATCH_DELETE_ON_SUCCESS:
|
||
- Delete source file after successful processing (default: true).
|
||
|
||
Async processing (Celery + Redis):
|
||
|
||
- CELERY_BROKER_URL:
|
||
- Redis broker URL (default: redis://localhost:6379/0).
|
||
- CELERY_RESULT_BACKEND:
|
||
- Redis backend URL (default: redis://localhost:6379/0).
|
||
- SCRAIBE_UPLOAD_DIR:
|
||
- Directory where uploaded audio is stored (default: /tmp/scraibe_uploads).
|
||
|
||
Email configuration:
|
||
|
||
- EMAIL_SMTP_HOST:
|
||
- SMTP server host.
|
||
- EMAIL_SMTP_PORT:
|
||
- SMTP server port (e.g., 587).
|
||
- EMAIL_SMTP_USER:
|
||
- SMTP username.
|
||
- EMAIL_SMTP_PASSWORD:
|
||
- SMTP password.
|
||
- EMAIL_SMTP_USE_TLS:
|
||
- Use TLS (true/false; default: true).
|
||
- EMAIL_FROM_ADDRESS:
|
||
- Sender address (e.g., "ScrAIbe <transcribe@your-domain.com>").
|
||
- EMAIL_CONTACT_ADDRESS:
|
||
- Support contact address shown in email templates.
|
||
- EMAIL_LOGO_URL:
|
||
- URL of the logo used in emails (preferred).
|
||
- EMAIL_LOGO_PATH:
|
||
- Fallback local path for email logo (default: /app/src/misc/logo1.png).
|
||
- EMAIL_CSS_PATH:
|
||
- Path to the CSS used in emails (default: /app/src/misc/mail_style.css).
|
||
|
||
Email subject lines (customizable):
|
||
|
||
- EMAIL_SUBJECT_UPLOAD:
|
||
- Subject for upload confirmation email.
|
||
- Default: "ScrAIbe: Your transcription request has been received"
|
||
- EMAIL_SUBJECT_SUCCESS:
|
||
- Subject for transcript-ready email.
|
||
- Default: "ScrAIbe: Your transcript is ready"
|
||
- EMAIL_SUBJECT_ERROR:
|
||
- Subject for error notification email.
|
||
- Default: "ScrAIbe: Error with your transcription request"
|
||
|
||
Summary prompt customization:
|
||
|
||
- SUMMARY_PROMPT_CHUNK:
|
||
- Override prompt used for each transcript chunk.
|
||
- SUMMARY_PROMPT_COMBINED:
|
||
- Override prompt used for the final combined summary.
|
||
- SUMMARY_PROMPT_FILE:
|
||
- Path to a file with prompts in sections:
|
||
- [chunk]
|
||
- [combined]
|
||
|
||
DOCX and cover pages:
|
||
|
||
- COVER_PAGE_ENABLED:
|
||
- Add a cover page to transcript and summary DOCX files (default: false).
|
||
- COVER_PAGE_ORGANIZATION:
|
||
- Organization name shown on the cover page.
|
||
- COVER_PAGE_TITLE_PREFIX:
|
||
- Title prefix (e.g., "TRANSCRIPT" or "SUMMARY").
|
||
- COVER_PAGE_LOGO_URL:
|
||
- Logo URL to include on the cover page.
|
||
- COVER_PAGE_LOGO_PATH:
|
||
- Local logo path to include on the cover page.
|
||
|
||
Output files (async web GUI and watch-folder mode):
|
||
|
||
When a job completes, the user receives:
|
||
|
||
- Transcript:
|
||
- .md file
|
||
- .docx file (line-numbered, 30 lines per page, optional cover page)
|
||
- Summary (if requested):
|
||
- .md file
|
||
- .docx file (markdown-aware styling, optional cover page)
|
||
- JSON:
|
||
- Structured transcript with diarization and metadata
|
||
|
||
All of these can also be overridden from the CLI when needed (e.g., --localai-api-url, --summarizer-api-url).
|
||
|
||
## Dependencies
|
||
|
||
Core runtime dependencies:
|
||
|
||
- Python 3.9+
|
||
- httpx
|
||
- numpy
|
||
- tqdm
|
||
- gradio
|
||
- celery[redis]
|
||
- redis
|
||
- python-docx
|
||
- fastapi
|
||
- uvicorn
|
||
- ffmpeg (for audio preprocessing)
|
||
|
||
No local Whisper, PyTorch, or Pyannote models are required.
|
||
|
||
## Contributing
|
||
|
||
Contributions are welcome. Please refer to CONTRIBUTING.md for guidelines.
|
||
|
||
## License
|
||
|
||
This project is licensed under GPL-3.0. See LICENSE for details.
|