Files
scribe/README.md
T
2026-06-19 17:46:54 +00:00

371 lines
11 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ScrAIbe LocalAI-Backed Transcription and Summarization
ScrAIbe is a transcription and summarization service that:
- Sends audio to a LocalAI server running vibevoice.cpp for transcription and speaker diarization.
- Optionally uses a second LLM to generate a structured summary.
- Provides:
- A web GUI for uploading audio and receiving transcripts via email.
- A CLI and Python API for direct integration.
- An MCP-style HTTP API (OpenAPI) for LLMs and external systems.
- A watch-folder mode for automatic transcription, summarization, and email delivery.
No local speech models or heavy dependencies are required. ScrAIbe is designed as a thin client in front of your own AI services.
For more information: https://apstrom.ca
## Features
- Transcription with speaker diarization via LocalAI:
- Uses the /v1/audio/diarization endpoint.
- Compatible with vibevoice.cpp and other diarization-capable backends.
- Optional AI-powered summarization:
- Task: transcript_and_summarize
- Highlights:
- Main topics and discussion points
- Key decisions and outcomes
- Action items and responsibilities
- Open issues and risks
- Improved, configurable summary prompts (via environment or file).
- Async web GUI (always enabled):
- Upload audio via browser.
- Jobs are queued and processed in the background (Celery + Redis).
- Emails:
- Immediate confirmation with queue position.
- Final transcript (MD + DOCX + JSON) when ready.
- Summary as MD + DOCX (if requested).
- Error notification if processing fails.
- MCP-style HTTP API (optional):
- Exposes an OpenAPI-compliant REST endpoint for external LLMs or services.
- Allows:
- Audio upload for transcription.
- Job status checks.
- Retrieval of transcript JSON (no summary).
- Enabled via MCP_SERVER_ENABLED=true.
- Watch-folder mode (optional):
- Monitors a directory for audio files.
- For each file:
- Transcribes and summarizes.
- Emails transcript + summary + JSON to a configured address.
- Deletes the source file after successful processing (configurable).
- Enabled via WATCH_ENABLED=true.
- File formats:
- Transcript:
- .md
- .docx (line-numbered, 30 lines per page, optional cover page)
- Summary (if requested):
- .md
- .docx (markdown-aware WYSIWYG styling, optional cover page)
- Full structured output: .json
- Customizable branding:
- Web GUI title, logo, and accent color via environment variables.
- Email logo, accent color, and subject lines via environment variables.
- Optional cover pages for transcript and summary DOCX.
- CLI and Python API:
- Simple command-line interface.
- Drop-in Scraibe class for integration into other tools.
- Docker-ready:
- Lightweight container, configured via environment variables.
## Architecture
- LocalAI (vibevoice.cpp):
- Handles audio → transcript + speaker segments.
- Summarizer LLM (OpenAI-compatible chat endpoint):
- Handles transcript → structured summary.
- ScrAIbe:
- Orchestrates:
- File upload to LocalAI
- Transcript assembly
- Chunked summarization
- Output formatting (e.g., .md with transcript + summary)
- Runs:
- Web GUI (Gradio) always enabled
- MCP-style HTTP API (FastAPI) optional
- Watch-folder mode optional
- Celery worker (async processing)
- Redis (in-container by default)
## Quick Start (Web GUI in Docker)
Run the container with your LocalAI and summarizer endpoints:
- docker run -d \
-p 7860:7860 \
-e LOCALAI_API_URL=http://localai:8080 \
-e SUMMARIZER_API_URL=http://llm:8080 \
-e EMAIL_SMTP_HOST=smtp.your-domain.com \
-e EMAIL_SMTP_PORT=587 \
-e EMAIL_SMTP_USER=transcribe@your-domain.com \
-e EMAIL_SMTP_PASSWORD=your_password \
-e EMAIL_FROM_ADDRESS="ScrAIbe <transcribe@your-domain.com>" \
-e EMAIL_CONTACT_ADDRESS=support@your-domain.com \
-e WEBUI_TITLE="Your Transcription Service" \
-e WEBUI_LOGO_URL="https://your-domain.com/logo.png" \
-e EMAIL_LOGO_URL="https://your-domain.com/logo.png" \
-e EMAIL_ACCENT_COLOR="#7C6DA0" \
scraibe:latest
Then open: http://<host>:7860
## Quick Start (CLI)
Basic usage:
- Transcribe:
- python3 -m scraibe.cli -f "audio.wav" -o "./output" -of txt
- Transcribe and summarize:
- python3 -m scraibe.cli -f "audio.wav" -o "./output" --task transcript_and_summarize
Environment variables must be set to point to your LocalAI and summarizer LLM.
## Python API
Example: transcribe only
- from scraibe import Scraibe
- client = Scraibe()
- text = client.transcribe("audio.wav")
- print(text)
Example: transcribe and summarize
- from scraibe import Scraibe
- client = Scraibe()
- result = client.transcript_and_summarize("audio.wav")
- transcript = result["transcript"]
- summary = result["summary"]
You can override endpoints and models via environment variables or constructor parameters if needed.
## Command-Line Options
Run:
- python3 -m scraibe.cli -h
Key options:
- -f / --audio-files:
- One or more audio files to process.
- --task:
- transcribe (default)
- transcript_and_summarize
- -o / --output-directory:
- Output folder for generated files.
- -of / --output-format:
- txt, json, md, html
- For transcript_and_summarize, output is always saved as .md with:
- # Transcript
- # Summary
Other options (e.g., --language, --num-speakers) are accepted and forwarded where applicable; many legacy Whisper/Pyannote flags are kept for compatibility but ignored.
## Docker Usage
ScrAIbe is designed to run in Docker as a client to your LocalAI and summarizer LLM.
### Basic run (transcribe via CLI)
- docker run -it \
-e LOCALAI_API_URL=http://localai:8080 \
-v /path/to/audio:/audio \
scraibe:latest \
-f /audio/meeting.wav -o /audio/output -of txt
### Basic run (transcribe + summarize via CLI)
- docker run -it \
-e LOCALAI_API_URL=http://localai:8080 \
-e SUMMARIZER_API_URL=http://llm:8080 \
-v /path/to/audio:/audio \
scraibe:latest \
-f /audio/meeting.wav -o /audio/output --task transcript_and_summarize
### Docker Environment Variables
The following environment variables configure ScrAIbe in Docker.
Transcription / Diarization (LocalAI):
- LOCALAI_API_URL:
- Required.
- Base URL of the LocalAI server.
- Example: http://localai:8080
- LOCALAI_API_KEY:
- Optional.
- API key for LocalAI, if configured.
- LOCALAI_MODEL:
- Optional (default: vibevoice-diarize).
- Model name used for transcription/diarization.
Summarization LLM:
- SUMMARIZER_API_URL:
- Required when using --task transcript_and_summarize.
- Base URL of the summarization LLM (OpenAI-compatible /v1/chat/completions).
- Example: http://llm:8080
- SUMMARIZER_API_KEY:
- Optional.
- API key for the summarization LLM, if required.
- SUMMARIZER_MODEL:
- Optional (default: llama-3.1-8b-instruct).
- Model name used for summarization.
Web GUI and branding:
- WEBUI_TITLE:
- Title shown in the web GUI (default: A.P.Strom Transcription).
- WEBUI_LOGO_URL:
- URL of the logo displayed in the web GUI header.
- Example: https://your-domain.com/logo.png
Accent color (UI and emails):
- EMAIL_ACCENT_COLOR:
- Accent color used in:
- Web GUI buttons and accents
- Email headings, links, and email addresses
- Default: #7C6DA0
MCP-style HTTP API:
- MCP_SERVER_ENABLED:
- Enable MCP-style HTTP API (default: false).
- Values: true/false.
- MCP_SERVER_HOST:
- Bind address (default: 0.0.0.0).
- MCP_SERVER_PORT:
- Port (default: 8000).
- MCP_USE_CELERY:
- Use Celery for async transcription (default: true).
- If false, transcription runs in-process.
Watch-folder mode:
- WATCH_ENABLED:
- Enable watch-folder mode (default: false).
- Values: true/false.
- WATCH_DIR:
- Directory to monitor for audio files (required if WATCH_ENABLED=true).
- WATCH_EMAIL_TO:
- Email address to send transcript and summary (required if WATCH_ENABLED=true).
- WATCH_POLL_INTERVAL:
- Seconds between scans (default: 10).
- WATCH_DELETE_ON_SUCCESS:
- Delete source file after successful processing (default: true).
Async processing (Celery + Redis):
- CELERY_BROKER_URL:
- Redis broker URL (default: redis://localhost:6379/0).
- CELERY_RESULT_BACKEND:
- Redis backend URL (default: redis://localhost:6379/0).
- SCRAIBE_UPLOAD_DIR:
- Directory where uploaded audio is stored (default: /tmp/scraibe_uploads).
Email configuration:
- EMAIL_SMTP_HOST:
- SMTP server host.
- EMAIL_SMTP_PORT:
- SMTP server port (e.g., 587).
- EMAIL_SMTP_USER:
- SMTP username.
- EMAIL_SMTP_PASSWORD:
- SMTP password.
- EMAIL_SMTP_USE_TLS:
- Use TLS (true/false; default: true).
- EMAIL_FROM_ADDRESS:
- Sender address (e.g., "ScrAIbe <transcribe@your-domain.com>").
- EMAIL_CONTACT_ADDRESS:
- Support contact address shown in email templates.
- EMAIL_LOGO_URL:
- URL of the logo used in emails (preferred).
- EMAIL_LOGO_PATH:
- Fallback local path for email logo (default: /app/src/misc/logo1.png).
- EMAIL_CSS_PATH:
- Path to the CSS used in emails (default: /app/src/misc/mail_style.css).
Email subject lines (customizable):
- EMAIL_SUBJECT_UPLOAD:
- Subject for upload confirmation email.
- Default: "ScrAIbe: Your transcription request has been received"
- EMAIL_SUBJECT_SUCCESS:
- Subject for transcript-ready email.
- Default: "ScrAIbe: Your transcript is ready"
- EMAIL_SUBJECT_ERROR:
- Subject for error notification email.
- Default: "ScrAIbe: Error with your transcription request"
Summary prompt customization:
- SUMMARY_PROMPT_CHUNK:
- Override prompt used for each transcript chunk.
- SUMMARY_PROMPT_COMBINED:
- Override prompt used for the final combined summary.
- SUMMARY_PROMPT_FILE:
- Path to a file with prompts in sections:
- [chunk]
- [combined]
DOCX and cover pages:
- COVER_PAGE_ENABLED:
- Add a cover page to transcript and summary DOCX files (default: false).
- COVER_PAGE_ORGANIZATION:
- Organization name shown on the cover page.
- COVER_PAGE_TITLE_PREFIX:
- Title prefix (e.g., "TRANSCRIPT" or "SUMMARY").
- COVER_PAGE_LOGO_URL:
- Logo URL to include on the cover page.
- COVER_PAGE_LOGO_PATH:
- Local logo path to include on the cover page.
Output files (async web GUI and watch-folder mode):
When a job completes, the user receives:
- Transcript:
- .md file
- .docx file (line-numbered, 30 lines per page, optional cover page)
- Summary (if requested):
- .md file
- .docx file (markdown-aware styling, optional cover page)
- JSON:
- Structured transcript with diarization and metadata
All of these can also be overridden from the CLI when needed (e.g., --localai-api-url, --summarizer-api-url).
## Dependencies
Core runtime dependencies:
- Python 3.9+
- httpx
- numpy
- tqdm
- gradio
- celery[redis]
- redis
- python-docx
- fastapi
- uvicorn
- ffmpeg (for audio preprocessing)
No local Whisper, PyTorch, or Pyannote models are required.
## Contributing
Contributions are welcome. Please refer to CONTRIBUTING.md for guidelines.
## License
This project is licensed under GPL-3.0. See LICENSE for details.