ea5a0752df
Mirror and run GitLab CI / build (push) Has been cancelled
- Remove PDF-related references - Clarify DOCX format: no cover pages, transcript line-numbered - Align output files and env vars with current implementation
294 lines
8.4 KiB
Markdown
294 lines
8.4 KiB
Markdown
# ScrAIbe – LocalAI-Backed Transcription and Summarization
|
||
|
||
ScrAIbe is a transcription and summarization service that:
|
||
|
||
- Sends audio to a LocalAI server running vibevoice.cpp for transcription and speaker diarization.
|
||
- Optionally uses a second LLM to generate a structured summary.
|
||
- Provides:
|
||
- A web GUI for uploading audio and receiving transcripts via email.
|
||
- A CLI and Python API for direct integration.
|
||
|
||
No local speech models or heavy dependencies are required. ScrAIbe is designed as a thin client in front of your own AI services.
|
||
|
||
For more information: https://apstrom.ca
|
||
|
||
## Features
|
||
|
||
- Transcription with speaker diarization via LocalAI:
|
||
- Uses the /v1/audio/diarization endpoint.
|
||
- Compatible with vibevoice.cpp and other diarization-capable backends.
|
||
- Optional AI-powered summarization:
|
||
- Task: transcript_and_summarize
|
||
- Highlights:
|
||
- Main topics and discussion points
|
||
- Key decisions and outcomes
|
||
- Action items and responsibilities
|
||
- Open issues and risks
|
||
- Async web GUI:
|
||
- Upload audio via browser.
|
||
- Jobs are queued and processed in the background (Celery + Redis).
|
||
- Emails:
|
||
- Immediate confirmation with queue position.
|
||
- Final transcript (MD + DOCX + JSON) when ready.
|
||
- Summary as MD + DOCX (if requested).
|
||
- Error notification if processing fails.
|
||
- File formats:
|
||
- Transcript: .md and .docx (line-numbered, no cover page)
|
||
- Summary (if requested): .md and .docx (no line numbering, no cover page)
|
||
- Full structured output: .json
|
||
- Customizable branding:
|
||
- Web GUI title, logo, and accent color via environment variables.
|
||
- Email logo, accent color, and subject lines via environment variables.
|
||
- CLI and Python API:
|
||
- Simple command-line interface.
|
||
- Drop-in Scraibe class for integration into other tools.
|
||
- Docker-ready:
|
||
- Lightweight container, configured via environment variables.
|
||
|
||
## Architecture
|
||
|
||
- LocalAI (vibevoice.cpp):
|
||
- Handles audio → transcript + speaker segments.
|
||
- Summarizer LLM (OpenAI-compatible chat endpoint):
|
||
- Handles transcript → structured summary.
|
||
- ScrAIbe:
|
||
- Orchestrates:
|
||
- File upload to LocalAI
|
||
- Transcript assembly
|
||
- Chunked summarization
|
||
- Output formatting (e.g., .md with transcript + summary)
|
||
- Runs:
|
||
- Web GUI (Gradio)
|
||
- Celery worker (async processing)
|
||
- Redis (in-container by default)
|
||
|
||
## Quick Start (Web GUI in Docker)
|
||
|
||
Run the container with your LocalAI and summarizer endpoints:
|
||
|
||
- docker run -d \
|
||
-p 7860:7860 \
|
||
-e LOCALAI_API_URL=http://localai:8080 \
|
||
-e SUMMARIZER_API_URL=http://llm:8080 \
|
||
-e EMAIL_SMTP_HOST=smtp.your-domain.com \
|
||
-e EMAIL_SMTP_PORT=587 \
|
||
-e EMAIL_SMTP_USER=transcribe@your-domain.com \
|
||
-e EMAIL_SMTP_PASSWORD=your_password \
|
||
-e EMAIL_FROM_ADDRESS="ScrAIbe <transcribe@your-domain.com>" \
|
||
-e EMAIL_CONTACT_ADDRESS=support@your-domain.com \
|
||
-e WEBUI_TITLE="Your Transcription Service" \
|
||
-e WEBUI_LOGO_URL="https://your-domain.com/logo.png" \
|
||
-e EMAIL_LOGO_URL="https://your-domain.com/logo.png" \
|
||
-e EMAIL_ACCENT_COLOR="#7C6DA0" \
|
||
scraibe:latest
|
||
|
||
Then open: http://<host>:7860
|
||
|
||
## Quick Start (CLI)
|
||
|
||
Basic usage:
|
||
|
||
- Transcribe:
|
||
|
||
- python3 -m scraibe.cli -f "audio.wav" -o "./output" -of txt
|
||
|
||
- Transcribe and summarize:
|
||
|
||
- python3 -m scraibe.cli -f "audio.wav" -o "./output" --task transcript_and_summarize
|
||
|
||
Environment variables must be set to point to your LocalAI and summarizer LLM.
|
||
|
||
## Python API
|
||
|
||
Example: transcribe only
|
||
|
||
- from scraibe import Scraibe
|
||
|
||
- client = Scraibe()
|
||
- text = client.transcribe("audio.wav")
|
||
- print(text)
|
||
|
||
Example: transcribe and summarize
|
||
|
||
- from scraibe import Scraibe
|
||
|
||
- client = Scraibe()
|
||
- result = client.transcript_and_summarize("audio.wav")
|
||
- transcript = result["transcript"]
|
||
- summary = result["summary"]
|
||
|
||
You can override endpoints and models via environment variables or constructor parameters if needed.
|
||
|
||
## Command-Line Options
|
||
|
||
Run:
|
||
|
||
- python3 -m scraibe.cli -h
|
||
|
||
Key options:
|
||
|
||
- -f / --audio-files:
|
||
- One or more audio files to process.
|
||
- --task:
|
||
- transcribe (default)
|
||
- transcript_and_summarize
|
||
- -o / --output-directory:
|
||
- Output folder for generated files.
|
||
- -of / --output-format:
|
||
- txt, json, md, html
|
||
- For transcript_and_summarize, output is always saved as .md with:
|
||
- # Transcript
|
||
- # Summary
|
||
|
||
Other options (e.g., --language, --num-speakers) are accepted and forwarded where applicable; many legacy Whisper/Pyannote flags are kept for compatibility but ignored.
|
||
|
||
## Docker Usage
|
||
|
||
ScrAIbe is designed to run in Docker as a client to your LocalAI and summarizer LLM.
|
||
|
||
### Basic run (transcribe via CLI)
|
||
|
||
- docker run -it \
|
||
-e LOCALAI_API_URL=http://localai:8080 \
|
||
-v /path/to/audio:/audio \
|
||
scraibe:latest \
|
||
-f /audio/meeting.wav -o /audio/output -of txt
|
||
|
||
### Basic run (transcribe + summarize via CLI)
|
||
|
||
- docker run -it \
|
||
-e LOCALAI_API_URL=http://localai:8080 \
|
||
-e SUMMARIZER_API_URL=http://llm:8080 \
|
||
-v /path/to/audio:/audio \
|
||
scraibe:latest \
|
||
-f /audio/meeting.wav -o /audio/output --task transcript_and_summarize
|
||
|
||
### Docker Environment Variables
|
||
|
||
The following environment variables configure ScrAIbe in Docker.
|
||
|
||
Transcription / Diarization (LocalAI):
|
||
|
||
- LOCALAI_API_URL:
|
||
- Required.
|
||
- Base URL of the LocalAI server.
|
||
- Example: http://localai:8080
|
||
- LOCALAI_API_KEY:
|
||
- Optional.
|
||
- API key for LocalAI, if configured.
|
||
- LOCALAI_MODEL:
|
||
- Optional (default: vibevoice-diarize).
|
||
- Model name used for transcription/diarization.
|
||
|
||
Summarization LLM:
|
||
|
||
- SUMMARIZER_API_URL:
|
||
- Required when using --task transcript_and_summarize.
|
||
- Base URL of the summarization LLM (OpenAI-compatible /v1/chat/completions).
|
||
- Example: http://llm:8080
|
||
- SUMMARIZER_API_KEY:
|
||
- Optional.
|
||
- API key for the summarization LLM, if required.
|
||
- SUMMARIZER_MODEL:
|
||
- Optional (default: llama-3.1-8b-instruct).
|
||
- Model name used for summarization.
|
||
|
||
Web GUI and branding:
|
||
|
||
- WEBUI_TITLE:
|
||
- Title shown in the web GUI (default: A.P.Strom Transcription).
|
||
- WEBUI_LOGO_URL:
|
||
- URL of the logo displayed in the web GUI header.
|
||
- Example: https://your-domain.com/logo.png
|
||
|
||
Accent color (UI and emails):
|
||
|
||
- EMAIL_ACCENT_COLOR:
|
||
- Accent color used in:
|
||
- Web GUI buttons and accents
|
||
- Email headings, links, and email addresses
|
||
- Default: #7C6DA0
|
||
|
||
Async processing (Celery + Redis):
|
||
|
||
- CELERY_BROKER_URL:
|
||
- Redis broker URL (default: redis://localhost:6379/0).
|
||
- CELERY_RESULT_BACKEND:
|
||
- Redis backend URL (default: redis://localhost:6379/0).
|
||
- SCRAIBE_UPLOAD_DIR:
|
||
- Directory where uploaded audio is stored (default: /tmp/scraibe_uploads).
|
||
|
||
Email configuration:
|
||
|
||
- EMAIL_SMTP_HOST:
|
||
- SMTP server host.
|
||
- EMAIL_SMTP_PORT:
|
||
- SMTP server port (e.g., 587).
|
||
- EMAIL_SMTP_USER:
|
||
- SMTP username.
|
||
- EMAIL_SMTP_PASSWORD:
|
||
- SMTP password.
|
||
- EMAIL_SMTP_USE_TLS:
|
||
- Use TLS (true/false; default: true).
|
||
- EMAIL_FROM_ADDRESS:
|
||
- Sender address (e.g., "ScrAIbe <transcribe@your-domain.com>").
|
||
- EMAIL_CONTACT_ADDRESS:
|
||
- Support contact address shown in email templates.
|
||
- EMAIL_LOGO_URL:
|
||
- URL of the logo used in emails (preferred).
|
||
- EMAIL_LOGO_PATH:
|
||
- Fallback local path for email logo (default: /app/src/misc/logo1.png).
|
||
- EMAIL_CSS_PATH:
|
||
- Path to the CSS used in emails (default: /app/src/misc/mail_style.css).
|
||
|
||
Email subject lines (customizable):
|
||
|
||
- EMAIL_SUBJECT_UPLOAD:
|
||
- Subject for upload confirmation email.
|
||
- Default: "ScrAIbe: Your transcription request has been received"
|
||
- EMAIL_SUBJECT_SUCCESS:
|
||
- Subject for transcript-ready email.
|
||
- Default: "ScrAIbe: Your transcript is ready"
|
||
- EMAIL_SUBJECT_ERROR:
|
||
- Subject for error notification email.
|
||
- Default: "ScrAIbe: Error with your transcription request"
|
||
|
||
Output files (async web GUI):
|
||
|
||
When a job completes, the user receives:
|
||
|
||
- Transcript:
|
||
- .md file
|
||
- .docx file (line-numbered, no cover page)
|
||
- Summary (if requested):
|
||
- .md file
|
||
- .docx file (no line numbering, no cover page)
|
||
- JSON:
|
||
- Structured transcript with diarization and metadata
|
||
|
||
All of these can also be overridden from the CLI when needed (e.g., --localai-api-url, --summarizer-api-url).
|
||
|
||
## Dependencies
|
||
|
||
Core runtime dependencies:
|
||
|
||
- Python 3.9+
|
||
- httpx
|
||
- numpy
|
||
- tqdm
|
||
- gradio
|
||
- celery[redis]
|
||
- redis
|
||
- python-docx
|
||
- ffmpeg (for audio preprocessing)
|
||
|
||
No local Whisper, PyTorch, or Pyannote models are required.
|
||
|
||
## Contributing
|
||
|
||
Contributions are welcome. Please refer to CONTRIBUTING.md for guidelines.
|
||
|
||
## License
|
||
|
||
This project is licensed under GPL-3.0. See LICENSE for details.
|