admin d828a91bf3
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
Use embedded line numbers instead of built-in line numbering
- Remove w:lnNumType; line numbers are now plain text in each paragraph.
- Ensures first line is always '1' across Word, LibreOffice, Google Docs.
- Each paragraph: line number + tab + content.
2026-06-16 19:15:47 +00:00
2024-05-30 14:02:09 +00:00
2024-01-26 15:09:11 +01:00
2024-04-24 13:20:38 +00:00
2024-05-30 14:15:43 +00:00
2024-05-30 14:02:35 +00:00
2023-09-13 15:10:46 +02:00
2023-10-08 19:10:46 +02:00

ScrAIbe LocalAI-Backed Transcription and Summarization

ScrAIbe is a transcription and summarization service that:

  • Sends audio to a LocalAI server running vibevoice.cpp for transcription and speaker diarization.
  • Optionally uses a second LLM to generate a structured summary.
  • Provides:
    • A web GUI for uploading audio and receiving transcripts via email.
    • A CLI and Python API for direct integration.

No local speech models or heavy dependencies are required. ScrAIbe is designed as a thin client in front of your own AI services.

For more information: https://apstrom.ca

Features

  • Transcription with speaker diarization via LocalAI:
    • Uses the /v1/audio/diarization endpoint.
    • Compatible with vibevoice.cpp and other diarization-capable backends.
  • Optional AI-powered summarization:
    • Task: transcript_and_summarize
    • Highlights:
      • Main topics and discussion points
      • Key decisions and outcomes
      • Action items and responsibilities
      • Open issues and risks
  • Async web GUI:
    • Upload audio via browser.
    • Jobs are queued and processed in the background (Celery + Redis).
    • Emails:
      • Immediate confirmation with queue position.
      • Final transcript (MD + DOCX + JSON) when ready.
      • Summary as MD + DOCX (if requested).
      • Error notification if processing fails.
  • File formats:
    • Transcript: .md and .docx (line-numbered, no cover page)
    • Summary (if requested): .md and .docx (no line numbering, no cover page)
    • Full structured output: .json
  • Customizable branding:
    • Web GUI title, logo, and accent color via environment variables.
    • Email logo, accent color, and subject lines via environment variables.
  • CLI and Python API:
    • Simple command-line interface.
    • Drop-in Scraibe class for integration into other tools.
  • Docker-ready:
    • Lightweight container, configured via environment variables.

Architecture

  • LocalAI (vibevoice.cpp):
    • Handles audio → transcript + speaker segments.
  • Summarizer LLM (OpenAI-compatible chat endpoint):
    • Handles transcript → structured summary.
  • ScrAIbe:
    • Orchestrates:
      • File upload to LocalAI
      • Transcript assembly
      • Chunked summarization
      • Output formatting (e.g., .md with transcript + summary)
    • Runs:
      • Web GUI (Gradio)
      • Celery worker (async processing)
      • Redis (in-container by default)

Quick Start (Web GUI in Docker)

Run the container with your LocalAI and summarizer endpoints:

Then open: http://:7860

Quick Start (CLI)

Basic usage:

  • Transcribe:

    • python3 -m scraibe.cli -f "audio.wav" -o "./output" -of txt
  • Transcribe and summarize:

    • python3 -m scraibe.cli -f "audio.wav" -o "./output" --task transcript_and_summarize

Environment variables must be set to point to your LocalAI and summarizer LLM.

Python API

Example: transcribe only

  • from scraibe import Scraibe

    • client = Scraibe()
    • text = client.transcribe("audio.wav")
    • print(text)

Example: transcribe and summarize

  • from scraibe import Scraibe

    • client = Scraibe()
    • result = client.transcript_and_summarize("audio.wav")
    • transcript = result["transcript"]
    • summary = result["summary"]

You can override endpoints and models via environment variables or constructor parameters if needed.

Command-Line Options

Run:

  • python3 -m scraibe.cli -h

Key options:

  • -f / --audio-files:
    • One or more audio files to process.
  • --task:
    • transcribe (default)
    • transcript_and_summarize
  • -o / --output-directory:
    • Output folder for generated files.
  • -of / --output-format:
    • txt, json, md, html
    • For transcript_and_summarize, output is always saved as .md with:
      • Transcript

      • Summary

Other options (e.g., --language, --num-speakers) are accepted and forwarded where applicable; many legacy Whisper/Pyannote flags are kept for compatibility but ignored.

Docker Usage

ScrAIbe is designed to run in Docker as a client to your LocalAI and summarizer LLM.

Basic run (transcribe via CLI)

  • docker run -it
    -e LOCALAI_API_URL=http://localai:8080
    -v /path/to/audio:/audio
    scraibe:latest
    -f /audio/meeting.wav -o /audio/output -of txt

Basic run (transcribe + summarize via CLI)

  • docker run -it
    -e LOCALAI_API_URL=http://localai:8080
    -e SUMMARIZER_API_URL=http://llm:8080
    -v /path/to/audio:/audio
    scraibe:latest
    -f /audio/meeting.wav -o /audio/output --task transcript_and_summarize

Docker Environment Variables

The following environment variables configure ScrAIbe in Docker.

Transcription / Diarization (LocalAI):

  • LOCALAI_API_URL:
  • LOCALAI_API_KEY:
    • Optional.
    • API key for LocalAI, if configured.
  • LOCALAI_MODEL:
    • Optional (default: vibevoice-diarize).
    • Model name used for transcription/diarization.

Summarization LLM:

  • SUMMARIZER_API_URL:
    • Required when using --task transcript_and_summarize.
    • Base URL of the summarization LLM (OpenAI-compatible /v1/chat/completions).
    • Example: http://llm:8080
  • SUMMARIZER_API_KEY:
    • Optional.
    • API key for the summarization LLM, if required.
  • SUMMARIZER_MODEL:
    • Optional (default: llama-3.1-8b-instruct).
    • Model name used for summarization.

Web GUI and branding:

  • WEBUI_TITLE:
    • Title shown in the web GUI (default: A.P.Strom Transcription).
  • WEBUI_LOGO_URL:

Accent color (UI and emails):

  • EMAIL_ACCENT_COLOR:
    • Accent color used in:
      • Web GUI buttons and accents
      • Email headings, links, and email addresses
    • Default: #7C6DA0

Async processing (Celery + Redis):

  • CELERY_BROKER_URL:
    • Redis broker URL (default: redis://localhost:6379/0).
  • CELERY_RESULT_BACKEND:
    • Redis backend URL (default: redis://localhost:6379/0).
  • SCRAIBE_UPLOAD_DIR:
    • Directory where uploaded audio is stored (default: /tmp/scraibe_uploads).

Email configuration:

  • EMAIL_SMTP_HOST:
    • SMTP server host.
  • EMAIL_SMTP_PORT:
    • SMTP server port (e.g., 587).
  • EMAIL_SMTP_USER:
    • SMTP username.
  • EMAIL_SMTP_PASSWORD:
    • SMTP password.
  • EMAIL_SMTP_USE_TLS:
    • Use TLS (true/false; default: true).
  • EMAIL_FROM_ADDRESS:
  • EMAIL_CONTACT_ADDRESS:
    • Support contact address shown in email templates.
  • EMAIL_LOGO_URL:
    • URL of the logo used in emails (preferred).
  • EMAIL_LOGO_PATH:
    • Fallback local path for email logo (default: /app/src/misc/logo1.png).
  • EMAIL_CSS_PATH:
    • Path to the CSS used in emails (default: /app/src/misc/mail_style.css).

Email subject lines (customizable):

  • EMAIL_SUBJECT_UPLOAD:
    • Subject for upload confirmation email.
    • Default: "ScrAIbe: Your transcription request has been received"
  • EMAIL_SUBJECT_SUCCESS:
    • Subject for transcript-ready email.
    • Default: "ScrAIbe: Your transcript is ready"
  • EMAIL_SUBJECT_ERROR:
    • Subject for error notification email.
    • Default: "ScrAIbe: Error with your transcription request"

Output files (async web GUI):

When a job completes, the user receives:

  • Transcript:
    • .md file
    • .docx file (line-numbered, no cover page)
  • Summary (if requested):
    • .md file
    • .docx file (no line numbering, no cover page)
  • JSON:
    • Structured transcript with diarization and metadata

All of these can also be overridden from the CLI when needed (e.g., --localai-api-url, --summarizer-api-url).

Dependencies

Core runtime dependencies:

  • Python 3.9+
  • httpx
  • numpy
  • tqdm
  • gradio
  • celery[redis]
  • redis
  • python-docx
  • ffmpeg (for audio preprocessing)

No local Whisper, PyTorch, or Pyannote models are required.

Contributing

Contributions are welcome. Please refer to CONTRIBUTING.md for guidelines.

License

This project is licensed under GPL-3.0. See LICENSE for details.

S
Description
No description provided
Readme GPL-3.0 4.5 MiB
Languages
Python 90.4%
HTML 6%
CSS 2.3%
Dockerfile 1%
Makefile 0.3%