Compare commits

..

96 Commits

Author SHA1 Message Date
admin cd0c730abe Ensure WebUI always loads even if MCP/watcher fail
Mirror and run GitLab CI / build (push) Waiting to run
Ruff / ruff (push) Waiting to run
- Wrap MCP server and watcher startup in try/except.
- Log warnings but never block WebUI launch.
2026-06-19 17:50:49 +00:00
admin 2bd6ee1567 Update README with new features (MCP API, watch-folder, improved summaries, DOCX styling, cover pages)
Mirror and run GitLab CI / build (push) Waiting to run
2026-06-19 17:46:54 +00:00
admin 4bc9f82ee7 Test and validate all new modules on dev
Mirror and run GitLab CI / build (push) Waiting to run
Ruff / ruff (push) Waiting to run
- Confirmed MCP server endpoints and /transcribe flow.
- Confirmed watcher audio detection logic.
- Confirmed summarizer prompt loading and env override.
- Confirmed docx_styles markdown-to-DOCX conversion.
- Confirmed docx_cover integration.
- Confirmed email_sender with cover pages and markdown styling.
- Confirmed tasks and __main__ wiring.
2026-06-19 17:37:28 +00:00
admin bdd0a80d8d Add watch-folder mode and wire MCP/watcher into entrypoint
Mirror and run GitLab CI / build (push) Waiting to run
Ruff / ruff (push) Waiting to run
- New watcher.py: polls WATCH_DIR, enqueues transcription+summary via Celery.
- New process_watch_file_task in tasks.py.
- Updated __main__.py: WebUI always runs; MCP and watcher run in parallel when enabled.
2026-06-19 17:18:20 +00:00
admin 7a31be9de5 Improve summary prompt, add markdown-to-DOCX styling, and add cover pages
Mirror and run GitLab CI / build (push) Waiting to run
Ruff / ruff (push) Waiting to run
- Configurable summary prompts via ENV or file; stronger default prompt.
- New docx_styles.py: converts markdown (headings, bullets, bold/italic) to DOCX.
- Updated create_summary_docx to use markdown-aware styling.
- New docx_cover.py: reusable cover page for transcript and summary.
- Cover pages enabled when COVER_PAGE_ENABLED=true.
2026-06-19 17:16:46 +00:00
admin 54414def26 Add MCP-style API server (OpenAPI) alongside WebUI
Mirror and run GitLab CI / build (push) Waiting to run
Ruff / ruff (push) Waiting to run
- New mcp_server.py: FastAPI app for LLMs to upload audio and get transcript JSON.
- Added process_mcp_transcribe_task Celery task.
- Updated __main__.py: WebUI always runs; MCP server runs in parallel when MCP_SERVER_ENABLED=true.
2026-06-19 17:04:44 +00:00
admin 111d1ea18b Set 30 lines per page
Mirror and run GitLab CI / build (push) Waiting to run
Ruff / ruff (push) Waiting to run
2026-06-19 16:18:15 +00:00
admin cb27ba80a1 Increase line length to 64 chars and lines per page to 32
Mirror and run GitLab CI / build (push) Waiting to run
Ruff / ruff (push) Waiting to run
2026-06-19 16:12:20 +00:00
admin 2112b8c7e2 Rewrite transcript DOCX logic for correctness
Mirror and run GitLab CI / build (push) Waiting to run
Ruff / ruff (push) Waiting to run
- Prepare transcript into pages of 29 lines each before writing.
- Each line max 60 chars total (48 content + number + spaces).
- Words preserved (no clipping); wrap at word boundaries.
- Page break after every 29 lines.
- No distinction between logical/visual lines.
2026-06-19 16:07:20 +00:00
admin 49f3cdc407 Fix page breaks: insert after every 29 lines; wrap at 58 chars preserving whole words
Mirror and run GitLab CI / build (push) Waiting to run
Ruff / ruff (push) Waiting to run
- Insert page break after every 29 visual lines.
- Wrap content at 58 characters, keeping whole words together.
- Ensure no text is lost; all transcript text is included.
2026-06-19 15:32:31 +00:00
admin 2c0998579c Ensure page break after line 29 and preserve all transcript text
Mirror and run GitLab CI / build (push) Waiting to run
Ruff / ruff (push) Waiting to run
- Insert page break after every 29 visual lines.
- Wrap long lines instead of truncating to preserve all words.
- No text is dropped; content wraps onto new lines as needed.
2026-06-19 15:25:49 +00:00
admin 327c05ea16 Use single-column layout in web UI with status below submit button
Mirror and run GitLab CI / build (push) Waiting to run
Ruff / ruff (push) Waiting to run
2026-06-19 15:20:18 +00:00
admin dabb5970ba Remove language and number of speakers fields from web GUI
Mirror and run GitLab CI / build (push) Waiting to run
Ruff / ruff (push) Waiting to run
- Drop 'Language (optional)' and 'Number of speakers (optional)' inputs.
- Update submit handler to pass None for both fields.
2026-06-19 15:18:03 +00:00
admin 36b0b6f241 chore: set max line length to 58 chars in Ruff config
Mirror and run GitLab CI / build (push) Waiting to run
2026-06-18 18:10:06 +00:00
admin 6640bc050d feat: add chunked ASR for long audio with env-configurable chunk duration
Mirror and run GitLab CI / build (push) Waiting to run
Ruff / ruff (push) Waiting to run
- Integrate chunking into LocalAI client to avoid GPU OOM on long audio.
- Split long files into overlapping chunks; transcribe each chunk; merge segments with corrected timestamps.
- Auto-enable chunking when audio duration > LOCALAI_MAX_SINGLE_REQUEST_DURATION (default 300s).
- Add env variables:
    LOCALAI_CHUNK_DURATION (default 180)
    LOCALAI_CHUNK_OVERLAP (default 2)
    LOCALAI_MAX_SINGLE_REQUEST_DURATION (default 300)
- Add unit and integration tests for chunking logic.
- Confirmed working end-to-end with vibevoice-cpp-asr on 88-minute file.
2026-06-18 17:46:29 +00:00
admin 59363c5dcd Set content max_chars to 54
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
2026-06-17 17:08:41 +00:00
admin 0e27537a68 Enforce 60-char max per full line including spaces
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
- Reduce content max_chars to 48 so that:
  - line_number (up to 2) + spaces (up to 9) + content (48) <= 60.
2026-06-17 17:03:33 +00:00
admin 0947e91f15 Add extra white space between line number and text
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
- First line of each speaker turn: 2 base + 7 extra spaces.
- Continuation lines: 2 base + 3 extra spaces.
2026-06-17 16:55:31 +00:00
admin 1d447f2836 Center footer page numbers
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
2026-06-17 02:11:56 +00:00
admin 49e607e1e1 Add page numbers to footer: 'X of Y' (bottom left)
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
- Use PAGE and NUMPAGES fields for dynamic page numbering.
- Footer aligned left.
2026-06-17 02:10:42 +00:00
admin bd4393addc Increase max characters per visual line to 60
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
2026-06-17 02:01:35 +00:00
admin f5836d83f3 Add two spaces after line number and reduce max chars to 57
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
- Insert two spaces after the tab between line number and content.
- Reduce max_chars from 58 to 57 to slightly shorten each visual line.
2026-06-17 02:00:08 +00:00
admin b2dce9e048 Set 29 lines per page and fix page break insertion
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
- Use 29 visual lines per page before inserting a page break.
- Use w:pageBreak element for reliable page breaks across editors.
- Restart line numbering at 1 on each new page.
2026-06-16 19:51:24 +00:00
admin 4d9414fee9 Set line spacing to 1.5; page break every 32 lines
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
- Increase line spacing to 1.5 (360 twips).
- Insert page break after every 32 visual lines.
- Restart line numbering at 1 on each new page.
2026-06-16 19:49:01 +00:00
admin d4ed84f68d Set line spacing to 1.2 and 32 lines per page
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
- Increase line spacing to 1.2 (288 twips).
- Restart line numbering at 1 every 32 lines with page break.
2026-06-16 19:42:25 +00:00
admin eb83a37f02 Restart line numbering at 1 every 45 lines with page break
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
- Insert page break after 45 visual lines.
- Reset line counter so each page starts at 1.
- Uses embedded line numbers for consistent behavior across editors.
2026-06-16 19:40:15 +00:00
admin e7aa5ebf25 Ensure first visual line respects 58-char limit including label
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
- Trim first line at word boundary if label + content > 58.
- Subsequent lines continue at full width.
2026-06-16 19:26:30 +00:00
admin 1265a664cd Clip visual lines at 58 characters
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
2026-06-16 19:23:09 +00:00
admin 83f3c09218 Make line numbers reflect visual lines, not speaker turns
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
- Split long lines into multiple visual lines at word boundaries.
- Each visual line is its own paragraph with its own embedded line number.
- Continuous numbering across speakers and pages.
- Portable across Word, LibreOffice, Google Docs.
2026-06-16 19:21:04 +00:00
admin d828a91bf3 Use embedded line numbers instead of built-in line numbering
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
- Remove w:lnNumType; line numbers are now plain text in each paragraph.
- Ensures first line is always '1' across Word, LibreOffice, Google Docs.
- Each paragraph: line number + tab + content.
2026-06-16 19:15:47 +00:00
admin 670c6d3e2b Fix first-page line numbering off-by-one in transcript DOCX
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
- Remove docGrid element to prevent phantom grid-based line offset.
- Ensure exactly one lnNumType element (no duplicates).
- First visible line on page 1 now correctly numbered as 1.
2026-06-16 19:09:26 +00:00
admin f20102d564 Fix transcript DOCX line numbering (spacing and column fixes)
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
- Ensure single column layout (cols num='1')
- Set explicit single line spacing (before/after=0, line=240 twips)
- Prevents Word from counting extra lines due to spacing/columns
2026-06-16 18:08:46 +00:00
admin 0e6bc53cf8 Fix duplicate pgMar causing line numbering issue
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
- Update existing pgMar instead of appending a second one
- Prevents Word from miscounting lines on first page
2026-06-16 18:03:39 +00:00
admin c43076efd4 Increase timeouts for large-file transcription
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
- LocalAI client timeout: 600s -> 3600s
- Summarizer timeout: 600s -> 3600s
- Add task_time_limit=14400s (4h) and soft_time_limit=13500s to transcription task
2026-06-16 17:18:09 +00:00
admin 03d66219d9 Rebuild transcript DOCX generation flow
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
- Clean, single-pass implementation for transcript and summary DOCX
- Explicit margins, font, line numbering per OOXML spec
- Disable docGrid to prevent off-by-one line numbering
- Ensure first content line is line 1
2026-06-16 16:54:48 +00:00
admin 0c0e52dfb8 Fix syntax error in speaker identification prompt string
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
2026-06-16 16:05:02 +00:00
admin 604bfa3f41 Ensure identified speaker names/roles are printed in ALL CAPS
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
2026-06-16 16:02:55 +00:00
admin 8ff473f3e6 Fix transcript DOCX line numbering starting at 2 (docGrid)
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
- Disable document grid (w:type='none') when enabling line numbering
- Prevents Word from treating an empty grid line as line 1
2026-06-16 16:00:09 +00:00
admin 0b3f737e5b Update speaker identification to use real names or roles instead of random names
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
2026-06-16 15:49:39 +00:00
admin 598f8630de Fix transcript DOCX line numbering (invalid 'eachPage' value)
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
- Replace invalid 'eachPage' with valid 'newPage' for w:lnNumType restart attribute
- This ensures Word starts line numbering at 1 on the first page
2026-06-16 15:41:12 +00:00
admin 7fac0e7d9c Fix transcript DOCX line numbering starting at line 2 (robust)
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
- Fully clear default paragraphs from document body so Word's line numbering starts at the first real line
2026-06-15 16:26:28 +00:00
admin 5dd56a3368 Fix missing subject on emails with attachments
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
- Ensure Subject header is set on the outermost MIME part when attachments are present
- Restructure send_email to use multipart/mixed as root with headers when attachments exist
2026-06-15 15:03:50 +00:00
admin 7364d572d5 Fix transcript DOCX line numbering starting at line 2
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
- Remove initial empty paragraph so Word's line numbering starts at first real line
2026-06-15 14:54:32 +00:00
admin d51b006a19 Fix Gradio launch error and adjust upload template
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
- Remove unsupported 'enable_api' argument from app.launch()
- Hide API link via CSS instead
- Remove queue-position paragraph from upload_notification_template.html
2026-06-15 04:06:55 +00:00
admin ea5a0752df Update README to reflect current behavior
Mirror and run GitLab CI / build (push) Has been cancelled
- Remove PDF-related references
- Clarify DOCX format: no cover pages, transcript line-numbered
- Align output files and env vars with current implementation
2026-06-15 03:58:56 +00:00
admin b0a1bc059b Simplify email subject handling and remove duplicate functions
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
- Remove send_success_email/send_error_email from email_sender.py
- Centralize subject logic in tasks.py using _get_subject()
- Use EMAIL_SUBJECT_SUCCESS and EMAIL_SUBJECT_ERROR with clear defaults
- Ensure subject is always set and logged before sending
2026-06-15 03:52:19 +00:00
admin e27e5b8522 Revert PDF generation; simplify to DOCX + MD + JSON only
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
- Remove PDF helpers, LibreOffice, PyPDF2, reportlab
- Transcript DOCX: standalone, no cover page, with line numbering
- Summary DOCX: standalone, no cover page, no line numbering
- Attachments:
  - Transcribe: JSON, transcript MD, transcript DOCX
  - Transcribe & Summarize: JSON, transcript MD, transcript DOCX, summary MD, summary DOCX
2026-06-15 03:38:12 +00:00
admin 6233a41f61 Remove Gradio API page and 'Use via API' link from web UI
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
- Set enable_api=False in app.launch()
- Hide API-related links via CSS
2026-06-15 03:26:34 +00:00
admin 237bd4b37c Refactor PDF generation and attachment logic
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
- Generate PDFs by:
  - Creating individual .docx components (cover, transcript, summary)
  - Converting each .docx to PDF
  - Merging PDFs in correct order
  - Adding page numbers to final PDFs

- Transcribe & Summarize:
  - Attach: JSON, transcript MD, summary MD, TRANSCRIPT.pdf, SUMMARY.pdf, COMBINED.pdf

- Transcribe only:
  - Attach: JSON, transcript MD, TRANSCRIPT.pdf

- Ensure transcript line numbering is isolated to its own .docx before PDF merge
2026-06-15 03:16:53 +00:00
admin 7ece1a50c2 Update Web UI: rename option, increase title font, default identify speakers
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
- Rename 'Transcript & Summarize' to 'Transcribe & summarize'
- Increase title font size to 60px via CSS
- Set 'Identify speakers' checkbox to default selected
2026-06-15 03:02:19 +00:00
admin 46fbcf80af Ensure success and error emails always have a subject
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
- Use EMAIL_SUBJECT_SUCCESS env var for success emails
- Use EMAIL_SUBJECT_ERROR env var for error emails
- Provide safe defaults if env vars are missing or blank
- Add final guard in send_email() to prevent blank subjects
2026-06-15 02:57:09 +00:00
admin 42a155aeaa Add PDF-based document generation with LibreOffice; fix line numbering and margins
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
- Add LibreOffice Writer and DejaVu fonts to Dockerfile for PDF generation
- Add PyPDF2 and reportlab to requirements.txt
- Refactor email_sender.py:
  - Enforce 1-inch margins on all sides
  - Isolate line numbering to transcript section only
  - Add generate_pdf_documents() to build:
    - TRANSCRIPT.pdf (cover + transcript)
    - SUMMARY.pdf (cover + summary)
    - COMBINED.pdf (transcript cover + summary + TRANSCRIPT header + transcript)
  - Add page numbers (bottom-right) to all PDFs via reportlab
- Update tasks.py:
  - Use generate_pdf_documents() after creating DOCX files
  - Attach source JSON, MD files, and compiled PDFs in success email
- Add test_docx_generation.py for transcript/summary/combined DOCX testing
2026-06-15 02:19:17 +00:00
admin b0a23b32e1 Fix page numbering: correct field insertion for PAGE and NUMPAGES
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
2026-06-14 23:08:51 +00:00
admin 2e2bc3fb29 Fix page numbering: use correct python-docx field insertion for PAGE and NUMPAGES
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
2026-06-14 23:03:12 +00:00
admin 2f9299389b Fix line numbering: only transcript pages; ensure page numbering fields are set correctly
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
2026-06-14 22:25:26 +00:00
admin e0d2fd6963 Fix combined .docx: line numbering only for transcript, centered cover pages, correct date format, reliable page numbering
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
2026-06-14 22:07:36 +00:00
admin 4651c5f8b2 Ensure success email subject is never blank; add final guard in send_email
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
2026-06-14 21:56:04 +00:00
admin 6c11a8f19a Add 'Page X of Y' footer to all .docx files
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
2026-06-14 21:51:12 +00:00
admin 2a2a5e024c Update combined .docx order: cover page, page break, summary, page break, transcript
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
2026-06-14 21:47:36 +00:00
admin 7adca3d921 Add cover pages to transcript/summary .docx with AI-generated descriptions; include combined .docx when both requested
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
2026-06-14 21:33:15 +00:00
admin efb34dd9ff Translate markdown headings to WYSIWYG styles in summary .docx
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
2026-06-14 21:18:42 +00:00
admin 11e5309a8e End underline at colon in transcript .docx, not over following space
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
2026-06-14 21:14:45 +00:00
admin a3ca1f3505 Ensure success email subject is wired to EMAIL_SUBJECT_SUCCESS and never blank
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
2026-06-14 21:11:53 +00:00
admin 154cac6c7b Ensure success email subject is wired to EMAIL_SUBJECT_SUCCESS and never blank
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
2026-06-14 21:09:25 +00:00
admin 18f4a4e8de Reduce logo to 75px desktop / 50px mobile; increase title font by 3pt
Mirror and run GitLab CI / build (push) Has been cancelled
2026-06-14 21:05:19 +00:00
admin 2f304e3ed1 Fix header.html template escaping so title and logo render correctly
Mirror and run GitLab CI / build (push) Has been cancelled
2026-06-14 21:02:59 +00:00
admin fd94e2daa0 Center logo above title in header for desktop and mobile
Mirror and run GitLab CI / build (push) Has been cancelled
2026-06-14 21:00:04 +00:00
admin e74bc04cb3 Show timestamp and speaker name on same line as text in transcript .docx
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
2026-06-14 20:57:32 +00:00
admin c792fa17e8 Fix .logo-container: remove flex, limit to 75px
Mirror and run GitLab CI / build (push) Has been cancelled
2026-06-14 20:54:19 +00:00
admin e55f36a131 Improve queue position accuracy and wording in upload email
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
2026-06-14 20:51:34 +00:00
admin 572587bb85 Fix syntax error in tasks.py speaker identification prompt
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
2026-06-14 20:43:37 +00:00
admin cfc38b21ed Reduce header logo size to 25% while keeping responsive layout 2026-06-14 20:42:31 +00:00
ScrAIbe Admin 1582b90ddb Fix header template escaping; ensure title and logo render from env vars
Mirror and run GitLab CI / build (push) Has been cancelled
2026-06-14 20:11:28 +00:00
ScrAIbe Admin 9ec4c4ccba Restore title and logo in header with responsive layout
Mirror and run GitLab CI / build (push) Has been cancelled
2026-06-14 19:48:57 +00:00
ScrAIbe Admin 8ecae8f648 Optimize Web UI for mobile: fix logo overlap and responsive layout
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
2026-06-14 19:00:40 +00:00
ScrAIbe Admin 49e999f0ee Add Identify speakers option: AI infers names and replaces Speaker IDs in transcript
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
2026-06-14 18:05:37 +00:00
ScrAIbe Admin eb9b2f9126 Fix 'Section' object has no attribute '_element' in create_transcript_docx
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
2026-06-14 17:53:01 +00:00
admin 50c7ec90a0 Always send numeric queue position in upload notification email
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
2026-06-14 17:00:06 +00:00
admin f7c9c70bfc Robustly wire email subjects from env vars with safe fallbacks and logging
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
2026-06-14 16:31:03 +00:00
admin a8f48b9e58 Use structured filenames and formal DOCX transcript styling
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
2026-06-14 16:20:10 +00:00
admin 2dce9b43c9 Use email logo as subtle watermark instead of standalone image
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
2026-06-14 16:02:56 +00:00
admin 1dea51f1f9 Add cleanup of temp and upload files after transcription job
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
2026-06-14 15:55:44 +00:00
admin 63cd620b79 Add accent color, email subjects, MD+DOCX outputs, update README
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
2026-06-14 15:49:11 +00:00
admin dc20e9cff0 Use URL-based logos, env-based WebGUI title, apstrom.ca link, update README
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
2026-06-14 15:35:16 +00:00
admin fb1dc3324d Use logo1.png in emails, inline mail_style.css, attach summary as MD
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
2026-06-14 15:24:08 +00:00
admin 917a7b8f2f Make email template placeholders configurable via environment variables
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
2026-06-14 15:11:53 +00:00
admin 85cdd9216a Use firm email templates, logo, and header/footer in async UI
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
2026-06-14 14:54:39 +00:00
admin 2803c81b44 Implement async processing with Celery, Redis, and queue-based email notifications
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
2026-06-14 14:38:10 +00:00
admin b9d25a39dd Use verbose_json diarization, add JSON+TXT email feature
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
2026-06-14 05:36:45 +00:00
admin f6db48b1d0 Fix Gradio 6.0 compatibility: remove show_api, move css to launch()
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
2026-06-14 05:18:28 +00:00
admin 37d30e0ee2 Ensure Docker container always starts Web GUI (not CLI)
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
2026-06-14 05:11:38 +00:00
admin d854d498cd Integrate custom Web GUI assets, templates, and config.yaml; adapt for LocalAI
Mirror and run GitLab CI / build (push) Has been cancelled
2026-06-13 21:48:35 +00:00
admin 1eb88d27ba Fix verbose double-pass bug in CLI/autotranscript; improve logging
Ruff / ruff (push) Has been cancelled
Mirror and run GitLab CI / build (push) Has been cancelled
2026-06-13 17:51:55 +00:00
admin 2ea46ada42 Add structured logging for Docker; support LOG_LEVEL env and --log-level
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
2026-06-13 17:46:25 +00:00
admin 47b3304297 Fix RuntimeWarning: remove cli import from __init__.py
Mirror and run GitLab CI / build (push) Has been cancelled
Ruff / ruff (push) Has been cancelled
2026-06-13 17:38:32 +00:00
admin 4822ef28e8 Update README for LocalAI-backed architecture and summarization
Mirror and run GitLab CI / build (push) Has been cancelled
2026-06-13 17:28:10 +00:00
35 changed files with 4530 additions and 262 deletions
+29 -13
View File
@@ -5,39 +5,55 @@ FROM python:3.11-slim
LABEL maintainer="Jacob Schmieder" LABEL maintainer="Jacob Schmieder"
LABEL email="Jacob.Schmieder@dbfz.de" LABEL email="Jacob.Schmieder@dbfz.de"
LABEL version="0.1.1.dev" LABEL version="0.1.1.dev"
LABEL description="Scraibe: LocalAI-backed transcription and diarization client with summarization. \ LABEL description="Scraibe: LocalAI-backed transcription and diarization client with summarization and custom Web GUI. \
Sends audio to a LocalAI server running vibevoice.cpp and uses a second LLM for summarization." Sends audio to a LocalAI server running vibevoice.cpp and uses a second LLM for summarization."
LABEL url="https://github.com/JSchmie/ScrAIbe" LABEL url="https://git.optimex.systems/admin/scribe"
# Install system dependencies (ffmpeg required) # Install system dependencies (ffmpeg, redis)
RUN apt update -y && \ RUN apt update -y && \
apt install -y --no-install-recommends ffmpeg && \ apt install -y --no-install-recommends ffmpeg redis-server && \
apt clean && \ apt clean && \
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
# Working directory # Working directory
WORKDIR /app WORKDIR /app/src
# Environment variables for LocalAI (transcription/diarization) # Environment variables for LocalAI (transcription/diarization)
# Set these via docker run -e or docker-compose
ENV LOCALAI_API_URL=http://localhost:8080 ENV LOCALAI_API_URL=http://localhost:8080
ENV LOCALAI_API_KEY= ENV LOCALAI_API_KEY=
ENV LOCALAI_MODEL=vibevoice-diarize ENV LOCALAI_MODEL=vibevoice-cpp-asr
# Environment variables for Summarizer LLM # Environment variables for Summarizer LLM
ENV SUMMARIZER_API_URL=http://localhost:8080 ENV SUMMARIZER_API_URL=http://localhost:8080
ENV SUMMARIZER_API_KEY= ENV SUMMARIZER_API_KEY=
ENV SUMMARIZER_MODEL=llama-3.1-8b-instruct ENV SUMMARIZER_MODEL=qwen3-14b
# Gradio / Web GUI
ENV GRADIO_SERVER_NAME=0.0.0.0
# Async processing (Celery + Redis)
ENV CELERY_BROKER_URL=redis://localhost:6379/0
ENV CELERY_RESULT_BACKEND=redis://localhost:6379/0
ENV SCRAIBE_UPLOAD_DIR=/tmp/scraibe_uploads
# Email and template configuration
ENV EMAIL_CONTACT_ADDRESS=support@example.com
ENV EMAIL_CSS_PATH=
ENV SCRAIBE_TEMPLATES_DIR=/app/src/misc
ENV SCRABIE_VERSION=0.1.1.dev
# Copy and install Python dependencies # Copy and install Python dependencies
COPY requirements.txt /app/requirements.txt COPY requirements.txt /app/src/requirements.txt
RUN pip install --no-cache-dir -r requirements.txt RUN pip install --no-cache-dir -r requirements.txt
# Copy application code # Copy application code
COPY scraibe /app/scraibe COPY scraibe /app/src/scraibe
# Expose port (if UI is served) # Copy custom Web GUI assets (header, footer, templates, logos, config)
COPY misc /app/src/misc
# Expose ports
EXPOSE 7860 EXPOSE 7860
# Run the application # Run the Web GUI and Celery worker (with Redis) by default
ENTRYPOINT ["python3", "-m", "scraibe.cli"] CMD ["/bin/bash", "-c", "redis-server --daemonize yes && celery -A scraibe.celery_app worker -Q transcription -l info & python3 -m scraibe"]
+369 -172
View File
@@ -1,173 +1,370 @@
# `ScrAIbe: Streamlined Conversation Recording with Automated Intelligence Based Environment` 🎙️🧠 # ScrAIbe LocalAI-Backed Transcription and Summarization
ScrAIbe is a transcription and summarization service that:
- Sends audio to a LocalAI server running vibevoice.cpp for transcription and speaker diarization.
- Optionally uses a second LLM to generate a structured summary.
- Provides:
- A web GUI for uploading audio and receiving transcripts via email.
- A CLI and Python API for direct integration.
- An MCP-style HTTP API (OpenAPI) for LLMs and external systems.
- A watch-folder mode for automatic transcription, summarization, and email delivery.
No local speech models or heavy dependencies are required. ScrAIbe is designed as a thin client in front of your own AI services.
For more information: https://apstrom.ca
## Features
- Transcription with speaker diarization via LocalAI:
- Uses the /v1/audio/diarization endpoint.
- Compatible with vibevoice.cpp and other diarization-capable backends.
- Optional AI-powered summarization:
- Task: transcript_and_summarize
- Highlights:
- Main topics and discussion points
- Key decisions and outcomes
- Action items and responsibilities
- Open issues and risks
- Improved, configurable summary prompts (via environment or file).
- Async web GUI (always enabled):
- Upload audio via browser.
- Jobs are queued and processed in the background (Celery + Redis).
- Emails:
- Immediate confirmation with queue position.
- Final transcript (MD + DOCX + JSON) when ready.
- Summary as MD + DOCX (if requested).
- Error notification if processing fails.
- MCP-style HTTP API (optional):
- Exposes an OpenAPI-compliant REST endpoint for external LLMs or services.
- Allows:
- Audio upload for transcription.
- Job status checks.
- Retrieval of transcript JSON (no summary).
- Enabled via MCP_SERVER_ENABLED=true.
- Watch-folder mode (optional):
- Monitors a directory for audio files.
- For each file:
- Transcribes and summarizes.
- Emails transcript + summary + JSON to a configured address.
- Deletes the source file after successful processing (configurable).
- Enabled via WATCH_ENABLED=true.
- File formats:
- Transcript:
- .md
- .docx (line-numbered, 30 lines per page, optional cover page)
- Summary (if requested):
- .md
- .docx (markdown-aware WYSIWYG styling, optional cover page)
- Full structured output: .json
- Customizable branding:
- Web GUI title, logo, and accent color via environment variables.
- Email logo, accent color, and subject lines via environment variables.
- Optional cover pages for transcript and summary DOCX.
- CLI and Python API:
- Simple command-line interface.
- Drop-in Scraibe class for integration into other tools.
- Docker-ready:
- Lightweight container, configured via environment variables.
## Architecture
- LocalAI (vibevoice.cpp):
- Handles audio → transcript + speaker segments.
- Summarizer LLM (OpenAI-compatible chat endpoint):
- Handles transcript → structured summary.
- ScrAIbe:
- Orchestrates:
- File upload to LocalAI
- Transcript assembly
- Chunked summarization
- Output formatting (e.g., .md with transcript + summary)
- Runs:
- Web GUI (Gradio) always enabled
- MCP-style HTTP API (FastAPI) optional
- Watch-folder mode optional
- Celery worker (async processing)
- Redis (in-container by default)
## Quick Start (Web GUI in Docker)
Run the container with your LocalAI and summarizer endpoints:
- docker run -d \
-p 7860:7860 \
-e LOCALAI_API_URL=http://localai:8080 \
-e SUMMARIZER_API_URL=http://llm:8080 \
-e EMAIL_SMTP_HOST=smtp.your-domain.com \
-e EMAIL_SMTP_PORT=587 \
-e EMAIL_SMTP_USER=transcribe@your-domain.com \
-e EMAIL_SMTP_PASSWORD=your_password \
-e EMAIL_FROM_ADDRESS="ScrAIbe <transcribe@your-domain.com>" \
-e EMAIL_CONTACT_ADDRESS=support@your-domain.com \
-e WEBUI_TITLE="Your Transcription Service" \
-e WEBUI_LOGO_URL="https://your-domain.com/logo.png" \
-e EMAIL_LOGO_URL="https://your-domain.com/logo.png" \
-e EMAIL_ACCENT_COLOR="#7C6DA0" \
scraibe:latest
Then open: http://<host>:7860
## Quick Start (CLI)
Basic usage:
- Transcribe:
- python3 -m scraibe.cli -f "audio.wav" -o "./output" -of txt
- Transcribe and summarize:
- python3 -m scraibe.cli -f "audio.wav" -o "./output" --task transcript_and_summarize
Environment variables must be set to point to your LocalAI and summarizer LLM.
## Python API
Example: transcribe only
- from scraibe import Scraibe
- client = Scraibe()
- text = client.transcribe("audio.wav")
- print(text)
Example: transcribe and summarize
- from scraibe import Scraibe
- client = Scraibe()
- result = client.transcript_and_summarize("audio.wav")
- transcript = result["transcript"]
- summary = result["summary"]
You can override endpoints and models via environment variables or constructor parameters if needed.
## Command-Line Options
Run:
- python3 -m scraibe.cli -h
Key options:
- -f / --audio-files:
- One or more audio files to process.
- --task:
- transcribe (default)
- transcript_and_summarize
- -o / --output-directory:
- Output folder for generated files.
- -of / --output-format:
- txt, json, md, html
- For transcript_and_summarize, output is always saved as .md with:
- # Transcript
- # Summary
Other options (e.g., --language, --num-speakers) are accepted and forwarded where applicable; many legacy Whisper/Pyannote flags are kept for compatibility but ignored.
## Docker Usage
ScrAIbe is designed to run in Docker as a client to your LocalAI and summarizer LLM.
### Basic run (transcribe via CLI)
- docker run -it \
-e LOCALAI_API_URL=http://localai:8080 \
-v /path/to/audio:/audio \
scraibe:latest \
-f /audio/meeting.wav -o /audio/output -of txt
### Basic run (transcribe + summarize via CLI)
- docker run -it \
-e LOCALAI_API_URL=http://localai:8080 \
-e SUMMARIZER_API_URL=http://llm:8080 \
-v /path/to/audio:/audio \
scraibe:latest \
-f /audio/meeting.wav -o /audio/output --task transcript_and_summarize
### Docker Environment Variables
The following environment variables configure ScrAIbe in Docker.
Transcription / Diarization (LocalAI):
- LOCALAI_API_URL:
- Required.
- Base URL of the LocalAI server.
- Example: http://localai:8080
- LOCALAI_API_KEY:
- Optional.
- API key for LocalAI, if configured.
- LOCALAI_MODEL:
- Optional (default: vibevoice-diarize).
- Model name used for transcription/diarization.
Summarization LLM:
Welcome to `ScrAIbe`, a state-of-the-art, [PyTorch](https://pytorch.org/) based multilingual speech-to-text framework designed to generate fully automated transcriptions. - SUMMARIZER_API_URL:
- Required when using --task transcript_and_summarize.
Beyond transcription, ScrAIbe supports advanced functions such as speaker diarization and speaker recognition. 🚀 - Base URL of the summarization LLM (OpenAI-compatible /v1/chat/completions).
- Example: http://llm:8080
Designed as a comprehensive AI toolkit, it uses multiple powerful AI models: - SUMMARIZER_API_KEY:
- Optional.
- **[Whisper](https://github.com/openai/whisper)**: A general-purpose speech recognition model. - API key for the summarization LLM, if required.
- **[WhisperX](https://github.com/m-bain/whisperX)**: A faster, quantized version of Whisper for enhanced performance on CPU. ⚡ - SUMMARIZER_MODEL:
- **[Pyannote-Audio](https://github.com/pyannote/pyannote-audio)**: An open-source toolkit for speaker diarization. 🗣️ - Optional (default: llama-3.1-8b-instruct).
- Model name used for summarization.
The framework utilizes a PyanNet-inspired pipeline, with the `Pyannote` library for speaker diarization and `VoxCeleb` for speaker embedding.
Web GUI and branding:
During post-diarization, each audio segment is processed by the OpenAI `Whisper` model in a transformer encoder-decoder structure. Initially, a CNN mitigates noise and enhances speech. Before transcription, `VoxLingua` identifies the language segment, facilitating Whisper's role in both transcription and text translation. 🌍✨
- WEBUI_TITLE:
The following graphic illustrates the whole pipeline: - Title shown in the web GUI (default: A.P.Strom Transcription).
- WEBUI_LOGO_URL:
<div style="text-align:center;"> - URL of the logo displayed in the web GUI header.
<img src="./Pictures/pipeline.png#gh-dark-mode-only" style="width: 60%;" /> - Example: https://your-domain.com/logo.png
<img src="./Pictures/pipeline_light.png#gh-light-mode-only" style="width: 60%;" />
</div> Accent color (UI and emails):
## Getting Started 🚀 - EMAIL_ACCENT_COLOR:
- Accent color used in:
### Prerequisites - Web GUI buttons and accents
- Email headings, links, and email addresses
Before installing ScrAIbe, ensure you have the following prerequisites: - Default: #7C6DA0
- **Python**: Version 3.9 or later. MCP-style HTTP API:
- **PyTorch**: Version 2.0 or later.
- **CUDA**: A compatible version with your PyTorch Version if you want to use GPU acceleration. - MCP_SERVER_ENABLED:
- Enable MCP-style HTTP API (default: false).
**Note:** PyTorch should be automatically installed with the pip installer. However, if you encounter any issues, you should consider installing it manually by following the instructions on the [PyTorch website](https://pytorch.org/get-started/locally/). - Values: true/false.
- MCP_SERVER_HOST:
### Install ScrAIbe - Bind address (default: 0.0.0.0).
- MCP_SERVER_PORT:
Install ScrAIbe on your local machine with ease using PyPI. - Port (default: 8000).
- MCP_USE_CELERY:
```bash - Use Celery for async transcription (default: true).
pip install scraibe - If false, transcription runs in-process.
```
Watch-folder mode:
If you want to install the development version, you can do so by installing it from GitHub:
- WATCH_ENABLED:
```bash - Enable watch-folder mode (default: false).
pip install git+https://github.com/JSchmie/ScrAIbe.git@develop - Values: true/false.
``` - WATCH_DIR:
- Directory to monitor for audio files (required if WATCH_ENABLED=true).
or from PyPI using our latest pre-release: - WATCH_EMAIL_TO:
- Email address to send transcript and summary (required if WATCH_ENABLED=true).
```bash - WATCH_POLL_INTERVAL:
pip install --pre scraibe - Seconds between scans (default: 10).
``` - WATCH_DELETE_ON_SUCCESS:
- Delete source file after successful processing (default: true).
Get started with ScrAIbe today and experience seamless, automated transcription and diarization.
Async processing (Celery + Redis):
## Usage
- CELERY_BROKER_URL:
We've developed ScrAIbe with several access points to cater to diverse user needs. - Redis broker URL (default: redis://localhost:6379/0).
- CELERY_RESULT_BACKEND:
### Python Usage - Redis backend URL (default: redis://localhost:6379/0).
- SCRAIBE_UPLOAD_DIR:
Gain full control over the functionalities as well as process customization. - Directory where uploaded audio is stored (default: /tmp/scraibe_uploads).
```python Email configuration:
from scraibe import Scraibe
- EMAIL_SMTP_HOST:
model = Scraibe() - SMTP server host.
- EMAIL_SMTP_PORT:
text = model.autotranscribe("audio.wav") - SMTP server port (e.g., 587).
- EMAIL_SMTP_USER:
print(f"Transcription: \n{text}") - SMTP username.
``` - EMAIL_SMTP_PASSWORD:
- SMTP password.
The `Scraibe` class ensures the models are properly loaded. You can customize the models with various keywords: - EMAIL_SMTP_USE_TLS:
- Use TLS (true/false; default: true).
- **Whisper Models**: Use the `whisper_model` keyword to specify models like `tiny`, `base`, `small`, `medium`, or `large` (`large-v2`, `large-v3`) depending on your accuracy and speed needs. - EMAIL_FROM_ADDRESS:
- **Pyannote Diarization Model**: Use the `dia_model` keyword to change the diarization model. - Sender address (e.g., "ScrAIbe <transcribe@your-domain.com>").
- **WhisperX**: Set the `whisper_type` to `"whisperX"` for enhanced performance on CPU and use their enhanced models. (Model names are the same) - EMAIL_CONTACT_ADDRESS:
- **Keyword Arguments**: A variety of different `kwargs` are available: - Support contact address shown in email templates.
- `use_auth_token`: Pass a Hugging Face token to the Pyannote backend if you want to use one of the models hosted on their Hugging Face. - EMAIL_LOGO_URL:
- `verbose`: Enable this to add an additional level of verbosity. - URL of the logo used in emails (preferred).
- EMAIL_LOGO_PATH:
In general, you should be able to input any `kwargs` that you can input in the original Whisper (WhisperX) and Pyannote Python APIs. - Fallback local path for email logo (default: /app/src/misc/logo1.png).
- EMAIL_CSS_PATH:
As input, `autotranscribe` accepts every format compatible with [FFmpeg](https://ffmpeg.org/ffmpeg-formats.html). Examples include `.mp4`, `.mp3`, `.wav`, `.ogg`, `.flac`, and many more. - Path to the CSS used in emails (default: /app/src/misc/mail_style.css).
To further control the pipeline of `ScrAIbe`, you can pass almost any keyword argument that is accepted by `Whisper` or `Pyannote`. For more options, refer to the documentation of these frameworks, as their keywords are likely to work here as well. Email subject lines (customizable):
Here are some examples regarding `diarization` (which relies on the `pyannote` pipeline): - EMAIL_SUBJECT_UPLOAD:
- Subject for upload confirmation email.
- `num_speakers`: Number of speakers in the audio file - Default: "ScrAIbe: Your transcription request has been received"
- `min_speakers`: Minimum number of speakers in the audio file - EMAIL_SUBJECT_SUCCESS:
- `max_speakers`: Maximum number of speakers in the audio file - Subject for transcript-ready email.
- Default: "ScrAIbe: Your transcript is ready"
Then there are arguments for the transcription process, which uses the "Whisper" model: - EMAIL_SUBJECT_ERROR:
- Subject for error notification email.
- `language`: Specify the language ([list of supported languages](https://github.com/openai/whisper/blob/main/language-breakdown.svg)) - Default: "ScrAIbe: Error with your transcription request"
- `task`: Can be either `transcribe` or `translate`. If `translate` is selected, the transcribed audio will be translated to English.
Summary prompt customization:
For example:
- SUMMARY_PROMPT_CHUNK:
```python - Override prompt used for each transcript chunk.
text = model.autotranscribe("audio.wav", language="german", num_speakers = 2) - SUMMARY_PROMPT_COMBINED:
``` - Override prompt used for the final combined summary.
- SUMMARY_PROMPT_FILE:
`Scraibe` also contains the option to just do a transcription: - Path to a file with prompts in sections:
- [chunk]
```python - [combined]
transcription = model.transcribe("audio.wav")
``` DOCX and cover pages:
or just do a diarization: - COVER_PAGE_ENABLED:
- Add a cover page to transcript and summary DOCX files (default: false).
```python - COVER_PAGE_ORGANIZATION:
diarization = model.diarization("audio.wav") - Organization name shown on the cover page.
``` - COVER_PAGE_TITLE_PREFIX:
- Title prefix (e.g., "TRANSCRIPT" or "SUMMARY").
Start exploring the powerful features of ScrAIbe and customize it to fit your specific transcription and diarization needs! - COVER_PAGE_LOGO_URL:
- Logo URL to include on the cover page.
### Command-line usage - COVER_PAGE_LOGO_PATH:
- Local logo path to include on the cover page.
Next to the Pyhton interface, you can also run ScrAIbe using the command-line interface:
Output files (async web GUI and watch-folder mode):
```bash
scraibe -f "audio.wav" --language "german" --num_speakers 2 When a job completes, the user receives:
```
- Transcript:
For the full list of options, run: - .md file
- .docx file (line-numbered, 30 lines per page, optional cover page)
```bash - Summary (if requested):
scraibe -h - .md file
``` - .docx file (markdown-aware styling, optional cover page)
- JSON:
This will display a comprehensive list of all command-line options, allowing you to tailor ScrAIbes functionality to your specific needs. - Structured transcript with diarization and metadata
## Gradio App 🌐 All of these can also be overridden from the CLI when needed (e.g., --localai-api-url, --summarizer-api-url).
The Gradio App is now part of ScrAIbe-WebUI! This user-friendly interface enables you to run the model without any coding knowledge. You can easily run the app in your browser and upload your audio files, or make the framework available on your network and run it on your local machine. 🚀 ## Dependencies
All functionalities previously available in the Gradio App are now part of the ScrAIbe-WebUI. For more information and detailed instructions, visit the [ScrAIbe-WebUI GitHub repository](https://github.com/JSchmie/ScrAIbe-WebUI). Core runtime dependencies:
## Docker Container 🐳 - Python 3.9+
- httpx
ScrAIbe's Docker containers have also moved to ScrAIbe-WebUI! This option is especially useful if you want to run the model on a server or if you would like to use the GPU without dealing with CUDA. - numpy
- tqdm
All Docker container functionalities are now part of ScrAIbe-WebUI. For more information and detailed instructions on how to use the Docker containers, please visit the [ScrAIbe-WebUI GitHub repository](https://github.com/JSchmie/ScrAIbe-WebUI). - gradio
- celery[redis]
--- - redis
- python-docx
With these changes, ScrAIbe focuses on its core functionalities while the enhanced Gradio App and related Docker containers are now part of ScrAIbe-WebUI. Enjoy a more streamlined and powerful transcription experience! 🎉 - fastapi
- uvicorn
## Documentation 📚 - ffmpeg (for audio preprocessing)
For comprehensive guides, detailed instructions, and advanced usage tips, visit our [documentation page](https://jschmie.github.io/ScrAIbe/). Here, you will find everything you need to make the most out of ScrAIbe. No local Whisper, PyTorch, or Pyannote models are required.
### Contributions 🤝 ## Contributing
We warmly welcome contributions from the community! Whether youre fixing bugs, adding new features, or improving documentation, your help is invaluable. Please see our [Contributing Guidelines](./CONTRIBUTING.md) for more information on how to get involved and make your mark on ScrAIbe-WebUI. Contributions are welcome. Please refer to CONTRIBUTING.md for guidelines.
## License
### License 📜
This project is licensed under GPL-3.0. See LICENSE for details.
ScrAIbe-WebUI is proudly open source and licensed under the GPL-3.0 license. This promotes a collaborative and transparent development process. For more details, see the [LICENSE](./LICENSE) file in this repository.
## Acknowledgments
Special thanks go to the [KIDA](https://www.kida-bmel.de/) project and the [BMEL (Bundesministerium für Ernährung und Landwirtschaft)](https://www.bmel.de/EN/Home/home_node.html), especially to the AI Consultancy Team.
---
Join us in making ScrAIbe even better! 🚀
+101
View File
@@ -0,0 +1,101 @@
## Custom configuration for A.P.Strom Transcription (LocalAI-backed)
## Lines that start with ## are comment lines.
interface_type: async # async or simple (one does transcriptions, requires Email setup)
launch:
## Gradio launch options (if using WebUI)
# server_port: null
# server_name: "A.P.Strom Transcription"
# inline: false
# inbrowser: false
# share: false
# debug: false
max_threads: 18
# quiet: false
# auth: null # tuple of username and password
# auth_message: null
# prevent_thread_lock: false
# show_error: false
# height: 500
# width: 100%
favicon_path: /app/src/misc/logo.png
# ssl_keyfile: null
# ssl_certfile: null
# ssl_keyfile_password: null
# ssl_verify: false
# show_api: false
# allowed_paths:
# blocked_paths: null
# root_path: null
# app_kwargs: null
# state_session_capacity: 10000
# share_server_address: null
# share_server_protocol: null
# max_file_size: null
# enable_monitoring: null
queue:
## Queue configuration
# status_update_rate: 'auto'
# api_open: null
max_size: 10
# default_concurrency_limit:
layout:
show_settings: false
header: /app/src/misc/header.html
header_format_options:
header_logo_url: https://apstrom.ca/
header_logo_src: /app/src/misc/logo.png
footer: /app/src/misc/footer.html
footer_format_options:
# footer_css_path: /app/src/misc/footer_style.css
footer_scraibe_webui_version: "0.1.1-dev"
scraibe_params:
## LocalAI (transcription + diarization)
localai_api_url: http://localhost:8080
localai_api_key: ""
localai_model: vibevoice-cpp-asr
## Summarizer LLM (for transcript_and_summarize)
summarizer_api_url: http://localhost:8080
summarizer_api_key: ""
summarizer_model: qwen3-14b
## Legacy Whisper/Pyannote fields (ignored by LocalAI client; kept for compatibility)
whisper_model: large-v3
whisper_type: whisper
dia_model: null
use_auth_token: null
device: cpu
num_threads: 18
mail:
sender_email: scribe@apstrom.ca
smtp_server: mail.apstrom.ca
smtp_port: 587
sender_password: ""
connection_type: TLS # 'SSL', 'TLS', or 'PLAIN'
context: default
default_subject: "A.P.Strom audio transcription"
error_template: /app/src/misc/error_notification_template.html
error_subject: "error"
error_format_options:
## exception is mandatory for your html; will be set to the related exception in the Code
contact_email: support@apstrom.ca
success_template: /app/src/misc/success_template.html
success_subject: "ready"
success_format_options:
contact_email: info@apstrom.ca
upload_notification_template: /app/src/misc/upload_notification_template.html
upload_subject: "upload successful"
upload_notification_format_options:
queue_position: null
contact_email: info@apstrom.ca
# mail_css_path: /app/src/misc/mail_style.css
advanced:
keep_model_alive: false # for sync interface only; keeps the model alive during a session
concurrent_workers_async: 2 # number of concurrent working threads in the async interface
+31
View File
@@ -0,0 +1,31 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Error Notification</title>
<style>
{email_css}
</style>
</head>
<body>
<div class="container">
<h1 style="color:{accent_color};">Error Notification</h1>
<p>Dear user,</p>
<p>An error occurred while processing your audio file. This means that something went wrong during the processing of your file, and it could not be completed successfully.</p>
<p class="error-message">Error details: {exception}</p>
<p>Please check the file and try again. If the problem persists, our support team is here to help.</p>
<div class="contact">
<p>You can contact our support team at <a href="mailto:{contact_email}" style="color:{accent_color};">{contact_email}</a>. They are available to assist with any questions or issues you may have.</p>
</div>
<div class="disclaimer">
<p>Please note that our support team does not have the ability to fix processing errors directly or access the files you have uploaded. They can provide guidance and help troubleshoot any issues you may encounter.</p>
</div>
<div class="signature">
<p>Thank you for using our transcription service!</p>
<p>A.P.Strom</p>
</div>
{email_logo}
</div>
</body>
</html>
+119
View File
@@ -0,0 +1,119 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Footer</title>
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.4/css/all.min.css">
<style>
/* Styles from footer_style.css */
/* Resetting margins and paddings */
html, body {{
margin: 0;
padding: 0;
width: 100%;
height: 100%;
}}
body {{
font-family: Arial, sans-serif;
display: flex;
flex-direction: column;
justify-content: space-between; /* Ensures footer stays at the bottom */
}}
.footer {{
display: flex;
justify-content: space-between;
align-items: center;
padding: 20px;
/* Removed background-color to inherit from parent */
font-size: 16px;
color: #333;
width: 100%; /* Ensure footer is full width */
box-sizing: border-box; /* Padding is included in the width */
}}
.footer > div:first-child {{
margin-left: -20px;
padding-left: 0; /* Adjust if necessary */
}}
.footer div, .footer a {{
margin: 0 5px;
}}
.footer div {{
text-align: left;
}}
.footer a {{
color: {accent_color};
transition: color 0.3s ease;
}}
.footer a:hover {{
color: #50AF31;
}}
.foot-text {{
text-align: center;
width: 80%;
margin-bottom: 15px;
font-size: 14px;
color: #333;
}}
.brand-section {{
display: flex;
flex-direction: column;
align-items: center;
justify-content: center;
text-align: center;
}}
.brand-icon a {{
display: flex;
align-items: center;
justify-content: center;
width: 50px;
height: 50px;
border-radius: 50%;
background-color: transparent; /* Ensure transparency */
text-decoration: none !important;
color: white;
transition: background-color 0.3s ease, transform 0.3s ease;
}}
.brand-icon i {{
font-size: 24px;
}}
.brand-icon a:hover, .brand-icon a:focus {{
background-color: {accent_color};
transform: scale(1.1);
text-decoration: none;
}}
.brand-icon a, .brand-icon a:hover, .brand-icon a:active, .brand-icon a:visited {{
text-decoration: none;
}}
.build-version {{
margin-top: 8px;
color: white; /* Adjust as needed */
font-size: 12px;
}}
/* Removed dark mode media query to let Gradio handle theming */
</style>
</head>
<body>
<div class="footer">
<div class="foot-text">
<h2 style="font-weight: bold; color: {accent_color};">Disclaimer</h2>
<p>The transcription completed by this application may contain errors.</p>
<p>Users must take care to review transcripts before circulating to ensure that they are error-free and complete.</p>
<p>The transcripts produced by this application do not replace a court reporter's transcription. The transcripts completed by this application are for the user's convenience only.</p>
<h2 style="font-weight: bold; color: {accent_color};">Data retention</h2>
<p>Audio or video files uploaded to this application are only retained for the time that it takes to complete the transcription. All transcripts are deleted after they are transmitted to the user.</p>
</div>
<div class="brand-section">
<div class="brand-icon">
<a href="https://apstrom.ca" aria-label="A.P.Strom">
<i class="fas fa-globe"></i>
</a>
</div>
</div>
<div class="build-version">Build-Version: {footer_scraibe_webui_version}</div>
</div>
</body>
</html>
+100
View File
@@ -0,0 +1,100 @@
/* footer_style.css */
/* Resetting margins and paddings */
html, body {
margin: 0;
padding: 0;
width: 100%;
height: 100%;
}
body {
font-family: Arial, sans-serif;
display: flex;
flex-direction: column;
justify-content: space-between; /* Ensures footer stays at the bottom */
}
.footer {
display: flex;
justify-content: space-between;
align-items: center;
padding: 20px; /* Adjusted for demonstration */
background-color: #F9FAFB; /* Fixed background color */
font-size: 16px;
color: #333;
width: 100%; /* Ensure footer is full width */
box-sizing: border-box; /* Padding is included in the width */
}
.footer > div:first-child {
margin-left: -20px;
padding-left: 0; /* Reducing or eliminating left padding if it's causing the shift */
}
.footer div, .footer a {
margin: 0 5px;
}
.footer div {
text-align: left;
}
.footer a {
color: #333;
transition: color 0.3s ease;
}
.footer a:hover {
color: #50AF31;
}
.github-section {
display: flex;
flex-direction: column;
align-items: center;
justify-content: center;
text-align: center;
}
.github-icon a {
display: flex;
align-items: center;
justify-content: center;
width: 50px;
height: 50px;
border-radius: 50%;
background-color: none;
text-decoration: none !important; /* Removes underline */
color: white;
transition: background-color 0.3s ease, transform 0.3s ease;
}
.github-icon i {
font-size: 24px;
}
.github-icon a:hover, .github-icon a:focus {
background-color: #50AF31;
transform: scale(1.1);
text-decoration: none; /* Removes underline */
}
.github-icon a, .github-icon a:hover, .github-icon a:active, .github-icon a:visited {
text-decoration: none;
}
.build-version {
margin-top: 8px; /* Adjust spacing between the icon and the text as needed */
color: white; /* Adjust text color as needed */
font-size: 12px; /* Adjust font size as needed */
}
/* Dark mode styles */
@media (prefers-color-scheme: dark) {
body {
background-color: #121212;
color: #FFFFFF;
}
.footer {
background-color: transparent; /* Make footer background transparent */
color: #FFFFFF;
}
.footer a {
color: #FFFFFF;
}
.footer a:hover {
color: #50AF31;
}
.build-version {
color: #CCCCCC; /* Adjust text color for better contrast in dark mode */
}
}
+120
View File
@@ -0,0 +1,120 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>{webui_title}</title>
<!-- Importing Cormorant Garamond font from Google Fonts -->
<link href="https://fonts.googleapis.com/css2?family=Cormorant+Garamond:wght@400;700&display=swap" rel="stylesheet">
<style>
.header-wrapper {{
display: flex;
flex-direction: column;
align-items: center;
justify-content: center;
padding: 20px 20px 0;
box-sizing: border-box;
}}
.logo-container {{
display: flex;
align-items: center;
justify-content: center;
margin-bottom: 10px;
}}
.logo {{
width: 75px;
height: auto;
display: block;
}}
.header-title {{
font-family: 'Cormorant Garamond', serif;
font-size: 45px;
font-weight: bold;
color: {accent_color};
margin: 0;
position: relative;
padding: 0.4em 0;
text-align: center;
max-width: 90%;
}}
.header-title::before,
.header-title::after {{
content: "";
position: absolute;
height: 2px;
width: 80%;
background-color: {accent_color};
left: 10%;
}}
.header-title::before {{
top: 0.4em;
}}
.header-title::after {{
bottom: 0.4em;
}}
.header-description {{
text-align: center;
padding: 10px 40px 20px;
max-width: 800px;
margin: 0 auto;
}}
.header-description p,
.header-description h2 {{
font-size: 15px;
margin: 8px 0;
line-height: 1.5;
}}
.header-description h2 {{
font-weight: bold;
color: {accent_color};
}}
@media (max-width: 768px) {{
.header-title {{
font-size: 31px;
}}
.header-title::before,
.header-title::after {{
width: 80%;
left: 10%;
}}
.logo {{
width: 50px;
}}
.header-description {{
padding: 10px 20px 15px;
}}
}}
</style>
</head>
<body>
<div class="header-wrapper">
<div class="logo-container">
<a href="{header_logo_url}">
<img src="{header_logo_src}" alt="{webui_title}" class="logo">
</a>
</div>
<h1 class="header-title">{webui_title}</h1>
</div>
<div class="header-description">
<p>
Upload, record, or provide a video with audio for transcription. Our toolkit is designed to transcribe content from multiple languages accurately. The integrated speaker diarisation feature identifies different speakers, ensuring a smooth transcription experience. For optimal results, indicate the number of speakers and the original language of the content.
</p>
<h2>Start your transcription below.</h2>
</div>
</body>
</html>
+58
View File
@@ -0,0 +1,58 @@
/* header_style.css */
/* Importing Cormorant Garamond font from Google Fonts */
@import url('https://fonts.googleapis.com/css2?family=Cormorant+Garamond:wght@400;700&display=swap');
.header-container {
display: flex;
align-items: center;
justify-content: center;
position: relative;
padding-top: 30px;
}
.logo-container {
position: absolute;
top: 50%;
right: 20px;
transform: translateY(-50%);
width: 300px;
}
.logo {
width: 100%;
height: auto;
}
h1 {
font-family: 'Cormorant Garamond', serif;
font-size: 50px !important; /* Increased font size */
font-weight: bold;
color: #50AF31;
margin: 0;
position: relative;
padding: 0.5em 0;
}
h1::before, h1::after {
content: "";
position: absolute;
height: 2px;
width: 80%;
background-color: #50AF31;
left: 10%;
}
h1::before {
top: 0.5em;
}
h1::after {
bottom: 0.5em;
}
p, h2 {
font-size: 16px;
margin: 10px 0;
line-height: 1.4;
}
BIN
View File
Binary file not shown.

After

Width:  |  Height:  |  Size: 152 KiB

BIN
View File
Binary file not shown.

After

Width:  |  Height:  |  Size: 111 KiB

BIN
View File
Binary file not shown.

After

Width:  |  Height:  |  Size: 111 KiB

+61
View File
@@ -0,0 +1,61 @@
body {
font-family: Arial, sans-serif;
line-height: 1.5;
background-color: #ffffff;
color: #333;
margin: 0;
padding: 20px;
}
.container {
width: 100%;
max-width: 600px;
margin: 0 auto;
padding: 20px;
border: 1px solid #ddd;
border-radius: 5px;
}
h1, h2, h3 {
font-size: 1.5em;
margin-top: 0;
color: {accent_color};
}
p {
margin: 10px 0;
font-size: 1em;
}
.error-message, .success-message {
padding: 10px 0;
margin-bottom: 15px;
font-size: 1em;
}
.error-message {
color: #721c24;
}
.success-message {
color: #155724;
}
.contact {
margin-top: 15px;
font-size: 0.9em;
color: #555;
}
.contact a {
color: {accent_color};
text-decoration: none;
}
.contact a:hover {
text-decoration: underline;
}
a {
color: {accent_color};
}
.disclaimer {
margin-top: 20px;
font-size: 0.8em;
color: #777;
}
.signature {
margin-top: 20px;
font-size: 0.8em;
color: #555;
}
+30
View File
@@ -0,0 +1,30 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Transcript Ready</title>
<style>
{email_css}
</style>
</head>
<body>
<div class="container">
<h1 style="color:{accent_color};">Transcript Ready</h1>
<p>Dear user,</p>
<p>Your file has been successfully processed, and the transcript is now ready. The transcript of your audio or video file is attached to this email.</p>
<p>We hope you find the transcript useful. If you have any questions or need further assistance, please do not hesitate to contact our support team.</p>
<div class="contact">
<p>You can reach our support team at <a href="mailto:{contact_email}" style="color:{accent_color};">{contact_email}</a>. They are available to help with any questions or issues you may have.</p>
</div>
<div class="disclaimer">
<p>Please note that our support team cannot modify the content of the transcript. They can assist with any other questions or concerns you may have.</p>
</div>
<div class="signature">
<p>Thank you for using our transcription service!</p>
<p>A.P.Strom</p>
</div>
{email_logo}
</div>
</body>
</html>
+30
View File
@@ -0,0 +1,30 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Upload Successful</title>
<style>
{email_css}
</style>
</head>
<body>
<div class="container">
<h1 style="color:{accent_color};">Upload Successful</h1>
<p>Dear user,</p>
<p>Your file has been successfully uploaded and is now in our processing queue. This means that our system has received your file, and it is waiting to be processed. We will handle your file as soon as possible.</p>
<p>We will notify you once your file has been processed. If you have any urgent needs or further questions, feel free to reach out to our support team.</p>
<div class="contact">
<p>You can contact our support team at <a href="mailto:{contact_email}" style="color:{accent_color};">{contact_email}</a>. Please note that our support team is here to help with any questions or issues you might have.</p>
</div>
<div class="disclaimer">
<p>Please note that our support team does not have the ability to change your position in the queue or access the files you have uploaded. They are here to provide assistance and answer any questions you might have about the process.</p>
</div>
<div class="signature">
<p>Thank you for using our transcription service!</p>
<p>A.P.Strom</p>
</div>
{email_logo}
</div>
</body>
</html>
+3
View File
@@ -72,6 +72,9 @@ scraibe = "scraibe.cli:cli"
[tool.poetry.extras] [tool.poetry.extras]
app = ["scraibe-webui"] app = ["scraibe-webui"]
[tool.ruff]
line-length = 58
[tool.ruff.lint.extend-per-file-ignores] [tool.ruff.lint.extend-per-file-ignores]
"__init__.py" = ["E402", "F403", "F401"] "__init__.py" = ["E402", "F403", "F401"]
"scraibe/misc.py" = ["E722"] "scraibe/misc.py" = ["E722"]
+5
View File
@@ -1,3 +1,8 @@
tqdm>=4.66.5 tqdm>=4.66.5
numpy>=1.26.4 numpy>=1.26.4
httpx>=0.28.0 httpx>=0.28.0
gradio>=5.0.0
PyYAML>=6.0
celery[redis]>=5.3.0
redis>=5.0.0
python-docx>=1.1.0
-2
View File
@@ -5,6 +5,4 @@ from .audio import AudioProcessor
from .transcript_exporter import Transcript from .transcript_exporter import Transcript
from .misc import set_threads, ParseKwargs from .misc import set_threads, ParseKwargs
from .cli import cli
from ._version import __version__ from ._version import __version__
+59
View File
@@ -0,0 +1,59 @@
"""
Entrypoint for running ScrAIbe as a module:
python -m scraibe
Always launches the Web GUI (Gradio).
Optionally launches:
- MCP-style API server
- Watch-folder mode
"""
import os
import threading
import logging
logger = logging.getLogger("scraibe.__main__")
from .webui import create_app
def _run_mcp_server():
"""
Run MCP server in a separate thread.
"""
import uvicorn
from . import mcp_server
host = os.getenv("MCP_SERVER_HOST", "0.0.0.0")
port = int(os.getenv("MCP_SERVER_PORT", "8000"))
uvicorn.run(
mcp_server.app,
host=host,
port=port,
log_level="info",
)
if __name__ == "__main__":
# Optionally start MCP server in background (non-blocking)
mcp_enabled = os.getenv("MCP_SERVER_ENABLED", "false").strip().lower() in ("true", "1", "yes")
if mcp_enabled:
try:
t = threading.Thread(target=_run_mcp_server, daemon=True)
t.start()
logger.info("MCP server started in background.")
except Exception as e:
logger.warning("Failed to start MCP server (WebUI will continue): %s", e)
# Optionally start watch-folder mode (non-blocking)
try:
from .watcher import start_watcher
start_watcher()
logger.info("Watch-folder mode started.")
except Exception as e:
logger.warning("Failed to start watch-folder mode (WebUI will continue): %s", e)
# Always start WebUI (Gradio)
create_app()
+114
View File
@@ -7,13 +7,21 @@ Simplified audio processor for ScrAIbe.
Previously this used torch and pyannote-style processing. In the LocalAI-backed Previously this used torch and pyannote-style processing. In the LocalAI-backed
version, we primarily pass files to the API, but we keep a lightweight helper version, we primarily pass files to the API, but we keep a lightweight helper
for backward compatibility. for backward compatibility.
Now also includes utilities for chunking long audio into smaller segments
to avoid GPU memory limits when using vibevoice-cpp on LocalAI.
""" """
import json
import os
import tempfile
from subprocess import CalledProcessError, run from subprocess import CalledProcessError, run
import numpy as np import numpy as np
SAMPLE_RATE = 16000 SAMPLE_RATE = 16000
NORMALIZATION_FACTOR = 32768.0 NORMALIZATION_FACTOR = 32768.0
DEFAULT_CHUNK_DURATION = 180.0 # seconds
DEFAULT_CHUNK_OVERLAP = 2.0 # seconds
class AudioProcessor: class AudioProcessor:
@@ -106,3 +114,109 @@ class AudioProcessor:
def __repr__(self) -> str: def __repr__(self) -> str:
return f"AudioProcessor(waveform_len={len(self.waveform)}, sr={self.sr})" return f"AudioProcessor(waveform_len={len(self.waveform)}, sr={self.sr})"
def get_audio_duration(file_path: str) -> float:
"""
Get the duration of an audio file in seconds using ffprobe.
Args:
file_path: Path to the audio file.
Returns:
Duration in seconds as a float.
Raises:
RuntimeError: If ffprobe fails.
"""
cmd = [
"ffprobe",
"-v", "error",
"-show_entries", "format=duration",
"-of", "json",
file_path,
]
try:
result = run(cmd, capture_output=True, text=True, check=True)
data = json.loads(result.stdout)
return float(data["format"]["duration"])
except (CalledProcessError, json.JSONDecodeError, KeyError) as e:
raise RuntimeError(f"Failed to get audio duration for {file_path}: {e}")
def split_audio_into_chunks(
input_path: str,
max_duration: float = DEFAULT_CHUNK_DURATION,
overlap: float = DEFAULT_CHUNK_OVERLAP,
output_format: str = "wav",
sample_rate: int = 24000,
) -> list:
"""
Split a long audio file into overlapping chunks using ffmpeg.
Args:
input_path: Path to the input audio file.
max_duration: Maximum duration of each chunk in seconds.
overlap: Overlap duration in seconds between consecutive chunks.
output_format: Output format (e.g., 'wav').
sample_rate: Sample rate for output chunks.
Returns:
List of dicts:
[{"path": "chunk.wav", "start": 0.0, "end": 180.0}, ...]
Files must be cleaned up by the caller.
"""
duration = get_audio_duration(input_path)
# If file is shorter than max_duration, no need to split
if duration <= max_duration:
return [{"path": input_path, "start": 0.0, "end": duration}]
chunks = []
start = 0.0
chunk_id = 0
while start < duration:
chunk_end = min(start + max_duration, duration)
chunk_duration = chunk_end - start
tmp = tempfile.NamedTemporaryFile(
delete=False,
suffix=f".{output_format}",
prefix="scraibe_chunk_",
)
chunk_path = tmp.name
tmp.close()
cmd = [
"ffmpeg",
"-y",
"-nostdin",
"-ss", str(start),
"-i", input_path,
"-t", str(chunk_duration),
"-ar", str(sample_rate),
"-ac", "1",
"-c:a", "pcm_s16le",
chunk_path,
]
try:
run(cmd, capture_output=True, check=True)
except CalledProcessError as e:
# Clean up on error
if os.path.exists(chunk_path):
os.remove(chunk_path)
raise RuntimeError(
f"Failed to create audio chunk {chunk_id} for {input_path}: {e.stderr.decode()}"
)
chunks.append({
"path": chunk_path,
"start": start,
"end": chunk_end,
})
start += max_duration - overlap
chunk_id += 1
return chunks
+98 -17
View File
@@ -16,12 +16,15 @@ but ignored when not relevant.
""" """
import os import os
from typing import Union, Optional import logging
from typing import Union, Optional, Dict, Any
from .localai_client import LocalAIClient, LocalAIError from .localai_client import LocalAIClient, LocalAIError
from .summarizer import SummarizerClient, SummarizerError from .summarizer import SummarizerClient, SummarizerError
from .transcript_exporter import Transcript from .transcript_exporter import Transcript
logger = logging.getLogger("scraibe.autotranscript")
class Scraibe: class Scraibe:
""" """
@@ -68,6 +71,8 @@ class Scraibe:
""" """
self.verbose = verbose or kwargs.get("verbose", False) self.verbose = verbose or kwargs.get("verbose", False)
logger.info("Initializing Scraibe.")
try: try:
self.client = LocalAIClient( self.client = LocalAIClient(
api_url=api_url, api_url=api_url,
@@ -75,6 +80,7 @@ class Scraibe:
model=model, model=model,
) )
except LocalAIError as e: except LocalAIError as e:
logger.error("Failed to initialize LocalAI client: %s", e)
raise LocalAIError(f"Failed to initialize LocalAI client: {e}") raise LocalAIError(f"Failed to initialize LocalAI client: {e}")
# Summarizer is lazy-initialized if needed # Summarizer is lazy-initialized if needed
@@ -95,6 +101,7 @@ class Scraibe:
if self._summarizer is not None: if self._summarizer is not None:
return self._summarizer return self._summarizer
logger.info("Initializing SummarizerClient (lazy).")
try: try:
self._summarizer = SummarizerClient( self._summarizer = SummarizerClient(
api_url=api_url, api_url=api_url,
@@ -102,6 +109,7 @@ class Scraibe:
model=model, model=model,
) )
except SummarizerError as e: except SummarizerError as e:
logger.error("Failed to initialize Summarizer client: %s", e)
raise SummarizerError(f"Failed to initialize Summarizer client: {e}") raise SummarizerError(f"Failed to initialize Summarizer client: {e}")
return self._summarizer return self._summarizer
@@ -112,21 +120,21 @@ class Scraibe:
def transcribe( def transcribe(
self, self,
audio_file: Union[str], audio_file: str,
*,
for_export: bool = False,
**kwargs, **kwargs,
) -> str: ) -> Union[str, Dict[str, Any]]:
""" """
Transcribe the provided audio file using LocalAI. Transcribe the provided audio file using LocalAI.
Uses /v1/audio/diarization with vibevoice.cpp, then concatenates Uses /v1/audio/diarization with vibevoice.cpp (verbose_json).
all segment texts.
Args:
audio_file (str): Path to the audio file.
**kwargs: Additional keyword arguments (some forwarded, others ignored).
Returns: Returns:
str: The concatenated transcribed text. - If for_export=False: plain transcript text (str).
- If for_export=True: dict with:
- transcript: plain text
- segments: list[segment] with speaker labels
- raw_result: full verbose_json from LocalAI (if present)
""" """
if isinstance(audio_file, str): if isinstance(audio_file, str):
if not os.path.exists(audio_file): if not os.path.exists(audio_file):
@@ -136,35 +144,78 @@ class Scraibe:
"In LocalAI mode, audio_file must be a file path (str)." "In LocalAI mode, audio_file must be a file path (str)."
) )
verbose = kwargs.get("verbose", self.verbose) verbose = kwargs.pop("verbose", self.verbose)
logger.info("transcribe called for: %s", audio_file)
try: try:
result = self.client.diarize_and_transcribe( result = self.client.diarize_and_transcribe(
audio_path=audio_file, audio_path=audio_file,
include_text=True, include_text=True,
verbose=verbose, verbose=verbose,
return_raw=True,
**kwargs, **kwargs,
) )
except LocalAIError as e: except LocalAIError as e:
logger.error("Error during LocalAI transcription: %s", e)
raise LocalAIError(f"Error during LocalAI transcription: {e}") raise LocalAIError(f"Error during LocalAI transcription: {e}")
segments = result.get("segments", [])
speakers = result.get("speakers", [])
transcripts = result.get("transcripts", []) transcripts = result.get("transcripts", [])
return " ".join(t.strip() for t in transcripts if t.strip())
# Build simple transcript text
if for_export:
# Include speaker-labeled transcript
lines = []
for seg, speaker, text in zip(segments, speakers, transcripts):
start, end = seg
ts = self._format_timestamp(start)
line = f"[{ts}] {speaker}: {text.strip()}"
lines.append(line)
full_text = "\n\n".join(lines)
else:
# Legacy: space-joined text
full_text = " ".join(t.strip() for t in transcripts if t.strip())
logger.info("transcribe completed, length=%d chars", len(full_text))
if for_export:
# Return richer structure for JSON export
raw_result = result.get("raw_result")
return {
"transcript": full_text,
"segments": [
{
"id": i,
"speaker": sp,
"start": seg[0],
"end": seg[1],
"text": txt,
}
for i, (seg, sp, txt) in enumerate(
zip(segments, speakers, transcripts)
)
],
"raw_result": raw_result if raw_result is not None else None,
}
return full_text
def transcript_and_summarize( def transcript_and_summarize(
self, self,
audio_file: Union[str], audio_file: str,
*, *,
summarizer_api_url: Optional[str] = None, summarizer_api_url: Optional[str] = None,
summarizer_api_key: Optional[str] = None, summarizer_api_key: Optional[str] = None,
summarizer_model: Optional[str] = None, summarizer_model: Optional[str] = None,
for_export: bool = False,
**kwargs, **kwargs,
) -> dict: ) -> dict:
""" """
Transcribe the audio file and generate a detailed summary. Transcribe the audio file and generate a detailed summary.
Steps: Steps:
- Transcribe via LocalAI. - Transcribe via LocalAI (verbose_json).
- Build a plain-text transcript (with speaker labels). - Build a plain-text transcript (with speaker labels).
- Summarize the transcript using the configured LLM. - Summarize the transcript using the configured LLM.
@@ -172,6 +223,8 @@ class Scraibe:
dict with: dict with:
- transcript: full transcript text (with speaker labels) - transcript: full transcript text (with speaker labels)
- summary: final detailed summary (markdown-ready) - summary: final detailed summary (markdown-ready)
- segments: (if for_export) list[segment] with speaker labels
- raw_result: (if for_export) full verbose_json from LocalAI
""" """
if isinstance(audio_file, str): if isinstance(audio_file, str):
if not os.path.exists(audio_file): if not os.path.exists(audio_file):
@@ -181,7 +234,8 @@ class Scraibe:
"In LocalAI mode, audio_file must be a file path (str)." "In LocalAI mode, audio_file must be a file path (str)."
) )
verbose = kwargs.get("verbose", self.verbose) verbose = kwargs.pop("verbose", self.verbose)
logger.info("transcript_and_summarize called for: %s", audio_file)
# 1) Get diarized + transcribed result # 1) Get diarized + transcribed result
try: try:
@@ -189,9 +243,11 @@ class Scraibe:
audio_path=audio_file, audio_path=audio_file,
include_text=True, include_text=True,
verbose=verbose, verbose=verbose,
return_raw=True,
**kwargs, **kwargs,
) )
except LocalAIError as e: except LocalAIError as e:
logger.error("Error during LocalAI transcription: %s", e)
raise LocalAIError(f"Error during LocalAI transcription: {e}") raise LocalAIError(f"Error during LocalAI transcription: {e}")
segments = result.get("segments", []) segments = result.get("segments", [])
@@ -199,6 +255,7 @@ class Scraibe:
transcripts = result.get("transcripts", []) transcripts = result.get("transcripts", [])
if not segments: if not segments:
logger.warning("No segments returned; returning empty transcript/summary.")
return { return {
"transcript": "", "transcript": "",
"summary": "No transcript content to summarize.", "summary": "No transcript content to summarize.",
@@ -213,6 +270,7 @@ class Scraibe:
lines.append(line) lines.append(line)
full_transcript = "\n\n".join(lines) full_transcript = "\n\n".join(lines)
logger.info("Built full transcript, length=%d chars", len(full_transcript))
# 3) Summarize # 3) Summarize
try: try:
@@ -222,18 +280,41 @@ class Scraibe:
model=summarizer_model, model=summarizer_model,
) )
except SummarizerError as e: except SummarizerError as e:
logger.error("Failed to initialize summarizer: %s", e)
raise SummarizerError(f"Failed to initialize summarizer: {e}") raise SummarizerError(f"Failed to initialize summarizer: {e}")
try: try:
summary = summarizer.summarize_transcript(full_transcript) summary = summarizer.summarize_transcript(full_transcript)
except SummarizerError as e: except SummarizerError as e:
logger.error("Error during summarization: %s", e)
raise SummarizerError(f"Error during summarization: {e}") raise SummarizerError(f"Error during summarization: {e}")
return { logger.info("transcript_and_summarize completed.")
out = {
"transcript": full_transcript, "transcript": full_transcript,
"summary": summary, "summary": summary,
} }
if for_export:
# Add segments and raw_result for JSON export
raw_result = result.get("raw_result")
out["segments"] = [
{
"id": i,
"speaker": sp,
"start": seg[0],
"end": seg[1],
"text": txt,
}
for i, (seg, sp, txt) in enumerate(
zip(segments, speakers, transcripts)
)
]
out["raw_result"] = raw_result if raw_result is not None else None
return out
# ----------------- # -----------------
# Helpers # Helpers
# ----------------- # -----------------
+28
View File
@@ -0,0 +1,28 @@
"""
Celery application for async transcription jobs.
"""
import os
from celery import Celery
broker_url = os.getenv("CELERY_BROKER_URL", "redis://redis:6379/0")
result_backend = os.getenv("CELERY_RESULT_BACKEND", "redis://redis:6379/0")
celery_app = Celery(
"scraibe",
broker=broker_url,
backend=result_backend,
)
celery_app.conf.update(
task_routes={
"scraibe.tasks.process_transcription_task": {"queue": "transcription"},
},
task_serializer="json",
result_serializer="json",
accept_content=["json"],
timezone="UTC",
enable_utc=True,
)
celery_app.autodiscover_tasks(["scraibe.tasks"])
+47 -11
View File
@@ -9,9 +9,10 @@ This version is adapted for LocalAI-based transcription and diarization.
import os import os
import json import json
import logging
from argparse import ArgumentParser, ArgumentDefaultsHelpFormatter from argparse import ArgumentParser, ArgumentDefaultsHelpFormatter
from .autotranscript import Scraibe from .autotranscript import Scraibe
from .misc import set_threads from .misc import set_threads, setup_logging
def cli(): def cli():
@@ -20,6 +21,11 @@ def cli():
and diarize audio files via a LocalAI server. and diarize audio files via a LocalAI server.
""" """
# Initialize logging (can be overridden via --log-level)
setup_logging(level=os.getenv("LOG_LEVEL", "INFO"))
logger = logging.getLogger("scraibe.cli")
def str2bool(string): def str2bool(string):
str2val = {"True": True, "False": False} str2val = {"True": True, "False": False}
if string in str2val: if string in str2val:
@@ -181,20 +187,41 @@ def cli():
help="Number of speakers in the audio.", help="Number of speakers in the audio.",
) )
parser.add_argument(
"--log-level",
type=str,
default=None,
choices=["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"],
help="Override LOG_LEVEL env var for logging verbosity.",
)
args = parser.parse_args() args = parser.parse_args()
# Apply log-level override if provided
log_level = args.log_level or os.getenv("LOG_LEVEL", "INFO")
setup_logging(level=log_level)
logger.info("CLI starting with log_level=%s", log_level)
arg_dict = vars(args) arg_dict = vars(args)
# configure output # configure output
out_folder = arg_dict.pop("output_directory") out_folder = arg_dict.pop("output_directory")
os.makedirs(out_folder, exist_ok=True) os.makedirs(out_folder, exist_ok=True)
logger.info("Output directory: %s", out_folder)
out_format = arg_dict.pop("output_format") out_format = arg_dict.pop("output_format")
task = arg_dict.pop("task") task = arg_dict.pop("task")
logger.info("Task: %s", task)
logger.info("Output format: %s", out_format)
set_threads(arg_dict.pop("num_threads")) set_threads(arg_dict.pop("num_threads"))
# Read shared values once
verbose = arg_dict.pop("verbose_output")
language = arg_dict.pop("language")
num_speakers = arg_dict.pop("num_speakers")
# Build kwargs for Scraibe (LocalAI-backed) # Build kwargs for Scraibe (LocalAI-backed)
class_kwargs = { class_kwargs = {
"api_url": arg_dict.pop("localai_api_url"), "api_url": arg_dict.pop("localai_api_url"),
@@ -205,38 +232,45 @@ def cli():
"whisper_type": arg_dict.pop("whisper_type"), "whisper_type": arg_dict.pop("whisper_type"),
"dia_model": arg_dict.pop("diarization_directory"), "dia_model": arg_dict.pop("diarization_directory"),
"use_auth_token": arg_dict.pop("hf_token"), "use_auth_token": arg_dict.pop("hf_token"),
"verbose": arg_dict.pop("verbose_output"), "verbose": verbose,
} }
logger.info("LocalAI API URL: %s", class_kwargs["api_url"] or os.getenv("LOCALAI_API_URL", "<not set>"))
logger.info("LocalAI Model: %s", class_kwargs["model"] or os.getenv("LOCALAI_MODEL", "<not set>"))
model = Scraibe(**class_kwargs) model = Scraibe(**class_kwargs)
if arg_dict["audio_files"]: if arg_dict["audio_files"]:
audio_files = arg_dict.pop("audio_files") audio_files = arg_dict.pop("audio_files")
logger.info("Audio files: %s", audio_files)
if task == "transcribe": if task == "transcribe":
for audio in audio_files: for audio in audio_files:
logger.info("Starting 'transcribe' for: %s", audio)
out = model.transcribe( out = model.transcribe(
audio, audio,
language=arg_dict.pop("language"), language=language,
verbose=arg_dict.pop("verbose_output"), verbose=verbose,
num_speakers=arg_dict.pop("num_speakers"), num_speakers=num_speakers,
) )
basename = audio.split("/")[-1].split(".")[0] basename = audio.split("/")[-1].split(".")[0]
path = os.path.join(out_folder, f"{basename}.{out_format}") path = os.path.join(out_folder, f"{basename}.{out_format}")
print(f"Saving {basename}.{out_format} to {out_folder}") logger.info("Saving transcript to: %s", path)
with open(path, "w", encoding="utf-8") as f: with open(path, "w", encoding="utf-8") as f:
f.write(out) f.write(out)
logger.info("Transcript saved: %s", path)
elif task == "transcript_and_summarize": elif task == "transcript_and_summarize":
for audio in audio_files: for audio in audio_files:
logger.info("Starting 'transcript_and_summarize' for: %s", audio)
result = model.transcript_and_summarize( result = model.transcript_and_summarize(
audio, audio,
summarizer_api_url=arg_dict.pop("summarizer_api_url"), summarizer_api_url=arg_dict.pop("summarizer_api_url"),
summarizer_api_key=arg_dict.pop("summarizer_api_key"), summarizer_api_key=arg_dict.pop("summarizer_api_key"),
summarizer_model=arg_dict.pop("summarizer_model"), summarizer_model=arg_dict.pop("summarizer_model"),
language=arg_dict.pop("language"), language=language,
verbose=arg_dict.pop("verbose_output"), verbose=verbose,
num_speakers=arg_dict.pop("num_speakers"), num_speakers=num_speakers,
) )
transcript_text = result.get("transcript", "") transcript_text = result.get("transcript", "")
@@ -246,7 +280,7 @@ def cli():
# Always use .md for transcript_and_summarize # Always use .md for transcript_and_summarize
md_path = os.path.join(out_folder, f"{basename}.md") md_path = os.path.join(out_folder, f"{basename}.md")
print(f"Saving {basename}.md (transcript + summary) to {out_folder}") logger.info("Saving transcript + summary to: %s", md_path)
with open(md_path, "w", encoding="utf-8") as f: with open(md_path, "w", encoding="utf-8") as f:
f.write("# Transcript\n\n") f.write("# Transcript\n\n")
@@ -254,5 +288,7 @@ def cli():
f.write("\n\n# Summary\n\n") f.write("\n\n# Summary\n\n")
f.write(summary_text) f.write(summary_text)
logger.info("Transcript + summary saved: %s", md_path)
if __name__ == "__main__": if __name__ == "__main__":
cli() cli()
+118
View File
@@ -0,0 +1,118 @@
"""
Reusable cover-page generator for transcript and summary DOCX files.
Configuration (env):
- COVER_PAGE_ENABLED: "true"/"false" (default: false)
- COVER_PAGE_ORGANIZATION: e.g., "A.P.Strom"
- COVER_PAGE_TITLE_PREFIX: e.g., "TRANSCRIPT" or "SUMMARY"
- COVER_PAGE_LOGO_URL: optional URL
- COVER_PAGE_LOGO_PATH: optional local path
"""
import os
from typing import Optional
from docx import Document
from docx.shared import Pt, Inches
from docx.enum.text import WD_ALIGN_PARAGRAPH
from docx.oxml import OxmlElement
from docx.oxml.ns import qn
def _add_page_break(doc: Document):
"""Insert a page break paragraph."""
p = doc.add_paragraph()
pPr = p._p.get_or_add_pPr()
# Clear spacing/tabs
for child in list(pPr):
tag = child.tag.split("}")[-1] if "}" in child.tag else child.tag
if tag in ("tabs", "spacing", "ind"):
pPr.remove(child)
page_break = OxmlElement("w:pageBreak")
page_break.set("{http://schemas.openxmlformats.org/wordprocessingml/2006/main}val", "1")
pPr.append(page_break)
def add_cover_page(
doc: Document,
title: str,
subtitle: Optional[str] = None,
metadata: Optional[dict] = None,
include_logo: bool = False,
):
"""
Insert a cover page at the current cursor position.
- title: e.g., "TRANSCRIPT" or "SUMMARY"
- subtitle: e.g., "Meeting of 16 June 2026"
- metadata: optional dict with keys like:
- "Organization"
- "Date"
- "Prepared by"
- "Reference"
"""
org = (os.getenv("COVER_PAGE_ORGANIZATION") or "").strip() or metadata.get("Organization") if metadata else None
date = (metadata.get("Date") if metadata else None) or ""
prepared_by = (metadata.get("Prepared by") if metadata else None) or ""
reference = (metadata.get("Reference") if metadata else None) or ""
# Title
p = doc.add_paragraph()
p.alignment = WD_ALIGN_PARAGRAPH.CENTER
p.paragraph_format.space_after = Pt(6)
run = p.add_run(title.upper())
run.bold = True
run.font.name = "Courier"
run.font.size = Pt(18)
# Subtitle
if subtitle:
p = doc.add_paragraph()
p.alignment = WD_ALIGN_PARAGRAPH.CENTER
p.paragraph_format.space_after = Pt(12)
run = p.add_run(subtitle)
run.font.name = "Courier"
run.font.size = Pt(14)
# Optional logo placeholder (text-only for now; can be extended)
if include_logo:
logo_url = (os.getenv("COVER_PAGE_LOGO_URL") or "").strip()
logo_path = (os.getenv("COVER_PAGE_LOGO_PATH") or "").strip()
# For now, just reserve space; image insertion can be added later.
p = doc.add_paragraph()
p.alignment = WD_ALIGN_PARAGRAPH.CENTER
p.paragraph_format.space_after = Pt(12)
# Metadata lines
if org or date or prepared_by or reference:
p = doc.add_paragraph()
p.alignment = WD_ALIGN_PARAGRAPH.CENTER
p.paragraph_format.space_after = Pt(4)
if org:
r = p.add_run(org)
r.font.name = "Courier"
r.font.size = Pt(12)
if date:
if org:
p.add_run("\n")
r = p.add_run(date)
r.font.name = "Courier"
r.font.size = Pt(12)
if prepared_by or reference:
p = doc.add_paragraph()
p.alignment = WD_ALIGN_PARAGRAPH.CENTER
p.paragraph_format.space_after = Pt(4)
if prepared_by:
r = p.add_run(f"Prepared by: {prepared_by}")
r.font.name = "Courier"
r.font.size = Pt(11)
if reference:
if prepared_by:
p.add_run("\n")
r = p.add_run(f"Reference: {reference}")
r.font.name = "Courier"
r.font.size = Pt(11)
# Page break after cover page
_add_page_break(doc)
+147
View File
@@ -0,0 +1,147 @@
"""
Utility module for applying styles and converting simple markdown
into styled DOCX paragraphs/runs for summaries.
"""
import re
from docx import Document
from docx.shared import Pt
from docx.oxml import OxmlElement
from docx.oxml.ns import qn
def _ensure_style(doc, name, based_on="Normal", font_name="Courier", font_size=Pt(12)):
"""
Ensure a paragraph style exists in the document.
"""
styles = doc.styles
if name not in [s.name for s in styles]:
style = styles.add_style(name, 1) # 1 = WD_STYLE_TYPE.PARAGRAPH
style.font.name = font_name
style.font.size = font_size
if based_on:
style.base_style = styles[based_on]
return styles[name]
def apply_heading_style(doc, paragraph, level: int):
"""
Apply heading style to a paragraph based on level (1, 2, 3).
"""
if level == 1:
style_name = "SummaryHeading1"
size = Pt(16)
elif level == 2:
style_name = "SummaryHeading2"
size = Pt(14)
else:
style_name = "SummaryHeading3"
size = Pt(12)
style = _ensure_style(doc, style_name, font_size=size)
paragraph.style = style
paragraph.paragraph_format.space_before = Pt(4)
paragraph.paragraph_format.space_after = Pt(2)
def apply_bullet_style(doc, paragraph):
"""
Apply a simple bullet style to a paragraph.
"""
style_name = "SummaryBullet"
style = _ensure_style(doc, style_name)
paragraph.style = style
pPr = paragraph._p.get_or_add_pPr()
tabs = OxmlElement("w:tabs")
tab = OxmlElement("w:tab")
tab.set(qn("w:val"), "left")
tab.set(qn("w:pos"), "360")
tabs.append(tab)
pPr.append(tabs)
def parse_simple_md_to_paragraphs(doc, text: str):
"""
Convert simple markdown text into DOCX paragraphs with styles.
Supported:
- # / ## / ### for headings
- - / * for bullet lists
- **bold** and *italic*
This is intentionally simple and robust for legal/business summaries.
"""
lines = text.splitlines()
current_paragraph = None
in_list = False
for line in lines:
stripped = line.strip()
if not stripped:
current_paragraph = None
in_list = False
continue
# Headings
heading_match = re.match(r"^(#{1,3})\s+(.*)", stripped)
if heading_match:
level = len(heading_match.group(1))
content = heading_match.group(2).strip()
p = doc.add_paragraph()
apply_heading_style(doc, p, level)
_add_run_with_inline_md(p, content)
current_paragraph = p
in_list = False
continue
# Bullet list
bullet_match = re.match(r"^[-*]\s+(.*)", stripped)
if bullet_match:
content = bullet_match.group(1).strip()
if not in_list or current_paragraph is None:
in_list = True
current_paragraph = doc.add_paragraph()
apply_bullet_style(doc, current_paragraph)
else:
current_paragraph = doc.add_paragraph()
apply_bullet_style(doc, current_paragraph)
_add_run_with_inline_md(current_paragraph, content)
continue
# Normal paragraph
if not in_list or current_paragraph is None:
in_list = False
current_paragraph = doc.add_paragraph()
else:
current_paragraph = doc.add_paragraph()
_add_run_with_inline_md(current_paragraph, stripped)
def _add_run_with_inline_md(paragraph, text: str):
"""
Add runs to a paragraph, interpreting **bold** and *italic*.
"""
# Simple regex for bold and italic
parts = re.split(r"(\*\*\*.*?\*\*\*|\*\*.*?\*\*|\*.*?\*)", text)
for part in parts:
if not part:
continue
run = paragraph.add_run(part)
run.font.name = "Courier"
run.font.size = Pt(12)
# Bold
bold_match = re.fullmatch(r"\*\*(.+?)\*\*", part)
if bold_match:
run.bold = True
part = bold_match.group(1)
# Italic
italic_match = re.fullmatch(r"\*(.+?)\*", part)
if italic_match:
run.italic = True
part = italic_match.group(1)
run.text = part
+617
View File
@@ -0,0 +1,617 @@
"""
Email sender module for ScrAIbe.
Sends transcription outputs (TXT, JSON, etc.) via SMTP.
All credentials are configured via environment variables.
Supports both plain text and HTML email bodies.
Template placeholders are primarily filled via environment variables.
"""
import base64
import json
import logging
import os
import re
import smtplib
from email import encoders
from email.mime.base import MIMEBase
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
from typing import Any, Dict, List, Optional
from docx import Document
from docx.oxml import OxmlElement
from docx.oxml.ns import qn
from docx.shared import Inches, Pt
from docx.enum.text import WD_ALIGN_PARAGRAPH
logger = logging.getLogger("scraibe.email_sender")
class EmailError(Exception):
pass
def get_email_config():
"""
Read email configuration from environment variables.
Raises EmailError if required fields are missing.
"""
smtp_host = os.getenv("EMAIL_SMTP_HOST")
smtp_port = os.getenv("EMAIL_SMTP_PORT")
smtp_user = os.getenv("EMAIL_SMTP_USER")
smtp_password = os.getenv("EMAIL_SMTP_PASSWORD")
from_address = os.getenv("EMAIL_FROM_ADDRESS")
use_tls_str = os.getenv("EMAIL_SMTP_USE_TLS", "true").strip().lower()
use_tls = use_tls_str not in ("false", "0", "no")
if not all([smtp_host, smtp_port, smtp_user, smtp_password, from_address]):
raise EmailError(
"Email configuration incomplete. "
"Ensure EMAIL_SMTP_HOST, EMAIL_SMTP_PORT, EMAIL_SMTP_USER, "
"EMAIL_SMTP_PASSWORD, and EMAIL_FROM_ADDRESS are set."
)
return {
"smtp_host": smtp_host,
"smtp_port": int(smtp_port),
"smtp_user": smtp_user,
"smtp_password": smtp_password,
"from_address": from_address,
"use_tls": use_tls,
}
def _load_css(path: str) -> str:
"""
Load CSS file content if it exists.
"""
if not path or not os.path.exists(path):
return ""
with open(path, "r", encoding="utf-8") as f:
return f.read()
def _email_logo_html() -> str:
"""
Return a subtle watermark-style logo for emails.
- Priority:
1) EMAIL_LOGO_URL (direct URL)
2) EMAIL_LOGO_PATH (local file as base64)
- Style: small, faint, bottom-right, non-intrusive.
"""
logo_url = os.getenv("EMAIL_LOGO_URL")
src = logo_url
if not logo_url:
logo_path = os.getenv("EMAIL_LOGO_PATH", "/app/src/misc/logo1.png")
if os.path.exists(logo_path):
try:
with open(logo_path, "rb") as f:
b64 = base64.b64encode(f.read()).decode("utf-8")
src = f"data:image/png;base64,{b64}"
except Exception:
src = None
if not src:
return ""
# Watermark: bottom-right, low opacity, compact
return (
f'<div style="text-align: right; margin-top: 24px; opacity: 0.15;">'
f'<img src="{src}" alt="Logo" style="max-width: 90px; height: auto; display: inline-block;" />'
f'</div>'
)
def _accent_color() -> str:
"""
Accent color for UI and emails.
Default: #7C6DA0
"""
return os.getenv("EMAIL_ACCENT_COLOR", "#7C6DA0")
def build_template_context(**runtime_kwargs: Any) -> Dict[str, Any]:
"""
Build a context dict for templates from:
- environment variables (base, customizable)
- runtime-provided values (override env if present)
Environment variables:
- EMAIL_CONTACT_ADDRESS: value for {contact_email}
- EMAIL_CSS_PATH: path to mail_style.css (optional; we inline it)
- EMAIL_LOGO_URL: URL for email logo (preferred)
- EMAIL_LOGO_PATH: fallback local path for email logo
- EMAIL_ACCENT_COLOR: accent color (default #7C6DA0)
"""
# Load and inline mail_style.css for consistent email styling
css_path = os.getenv("EMAIL_CSS_PATH", "/app/src/misc/mail_style.css")
css_text = _load_css(css_path)
# Build logo HTML (URL or local fallback)
logo_html = _email_logo_html()
# Accent color
accent = _accent_color()
ctx: Dict[str, Any] = {
"contact_email": os.getenv("EMAIL_CONTACT_ADDRESS", "support@example.com"),
"email_css": css_text,
"email_logo": logo_html,
"accent_color": accent,
}
# Runtime values override env if provided
if runtime_kwargs:
ctx.update(runtime_kwargs)
return ctx
def load_template(template_name: str, **runtime_kwargs: Any) -> str:
"""
Load an HTML email template from misc/ and render placeholders.
Expects files like:
/app/src/misc/upload_notification_template.html
/app/src/misc/success_template.html
/app/src/misc/error_notification_template.html
"""
base = os.getenv("SCRAIBE_TEMPLATES_DIR", "/app/src/misc")
path = os.path.join(base, template_name)
if not os.path.exists(path):
raise EmailError(f"Email template not found: {path}")
with open(path, "r", encoding="utf-8") as f:
template = f.read()
# Build context from env + runtime
ctx = build_template_context(**runtime_kwargs)
# Replace {placeholder} style variables safely
try:
return template.format(**ctx)
except KeyError as e:
raise EmailError(f"Missing template variable: {e}")
def send_email(
to: str,
subject: str,
body: str,
html: Optional[str],
attachments: List[str],
cc: Optional[str] = None,
) -> bool:
"""
Send an email with optional HTML body and file attachments.
Args:
to: Comma-separated list of recipient email addresses.
subject: Email subject.
body: Email body (plain text).
html: Email body (HTML), or None.
attachments: List of file paths to attach.
cc: Comma-separated list of CC email addresses (optional).
Returns:
True if sent successfully.
Raises:
EmailError if sending fails.
"""
try:
cfg = get_email_config()
except EmailError as e:
logger.error("Email configuration error: %s", e)
raise
# Parse recipients
to_list = [addr.strip() for addr in to.split(",") if addr.strip()]
cc_list = [addr.strip() for addr in cc.split(",") if addr.strip()] if cc else []
if not to_list:
raise EmailError("No valid 'To' email addresses provided.")
# Ensure subject is never blank
if not subject or not subject.strip():
logger.warning("Subject was blank or missing; using default subject.")
subject = "ScrAIbe: Your transcript is ready"
subject = subject.strip()
has_attachments = bool(attachments)
# Build the text/HTML part (alternative)
alt = MIMEMultipart("alternative")
alt.attach(MIMEText(body, "plain"))
if html:
alt.attach(MIMEText(html, "html"))
if has_attachments:
# Outer message: multipart/mixed with headers
msg = MIMEMultipart("mixed")
msg["From"] = cfg["from_address"]
msg["To"] = ", ".join(to_list)
if cc_list:
msg["Cc"] = ", ".join(cc_list)
msg["Subject"] = subject
# Attach the alternative (text/HTML) part
msg.attach(alt)
# Attach files
for file_path in attachments:
if not os.path.isfile(file_path):
logger.warning("Attachment file not found, skipping: %s", file_path)
continue
try:
with open(file_path, "rb") as f:
part = MIMEBase("application", "octet-stream")
part.set_payload(f.read())
encoders.encode_base64(part)
part.add_header(
"Content-Disposition",
"attachment",
filename=os.path.basename(file_path),
)
msg.attach(part)
except Exception as e:
logger.warning("Failed to attach file %s: %s", file_path, e)
else:
# No attachments: use the alternative part as the root message
msg = alt
msg["From"] = cfg["from_address"]
msg["To"] = ", ".join(to_list)
if cc_list:
msg["Cc"] = ", ".join(cc_list)
msg["Subject"] = subject
# Connect and send
try:
if cfg["use_tls"]:
server = smtplib.SMTP(cfg["smtp_host"], cfg["smtp_port"], timeout=30)
server.ehlo()
server.starttls()
server.ehlo()
else:
server = smtplib.SMTP(cfg["smtp_host"], cfg["smtp_port"], timeout=30)
server.ehlo()
server.login(cfg["smtp_user"], cfg["smtp_password"])
server.sendmail(
cfg["from_address"],
to_list + cc_list,
msg.as_string(),
)
server.quit()
logger.info(
"Email sent to %s (CC: %s) with subject: %s",
to_list,
cc_list or "None",
subject,
)
return True
except Exception as e:
logger.error("Failed to send email: %s", e)
raise EmailError(f"Failed to send email: {e}")
# ------------ DOCX helpers ------------
# Namespaces
W_NS = "http://schemas.openxmlformats.org/wordprocessingml/2006/main"
def _set_element_attr(elem, attr, value):
elem.set(f"{{{W_NS}}}{attr}", str(value))
def _create_transcript_section_properties(section):
"""
Configure the section properties for transcript DOCX:
- Margins: 1 inch all sides
- Single column layout
- No built-in line numbering (we embed line numbers as text for portability)
- Remove document grid to avoid off-by-one line numbering
"""
sectPr = section._sectPr
# Margins: 1 inch = 1440 twips
pgMar = sectPr.find(f"{{{W_NS}}}pgMar")
if pgMar is None:
pgMar = OxmlElement("w:pgMar")
sectPr.append(pgMar)
_set_element_attr(pgMar, "top", "1440")
_set_element_attr(pgMar, "right", "1440")
_set_element_attr(pgMar, "bottom", "1440")
_set_element_attr(pgMar, "left", "1440")
_set_element_attr(pgMar, "header", "720")
_set_element_attr(pgMar, "footer", "720")
_set_element_attr(pgMar, "gutter", "0")
# Ensure single column (no multi-column layout)
cols = sectPr.find(f"{{{W_NS}}}cols")
if cols is not None:
_set_element_attr(cols, "num", "1")
_set_element_attr(cols, "space", "720")
# Remove document grid entirely
for docGrid in sectPr.findall(f"{{{W_NS}}}docGrid"):
sectPr.remove(docGrid)
# Remove any built-in line numbering; we will use text-based line numbers
for lnNumType in sectPr.findall(f"{{{W_NS}}}lnNumType"):
sectPr.remove(lnNumType)
def _add_transcript_paragraph(doc, line_text, line_number):
"""
Add a single transcript line as a paragraph with an embedded line number.
Uses a left tab stop so the line number appears in the left margin area,
independent of built-in line numbering, ensuring consistent behavior
across Word, LibreOffice, Google Docs, etc.
"""
line_text = line_text.strip()
if not line_text:
return
p = doc.add_paragraph()
# Set up paragraph formatting:
# - No left indent; we control spacing via tab stop
# - Single line spacing, no extra before/after
pPr = p._p.get_or_add_pPr()
# Remove any default indent
pPr.find(f"{{{W_NS}}}ind") and pPr.remove(pPr.find(f"{{{W_NS}}}ind"))
# Define a left tab stop for line numbers (e.g. 360 twips ≈ 0.25")
tabs = OxmlElement("w:tabs")
tab = OxmlElement("w:tab")
tab.set("{http://schemas.openxmlformats.org/wordprocessingml/2006/main}val", "left")
tab.set("{http://schemas.openxmlformats.org/wordprocessingml/2006/main}pos", "360")
tabs.append(tab)
pPr.append(tabs)
spacing = OxmlElement("w:spacing")
_set_element_attr(spacing, "before", "0")
_set_element_attr(spacing, "after", "0")
_set_element_attr(spacing, "line", "360") # 1.5 line spacing (12pt * 1.5 = 18pt → 360 twips)
_set_element_attr(spacing, "lineRule", "auto")
pPr.append(spacing)
# Try to match: [00:00] SPEAKER 1: content
m = re.match(r"\[(\d+:\d+(?::\d+)?)\]\s*(.+?):\s*(.*)", line_text)
# Line number run (no underline)
run_ln = p.add_run(str(line_number))
run_ln.font.name = "Courier"
run_ln.font.size = Pt(12)
run_ln.underline = False
# Tab + spaces between line number and content
# - 2 base spaces + 7 more for first line of speaker turn
# - 2 base spaces + 3 more for continuation lines
if m:
extra_spaces = " " # 7 spaces for speaker lines
else:
extra_spaces = " " # 3 spaces for continuation lines
run_tab = p.add_run("\t " + extra_spaces)
run_tab.font.name = "Courier"
run_tab.font.size = Pt(12)
run_tab.underline = False
if m:
ts, speaker, content = m.groups()
label_text = f"[{ts}] {speaker.upper()}:"
# Label run (underline)
run_label = p.add_run(label_text)
run_label.underline = True
run_label.font.name = "Courier"
run_label.font.size = Pt(12)
# Space run (no underline)
run_space = p.add_run(" ")
run_space.underline = False
run_space.font.name = "Courier"
run_space.font.size = Pt(12)
# Content run (no underline)
run_txt = p.add_run(content.strip())
run_txt.underline = False
run_txt.font.name = "Courier"
run_txt.font.size = Pt(12)
else:
# Non-standard line: plain text
run = p.add_run(line_text)
run.underline = False
run.font.name = "Courier"
run.font.size = Pt(12)
# ------------ Public DOCX functions ------------
def create_transcript_docx(text: str, filename: str):
"""
Create a transcript DOCX with:
- 1" margins on all sides
- 12pt Courier font
- Each page has exactly 29 numbered lines of text
- Max 60 characters per line (including number and spaces)
- Words preserved (no clipping or omission)
- Blank spacing between number and text preserved
- Page break after every 29 lines
- Centered footer: "X of Y"
"""
# Step 1: Prepare transcript into pages of 29 lines each
# Each line <= 60 chars total, words preserved, no clipping
# Structure: nested list of paragraphs (pages -> lines)
prepared_pages = []
current_page = []
line_count = 0
# 52 chars content + 2 digits + 1 tab + 9 spaces = 64 max
MAX_CONTENT_LEN = 52
for raw_line in text.strip().splitlines():
raw_line = raw_line.strip()
if not raw_line:
continue
# Wrap into segments without clipping words
words = raw_line.split()
segments = []
current = ""
for w in words:
if not current:
current = w
elif len(current) + 1 + len(w) <= MAX_CONTENT_LEN:
current += " " + w
else:
segments.append(current)
current = w
if current:
segments.append(current)
# Add segments to pages, enforcing 29 lines per page
for seg in segments:
if line_count == 30:
prepared_pages.append(current_page)
current_page = []
line_count = 0
current_page.append(seg)
line_count += 1
if current_page:
prepared_pages.append(current_page)
# Step 2: Create DOCX
doc = Document()
style = doc.styles["Normal"]
style.font.name = "Courier"
style.font.size = Pt(12)
body = doc.element.body
for p in list(body.findall(f"{{{W_NS}}}p")):
body.remove(p)
_create_transcript_section_properties(doc.sections[0])
# Step 3: Optionally add cover page
from . import docx_cover
cover_enabled = os.getenv("COVER_PAGE_ENABLED", "false").strip().lower() in ("true", "1", "yes")
if cover_enabled:
docx_cover.add_cover_page(
doc,
title="TRANSCRIPT",
subtitle=None,
metadata=None,
include_logo=True,
)
# Step 4: Write prepared pages into DOCX
for page_idx, page_lines in enumerate(prepared_pages):
# Insert page break between pages
if page_idx > 0:
p_break = doc.add_paragraph()
pPr = p_break._p.get_or_add_pPr()
for child in list(pPr):
tag = child.tag.split("}")[-1] if "}" in child.tag else child.tag
if tag in ("tabs", "spacing", "ind"):
pPr.remove(child)
page_break = OxmlElement("w:pageBreak")
page_break.set("{http://schemas.openxmlformats.org/wordprocessingml/2006/main}val", "1")
pPr.append(page_break)
# Write each line with its number (1-29)
for line_num, line_text in enumerate(page_lines, start=1):
_add_transcript_paragraph(doc, line_text, line_number=line_num)
# Step 5: Add footer: "X of Y" centered
section = doc.sections[0]
footer = section.footer
footer.is_linked_to_previous = False
footer_para = footer.paragraphs[0] if footer.paragraphs else footer.add_paragraph()
footer_para.alignment = WD_ALIGN_PARAGRAPH.CENTER
for r in footer_para.runs:
r.text = ""
def add_field(run, code):
fldChar = OxmlElement("w:fldChar")
fldChar.set(qn("w:fldCharType"), "begin")
run._r.append(fldChar)
instrText = OxmlElement("w:instrText")
instrText.set(qn("xml:space"), "preserve")
instrText.text = code
run._r.append(instrText)
fldCharEnd = OxmlElement("w:fldChar")
fldCharEnd.set(qn("w:fldCharType"), "end")
run._r.append(fldCharEnd)
run_page = footer_para.add_run()
add_field(run_page, " PAGE ")
run_of = footer_para.add_run(" of ")
run_total = footer_para.add_run()
add_field(run_total, " NUMPAGES ")
doc.save(filename)
def create_summary_docx(text: str, filename: str):
"""
Create a summary DOCX with:
- 1" margins on all sides
- 12pt Courier font
- Markdown-aware WYSIWYG styling (headings, bullets, bold/italic)
"""
from . import docx_styles
doc = Document()
# Base font
style = doc.styles["Normal"]
style.font.name = "Courier"
style.font.size = Pt(12)
# Margins: 1 inch all sides
for section in doc.sections:
section.left_margin = Inches(1.0)
section.right_margin = Inches(1.0)
section.top_margin = Inches(1.0)
section.bottom_margin = Inches(1.0)
# Remove default paragraph
body = doc.element.body
for p in list(body.findall(f"{{{W_NS}}}p")):
body.remove(p)
# Optionally add cover page
from . import docx_cover
cover_enabled = os.getenv("COVER_PAGE_ENABLED", "false").strip().lower() in ("true", "1", "yes")
if cover_enabled:
docx_cover.add_cover_page(
doc,
title="SUMMARY",
subtitle=None,
metadata=None,
include_logo=True,
)
# Add summary content using markdown-aware styling
if text.strip():
docx_styles.parse_simple_md_to_paragraphs(doc, text.strip())
doc.save(filename)
+347 -18
View File
@@ -9,20 +9,35 @@ It replaces the previous local Whisper + Pyannote pipeline by sending
audio files to the /v1/audio/diarization endpoint and mapping the audio files to the /v1/audio/diarization endpoint and mapping the
response into the same Transcript format used by the UI. response into the same Transcript format used by the UI.
For long audio files, it can chunk the input to avoid GPU OOM errors.
Environment Variables: Environment Variables:
LOCALAI_API_URL: (required) Base URL of the LocalAI server LOCALAI_API_URL: (required) Base URL of the LocalAI server
(e.g., http://localhost:8080) (e.g., http://localhost:8080)
LOCALAI_API_KEY: (optional) API key, if configured LOCALAI_API_KEY: (optional) API key, if configured
LOCALAI_MODEL: (optional) Model name to use (default: vibevoice-diarize) LOCALAI_MODEL: (optional) Model name to use (default: vibevoice-diarize)
Chunking / long audio (all optional):
LOCALAI_CHUNK_DURATION: Max duration of each chunk in seconds
(default: 180.0)
LOCALAI_CHUNK_OVERLAP: Overlap between consecutive chunks in seconds
(default: 2.0)
LOCALAI_MAX_SINGLE_REQUEST_DURATION: If audio duration exceeds this, chunking
is enabled automatically (default: 300.0)
""" """
import os import os
import io import io
import json import json
import logging
from typing import Dict, List, Any, Optional from typing import Dict, List, Any, Optional
import httpx import httpx
from .audio import get_audio_duration, split_audio_into_chunks
logger = logging.getLogger("scraibe.localai_client")
class LocalAIError(Exception): class LocalAIError(Exception):
"""Raised when the LocalAI API returns an error or unexpected response.""" """Raised when the LocalAI API returns an error or unexpected response."""
@@ -36,16 +51,22 @@ class LocalAIClient:
Responsibilities: Responsibilities:
- Read configuration from environment. - Read configuration from environment.
- Upload audio file as multipart/form-data. - Upload audio file as multipart/form-data.
- Parse diarization + transcription response. - Parse diarization + transcription response (verbose_json).
- Map response into the same structure expected by Scraibe's Transcript. - Map response into the same structure expected by Scraibe's Transcript.
- For long audio: chunk, transcribe each chunk, merge results.
""" """
# Default thresholds for chunking long audio to avoid GPU OOM.
# These can be overridden via environment or at call time.
DEFAULT_CHUNK_DURATION = 180.0 # seconds
DEFAULT_CHUNK_OVERLAP = 2.0 # seconds
def __init__( def __init__(
self, self,
api_url: Optional[str] = None, api_url: Optional[str] = None,
api_key: Optional[str] = None, api_key: Optional[str] = None,
model: Optional[str] = None, model: Optional[str] = None,
timeout: float = 600.0, timeout: float = 3600.0,
): ):
""" """
Args: Args:
@@ -67,12 +88,67 @@ class LocalAIClient:
"Provide the LocalAI server URL via environment or constructor." "Provide the LocalAI server URL via environment or constructor."
) )
logger.info(
"Initializing LocalAIClient: url=%s model=%s",
self.api_url,
self.model,
)
self._client = httpx.Client( self._client = httpx.Client(
base_url=self.api_url, base_url=self.api_url,
timeout=self.timeout, timeout=self.timeout,
follow_redirects=True, follow_redirects=True,
) )
@staticmethod
def _env_float(var: str, default: float) -> float:
"""
Read a float from environment with a fallback default.
"""
val = (os.getenv(var) or "").strip()
if val == "":
return default
try:
return float(val)
except ValueError:
logger.warning(
"Invalid value for %s: %s; using default %s", var, val, default
)
return default
def _effective_chunk_duration(self, provided: Optional[float]) -> float:
"""
Resolve chunk_duration using this precedence:
1) provided argument
2) LOCALAI_CHUNK_DURATION env
3) class default
"""
if provided is not None:
return provided
return self._env_float("LOCALAI_CHUNK_DURATION", self.DEFAULT_CHUNK_DURATION)
def _effective_chunk_overlap(self, provided: Optional[float]) -> float:
"""
Resolve chunk_overlap:
1) provided argument
2) LOCALAI_CHUNK_OVERLAP env
3) class default
"""
if provided is not None:
return provided
return self._env_float("LOCALAI_CHUNK_OVERLAP", self.DEFAULT_CHUNK_OVERLAP)
def _effective_max_single_request_duration(self, provided: Optional[float]) -> float:
"""
Resolve max_single_request_duration:
1) provided argument
2) LOCALAI_MAX_SINGLE_REQUEST_DURATION env
3) default 300.0
"""
if provided is not None:
return provided
return self._env_float("LOCALAI_MAX_SINGLE_REQUEST_DURATION", 300.0)
def close(self): def close(self):
"""Close the underlying HTTP client.""" """Close the underlying HTTP client."""
self._client.close() self._client.close()
@@ -97,20 +173,19 @@ class LocalAIClient:
response_format: Optional[str] = None, response_format: Optional[str] = None,
include_text: Optional[bool] = None, include_text: Optional[bool] = None,
verbose: bool = False, verbose: bool = False,
return_raw: bool = False,
use_chunking: Optional[bool] = None,
chunk_duration: Optional[float] = None,
chunk_overlap: Optional[float] = None,
max_single_request_duration: Optional[float] = None,
**_ignored, **_ignored,
) -> Dict[str, Any]: ) -> Dict[str, Any]:
""" """
Send audio to LocalAI /v1/audio/diarization and return a dict Send audio to LocalAI /v1/audio/diarization and return:
in the same style as the previous internal diarization output: - A normalized dict with segments, speakers, transcripts.
- Optionally, the raw verbose_json response (for JSON export).
{ For long audio, it can automatically chunk the file to avoid GPU OOM.
"segments": [ [start, end], ... ],
"speakers": [ "SPEAKER_00", ... ],
"transcripts": [ "text for segment", ... ]
}
Extra kwargs that the old UI used (e.g., whisper-specific) are
accepted but ignored.
Args: Args:
audio_path: Path to the audio file. audio_path: Path to the audio file.
@@ -122,15 +197,105 @@ class LocalAIClient:
min_duration_on: Optional min segment duration. min_duration_on: Optional min segment duration.
min_duration_off: Optional min gap duration. min_duration_off: Optional min gap duration.
response_format: "json", "verbose_json", or "rttm". response_format: "json", "verbose_json", or "rttm".
Defaults to "verbose_json" if not set. Defaults to "verbose_json".
include_text: Whether to request per-segment text. include_text: Whether to request per-segment text.
Defaults to True. Defaults to True.
verbose: If True, prints progress messages. verbose: If True, prints progress messages.
return_raw: If True, also return the raw API response in 'raw_result'.
use_chunking: Whether to enable chunking for long audio.
If None, enabled automatically based on duration.
chunk_duration: Max duration per chunk in seconds.
Falls back to LOCALAI_CHUNK_DURATION env, then 180.0.
chunk_overlap: Overlap between chunks in seconds.
Falls back to LOCALAI_CHUNK_OVERLAP env, then 2.0.
max_single_request_duration: If audio duration exceeds this, chunking
is enabled (unless explicitly disabled).
Falls back to LOCALAI_MAX_SINGLE_REQUEST_DURATION
env, then 300.0.
""" """
if verbose: if verbose:
print("Starting diarization and transcription via LocalAI.") print("Starting diarization and transcription via LocalAI.")
# Defaults: use verbose_json + include_text to get both diarization and transcription. logger.info("diarize_and_transcribe requested for: %s", audio_path)
# Resolve chunking parameters with environment support
chunk_duration = self._effective_chunk_duration(chunk_duration)
chunk_overlap = self._effective_chunk_overlap(chunk_overlap)
max_single = self._effective_max_single_request_duration(max_single_request_duration)
if use_chunking is None:
try:
duration = get_audio_duration(audio_path)
except RuntimeError:
duration = None
use_chunking = (duration is not None and duration > max_single)
logger.info(
"Auto-chunking decision: duration=%s, threshold=%s, use_chunking=%s",
duration,
max_single,
use_chunking,
)
if use_chunking:
return self._diarize_and_transcribe_chunked(
audio_path=audio_path,
language=language,
num_speakers=num_speakers,
min_speakers=min_speakers,
max_speakers=max_speakers,
clustering_threshold=clustering_threshold,
min_duration_on=min_duration_on,
min_duration_off=min_duration_off,
response_format=response_format,
include_text=include_text,
verbose=verbose,
return_raw=return_raw,
chunk_duration=chunk_duration,
chunk_overlap=chunk_overlap,
)
# Single-request path (existing behavior)
return self._diarize_and_transcribe_single(
audio_path=audio_path,
language=language,
num_speakers=num_speakers,
min_speakers=min_speakers,
max_speakers=max_speakers,
clustering_threshold=clustering_threshold,
min_duration_on=min_duration_on,
min_duration_off=min_duration_off,
response_format=response_format,
include_text=include_text,
verbose=verbose,
return_raw=return_raw,
)
def _diarize_and_transcribe_single(
self,
audio_path: str,
*,
language: Optional[str] = None,
num_speakers: Optional[int] = None,
min_speakers: Optional[int] = None,
max_speakers: Optional[int] = None,
clustering_threshold: Optional[float] = None,
min_duration_on: Optional[float] = None,
min_duration_off: Optional[float] = None,
response_format: Optional[str] = None,
include_text: Optional[bool] = None,
verbose: bool = False,
return_raw: bool = False,
) -> Dict[str, Any]:
"""
Internal: single-request diarization and transcription.
"""
if verbose:
print("Starting diarization and transcription via LocalAI.")
logger.info("diarize_and_transcribe requested for: %s", audio_path)
# Always use verbose_json for diarization + speaker info
if response_format is None: if response_format is None:
response_format = "verbose_json" response_format = "verbose_json"
if include_text is None: if include_text is None:
@@ -158,6 +323,8 @@ class LocalAIClient:
if min_duration_off is not None: if min_duration_off is not None:
data["min_duration_off"] = str(min_duration_off) data["min_duration_off"] = str(min_duration_off)
logger.debug("LocalAI request params: %s", data)
# Open file # Open file
if not os.path.exists(audio_path): if not os.path.exists(audio_path):
raise LocalAIError(f"Audio file not found: {audio_path}") raise LocalAIError(f"Audio file not found: {audio_path}")
@@ -172,6 +339,7 @@ class LocalAIClient:
headers["Authorization"] = f"Bearer {self.api_key}" headers["Authorization"] = f"Bearer {self.api_key}"
# POST /v1/audio/diarization # POST /v1/audio/diarization
logger.info("Sending request to LocalAI: /v1/audio/diarization")
resp = self._client.post( resp = self._client.post(
"/v1/audio/diarization", "/v1/audio/diarization",
data=data, data=data,
@@ -179,15 +347,19 @@ class LocalAIClient:
headers=headers, headers=headers,
) )
logger.info("LocalAI response status: %d", resp.status_code)
if resp.status_code >= 400: if resp.status_code >= 400:
body = resp.text body = resp.text
logger.error("LocalAI error response: %s", body)
raise LocalAIError( raise LocalAIError(
f"LocalAI request failed with status {resp.status_code}: {body}" f"LocalAI request failed with status {resp.status_code}: {body}"
) )
try: try:
result = resp.json() raw_result = resp.json()
except json.JSONDecodeError: except json.JSONDecodeError:
logger.error("Failed to parse LocalAI response as JSON.")
raise LocalAIError( raise LocalAIError(
"Failed to parse LocalAI response as JSON." "Failed to parse LocalAI response as JSON."
) )
@@ -195,11 +367,163 @@ class LocalAIClient:
if verbose: if verbose:
print("Diarization and transcription finished. Starting post-processing.") print("Diarization and transcription finished. Starting post-processing.")
return self._parse_diarization_response(result) parsed = self._parse_diarization_response(raw_result)
if return_raw:
parsed["raw_result"] = raw_result
return parsed
def _diarize_and_transcribe_chunked(
self,
audio_path: str,
*,
language: Optional[str] = None,
num_speakers: Optional[int] = None,
min_speakers: Optional[int] = None,
max_speakers: Optional[int] = None,
clustering_threshold: Optional[float] = None,
min_duration_on: Optional[float] = None,
min_duration_off: Optional[float] = None,
response_format: Optional[str] = None,
include_text: Optional[bool] = None,
verbose: bool = False,
return_raw: bool = False,
chunk_duration: float = DEFAULT_CHUNK_DURATION,
chunk_overlap: float = DEFAULT_CHUNK_OVERLAP,
) -> Dict[str, Any]:
"""
Internal: chunked diarization and transcription for long audio.
- Splits audio into overlapping chunks.
- Transcribes each chunk via /v1/audio/diarization.
- Merges segments with adjusted timestamps.
"""
if verbose:
print("Audio is long; splitting into chunks to avoid GPU memory issues.")
logger.info(
"Chunked transcription: chunk_duration=%s, overlap=%s",
chunk_duration,
chunk_overlap,
)
chunks = split_audio_into_chunks(
input_path=audio_path,
max_duration=chunk_duration,
overlap=chunk_overlap,
)
if len(chunks) == 1:
# No actual split needed; fall back to single-request path
return self._diarize_and_transcribe_single(
audio_path=chunks[0]["path"],
language=language,
num_speakers=num_speakers,
min_speakers=min_speakers,
max_speakers=max_speakers,
clustering_threshold=clustering_threshold,
min_duration_on=min_duration_on,
min_duration_off=min_duration_off,
response_format=response_format,
include_text=include_text,
verbose=verbose,
return_raw=return_raw,
)
all_segments: List[List[float]] = []
all_speakers: List[str] = []
all_transcripts: List[str] = []
raw_results: List[Dict[str, Any]] = []
temp_files = [c["path"] for c in chunks]
try:
for i, chunk_info in enumerate(chunks):
chunk_path = chunk_info["path"]
chunk_start = chunk_info["start"]
if verbose:
print(
f"Transcribing chunk {i+1}/{len(chunks)} "
f"(start={chunk_start:.1f}s)"
)
logger.info(
"Transcribing chunk %d/%d, start=%.1f", i + 1, len(chunks), chunk_start
)
# Use single-request logic for each chunk
chunk_result = self._diarize_and_transcribe_single(
audio_path=chunk_path,
language=language,
num_speakers=num_speakers,
min_speakers=min_speakers,
max_speakers=max_speakers,
clustering_threshold=clustering_threshold,
min_duration_on=min_duration_on,
min_duration_off=min_duration_off,
response_format=response_format,
include_text=include_text,
verbose=False,
return_raw=return_raw,
)
segs = chunk_result.get("segments", [])
spks = chunk_result.get("speakers", [])
txts = chunk_result.get("transcripts", [])
raw = chunk_result.get("raw_result")
# Adjust timestamps to global timeline
adjusted_segs = []
for seg, sp, txt in zip(segs, spks, txts):
start = float(seg[0]) + chunk_start
end = float(seg[1]) + chunk_start
adjusted_segs.append([start, end])
all_speakers.append(sp)
all_transcripts.append(txt)
all_segments.extend(adjusted_segs)
if return_raw and raw is not None:
raw_results.append(raw)
finally:
# Clean up temporary chunk files
for path in temp_files:
if path and os.path.exists(path) and path != audio_path:
try:
os.remove(path)
except Exception as e:
logger.warning("Failed to remove chunk file %s: %s", path, e)
# Sort segments by start time
combined = list(zip(all_segments, all_speakers, all_transcripts))
combined.sort(key=lambda x: x[0][0])
all_segments = [x[0] for x in combined]
all_speakers = [x[1] for x in combined]
all_transcripts = [x[2] for x in combined]
if verbose:
print(
f"Chunked transcription complete. Total segments: {len(all_segments)}"
)
result = {
"segments": all_segments,
"speakers": all_speakers,
"transcripts": all_transcripts,
}
if return_raw and raw_results:
result["raw_result"] = {
"chunked": True,
"chunks": raw_results,
}
return result
def _parse_diarization_response(self, result: Dict[str, Any]) -> Dict[str, Any]: def _parse_diarization_response(self, result: Dict[str, Any]) -> Dict[str, Any]:
""" """
Convert LocalAI response into the internal format used by Scraibe: Convert LocalAI verbose_json response into the internal format used by Scraibe:
{ {
"segments": [ [start, end], ... ], "segments": [ [start, end], ... ],
"speakers": [ "SPEAKER_00", ... ], "speakers": [ "SPEAKER_00", ... ],
@@ -209,7 +533,7 @@ class LocalAIClient:
segments = result.get("segments", []) segments = result.get("segments", [])
if not segments: if not segments:
# If no segments, return empty but valid structure logger.warning("LocalAI returned no segments.")
return { return {
"segments": [], "segments": [],
"speakers": [], "speakers": [],
@@ -230,6 +554,11 @@ class LocalAIClient:
out_speakers.append(speaker) out_speakers.append(speaker)
out_transcripts.append(text) out_transcripts.append(text)
logger.info(
"Parsed %d segments from LocalAI.",
len(out_segments),
)
return { return {
"segments": out_segments, "segments": out_segments,
"speakers": out_speakers, "speakers": out_speakers,
+205
View File
@@ -0,0 +1,205 @@
"""
MCP-style HTTP server for ScrAIbe.
- Exposes an OpenAPI-compliant endpoint for external LLMs to:
- Upload audio
- Receive transcript JSON (no summary)
- WebUI remains always enabled; this is additive.
Configuration (env):
- MCP_SERVER_ENABLED: "true"/"false" (default: false)
- MCP_SERVER_HOST: bind address (default: 0.0.0.0)
- MCP_SERVER_PORT: port (default: 8000)
- MCP_USE_CELERY: "true"/"false" (default: true)
- If true, uses Celery tasks; if false, runs synchronously.
"""
import os
import time
import uuid
import json
import logging
from typing import Optional
from fastapi import FastAPI, UploadFile, File, Form, HTTPException
from fastapi.responses import JSONResponse
from .autotranscript import Scraibe
logger = logging.getLogger("scraibe.mcp_server")
app = FastAPI(
title="ScrAIbe MCP Transcription API",
version="0.1.0",
description=(
"MCP-style HTTP API for ScrAIbe. "
"Allows external LLMs to upload audio and receive transcript JSON."
),
)
# In-memory job store for MCP (simple; can be replaced with Redis later)
_mcp_jobs: dict = {}
def _job_id() -> str:
return str(uuid.uuid4())
@app.get("/health")
async def health():
return {"status": "ok"}
@app.post("/transcribe")
async def transcribe(
file: UploadFile = File(...),
language: Optional[str] = Form(None),
num_speakers: Optional[int] = Form(None),
):
"""
Upload audio and start transcription.
Returns:
{
"job_id": "<id>",
"status": "queued" | "processing" | "completed" | "error",
"message": "..."
}
Use GET /transcribe/{job_id}/status and /json to retrieve results.
"""
use_celery = os.getenv("MCP_USE_CELERY", "true").strip().lower() in ("true", "1", "yes")
# Save uploaded file temporarily
try:
import tempfile
from pathlib import Path
upload_dir = Path(os.getenv("SCRAIBE_UPLOAD_DIR", "/tmp/scraibe_uploads"))
upload_dir.mkdir(parents=True, exist_ok=True)
ext = Path(file.filename or "file").suffix or ".wav"
ts = time.strftime("%Y%m%d%H%M%S")
tmp_name = f"mcp_upload_{ts}_{uuid.uuid4().hex[:8]}{ext}"
file_path = upload_dir / tmp_name
content = await file.read()
file_path.write_bytes(content)
except Exception as e:
logger.error("Error saving MCP upload: %s", e)
raise HTTPException(status_code=500, detail=f"Error saving file: {e}")
job_id = _job_id()
if use_celery:
try:
from .tasks import process_mcp_transcribe_task
except ImportError:
# Fallback: run synchronously
use_celery = False
if use_celery:
try:
process_mcp_transcribe_task.delay(
audio_path=str(file_path),
job_id=job_id,
language=language or None,
num_speakers=int(num_speakers) if num_speakers else None,
)
except Exception as e:
logger.error("Error enqueuing MCP job: %s", e)
_mcp_jobs[job_id] = {
"status": "error",
"message": f"Error enqueuing job: {e}",
"file_path": str(file_path),
}
return {
"job_id": job_id,
"status": "error",
"message": _mcp_jobs[job_id]["message"],
}
_mcp_jobs[job_id] = {
"status": "queued",
"message": "Job queued for processing.",
"file_path": str(file_path),
}
return {
"job_id": job_id,
"status": "queued",
"message": _mcp_jobs[job_id]["message"],
}
# Synchronous path
_mcp_jobs[job_id] = {
"status": "processing",
"message": "Transcription started (synchronous).",
"file_path": str(file_path),
}
def _run_sync():
try:
scraibe = Scraibe(verbose=False)
result = scraibe.transcribe(
audio_file=str(file_path),
language=language or None,
num_speakers=int(num_speakers) if num_speakers else None,
verbose=False,
for_export=True,
)
transcript_text = result.get("transcript", "")
segments = result.get("segments", [])
_mcp_jobs[job_id]["status"] = "completed"
_mcp_jobs[job_id]["transcript"] = transcript_text
_mcp_jobs[job_id]["segments"] = segments
_mcp_jobs[job_id]["message"] = "Transcription completed."
except Exception as e:
logger.error("MCP sync transcription error: %s", e)
_mcp_jobs[job_id]["status"] = "error"
_mcp_jobs[job_id]["message"] = f"Transcription error: {e}"
import threading
t = threading.Thread(target=_run_sync, daemon=True)
t.start()
return {
"job_id": job_id,
"status": "processing",
"message": _mcp_jobs[job_id]["message"],
}
@app.get("/transcribe/{job_id}/status")
async def get_status(job_id: str):
job = _mcp_jobs.get(job_id)
if not job:
raise HTTPException(status_code=404, detail="Job not found")
return {
"job_id": job_id,
"status": job["status"],
"message": job.get("message", ""),
}
@app.get("/transcribe/{job_id}/json")
async def get_json(job_id: str):
job = _mcp_jobs.get(job_id)
if not job:
raise HTTPException(status_code=404, detail="Job not found")
if job["status"] != "completed":
raise HTTPException(
status_code=400,
detail=f"Job not completed. Current status: {job['status']}",
)
transcript_text = job.get("transcript", "")
segments = job.get("segments", [])
return JSONResponse(
content={
"job_id": job_id,
"transcript": transcript_text,
"segments": segments,
}
)
+20
View File
@@ -1,4 +1,5 @@
import os import os
import logging
from argparse import Action from argparse import Action
from ast import literal_eval from ast import literal_eval
@@ -13,6 +14,25 @@ PYANNOTE_DEFAULT_PATH = os.path.join(CACHE_DIR, "pyannote")
PYANNOTE_DEFAULT_CONFIG = os.path.join(PYANNOTE_DEFAULT_PATH, "config.yaml") PYANNOTE_DEFAULT_CONFIG = os.path.join(PYANNOTE_DEFAULT_PATH, "config.yaml")
def setup_logging(level: str = "INFO"):
"""
Configure root logger to write to stdout so Docker can capture logs.
Args:
level: Logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL).
"""
numeric_level = getattr(logging, level.upper(), logging.INFO)
if not isinstance(numeric_level, int):
numeric_level = logging.INFO
logging.basicConfig(
level=numeric_level,
format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
datefmt="%Y-%m-%dT%H:%M:%S%z",
force=True,
)
def set_threads(parse_threads=None, yaml_threads=None): def set_threads(parse_threads=None, yaml_threads=None):
""" """
Configure number of threads. Configure number of threads.
+111 -29
View File
@@ -6,8 +6,8 @@ Provides a client to summarize long transcripts via an LLM endpoint.
Behavior: Behavior:
- Chunks transcript into 10,240-character segments. - Chunks transcript into 10,240-character segments.
- Generates a summary for each chunk. - Summarizes each chunk.
- Combines all chunk summaries and produces a final, detailed summary. - Summarizes the summaries into a final, detailed summary.
Environment Variables: Environment Variables:
- SUMMARIZER_API_URL: (required) Base URL of the LLM API (e.g., http://localhost:8080) - SUMMARIZER_API_URL: (required) Base URL of the LLM API (e.g., http://localhost:8080)
@@ -17,10 +17,13 @@ Environment Variables:
import os import os
import json import json
import logging
from typing import Optional from typing import Optional
import httpx import httpx
logger = logging.getLogger("scraibe.summarizer")
class SummarizerError(Exception): class SummarizerError(Exception):
"""Raised when the summarization API call fails.""" """Raised when the summarization API call fails."""
@@ -40,7 +43,7 @@ class SummarizerClient:
api_url: Optional[str] = None, api_url: Optional[str] = None,
api_key: Optional[str] = None, api_key: Optional[str] = None,
model: Optional[str] = None, model: Optional[str] = None,
timeout: float = 600.0, timeout: float = 3600.0,
): ):
self.api_url = (api_url or os.getenv("SUMMARIZER_API_URL")).strip().rstrip("/") self.api_url = (api_url or os.getenv("SUMMARIZER_API_URL")).strip().rstrip("/")
self.api_key = api_key or os.getenv("SUMMARIZER_API_KEY") or None self.api_key = api_key or os.getenv("SUMMARIZER_API_KEY") or None
@@ -53,6 +56,12 @@ class SummarizerClient:
"Provide the summarization LLM URL via environment or constructor." "Provide the summarization LLM URL via environment or constructor."
) )
logger.info(
"Initializing SummarizerClient: url=%s model=%s",
self.api_url,
self.model,
)
self._client = httpx.Client( self._client = httpx.Client(
base_url=self.api_url, base_url=self.api_url,
timeout=self.timeout, timeout=self.timeout,
@@ -84,21 +93,40 @@ class SummarizerClient:
- Next steps / action items - Next steps / action items
""" """
if not transcript.strip(): if not transcript.strip():
logger.warning("Empty transcript provided to summarize_transcript.")
return "No transcript provided to summarize." return "No transcript provided to summarize."
logger.info(
"Starting summarization for transcript length=%d chars",
len(transcript),
)
# 1) Chunk the transcript # 1) Chunk the transcript
chunks = self._chunk_text(transcript) chunks = self._chunk_text(transcript)
logger.info("Split transcript into %d chunks.", len(chunks))
# 2) Summarize each chunk # 2) Summarize each chunk
chunk_summaries = [] chunk_summaries = []
for i, chunk in enumerate(chunks): for i, chunk in enumerate(chunks):
logger.info(
"Summarizing chunk %d/%d (length=%d)",
i + 1,
len(chunks),
len(chunk),
)
summary = self._summarize_chunk(chunk, i, len(chunks)) summary = self._summarize_chunk(chunk, i, len(chunks))
chunk_summaries.append(summary) chunk_summaries.append(summary)
# 3) Combine and summarize summaries # 3) Combine and summarize summaries
combined = "\n\n".join(chunk_summaries) combined = "\n\n".join(chunk_summaries)
logger.info(
"Combining %d chunk summaries (total length=%d) for final summary.",
len(chunk_summaries),
len(combined),
)
final_summary = self._summarize_combined(combined) final_summary = self._summarize_combined(combined)
logger.info("Summarization completed.")
return final_summary return final_summary
def _chunk_text(self, text: str) -> list[str]: def _chunk_text(self, text: str) -> list[str]:
@@ -120,19 +148,76 @@ class SummarizerClient:
start = break_pos start = break_pos
return chunks return chunks
def _load_summary_prompt(self, role: str) -> str:
"""
Load summary prompt for the given role: 'chunk' or 'combined'.
Priority:
1) SUMMARY_PROMPT_{ROLE} (env)
2) SUMMARY_PROMPT_FILE (env) with [chunk] / [combined] sections
3) Built-in default prompt
"""
role_upper = role.upper()
# 1) Direct env var: SUMMARY_PROMPT_CHUNK / SUMMARY_PROMPT_COMBINED
env_key = f"SUMMARY_PROMPT_{role_upper}"
env_prompt = (os.getenv(env_key) or "").strip()
if env_prompt:
return env_prompt
# 2) File-based prompt with sections
prompt_file = (os.getenv("SUMMARY_PROMPT_FILE") or "").strip()
if prompt_file and os.path.exists(prompt_file):
try:
with open(prompt_file, "r", encoding="utf-8") as f:
content = f.read()
# Simple section parser: [chunk], [combined]
import re
pattern = re.compile(
r"\[" + role + r"\]\s*\n(.*?)(?=\n\[|$)",
re.DOTALL,
)
m = pattern.search(content)
if m:
text = m.group(1).strip()
if text:
return text
except Exception as e:
logger.warning("Failed to load SUMMARY_PROMPT_FILE for %s: %s", role, e)
# 3) Default prompts
if role == "chunk":
return (
"You are an expert legal and business meeting summarizer. "
"You will receive a segment of a longer transcript. "
"Provide a detailed, structured summary of this segment, focusing on: "
"- Topics discussed\n"
"- Key points and arguments\n"
"- Decisions and agreements\n"
"- Action items and responsibilities\n"
"- Any risks, conflicts, or open issues\n\n"
"Be concise but complete. Use bullet points where helpful. "
"Do not add information that is not present in the transcript."
)
else:
return (
"You are an expert legal and business meeting summarizer. "
"You will receive several intermediate summaries of a longer conversation. "
"Produce a single, comprehensive summary that makes it clear: "
"- The overall purpose and context of the discussion\n"
"- The main issues and topics addressed\n"
"- Key arguments and positions (briefly)\n"
"- Decisions and outcomes\n"
"- Action items, responsibilities, and next steps\n"
"- Any unresolved issues or risks\n\n"
"The summary should be detailed enough that a reader who was not present "
"can understand what happened and what is expected going forward. "
"Use clear, concise language and bullet points where appropriate. "
"Use markdown formatting (headings, lists, bold) to structure the summary."
)
def _summarize_chunk(self, chunk: str, index: int, total: int) -> str: def _summarize_chunk(self, chunk: str, index: int, total: int) -> str:
system_prompt = ( system_prompt = self._load_summary_prompt("chunk")
"You are an expert legal and business meeting summarizer. "
"You will receive a segment of a longer transcript. "
"Provide a detailed, structured summary of this segment, focusing on: "
"- Topics discussed\n"
"- Key points and arguments\n"
"- Decisions and agreements\n"
"- Action items and responsibilities\n"
"- Any risks, conflicts, or open issues\n\n"
"Be concise but complete. Use bullet points when helpful. "
"Do not add information that is not present in the transcript."
)
user_prompt = ( user_prompt = (
f"This is segment {index + 1} of {total} from a longer conversation.\n\n" f"This is segment {index + 1} of {total} from a longer conversation.\n\n"
@@ -142,20 +227,7 @@ class SummarizerClient:
return self._chat_completion(system_prompt, user_prompt) return self._chat_completion(system_prompt, user_prompt)
def _summarize_combined(self, combined_summaries: str) -> str: def _summarize_combined(self, combined_summaries: str) -> str:
system_prompt = ( system_prompt = self._load_summary_prompt("combined")
"You are an expert legal and business meeting summarizer. "
"You will receive several intermediate summaries of a longer conversation. "
"Produce a single, comprehensive summary that makes it clear: "
"- The overall purpose and context of the discussion\n"
"- The main issues and topics addressed\n"
"- Key arguments and positions (briefly)\n"
"- Decisions and outcomes\n"
"- Action items, responsibilities, and next steps\n"
"- Any unresolved issues or risks\n\n"
"The summary should be detailed enough that a reader who was not present "
"can understand what happened and what is expected going forward. "
"Use clear, concise language and bullet points where appropriate."
)
user_prompt = ( user_prompt = (
"Here are the intermediate summaries from different parts of the same conversation:\n\n" "Here are the intermediate summaries from different parts of the same conversation:\n\n"
@@ -183,13 +255,18 @@ class SummarizerClient:
if self.api_key: if self.api_key:
headers["Authorization"] = f"Bearer {self.api_key}" headers["Authorization"] = f"Bearer {self.api_key}"
logger.info("Calling summarizer endpoint: /v1/chat/completions")
resp = self._client.post( resp = self._client.post(
"/v1/chat/completions", "/v1/chat/completions",
json=payload, json=payload,
headers=headers, headers=headers,
) )
logger.info("Summarizer response status: %d", resp.status_code)
if resp.status_code >= 400: if resp.status_code >= 400:
logger.error("Summarizer error response: %s", resp.text)
raise SummarizerError( raise SummarizerError(
f"Summarizer API error {resp.status_code}: {resp.text}" f"Summarizer API error {resp.status_code}: {resp.text}"
) )
@@ -197,6 +274,7 @@ class SummarizerClient:
try: try:
data = resp.json() data = resp.json()
except json.JSONDecodeError: except json.JSONDecodeError:
logger.error("Failed to parse summarizer response as JSON.")
raise SummarizerError( raise SummarizerError(
"Failed to parse summarizer response as JSON." "Failed to parse summarizer response as JSON."
) )
@@ -206,6 +284,10 @@ class SummarizerClient:
content = data["choices"][0]["message"]["content"] content = data["choices"][0]["message"]["content"]
return content.strip() return content.strip()
except (KeyError, IndexError, TypeError): except (KeyError, IndexError, TypeError):
logger.error(
"Unexpected summarizer response format: %s",
json.dumps(data, indent=2),
)
raise SummarizerError( raise SummarizerError(
"Unexpected summarizer response format: " "Unexpected summarizer response format: "
f"{json.dumps(data, indent=2)}" f"{json.dumps(data, indent=2)}"
+713
View File
@@ -0,0 +1,713 @@
"""
Celery tasks for async transcription, diarization, and email notifications.
"""
import os
import json
import logging
import tempfile
from datetime import datetime
from .celery_app import celery_app
from .autotranscript import Scraibe
from .summarizer import SummarizerClient, SummarizerError
from .misc import setup_logging
from .email_sender import send_email, EmailError, load_template
from .email_sender import create_transcript_docx, create_summary_docx
logger = logging.getLogger("scraibe.tasks")
def _local_part(email: str) -> str:
"""
Extract the part before '@' from an email, sanitized for filenames.
"""
local = (email or "").split("@")[0].strip()
local = "".join(ch if ch.isalnum() or ch in ("-", "_", ".") else "_" for ch in local)
return local or "user"
def _date_tag() -> str:
"""
Date tag in DD-MON-YYYY format (e.g. 01-JAN-2025).
"""
return datetime.utcnow().strftime("%d-%b-%Y").upper()
def _safe_filename(base: str, local: str, date_tag: str, ext: str) -> str:
"""
Create a temp file with the requested logical name.
Uses mktemp for uniqueness but keeps the desired name pattern.
"""
name = f"{base}-{local}-{date_tag}{ext}"
return tempfile.mktemp(prefix=name.replace(".", ""), suffix=ext)
def _remove_file(path: str):
"""
Remove a file if it exists. Best-effort; logs but never raises.
"""
if not path:
return
try:
if os.path.exists(path):
os.remove(path)
except Exception as e:
logger.warning("Failed to remove file %s: %s", path, e)
def _get_subject(env_var: str, default: str) -> str:
"""
Safely read an email subject from an environment variable.
Uses default if unset or blank. Logs the final value.
"""
value = (os.getenv(env_var) or "").strip()
subject = value or default
logger.info("Email subject [%s] = %s", env_var, subject)
return subject
def get_queue_position(task_id: str) -> int:
"""
Estimate the job's position in the queue.
Returns:
- A positive int if we can estimate (1 = first in line).
- 0 if we cannot reliably determine position.
"""
try:
inspect = celery_app.control.inspect()
reserved = inspect.reserved() or {} # queued but not yet running
active = inspect.active() or {} # currently running
# Count tasks ahead of this one in the reserved (waiting) queue
ahead = 0
found = False
for _, tasks in list(reserved.values()):
for t in tasks:
tid = t.get("id")
if tid == task_id:
found = True
break
ahead += 1
if found:
break
# If not found in reserved, it may already be active or not yet visible.
# In that case, treat it as position 1.
if found:
return max(ahead + 1, 1)
else:
return 1
except Exception:
# If inspection fails, don't guess; caller should use a safe message.
return 0
def send_initial_email(to: str, queue_pos: int):
"""
Send initial confirmation email with queue position.
Subject is customizable via EMAIL_SUBJECT_UPLOAD.
"""
subject = _get_subject(
"EMAIL_SUBJECT_UPLOAD",
"ScrAIbe: Your transcription request has been received",
)
body = (
"Hello,\n\n"
"We have received your audio file for transcription.\n"
)
if queue_pos > 0:
body += f"Your request is currently number {queue_pos} in the queue.\n"
queue_position_display = (
f'<span style="color:{_accent_color()}; font-weight:bold;">{queue_pos}</span>'
)
else:
body += "Your request has been queued for processing.\n"
queue_position_display = "the queue"
body += (
"\n"
"You will receive an email with your transcript (and summary, if requested) "
"once processing is complete.\n\n"
"If you have any questions, contact us at "
f"{os.getenv('EMAIL_CONTACT_ADDRESS', 'support@example.com')}.\n\n"
"This is an automated message from ScrAIbe.\n"
)
html = None
try:
html = load_template(
"upload_notification_template.html",
queue_position_text=queue_position_display,
)
except EmailError as e:
logger.warning("Failed to render upload notification template: %s", e)
try:
send_email(to=to, subject=subject, body=body, html=html, attachments=[])
logger.info("Initial confirmation email sent to %s", to)
except EmailError as e:
logger.error("Failed to send initial email to %s: %s", to, e)
def send_success_email(
to: str,
transcript_text: str,
summary_text: str,
attachments: list,
task_id: str,
):
"""
Send final email with transcript and attachments.
Subject is customizable via EMAIL_SUBJECT_SUCCESS.
Falls back to a safe default if the env var is missing or blank.
"""
subject = _get_subject(
"EMAIL_SUBJECT_SUCCESS",
"ScrAIbe: Your transcript is ready",
)
body = (
"Hello,\n\n"
"Your transcription is ready.\n\n"
"Please find the transcript and JSON files attached.\n"
)
if summary_text:
body += (
"\n"
"SUMMARY\n"
"-------\n"
f"{summary_text}\n"
)
body += (
"\n"
"Job ID: " + str(task_id) + "\n\n"
"If you have any questions, contact us at "
f"{os.getenv('EMAIL_CONTACT_ADDRESS', 'support@example.com')}.\n\n"
"This is an automated message from ScrAIbe.\n"
)
html = None
try:
html = load_template("success_template.html")
except EmailError as e:
logger.warning("Failed to render success template: %s", e)
try:
send_email(
to=to,
subject=subject,
body=body,
html=html,
attachments=attachments,
)
logger.info("Success email sent to %s for job %s with subject: %s", to, task_id, subject)
except EmailError as e:
logger.error("Failed to send success email to %s for job %s: %s", to, task_id, e)
def send_error_email(to: str, error_message: str, task_id: str):
"""
Send error notification email.
Subject is customizable via EMAIL_SUBJECT_ERROR.
"""
subject = _get_subject(
"EMAIL_SUBJECT_ERROR",
"ScrAIbe: Error with your transcription request",
)
body = (
"Hello,\n\n"
"We encountered an error while processing your transcription request.\n\n"
f"Details: {error_message}\n\n"
"Job ID: " + str(task_id) + "\n\n"
"Please contact your administrator if the problem persists.\n\n"
"If you have any questions, contact us at "
f"{os.getenv('EMAIL_CONTACT_ADDRESS', 'support@example.com')}.\n\n"
"This is an automated message from ScrAIbe.\n"
)
html = None
try:
html = load_template(
"error_notification_template.html",
exception=str(error_message),
)
except EmailError as e:
logger.warning("Failed to render error template: %s", e)
try:
send_email(to=to, subject=subject, body=body, html=html, attachments=[])
logger.info("Error email sent to %s for job %s", to, task_id)
except EmailError as e:
logger.error("Failed to send error email to %s for job %s: %s", to, task_id, e)
@celery_app.task(
name="scraibe.tasks.process_transcription_task",
bind=True,
max_retries=1,
task_time_limit=14400, # 4 hours
task_soft_time_limit=13500, # warn at 3h45m
)
def process_transcription_task(
self,
audio_path: str,
task_type: str,
language: str,
num_speakers: int,
email_to: str,
email_cc: str,
include_summary: bool,
identify_speakers: bool = False,
):
"""
Async task: transcribe audio, optionally summarize, then email results.
Cleans up temporary files after completion.
"""
task_id = self.request.id
log_level = os.getenv("LOG_LEVEL", "INFO")
setup_logging(level=log_level)
temp_files = []
local = _local_part(email_to)
date_tag = _date_tag()
try:
# 1) Queue position and initial email
queue_pos = get_queue_position(task_id)
send_initial_email(to=email_to, queue_pos=queue_pos)
# 2) Initialize Scraibe
try:
scraibe = Scraibe(verbose=True)
except Exception as e:
send_error_email(
to=email_to,
error_message=f"Failed to initialize transcription service: {e}",
task_id=task_id,
)
raise
# 3) Transcription
if task_type == "transcript_and_summarize":
result = scraibe.transcript_and_summarize(
audio_file=audio_path,
language=language or None,
num_speakers=int(num_speakers) if num_speakers else None,
verbose=True,
for_export=True,
)
transcript_text = result.get("transcript", "")
summary_text = result.get("summary", "")
segments = result.get("segments", [])
raw_result = result.get("raw_result")
else:
result = scraibe.transcribe(
audio_file=audio_path,
language=language or None,
num_speakers=int(num_speakers) if num_speakers else None,
verbose=True,
for_export=True,
)
transcript_text = result.get("transcript", "")
summary_text = ""
segments = result.get("segments", [])
raw_result = result.get("raw_result")
# 3b) Optional speaker identification
speaker_map = {} # e.g. {"SPEAKER 1": "John", "SPEAKER 2": "Maria"}
if identify_speakers:
try:
# Use the same summarizer client as transcript_and_summarize
scraibe._ensure_summarizer()
summarizer = scraibe._summarizer
prompt = (
"Below is a transcript with speaker labels like 'SPEAKER 1', 'SPEAKER 2', etc. "
"Based on the context and how each speaker talks, identify each speaker as:\n"
"- Their real name, if it is clearly mentioned or strongly implied, OR\n"
"- A concise role/position (e.g., Judge, Doctor, Manager, Interviewer, Client, Witness), "
"if their identity is not clear.\n"
"Do not invent random personal names. "
"Do not add extra commentary. Output ONLY a mapping in this exact format, one per line:\n"
"SPEAKER 1: Name or Role\n"
"SPEAKER 2: Name or Role\n"
"SPEAKER 3: Name or Role\n"
"\n"
"Transcript:\n"
+ transcript_text
)
response = summarizer._chat_completion(
messages=[{"role": "user", "content": prompt}],
temperature=0.3,
max_tokens=300,
)
reply = (response or {}).get("choices", [{}])[0].get("message", {}).get("content", "")
# Parse mapping
import re
for m in re.finditer(
r"SPEAKER\s+(\d+)\s*:\s*(.+)",
reply,
re.IGNORECASE,
):
spk = f"SPEAKER {m.group(1).strip()}"
name = m.group(2).strip().rstrip(".").upper()
if name:
speaker_map[spk] = name
logger.info("Speaker identification mapping: %s", speaker_map)
# Apply mapping to transcript text
if speaker_map:
def replace_speaker(m):
label = m.group(0).strip()
# normalize to "SPEAKER N"
normalized = re.sub(
r"\s+",
" ",
re.sub(r"[^A-Z0-9\s]", "", label.upper()),
).strip()
return speaker_map.get(normalized, label)
# Replace in lines like "[00:12] SPEAKER 1:" but preserve timestamp and colon
def replace_in_line(line: str) -> str:
# match after timestamp bracket and space: "SPEAKER N:"
return re.sub(
r"(\[\d+:\d+(?::\d+)?\]\s*)([A-Z\s]+?):\s*",
lambda m: m.group(1) + (speaker_map.get(m.group(2).strip(), m.group(2)) + ": "),
line,
)
transcript_lines = transcript_text.splitlines()
transcript_text = "\n".join(
replace_in_line(line) for line in transcript_lines
)
# Also update segments for JSON export
updated_segments = []
for seg in segments:
sp = (seg.get("speaker") or "").strip()
sp_norm = re.sub(r"[^A-Z0-9\s]", "", sp.upper()).strip()
sp_new = speaker_map.get(sp_norm, sp)
seg = dict(seg)
seg["speaker"] = sp_new
updated_segments.append(seg)
segments = updated_segments
except (SummarizerError, Exception) as e:
logger.warning(
"Speaker identification failed; falling back to Speaker IDs: %s", e
)
speaker_map = {}
# 4) Prepare files
# Transcript .md
md_transcript_path = _safe_filename("TRANSCRIPT", local, date_tag, ".md")
with open(md_transcript_path, "w", encoding="utf-8") as f:
f.write("# Transcript\n\n")
f.write(transcript_text)
temp_files.append(md_transcript_path)
# Transcript .docx (standalone, no cover page)
docx_transcript_path = _safe_filename("TRANSCRIPT", local, date_tag, ".docx")
create_transcript_docx(
transcript_text,
docx_transcript_path,
)
temp_files.append(docx_transcript_path)
# JSON as SOURCE
json_data = {
"task": task_type,
"transcript": transcript_text,
"segments": segments,
"metadata": {
"timestamp": datetime.utcnow().isoformat(),
"job_id": task_id,
},
}
if summary_text:
json_data["summary"] = summary_text
if raw_result is not None:
json_data["raw_result"] = raw_result
json_path = _safe_filename("SOURCE", local, date_tag, ".json")
with open(json_path, "w", encoding="utf-8") as f:
json.dump(json_data, f, indent=2, ensure_ascii=False)
temp_files.append(json_path)
# Summary files (if present)
md_summary_path = None
docx_summary_path = None
if summary_text:
# Summary .md
md_summary_path = _safe_filename("SUMMARY", local, date_tag, ".md")
with open(md_summary_path, "w", encoding="utf-8") as f:
f.write("# Summary\n\n")
f.write(summary_text)
temp_files.append(md_summary_path)
# Summary .docx (standalone, no cover page)
docx_summary_path = _safe_filename("SUMMARY", local, date_tag, ".docx")
create_summary_docx(
summary_text,
docx_summary_path,
)
temp_files.append(docx_summary_path)
# 5) Build attachments list
# Always: JSON, transcript MD, transcript DOCX
attachments = [
md_transcript_path,
docx_transcript_path,
json_path,
]
# If summary is present, add summary MD and DOCX
if summary_text:
attachments += [md_summary_path, docx_summary_path]
# 6) Send success email
send_success_email(
to=email_to,
transcript_text=transcript_text,
summary_text=summary_text if include_summary else "",
attachments=attachments,
task_id=task_id,
)
logger.info("Job %s completed successfully.", task_id)
except Exception as e:
logger.error("Error processing job %s: %s", task_id, e, exc_info=True)
send_error_email(
to=email_to,
error_message=str(e),
task_id=task_id,
)
raise e
finally:
# 7) Cleanup
for path in temp_files:
_remove_file(path)
if audio_path:
_remove_file(audio_path)
logger.info("Cleanup completed for job %s.", task_id)
@celery_app.task(
name="scraibe.tasks.process_mcp_transcribe_task",
bind=True,
max_retries=1,
task_time_limit=14400,
task_soft_time_limit=13500,
)
def process_mcp_transcribe_task(
self,
audio_path: str,
job_id: str,
language: str,
num_speakers: int,
):
"""
Async task used by MCP-style API:
- Transcribe audio
- Store transcript + segments in shared MCP job store
- Clean up temporary file
"""
from .mcp_server import _mcp_jobs
log_level = os.getenv("LOG_LEVEL", "INFO")
setup_logging(level=log_level)
# Initialize status
_mcp_jobs.setdefault(
job_id,
{
"status": "processing",
"message": "Transcription started (async).",
"file_path": audio_path,
},
)
try:
scraibe = Scraibe(verbose=True)
result = scraibe.transcribe(
audio_file=audio_path,
language=language or None,
num_speakers=int(num_speakers) if num_speakers else None,
verbose=True,
for_export=True,
)
transcript_text = result.get("transcript", "")
segments = result.get("segments", [])
_mcp_jobs[job_id]["status"] = "completed"
_mcp_jobs[job_id]["transcript"] = transcript_text
_mcp_jobs[job_id]["segments"] = segments
_mcp_jobs[job_id]["message"] = "Transcription completed."
logger.info("MCP job %s completed.", job_id)
except Exception as e:
logger.error("MCP job %s failed: %s", job_id, e, exc_info=True)
_mcp_jobs[job_id]["status"] = "error"
_mcp_jobs[job_id]["message"] = f"Transcription error: {e}"
finally:
_remove_file(audio_path)
logger.info("MCP job %s cleanup completed.", job_id)
@celery_app.task(
name="scraibe.tasks.process_watch_file_task",
bind=True,
max_retries=1,
task_time_limit=14400,
task_soft_time_limit=13500,
)
def process_watch_file_task(
self,
file_path: str,
):
"""
Async task for watch-folder mode:
- Transcribe + summarize
- Email results
- Optionally delete source file
"""
task_id = self.request.id
log_level = os.getenv("LOG_LEVEL", "INFO")
setup_logging(level=log_level)
email_to = os.getenv("WATCH_EMAIL_TO") or os.getenv("EMAIL_DEFAULT_TO")
if not email_to:
logger.error("No email address configured for watch-folder mode.")
raise RuntimeError("WATCH_EMAIL_TO or EMAIL_DEFAULT_TO not set.")
delete_on_success = os.getenv("WATCH_DELETE_ON_SUCCESS", "true").strip().lower() in ("true", "1", "yes")
temp_files = []
local = "watch"
date_tag = _date_tag()
try:
scraibe = Scraibe(verbose=True)
result = scraibe.transcript_and_summarize(
audio_file=file_path,
language=None,
num_speakers=None,
verbose=True,
for_export=True,
)
transcript_text = result.get("transcript", "")
summary_text = result.get("summary", "")
segments = result.get("segments", [])
raw_result = result.get("raw_result")
# Transcript .md
md_transcript_path = _safe_filename("TRANSCRIPT", local, date_tag, ".md")
with open(md_transcript_path, "w", encoding="utf-8") as f:
f.write("# Transcript\n\n")
f.write(transcript_text)
temp_files.append(md_transcript_path)
# Transcript .docx
docx_transcript_path = _safe_filename("TRANSCRIPT", local, date_tag, ".docx")
create_transcript_docx(
transcript_text,
docx_transcript_path,
)
temp_files.append(docx_transcript_path)
# Summary .md
md_summary_path = _safe_filename("SUMMARY", local, date_tag, ".md")
with open(md_summary_path, "w", encoding="utf-8") as f:
f.write("# Summary\n\n")
f.write(summary_text)
temp_files.append(md_summary_path)
# Summary .docx
docx_summary_path = _safe_filename("SUMMARY", local, date_tag, ".docx")
create_summary_docx(
summary_text,
docx_summary_path,
)
temp_files.append(docx_summary_path)
# JSON as SOURCE
json_data = {
"task": "watch_transcript_and_summarize",
"transcript": transcript_text,
"summary": summary_text,
"segments": segments,
"metadata": {
"timestamp": datetime.utcnow().isoformat(),
"job_id": task_id,
"source_file": file_path,
},
}
if raw_result is not None:
json_data["raw_result"] = raw_result
json_path = _safe_filename("SOURCE", local, date_tag, ".json")
with open(json_path, "w", encoding="utf-8") as f:
json.dump(json_data, f, indent=2, ensure_ascii=False)
temp_files.append(json_path)
# Attachments
attachments = [
md_transcript_path,
docx_transcript_path,
md_summary_path,
docx_summary_path,
json_path,
]
# Send email
send_success_email(
to=email_to,
transcript_text=transcript_text,
summary_text=summary_text,
attachments=attachments,
task_id=task_id,
)
logger.info("Watch-folder job %s completed for %s.", task_id, file_path)
# Delete source file if configured
if delete_on_success and os.path.exists(file_path):
try:
os.remove(file_path)
logger.info("Deleted source file: %s", file_path)
except Exception as e:
logger.warning("Failed to delete source file %s: %s", file_path, e)
except Exception as e:
logger.error("Error processing watch file %s: %s", file_path, e, exc_info=True)
send_error_email(
to=email_to,
error_message=str(e),
task_id=task_id,
)
raise e
finally:
# Cleanup temp files
for path in temp_files:
_remove_file(path)
logger.info("Watch-folder job %s cleanup completed.", task_id)
+100
View File
@@ -0,0 +1,100 @@
"""
Watch-folder mode for ScrAIbe.
Monitors a folder for audio files. For each file:
- Transcribes + summarizes
- Emails results
- Deletes source file
Configuration (env):
- WATCH_ENABLED: "true"/"false" (default: false)
- WATCH_DIR: directory to watch (required if enabled)
- WATCH_EMAIL_TO: destination email (required if enabled)
- WATCH_POLL_INTERVAL: seconds between scans (default: 10)
- WATCH_DELETE_ON_SUCCESS: "true"/"false" (default: true)
"""
import os
import time
import logging
import threading
from pathlib import Path
logger = logging.getLogger("scraibe.watcher")
AUDIO_EXTENSIONS = {
".wav",
".mp3",
".flac",
".m4a",
".ogg",
".webm",
".mp4",
}
def _is_audio(path: Path) -> bool:
return path.is_file() and path.suffix.lower() in AUDIO_EXTENSIONS
def _enqueue_file(file_path: Path):
"""
Enqueue a file for transcription + summarization via Celery.
"""
from .tasks import process_watch_file_task
try:
process_watch_file_task.delay(str(file_path))
except Exception as e:
logger.error("Failed to enqueue watch file %s: %s", file_path, e)
def _scan_directory(watch_dir: Path):
"""
Scan directory and enqueue all audio files.
"""
if not watch_dir.is_dir():
logger.warning("WATCH_DIR does not exist or is not a directory: %s", watch_dir)
return
for p in watch_dir.iterdir():
if _is_audio(p):
logger.info("Found audio file in WATCH_DIR: %s", p)
_enqueue_file(p)
def start_watcher():
"""
Start watch-folder loop in a background thread.
"""
enabled = os.getenv("WATCH_ENABLED", "false").strip().lower() in ("true", "1", "yes")
if not enabled:
return
watch_dir = os.getenv("WATCH_DIR")
if not watch_dir:
logger.warning("WATCH_ENABLED is true but WATCH_DIR is not set. Watcher disabled.")
return
email_to = os.getenv("WATCH_EMAIL_TO")
if not email_to:
logger.warning("WATCH_ENABLED is true but WATCH_EMAIL_TO is not set. Watcher disabled.")
return
interval = float(os.getenv("WATCH_POLL_INTERVAL", "10"))
watch_path = Path(watch_dir).expanduser().resolve()
watch_path.mkdir(parents=True, exist_ok=True)
logger.info("Starting watch-folder: dir=%s, email=%s, interval=%s", watch_dir, email_to, interval)
def _loop():
while True:
try:
_scan_directory(watch_path)
except Exception as e:
logger.error("Error scanning WATCH_DIR: %s", e)
time.sleep(interval)
t = threading.Thread(target=_loop, daemon=True)
t.start()
+338
View File
@@ -0,0 +1,338 @@
"""
ScrAIbe Web GUI (Gradio) - Async Mode
-------------------------------------
Runs the Web GUI that:
- Accepts audio uploads
- Enqueues transcription jobs asynchronously via Celery
- Backend worker:
- Transcribes (with diarization)
- Optionally summarizes
- Emails the user:
- Immediately: confirmation + queue position
- On success: transcript + JSON (+ summary if requested)
- On error: error details
This is the default entrypoint when running in Docker.
"""
import os
import logging
import shutil
from datetime import datetime
import gradio as gr
from .misc import setup_logging
logger = logging.getLogger("scraibe.webui")
def load_config():
"""
Load configuration from misc/config.yaml if present.
Primary runtime configuration is via environment variables.
"""
config_path = os.getenv("SCRAIBE_CONFIG", "/app/src/misc/config.yaml")
config = {}
if os.path.exists(config_path):
try:
import yaml
with open(config_path, "r", encoding="utf-8") as f:
config = yaml.safe_load(f) or {}
except Exception as e:
logger.warning("Failed to load config from %s: %s", config_path, e)
return config
def load_html_template(path: str, **kwargs) -> str:
"""
Load an HTML template and fill placeholders.
"""
if not os.path.exists(path):
return ""
with open(path, "r", encoding="utf-8") as f:
template = f.read()
try:
return template.format(**kwargs)
except KeyError:
return template
def create_app():
"""
Create and launch the Gradio Web GUI (async mode).
"""
# Logging
log_level = os.getenv("LOG_LEVEL", "INFO")
setup_logging(level=log_level)
# Load config (branding, layout, etc.)
config = load_config()
layout_cfg = config.get("layout", {})
launch_cfg = config.get("launch", {})
logger.info("Starting ScrAIbe Web GUI (async mode).")
# Ensure upload directory exists
upload_dir = os.getenv("SCRAIBE_UPLOAD_DIR", "/tmp/scraibe_uploads")
os.makedirs(upload_dir, exist_ok=True)
# Paths for assets
header_path = layout_cfg.get("header", "/app/src/misc/header.html")
footer_path = layout_cfg.get("footer", "/app/src/misc/footer.html")
# Configurable title, logo URL, and accent color via environment
webui_title = os.getenv("WEBUI_TITLE", "A.P.Strom Transcription")
logo_url = os.getenv("WEBUI_LOGO_URL", "https://apstrom.ca")
accent_color = os.getenv("EMAIL_ACCENT_COLOR", "#7C6DA0")
# Prepare header HTML with logo URL and accent color
header_html = ""
if os.path.exists(header_path):
header_html = load_html_template(
header_path,
webui_title=webui_title,
header_logo_url=logo_url,
header_logo_src=logo_url,
accent_color=accent_color,
)
# Prepare footer HTML with accent color
footer_html = ""
if os.path.exists(footer_path):
version = os.getenv("SCRABIE_VERSION", "0.1.1.dev")
footer_html = load_html_template(
footer_path,
footer_scraibe_webui_version=version,
accent_color=accent_color,
)
# Build Gradio interface
with gr.Blocks(
title=webui_title,
css="""
/* Responsive layout: stack columns on smaller screens */
@media (max-width: 850px) {
.gradio-container {
max-width: 100% !important;
}
#main-row .gr-row {
flex-direction: column !important;
}
#main-row .gr-col {
width: 100% !important;
max-width: 100% !important;
flex: none !important;
}
}
""",
) as app:
# Header
if header_html:
gr.HTML(header_html)
# Single-column layout: inputs followed by status/output
with gr.Column():
audio_input = gr.Audio(
label="Upload or record audio",
type="filepath",
)
task_choice = gr.Radio(
choices=[
("Transcribe", "transcribe"),
("Transcribe & summarize", "transcript_and_summarize"),
],
value="transcribe",
label="Task",
container=True,
)
identify_speakers = gr.Checkbox(
label="Identify speakers (best effort using AI)",
value=True,
info="If enabled, AI will attempt to infer real names for speakers and replace Speaker 1/2/etc. in the transcript.",
)
email_to = gr.Textbox(
label="Your email address (required)",
placeholder="e.g. your.name@example.com",
)
email_cc = gr.Textbox(
label="CC (optional, comma-separated)",
placeholder="e.g. manager@example.com",
)
submit_btn = gr.Button("Submit for transcription", variant="primary")
status_text = gr.Textbox(
label="Status",
lines=6,
interactive=False,
)
# Footer
if footer_html:
gr.HTML(footer_html)
# Events
def on_task_change(value):
# No special UI changes needed; both modes handled in backend
return
task_choice.change(
fn=on_task_change,
inputs=[task_choice],
outputs=[],
)
def on_submit(
audio,
task,
email_to_val,
email_cc_val,
identify_speakers_val,
):
if not audio:
return "Please upload or record audio."
email_to_val = (email_to_val or "").strip()
if not email_to_val:
return "Please enter your email address."
# Copy uploaded file to a stable location
try:
ext = os.path.splitext(audio)[1] or ".wav"
ts = datetime.utcnow().strftime("%Y%m%d%H%M%S%f")
new_name = f"upload_{ts}{ext}"
dest_path = os.path.join(upload_dir, new_name)
shutil.copy2(audio, dest_path)
except Exception as e:
logger.error("Error copying uploaded file: %s", e)
return f"Error saving your file: {e}"
# Import Celery task
try:
from .tasks import process_transcription_task
except ImportError:
return (
"Error: async processing is not available (Celery not configured)."
)
# Enqueue transcription job
try:
task_result = process_transcription_task.delay(
audio_path=dest_path,
task_type=task,
language=None,
num_speakers=None,
email_to=email_to_val,
email_cc=email_cc_val or None,
include_summary=(task == "transcript_and_summarize"),
identify_speakers=bool(identify_speakers_val),
)
except Exception as e:
logger.error("Error enqueuing job: %s", e)
return f"Error submitting your file: {e}"
return (
"Your audio file has been received and added to the queue.\n"
"We have sent a confirmation email to you.\n"
"You will receive another email with your transcript (and summary, if requested) "
"once processing is complete.\n"
f"Job ID: {task_result.id}"
)
submit_btn.click(
fn=on_submit,
inputs=[
audio_input,
task_choice,
email_to,
email_cc,
identify_speakers,
],
outputs=[status_text],
)
# Launch options with accent color applied via CSS
server_name = launch_cfg.get("server_name", os.getenv("GRADIO_SERVER_NAME", "0.0.0.0"))
server_port = launch_cfg.get("server_port", 7860)
accent_css = f"""
:root {{
--primary-accent: {accent_color};
}}
button.primary,
.primary,
.gradio-button-primary,
.gradio-container button.primary {{
background-color: var(--primary-accent) !important;
border-color: var(--primary-accent) !important;
}}
button.primary:hover,
.primary:hover,
.gradio-button-primary:hover {{
background-color: var(--primary-accent) !important;
opacity: 0.95;
}}
.radio-item.selected,
.radio-item.selected label {{
color: var(--primary-accent) !important;
}}
a,
.gradio-container a {{
color: var(--primary-accent) !important;
}}
body {{
font-family: Arial, sans-serif;
}}
/* Increase main title font size */
h1,
.webui-title,
.header-title {{
font-size: 60px !important;
}}
/* Hide Gradio's "Use via API" link/button */
#share-btn,
a[href*="/api"],
a[href*="#/api"],
a[href*="#api"],
.gradio-container a[href*="api"] {{
display: none !important;
}}
/* Mobile-friendly adjustments */
@media (max-width: 700px) {{
.gradio-container {{
padding: 0 4px !important;
}}
.gradio-container .gr-row {{
flex-direction: column !important;
gap: 8px !important;
}}
.gradio-container .gr-col {{
width: 100% !important;
max-width: 100% !important;
flex: none !important;
}}
.gradio-container button.primary {{
width: 100% !important;
box-sizing: border-box;
}}
}}
"""
app.launch(
server_name=str(server_name),
server_port=int(server_port),
css=accent_css,
)
if __name__ == "__main__":
create_app()
+86
View File
@@ -0,0 +1,86 @@
import os
import subprocess
import tempfile
import pytest
from scraibe.audio import (
get_audio_duration,
split_audio_into_chunks,
)
TEST_AUDIO_1 = "tests/audio_test_1.mp4"
TEST_AUDIO_2 = "tests/audio_test_2.mp4"
@pytest.fixture(params=[TEST_AUDIO_1, TEST_AUDIO_2])
def test_audio_path(request):
return request.param
def test_get_audio_duration(test_audio_path):
dur = get_audio_duration(test_audio_path)
assert isinstance(dur, float)
assert dur > 0
def test_split_audio_into_chunks_no_split_short(test_audio_path):
# For short files, should return the same file with no extra chunks
chunks = split_audio_into_chunks(
input_path=test_audio_path,
max_duration=600.0, # larger than both test files
overlap=2.0,
)
assert len(chunks) == 1
assert chunks[0]["path"] == test_audio_path
assert chunks[0]["start"] == 0.0
dur = get_audio_duration(test_audio_path)
assert abs(chunks[0]["end"] - dur) < 0.05
def test_split_audio_into_chunks_creates_chunks(tmp_path):
# Use a small chunk duration to force splitting
chunks = split_audio_into_chunks(
input_path=TEST_AUDIO_1,
max_duration=2.0,
overlap=0.5,
)
assert len(chunks) > 1
# Check that each chunk file exists and is non-empty
for c in chunks:
assert os.path.exists(c["path"])
assert os.path.getsize(c["path"]) > 0
# Check time ordering and overlap
for i in range(1, len(chunks)):
prev = chunks[i - 1]
curr = chunks[i]
assert curr["start"] >= prev["start"]
assert curr["start"] < prev["end"] # overlap
# Cleanup
for c in chunks:
if os.path.exists(c["path"]):
os.remove(c["path"])
def test_split_audio_into_chunks_total_coverage(test_audio_path):
dur = get_audio_duration(test_audio_path)
# Use small chunks to ensure coverage
chunks = split_audio_into_chunks(
input_path=test_audio_path,
max_duration=2.0,
overlap=0.5,
)
# First chunk starts at 0
assert chunks[0]["start"] == 0.0
# Last chunk end should cover the duration
assert chunks[-1]["end"] >= dur - 0.05
# Cleanup
for c in chunks:
if os.path.exists(c["path"]):
os.remove(c["path"])
+96
View File
@@ -0,0 +1,96 @@
"""
Local test for transcript/summary/combined .docx generation.
Checks:
- Line numbering only on transcript pages.
- Page numbering (X of Y) in footer.
- Cover pages present and centered.
- Combined document structure.
"""
import sys
import os
import tempfile
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from scraibe.email_sender import (
create_transcript_docx,
create_summary_docx,
create_combined_docx,
)
TRANSCRIPT_TEXT = """[00:00] Speaker 1: Good morning, everyone. Thank you for joining today's meeting.
[00:12] Speaker 2: Good morning. I'm looking forward to discussing the new requirements.
[00:25] Speaker 1: Let's start with the timeline. We need to finalize the scope by Friday.
[00:38] Speaker 2: Agreed. I'll send a summary of the key points after this call.
[00:45] Speaker 1: Perfect. If there are no other items, we can wrap up here."""
SUMMARY_TEXT = """# Meeting Overview
## Key Discussion Points
### Timeline and Scope
#### Next Steps"""
COVER_DATE = "June 14, 2026"
TRANSCRIPT_DESC = "Transcript of a project planning meeting discussing timelines and scope."
SUMMARY_DESC = "Summary of a project planning meeting covering key decisions and next steps."
def main():
with tempfile.TemporaryDirectory() as tmpdir:
print("Using temp directory:", tmpdir)
# 1) Transcript-only
transcript_path = os.path.join(tmpdir, "TRANSCRIPT_TEST.docx")
print("Creating transcript-only docx:", transcript_path)
create_transcript_docx(
text=TRANSCRIPT_TEXT,
filename=transcript_path,
include_cover=True,
cover_date=COVER_DATE,
cover_desc=TRANSCRIPT_DESC,
)
print("OK: transcript-only created.")
# 2) Summary-only
summary_path = os.path.join(tmpdir, "SUMMARY_TEST.docx")
print("Creating summary-only docx:", summary_path)
create_summary_docx(
text=SUMMARY_TEXT,
filename=summary_path,
include_cover=True,
cover_date=COVER_DATE,
cover_desc=SUMMARY_DESC,
)
print("OK: summary-only created.")
# 3) Combined
combined_path = os.path.join(tmpdir, "COMBINED_TEST.docx")
print("Creating combined docx:", combined_path)
create_combined_docx(
transcript_text=TRANSCRIPT_TEXT,
summary_text=SUMMARY_TEXT,
filename=combined_path,
transcript_cover_date=COVER_DATE,
transcript_cover_desc=TRANSCRIPT_DESC,
summary_cover_date=COVER_DATE,
summary_cover_desc=SUMMARY_DESC,
)
print("OK: combined created.")
# Basic size sanity checks
for path in [transcript_path, summary_path, combined_path]:
size = os.path.getsize(path)
print(f"File: {os.path.basename(path)} - size: {size} bytes")
if size < 10000:
print("WARNING: File seems unusually small:", path)
print("\nAll .docx files generated successfully.")
print("Please open them in Word to verify:")
print("- Only transcript pages have line numbers.")
print("- Footer shows 'X of Y' on all pages.")
print("- Cover pages are centered and use the correct date format.")
print("- Combined doc order: cover, page break, summary, page break, transcript.")
if __name__ == "__main__":
main()
+230
View File
@@ -0,0 +1,230 @@
import os
import json
import tempfile
from unittest.mock import patch, MagicMock
import pytest
from scraibe.localai_client import LocalAIClient, LocalAIError
from scraibe.audio import get_audio_duration, split_audio_into_chunks
TEST_AUDIO_1 = "tests/audio_test_1.mp4"
def make_fake_segments(start=0.0, count=3):
segments = []
for i in range(count):
s = start + i * 2.0
e = s + 2.0
segments.append({
"start": s,
"end": e,
"speaker": "SPEAKER_00",
"text": f"Segment text {i}",
})
return segments
def fake_localai_response(segments):
return {
"segments": segments,
"text": " ".join(seg["text"] for seg in segments),
}
@pytest.fixture
def client():
with patch.object(LocalAIClient, "__init__", lambda self, **kw: None):
c = LocalAIClient()
c.api_url = "http://localhost:8080"
c.model = "vibevoice-diarize"
c.api_key = None
c._client = MagicMock()
return c
def test_parse_diarization_response(client):
segs = make_fake_segments()
raw = fake_localai_response(segs)
out = client._parse_diarization_response(raw)
assert "segments" in out
assert "speakers" in out
assert "transcripts" in out
assert len(out["segments"]) == len(segs)
for i, s in enumerate(segs):
assert out["segments"][i][0] == s["start"]
assert out["segments"][i][1] == s["end"]
assert out["speakers"][i] == s["speaker"]
assert out["transcripts"][i] == s["text"]
def test_parse_diarization_empty(client):
out = client._parse_diarization_response({"segments": []})
assert out["segments"] == []
assert out["speakers"] == []
assert out["transcripts"] == []
def test_diarize_and_transcribe_single_happy(client):
with patch.object(client, "_client") as mock_client:
mock_resp = MagicMock()
mock_resp.status_code = 200
mock_resp.json.return_value = fake_localai_response(make_fake_segments())
mock_client.post.return_value = mock_resp
result = client.diarize_and_transcribe(
audio_path=TEST_AUDIO_1,
verbose=False,
return_raw=True,
)
assert "segments" in result
assert "raw_result" in result
assert len(result["segments"]) > 0
def test_chunking_triggered_for_long_audio(client):
# Simulate long audio by patching get_audio_duration
with patch("scraibe.localai_client.get_audio_duration") as mock_dur, \
patch.object(client, "_diarize_and_transcribe_chunked") as mock_chunked:
mock_dur.return_value = 600.0 # 10 minutes
mock_chunked.return_value = {
"segments": [],
"speakers": [],
"transcripts": [],
}
client.diarize_and_transcribe(
audio_path=TEST_AUDIO_1,
verbose=False,
use_chunking=None,
max_single_request_duration=300.0,
)
mock_chunked.assert_called_once()
def test_chunking_not_triggered_for_short_audio(client):
with patch("scraibe.localai_client.get_audio_duration") as mock_dur, \
patch.object(client, "_diarize_and_transcribe_chunked") as mock_chunked, \
patch.object(client, "_diarize_and_transcribe_single") as mock_single:
mock_dur.return_value = 120.0
mock_single.return_value = {
"segments": [],
"speakers": [],
"transcripts": [],
}
client.diarize_and_transcribe(
audio_path=TEST_AUDIO_1,
verbose=False,
use_chunking=None,
max_single_request_duration=300.0,
)
mock_chunked.assert_not_called()
mock_single.assert_called_once()
def test_chunked_transcription_adjusts_timestamps(client):
# Mock split_audio_into_chunks to return two chunks
chunk1_path = TEST_AUDIO_1
chunk2_path = TEST_AUDIO_1 # reusing same file; in real usage different
chunks = [
{"path": chunk1_path, "start": 0.0, "end": 10.0},
{"path": chunk2_path, "start": 10.0, "end": 20.0},
]
with patch("scraibe.localai_client.split_audio_into_chunks") as mock_split, \
patch.object(client, "_diarize_and_transcribe_single") as mock_single, \
patch("os.remove"):
mock_split.return_value = chunks
# First chunk: segments 04
# Second chunk: segments 04 (local times)
def side_effect(audio_path, **kw):
if audio_path == chunk1_path:
segs = make_fake_segments(start=0.0, count=2)
else:
segs = make_fake_segments(start=0.0, count=2)
return client._parse_diarization_response(fake_localai_response(segs))
mock_single.side_effect = side_effect
result = client._diarize_and_transcribe_chunked(
audio_path=TEST_AUDIO_1,
verbose=False,
return_raw=False,
chunk_duration=10.0,
chunk_overlap=2.0,
)
# Check we got 4 segments total
assert len(result["segments"]) == 4
# First two segments should be in [0, 4]
assert result["segments"][0][0] == 0.0
assert result["segments"][1][0] == 2.0
# Next two segments should be shifted by 10
assert result["segments"][2][0] == 10.0
assert result["segments"][3][0] == 12.0
@pytest.mark.integration
def test_integration_chunked_transcription_with_localai():
"""
Integration test: run chunked transcription against a live LocalAI instance.
Only runs if LOCALAI_API_URL is set and an audio file is provided.
This test is skipped by default unless run with:
pytest -m integration
"""
api_url = os.getenv("LOCALAI_API_URL")
if not api_url:
pytest.skip("LOCALAI_API_URL not set; skipping integration test")
# Use one of the bundled test audio files
audio_path = TEST_AUDIO_1
if not os.path.exists(audio_path):
pytest.skip(f"Test audio not found: {audio_path}")
# Force chunking with a very small max_single_request_duration
# Use environment-configured model or a sensible default
model = os.getenv("LOCALAI_MODEL") or "vibevoice-cpp-asr"
client = LocalAIClient(api_url=api_url, model=model)
try:
result = client.diarize_and_transcribe(
audio_path=audio_path,
verbose=True,
return_raw=True,
use_chunking=True,
chunk_duration=3.0,
chunk_overlap=0.5,
max_single_request_duration=1.0,
)
assert "segments" in result
assert len(result["segments"]) > 0
# Basic sanity: segments are time-ordered
for i in range(1, len(result["segments"])):
prev_end = result["segments"][i - 1][1]
curr_start = result["segments"][i][0]
assert curr_start >= result["segments"][i - 1][0]
# If raw_result indicates chunked, ensure structure is sensible
raw = result.get("raw_result")
if raw and raw.get("chunked"):
assert "chunks" in raw
assert len(raw["chunks"]) > 1
finally:
client.close()