Compare commits
72 Commits
1582b90ddb
...
dev
| Author | SHA1 | Date | |
|---|---|---|---|
| cd0c730abe | |||
| 2bd6ee1567 | |||
| 4bc9f82ee7 | |||
| bdd0a80d8d | |||
| 7a31be9de5 | |||
| 54414def26 | |||
| 111d1ea18b | |||
| cb27ba80a1 | |||
| 2112b8c7e2 | |||
| 49f3cdc407 | |||
| 2c0998579c | |||
| 327c05ea16 | |||
| dabb5970ba | |||
| 36b0b6f241 | |||
| 6640bc050d | |||
| 59363c5dcd | |||
| 0e27537a68 | |||
| 0947e91f15 | |||
| 1d447f2836 | |||
| 49e607e1e1 | |||
| bd4393addc | |||
| f5836d83f3 | |||
| b2dce9e048 | |||
| 4d9414fee9 | |||
| d4ed84f68d | |||
| eb83a37f02 | |||
| e7aa5ebf25 | |||
| 1265a664cd | |||
| 83f3c09218 | |||
| d828a91bf3 | |||
| 670c6d3e2b | |||
| f20102d564 | |||
| 0e6bc53cf8 | |||
| c43076efd4 | |||
| 03d66219d9 | |||
| 0c0e52dfb8 | |||
| 604bfa3f41 | |||
| 8ff473f3e6 | |||
| 0b3f737e5b | |||
| 598f8630de | |||
| 7fac0e7d9c | |||
| 5dd56a3368 | |||
| 7364d572d5 | |||
| d51b006a19 | |||
| ea5a0752df | |||
| b0a1bc059b | |||
| e27e5b8522 | |||
| 6233a41f61 | |||
| 237bd4b37c | |||
| 7ece1a50c2 | |||
| 46fbcf80af | |||
| 42a155aeaa | |||
| b0a23b32e1 | |||
| 2e2bc3fb29 | |||
| 2f9299389b | |||
| e0d2fd6963 | |||
| 4651c5f8b2 | |||
| 6c11a8f19a | |||
| 2a2a5e024c | |||
| 7adca3d921 | |||
| efb34dd9ff | |||
| 11e5309a8e | |||
| a3ca1f3505 | |||
| 154cac6c7b | |||
| 18f4a4e8de | |||
| 2f304e3ed1 | |||
| fd94e2daa0 | |||
| e74bc04cb3 | |||
| c792fa17e8 | |||
| e55f36a131 | |||
| 572587bb85 | |||
| cfc38b21ed |
@@ -3,10 +3,12 @@
|
||||
ScrAIbe is a transcription and summarization service that:
|
||||
|
||||
- Sends audio to a LocalAI server running vibevoice.cpp for transcription and speaker diarization.
|
||||
- Optionally uses a second LLM to generate a detailed, structured summary.
|
||||
- Optionally uses a second LLM to generate a structured summary.
|
||||
- Provides:
|
||||
- A web GUI for uploading audio and receiving transcripts via email.
|
||||
- A CLI and Python API for direct integration.
|
||||
- An MCP-style HTTP API (OpenAPI) for LLMs and external systems.
|
||||
- A watch-folder mode for automatic transcription, summarization, and email delivery.
|
||||
|
||||
No local speech models or heavy dependencies are required. ScrAIbe is designed as a thin client in front of your own AI services.
|
||||
|
||||
@@ -24,21 +26,41 @@ For more information: https://apstrom.ca
|
||||
- Key decisions and outcomes
|
||||
- Action items and responsibilities
|
||||
- Open issues and risks
|
||||
- Async web GUI:
|
||||
- Improved, configurable summary prompts (via environment or file).
|
||||
- Async web GUI (always enabled):
|
||||
- Upload audio via browser.
|
||||
- Jobs are queued and processed in the background (Celery + Redis).
|
||||
- Emails:
|
||||
- Immediate confirmation with queue position.
|
||||
- Final transcript (MD + JSON) when ready.
|
||||
- Summary as MD file (if requested).
|
||||
- Final transcript (MD + DOCX + JSON) when ready.
|
||||
- Summary as MD + DOCX (if requested).
|
||||
- Error notification if processing fails.
|
||||
- MCP-style HTTP API (optional):
|
||||
- Exposes an OpenAPI-compliant REST endpoint for external LLMs or services.
|
||||
- Allows:
|
||||
- Audio upload for transcription.
|
||||
- Job status checks.
|
||||
- Retrieval of transcript JSON (no summary).
|
||||
- Enabled via MCP_SERVER_ENABLED=true.
|
||||
- Watch-folder mode (optional):
|
||||
- Monitors a directory for audio files.
|
||||
- For each file:
|
||||
- Transcribes and summarizes.
|
||||
- Emails transcript + summary + JSON to a configured address.
|
||||
- Deletes the source file after successful processing (configurable).
|
||||
- Enabled via WATCH_ENABLED=true.
|
||||
- File formats:
|
||||
- Transcript: .md and .docx
|
||||
- Summary (if requested): .md and .docx
|
||||
- Transcript:
|
||||
- .md
|
||||
- .docx (line-numbered, 30 lines per page, optional cover page)
|
||||
- Summary (if requested):
|
||||
- .md
|
||||
- .docx (markdown-aware WYSIWYG styling, optional cover page)
|
||||
- Full structured output: .json
|
||||
- Customizable branding:
|
||||
- Web GUI title, logo, and accent color via environment variables.
|
||||
- Email logo, accent color, and subject lines via environment variables.
|
||||
- Optional cover pages for transcript and summary DOCX.
|
||||
- CLI and Python API:
|
||||
- Simple command-line interface.
|
||||
- Drop-in Scraibe class for integration into other tools.
|
||||
@@ -58,7 +80,9 @@ For more information: https://apstrom.ca
|
||||
- Chunked summarization
|
||||
- Output formatting (e.g., .md with transcript + summary)
|
||||
- Runs:
|
||||
- Web GUI (Gradio)
|
||||
- Web GUI (Gradio) – always enabled
|
||||
- MCP-style HTTP API (FastAPI) – optional
|
||||
- Watch-folder mode – optional
|
||||
- Celery worker (async processing)
|
||||
- Redis (in-container by default)
|
||||
|
||||
@@ -209,6 +233,33 @@ Accent color (UI and emails):
|
||||
- Email headings, links, and email addresses
|
||||
- Default: #7C6DA0
|
||||
|
||||
MCP-style HTTP API:
|
||||
|
||||
- MCP_SERVER_ENABLED:
|
||||
- Enable MCP-style HTTP API (default: false).
|
||||
- Values: true/false.
|
||||
- MCP_SERVER_HOST:
|
||||
- Bind address (default: 0.0.0.0).
|
||||
- MCP_SERVER_PORT:
|
||||
- Port (default: 8000).
|
||||
- MCP_USE_CELERY:
|
||||
- Use Celery for async transcription (default: true).
|
||||
- If false, transcription runs in-process.
|
||||
|
||||
Watch-folder mode:
|
||||
|
||||
- WATCH_ENABLED:
|
||||
- Enable watch-folder mode (default: false).
|
||||
- Values: true/false.
|
||||
- WATCH_DIR:
|
||||
- Directory to monitor for audio files (required if WATCH_ENABLED=true).
|
||||
- WATCH_EMAIL_TO:
|
||||
- Email address to send transcript and summary (required if WATCH_ENABLED=true).
|
||||
- WATCH_POLL_INTERVAL:
|
||||
- Seconds between scans (default: 10).
|
||||
- WATCH_DELETE_ON_SUCCESS:
|
||||
- Delete source file after successful processing (default: true).
|
||||
|
||||
Async processing (Celery + Redis):
|
||||
|
||||
- CELERY_BROKER_URL:
|
||||
@@ -253,15 +304,40 @@ Email subject lines (customizable):
|
||||
- Subject for error notification email.
|
||||
- Default: "ScrAIbe: Error with your transcription request"
|
||||
|
||||
Output files (async web GUI):
|
||||
Summary prompt customization:
|
||||
|
||||
- SUMMARY_PROMPT_CHUNK:
|
||||
- Override prompt used for each transcript chunk.
|
||||
- SUMMARY_PROMPT_COMBINED:
|
||||
- Override prompt used for the final combined summary.
|
||||
- SUMMARY_PROMPT_FILE:
|
||||
- Path to a file with prompts in sections:
|
||||
- [chunk]
|
||||
- [combined]
|
||||
|
||||
DOCX and cover pages:
|
||||
|
||||
- COVER_PAGE_ENABLED:
|
||||
- Add a cover page to transcript and summary DOCX files (default: false).
|
||||
- COVER_PAGE_ORGANIZATION:
|
||||
- Organization name shown on the cover page.
|
||||
- COVER_PAGE_TITLE_PREFIX:
|
||||
- Title prefix (e.g., "TRANSCRIPT" or "SUMMARY").
|
||||
- COVER_PAGE_LOGO_URL:
|
||||
- Logo URL to include on the cover page.
|
||||
- COVER_PAGE_LOGO_PATH:
|
||||
- Local logo path to include on the cover page.
|
||||
|
||||
Output files (async web GUI and watch-folder mode):
|
||||
|
||||
When a job completes, the user receives:
|
||||
|
||||
- Transcript:
|
||||
- .md file
|
||||
- .docx file
|
||||
- .docx file (line-numbered, 30 lines per page, optional cover page)
|
||||
- Summary (if requested):
|
||||
- .md file
|
||||
- .docx file
|
||||
- .docx file (markdown-aware styling, optional cover page)
|
||||
- JSON:
|
||||
- Structured transcript with diarization and metadata
|
||||
|
||||
@@ -279,6 +355,8 @@ Core runtime dependencies:
|
||||
- celery[redis]
|
||||
- redis
|
||||
- python-docx
|
||||
- fastapi
|
||||
- uvicorn
|
||||
- ffmpeg (for audio preprocessing)
|
||||
|
||||
No local Whisper, PyTorch, or Pyannote models are required.
|
||||
|
||||
+35
-36
@@ -9,46 +9,56 @@
|
||||
<link href="https://fonts.googleapis.com/css2?family=Cormorant+Garamond:wght@400;700&display=swap" rel="stylesheet">
|
||||
|
||||
<style>
|
||||
.header-container {{
|
||||
.header-wrapper {{
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
align-items: center;
|
||||
justify-content: center;
|
||||
padding: 20px 20px 0;
|
||||
box-sizing: border-box;
|
||||
}}
|
||||
|
||||
.logo-container {{
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: space-between;
|
||||
padding: 20px 40px;
|
||||
box-sizing: border-box;
|
||||
justify-content: center;
|
||||
margin-bottom: 10px;
|
||||
}}
|
||||
|
||||
.logo {{
|
||||
width: 75px;
|
||||
height: auto;
|
||||
display: block;
|
||||
}}
|
||||
|
||||
.header-title {{
|
||||
font-family: 'Cormorant Garamond', serif;
|
||||
font-size: 42px;
|
||||
font-size: 45px;
|
||||
font-weight: bold;
|
||||
color: {accent_color};
|
||||
margin: 0;
|
||||
position: relative;
|
||||
padding: 0.4em 0;
|
||||
flex: 1;
|
||||
text-align: center;
|
||||
max-width: 90%;
|
||||
}}
|
||||
|
||||
.header-title::before, .header-title::after {{
|
||||
.header-title::before,
|
||||
.header-title::after {{
|
||||
content: "";
|
||||
position: absolute;
|
||||
height: 2px;
|
||||
width: 100%;
|
||||
width: 80%;
|
||||
background-color: {accent_color};
|
||||
left: 0;
|
||||
left: 10%;
|
||||
}}
|
||||
|
||||
.header-title::before {{ top: 0.4em; }}
|
||||
.header-title::after {{ bottom: 0.4em; }}
|
||||
|
||||
.logo-container {{
|
||||
flex-shrink: 0;
|
||||
margin-left: 20px;
|
||||
.header-title::before {{
|
||||
top: 0.4em;
|
||||
}}
|
||||
|
||||
.logo {{
|
||||
height: 80px;
|
||||
width: auto;
|
||||
display: block;
|
||||
.header-title::after {{
|
||||
bottom: 0.4em;
|
||||
}}
|
||||
|
||||
.header-description {{
|
||||
@@ -71,29 +81,18 @@
|
||||
}}
|
||||
|
||||
@media (max-width: 768px) {{
|
||||
.header-container {{
|
||||
flex-direction: column;
|
||||
align-items: center;
|
||||
padding: 15px;
|
||||
gap: 10px;
|
||||
}}
|
||||
|
||||
.header-title {{
|
||||
font-size: 28px;
|
||||
text-align: center;
|
||||
font-size: 31px;
|
||||
}}
|
||||
|
||||
.header-title::before, .header-title::after {{
|
||||
.header-title::before,
|
||||
.header-title::after {{
|
||||
width: 80%;
|
||||
left: 10%;
|
||||
}}
|
||||
|
||||
.logo-container {{
|
||||
margin-left: 0;
|
||||
}}
|
||||
|
||||
.logo {{
|
||||
height: 60px;
|
||||
width: 50px;
|
||||
}}
|
||||
|
||||
.header-description {{
|
||||
@@ -103,13 +102,13 @@
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<div class="header-container">
|
||||
<h1 class="header-title">{webui_title}</h1>
|
||||
<div class="header-wrapper">
|
||||
<div class="logo-container">
|
||||
<a href="{header_logo_url}">
|
||||
<img src="{header_logo_src}" alt="{webui_title}" class="logo">
|
||||
</a>
|
||||
</div>
|
||||
<h1 class="header-title">{webui_title}</h1>
|
||||
</div>
|
||||
<div class="header-description">
|
||||
<p>
|
||||
|
||||
@@ -13,7 +13,6 @@
|
||||
<h1 style="color:{accent_color};">Upload Successful</h1>
|
||||
<p>Dear user,</p>
|
||||
<p>Your file has been successfully uploaded and is now in our processing queue. This means that our system has received your file, and it is waiting to be processed. We will handle your file as soon as possible.</p>
|
||||
<p class="success-message">Your current position in the queue is: <span style="color:{accent_color}; font-weight:bold;">{queue_position}</span>. This is the order in which your file will be processed. We appreciate your patience as we work through the queue.</p>
|
||||
<p>We will notify you once your file has been processed. If you have any urgent needs or further questions, feel free to reach out to our support team.</p>
|
||||
<div class="contact">
|
||||
<p>You can contact our support team at <a href="mailto:{contact_email}" style="color:{accent_color};">{contact_email}</a>. Please note that our support team is here to help with any questions or issues you might have.</p>
|
||||
|
||||
@@ -72,6 +72,9 @@ scraibe = "scraibe.cli:cli"
|
||||
[tool.poetry.extras]
|
||||
app = ["scraibe-webui"]
|
||||
|
||||
[tool.ruff]
|
||||
line-length = 58
|
||||
|
||||
[tool.ruff.lint.extend-per-file-ignores]
|
||||
"__init__.py" = ["E402", "F403", "F401"]
|
||||
"scraibe/misc.py" = ["E722"]
|
||||
|
||||
+48
-1
@@ -3,10 +3,57 @@ Entrypoint for running ScrAIbe as a module:
|
||||
|
||||
python -m scraibe
|
||||
|
||||
Always launches the Web GUI (Gradio), never the CLI.
|
||||
Always launches the Web GUI (Gradio).
|
||||
Optionally launches:
|
||||
- MCP-style API server
|
||||
- Watch-folder mode
|
||||
"""
|
||||
|
||||
import os
|
||||
import threading
|
||||
import logging
|
||||
|
||||
logger = logging.getLogger("scraibe.__main__")
|
||||
|
||||
from .webui import create_app
|
||||
|
||||
|
||||
def _run_mcp_server():
|
||||
"""
|
||||
Run MCP server in a separate thread.
|
||||
"""
|
||||
import uvicorn
|
||||
from . import mcp_server
|
||||
|
||||
host = os.getenv("MCP_SERVER_HOST", "0.0.0.0")
|
||||
port = int(os.getenv("MCP_SERVER_PORT", "8000"))
|
||||
|
||||
uvicorn.run(
|
||||
mcp_server.app,
|
||||
host=host,
|
||||
port=port,
|
||||
log_level="info",
|
||||
)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
# Optionally start MCP server in background (non-blocking)
|
||||
mcp_enabled = os.getenv("MCP_SERVER_ENABLED", "false").strip().lower() in ("true", "1", "yes")
|
||||
if mcp_enabled:
|
||||
try:
|
||||
t = threading.Thread(target=_run_mcp_server, daemon=True)
|
||||
t.start()
|
||||
logger.info("MCP server started in background.")
|
||||
except Exception as e:
|
||||
logger.warning("Failed to start MCP server (WebUI will continue): %s", e)
|
||||
|
||||
# Optionally start watch-folder mode (non-blocking)
|
||||
try:
|
||||
from .watcher import start_watcher
|
||||
start_watcher()
|
||||
logger.info("Watch-folder mode started.")
|
||||
except Exception as e:
|
||||
logger.warning("Failed to start watch-folder mode (WebUI will continue): %s", e)
|
||||
|
||||
# Always start WebUI (Gradio)
|
||||
create_app()
|
||||
|
||||
@@ -7,13 +7,21 @@ Simplified audio processor for ScrAIbe.
|
||||
Previously this used torch and pyannote-style processing. In the LocalAI-backed
|
||||
version, we primarily pass files to the API, but we keep a lightweight helper
|
||||
for backward compatibility.
|
||||
|
||||
Now also includes utilities for chunking long audio into smaller segments
|
||||
to avoid GPU memory limits when using vibevoice-cpp on LocalAI.
|
||||
"""
|
||||
|
||||
import json
|
||||
import os
|
||||
import tempfile
|
||||
from subprocess import CalledProcessError, run
|
||||
import numpy as np
|
||||
|
||||
SAMPLE_RATE = 16000
|
||||
NORMALIZATION_FACTOR = 32768.0
|
||||
DEFAULT_CHUNK_DURATION = 180.0 # seconds
|
||||
DEFAULT_CHUNK_OVERLAP = 2.0 # seconds
|
||||
|
||||
|
||||
class AudioProcessor:
|
||||
@@ -106,3 +114,109 @@ class AudioProcessor:
|
||||
|
||||
def __repr__(self) -> str:
|
||||
return f"AudioProcessor(waveform_len={len(self.waveform)}, sr={self.sr})"
|
||||
|
||||
|
||||
def get_audio_duration(file_path: str) -> float:
|
||||
"""
|
||||
Get the duration of an audio file in seconds using ffprobe.
|
||||
|
||||
Args:
|
||||
file_path: Path to the audio file.
|
||||
|
||||
Returns:
|
||||
Duration in seconds as a float.
|
||||
|
||||
Raises:
|
||||
RuntimeError: If ffprobe fails.
|
||||
"""
|
||||
cmd = [
|
||||
"ffprobe",
|
||||
"-v", "error",
|
||||
"-show_entries", "format=duration",
|
||||
"-of", "json",
|
||||
file_path,
|
||||
]
|
||||
try:
|
||||
result = run(cmd, capture_output=True, text=True, check=True)
|
||||
data = json.loads(result.stdout)
|
||||
return float(data["format"]["duration"])
|
||||
except (CalledProcessError, json.JSONDecodeError, KeyError) as e:
|
||||
raise RuntimeError(f"Failed to get audio duration for {file_path}: {e}")
|
||||
|
||||
|
||||
def split_audio_into_chunks(
|
||||
input_path: str,
|
||||
max_duration: float = DEFAULT_CHUNK_DURATION,
|
||||
overlap: float = DEFAULT_CHUNK_OVERLAP,
|
||||
output_format: str = "wav",
|
||||
sample_rate: int = 24000,
|
||||
) -> list:
|
||||
"""
|
||||
Split a long audio file into overlapping chunks using ffmpeg.
|
||||
|
||||
Args:
|
||||
input_path: Path to the input audio file.
|
||||
max_duration: Maximum duration of each chunk in seconds.
|
||||
overlap: Overlap duration in seconds between consecutive chunks.
|
||||
output_format: Output format (e.g., 'wav').
|
||||
sample_rate: Sample rate for output chunks.
|
||||
|
||||
Returns:
|
||||
List of dicts:
|
||||
[{"path": "chunk.wav", "start": 0.0, "end": 180.0}, ...]
|
||||
Files must be cleaned up by the caller.
|
||||
"""
|
||||
duration = get_audio_duration(input_path)
|
||||
|
||||
# If file is shorter than max_duration, no need to split
|
||||
if duration <= max_duration:
|
||||
return [{"path": input_path, "start": 0.0, "end": duration}]
|
||||
|
||||
chunks = []
|
||||
start = 0.0
|
||||
chunk_id = 0
|
||||
|
||||
while start < duration:
|
||||
chunk_end = min(start + max_duration, duration)
|
||||
chunk_duration = chunk_end - start
|
||||
|
||||
tmp = tempfile.NamedTemporaryFile(
|
||||
delete=False,
|
||||
suffix=f".{output_format}",
|
||||
prefix="scraibe_chunk_",
|
||||
)
|
||||
chunk_path = tmp.name
|
||||
tmp.close()
|
||||
|
||||
cmd = [
|
||||
"ffmpeg",
|
||||
"-y",
|
||||
"-nostdin",
|
||||
"-ss", str(start),
|
||||
"-i", input_path,
|
||||
"-t", str(chunk_duration),
|
||||
"-ar", str(sample_rate),
|
||||
"-ac", "1",
|
||||
"-c:a", "pcm_s16le",
|
||||
chunk_path,
|
||||
]
|
||||
try:
|
||||
run(cmd, capture_output=True, check=True)
|
||||
except CalledProcessError as e:
|
||||
# Clean up on error
|
||||
if os.path.exists(chunk_path):
|
||||
os.remove(chunk_path)
|
||||
raise RuntimeError(
|
||||
f"Failed to create audio chunk {chunk_id} for {input_path}: {e.stderr.decode()}"
|
||||
)
|
||||
|
||||
chunks.append({
|
||||
"path": chunk_path,
|
||||
"start": start,
|
||||
"end": chunk_end,
|
||||
})
|
||||
|
||||
start += max_duration - overlap
|
||||
chunk_id += 1
|
||||
|
||||
return chunks
|
||||
|
||||
@@ -0,0 +1,118 @@
|
||||
"""
|
||||
Reusable cover-page generator for transcript and summary DOCX files.
|
||||
|
||||
Configuration (env):
|
||||
- COVER_PAGE_ENABLED: "true"/"false" (default: false)
|
||||
- COVER_PAGE_ORGANIZATION: e.g., "A.P.Strom"
|
||||
- COVER_PAGE_TITLE_PREFIX: e.g., "TRANSCRIPT" or "SUMMARY"
|
||||
- COVER_PAGE_LOGO_URL: optional URL
|
||||
- COVER_PAGE_LOGO_PATH: optional local path
|
||||
"""
|
||||
|
||||
import os
|
||||
from typing import Optional
|
||||
from docx import Document
|
||||
from docx.shared import Pt, Inches
|
||||
from docx.enum.text import WD_ALIGN_PARAGRAPH
|
||||
from docx.oxml import OxmlElement
|
||||
from docx.oxml.ns import qn
|
||||
|
||||
|
||||
def _add_page_break(doc: Document):
|
||||
"""Insert a page break paragraph."""
|
||||
p = doc.add_paragraph()
|
||||
pPr = p._p.get_or_add_pPr()
|
||||
# Clear spacing/tabs
|
||||
for child in list(pPr):
|
||||
tag = child.tag.split("}")[-1] if "}" in child.tag else child.tag
|
||||
if tag in ("tabs", "spacing", "ind"):
|
||||
pPr.remove(child)
|
||||
page_break = OxmlElement("w:pageBreak")
|
||||
page_break.set("{http://schemas.openxmlformats.org/wordprocessingml/2006/main}val", "1")
|
||||
pPr.append(page_break)
|
||||
|
||||
|
||||
def add_cover_page(
|
||||
doc: Document,
|
||||
title: str,
|
||||
subtitle: Optional[str] = None,
|
||||
metadata: Optional[dict] = None,
|
||||
include_logo: bool = False,
|
||||
):
|
||||
"""
|
||||
Insert a cover page at the current cursor position.
|
||||
|
||||
- title: e.g., "TRANSCRIPT" or "SUMMARY"
|
||||
- subtitle: e.g., "Meeting of 16 June 2026"
|
||||
- metadata: optional dict with keys like:
|
||||
- "Organization"
|
||||
- "Date"
|
||||
- "Prepared by"
|
||||
- "Reference"
|
||||
"""
|
||||
|
||||
org = (os.getenv("COVER_PAGE_ORGANIZATION") or "").strip() or metadata.get("Organization") if metadata else None
|
||||
date = (metadata.get("Date") if metadata else None) or ""
|
||||
prepared_by = (metadata.get("Prepared by") if metadata else None) or ""
|
||||
reference = (metadata.get("Reference") if metadata else None) or ""
|
||||
|
||||
# Title
|
||||
p = doc.add_paragraph()
|
||||
p.alignment = WD_ALIGN_PARAGRAPH.CENTER
|
||||
p.paragraph_format.space_after = Pt(6)
|
||||
run = p.add_run(title.upper())
|
||||
run.bold = True
|
||||
run.font.name = "Courier"
|
||||
run.font.size = Pt(18)
|
||||
|
||||
# Subtitle
|
||||
if subtitle:
|
||||
p = doc.add_paragraph()
|
||||
p.alignment = WD_ALIGN_PARAGRAPH.CENTER
|
||||
p.paragraph_format.space_after = Pt(12)
|
||||
run = p.add_run(subtitle)
|
||||
run.font.name = "Courier"
|
||||
run.font.size = Pt(14)
|
||||
|
||||
# Optional logo placeholder (text-only for now; can be extended)
|
||||
if include_logo:
|
||||
logo_url = (os.getenv("COVER_PAGE_LOGO_URL") or "").strip()
|
||||
logo_path = (os.getenv("COVER_PAGE_LOGO_PATH") or "").strip()
|
||||
# For now, just reserve space; image insertion can be added later.
|
||||
p = doc.add_paragraph()
|
||||
p.alignment = WD_ALIGN_PARAGRAPH.CENTER
|
||||
p.paragraph_format.space_after = Pt(12)
|
||||
|
||||
# Metadata lines
|
||||
if org or date or prepared_by or reference:
|
||||
p = doc.add_paragraph()
|
||||
p.alignment = WD_ALIGN_PARAGRAPH.CENTER
|
||||
p.paragraph_format.space_after = Pt(4)
|
||||
if org:
|
||||
r = p.add_run(org)
|
||||
r.font.name = "Courier"
|
||||
r.font.size = Pt(12)
|
||||
if date:
|
||||
if org:
|
||||
p.add_run("\n")
|
||||
r = p.add_run(date)
|
||||
r.font.name = "Courier"
|
||||
r.font.size = Pt(12)
|
||||
|
||||
if prepared_by or reference:
|
||||
p = doc.add_paragraph()
|
||||
p.alignment = WD_ALIGN_PARAGRAPH.CENTER
|
||||
p.paragraph_format.space_after = Pt(4)
|
||||
if prepared_by:
|
||||
r = p.add_run(f"Prepared by: {prepared_by}")
|
||||
r.font.name = "Courier"
|
||||
r.font.size = Pt(11)
|
||||
if reference:
|
||||
if prepared_by:
|
||||
p.add_run("\n")
|
||||
r = p.add_run(f"Reference: {reference}")
|
||||
r.font.name = "Courier"
|
||||
r.font.size = Pt(11)
|
||||
|
||||
# Page break after cover page
|
||||
_add_page_break(doc)
|
||||
@@ -0,0 +1,147 @@
|
||||
"""
|
||||
Utility module for applying styles and converting simple markdown
|
||||
into styled DOCX paragraphs/runs for summaries.
|
||||
"""
|
||||
|
||||
import re
|
||||
from docx import Document
|
||||
from docx.shared import Pt
|
||||
from docx.oxml import OxmlElement
|
||||
from docx.oxml.ns import qn
|
||||
|
||||
|
||||
def _ensure_style(doc, name, based_on="Normal", font_name="Courier", font_size=Pt(12)):
|
||||
"""
|
||||
Ensure a paragraph style exists in the document.
|
||||
"""
|
||||
styles = doc.styles
|
||||
if name not in [s.name for s in styles]:
|
||||
style = styles.add_style(name, 1) # 1 = WD_STYLE_TYPE.PARAGRAPH
|
||||
style.font.name = font_name
|
||||
style.font.size = font_size
|
||||
if based_on:
|
||||
style.base_style = styles[based_on]
|
||||
return styles[name]
|
||||
|
||||
|
||||
def apply_heading_style(doc, paragraph, level: int):
|
||||
"""
|
||||
Apply heading style to a paragraph based on level (1, 2, 3).
|
||||
"""
|
||||
if level == 1:
|
||||
style_name = "SummaryHeading1"
|
||||
size = Pt(16)
|
||||
elif level == 2:
|
||||
style_name = "SummaryHeading2"
|
||||
size = Pt(14)
|
||||
else:
|
||||
style_name = "SummaryHeading3"
|
||||
size = Pt(12)
|
||||
|
||||
style = _ensure_style(doc, style_name, font_size=size)
|
||||
paragraph.style = style
|
||||
paragraph.paragraph_format.space_before = Pt(4)
|
||||
paragraph.paragraph_format.space_after = Pt(2)
|
||||
|
||||
|
||||
def apply_bullet_style(doc, paragraph):
|
||||
"""
|
||||
Apply a simple bullet style to a paragraph.
|
||||
"""
|
||||
style_name = "SummaryBullet"
|
||||
style = _ensure_style(doc, style_name)
|
||||
paragraph.style = style
|
||||
pPr = paragraph._p.get_or_add_pPr()
|
||||
tabs = OxmlElement("w:tabs")
|
||||
tab = OxmlElement("w:tab")
|
||||
tab.set(qn("w:val"), "left")
|
||||
tab.set(qn("w:pos"), "360")
|
||||
tabs.append(tab)
|
||||
pPr.append(tabs)
|
||||
|
||||
|
||||
def parse_simple_md_to_paragraphs(doc, text: str):
|
||||
"""
|
||||
Convert simple markdown text into DOCX paragraphs with styles.
|
||||
|
||||
Supported:
|
||||
- # / ## / ### for headings
|
||||
- - / * for bullet lists
|
||||
- **bold** and *italic*
|
||||
|
||||
This is intentionally simple and robust for legal/business summaries.
|
||||
"""
|
||||
lines = text.splitlines()
|
||||
current_paragraph = None
|
||||
in_list = False
|
||||
|
||||
for line in lines:
|
||||
stripped = line.strip()
|
||||
if not stripped:
|
||||
current_paragraph = None
|
||||
in_list = False
|
||||
continue
|
||||
|
||||
# Headings
|
||||
heading_match = re.match(r"^(#{1,3})\s+(.*)", stripped)
|
||||
if heading_match:
|
||||
level = len(heading_match.group(1))
|
||||
content = heading_match.group(2).strip()
|
||||
p = doc.add_paragraph()
|
||||
apply_heading_style(doc, p, level)
|
||||
_add_run_with_inline_md(p, content)
|
||||
current_paragraph = p
|
||||
in_list = False
|
||||
continue
|
||||
|
||||
# Bullet list
|
||||
bullet_match = re.match(r"^[-*]\s+(.*)", stripped)
|
||||
if bullet_match:
|
||||
content = bullet_match.group(1).strip()
|
||||
if not in_list or current_paragraph is None:
|
||||
in_list = True
|
||||
current_paragraph = doc.add_paragraph()
|
||||
apply_bullet_style(doc, current_paragraph)
|
||||
else:
|
||||
current_paragraph = doc.add_paragraph()
|
||||
apply_bullet_style(doc, current_paragraph)
|
||||
_add_run_with_inline_md(current_paragraph, content)
|
||||
continue
|
||||
|
||||
# Normal paragraph
|
||||
if not in_list or current_paragraph is None:
|
||||
in_list = False
|
||||
current_paragraph = doc.add_paragraph()
|
||||
else:
|
||||
current_paragraph = doc.add_paragraph()
|
||||
|
||||
_add_run_with_inline_md(current_paragraph, stripped)
|
||||
|
||||
|
||||
def _add_run_with_inline_md(paragraph, text: str):
|
||||
"""
|
||||
Add runs to a paragraph, interpreting **bold** and *italic*.
|
||||
"""
|
||||
# Simple regex for bold and italic
|
||||
parts = re.split(r"(\*\*\*.*?\*\*\*|\*\*.*?\*\*|\*.*?\*)", text)
|
||||
for part in parts:
|
||||
if not part:
|
||||
continue
|
||||
|
||||
run = paragraph.add_run(part)
|
||||
run.font.name = "Courier"
|
||||
run.font.size = Pt(12)
|
||||
|
||||
# Bold
|
||||
bold_match = re.fullmatch(r"\*\*(.+?)\*\*", part)
|
||||
if bold_match:
|
||||
run.bold = True
|
||||
part = bold_match.group(1)
|
||||
|
||||
# Italic
|
||||
italic_match = re.fullmatch(r"\*(.+?)\*", part)
|
||||
if italic_match:
|
||||
run.italic = True
|
||||
part = italic_match.group(1)
|
||||
|
||||
run.text = part
|
||||
+326
-80
@@ -8,20 +8,22 @@ Template placeholders are primarily filled via environment variables.
|
||||
"""
|
||||
|
||||
import base64
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
import smtplib
|
||||
import logging
|
||||
from email import encoders
|
||||
from email.mime.base import MIMEBase
|
||||
from email.mime.multipart import MIMEMultipart
|
||||
from email.mime.text import MIMEText
|
||||
from typing import List, Optional, Dict, Any
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
from docx import Document
|
||||
from docx.shared import Inches, Pt
|
||||
from docx.oxml.ns import qn
|
||||
from docx.oxml import OxmlElement
|
||||
from docx.oxml.ns import qn
|
||||
from docx.shared import Inches, Pt
|
||||
from docx.enum.text import WD_ALIGN_PARAGRAPH
|
||||
|
||||
logger = logging.getLogger("scraibe.email_sender")
|
||||
|
||||
@@ -214,27 +216,34 @@ def send_email(
|
||||
if not to_list:
|
||||
raise EmailError("No valid 'To' email addresses provided.")
|
||||
|
||||
# Build message
|
||||
msg = MIMEMultipart("alternative")
|
||||
# Ensure subject is never blank
|
||||
if not subject or not subject.strip():
|
||||
logger.warning("Subject was blank or missing; using default subject.")
|
||||
subject = "ScrAIbe: Your transcript is ready"
|
||||
|
||||
subject = subject.strip()
|
||||
|
||||
has_attachments = bool(attachments)
|
||||
|
||||
# Build the text/HTML part (alternative)
|
||||
alt = MIMEMultipart("alternative")
|
||||
alt.attach(MIMEText(body, "plain"))
|
||||
if html:
|
||||
alt.attach(MIMEText(html, "html"))
|
||||
|
||||
if has_attachments:
|
||||
# Outer message: multipart/mixed with headers
|
||||
msg = MIMEMultipart("mixed")
|
||||
msg["From"] = cfg["from_address"]
|
||||
msg["To"] = ", ".join(to_list)
|
||||
if cc_list:
|
||||
msg["Cc"] = ", ".join(cc_list)
|
||||
msg["Subject"] = subject
|
||||
|
||||
# Attach plain text
|
||||
msg.attach(MIMEText(body, "plain"))
|
||||
|
||||
# Attach HTML if provided
|
||||
if html:
|
||||
msg.attach(MIMEText(html, "html"))
|
||||
|
||||
# Attach files in a separate multipart/mixed part
|
||||
if attachments:
|
||||
mixed = MIMEMultipart("mixed")
|
||||
mixed.attach(msg)
|
||||
msg = mixed
|
||||
# Attach the alternative (text/HTML) part
|
||||
msg.attach(alt)
|
||||
|
||||
# Attach files
|
||||
for file_path in attachments:
|
||||
if not os.path.isfile(file_path):
|
||||
logger.warning("Attachment file not found, skipping: %s", file_path)
|
||||
@@ -253,6 +262,14 @@ def send_email(
|
||||
msg.attach(part)
|
||||
except Exception as e:
|
||||
logger.warning("Failed to attach file %s: %s", file_path, e)
|
||||
else:
|
||||
# No attachments: use the alternative part as the root message
|
||||
msg = alt
|
||||
msg["From"] = cfg["from_address"]
|
||||
msg["To"] = ", ".join(to_list)
|
||||
if cc_list:
|
||||
msg["Cc"] = ", ".join(cc_list)
|
||||
msg["Subject"] = subject
|
||||
|
||||
# Connect and send
|
||||
try:
|
||||
@@ -273,9 +290,10 @@ def send_email(
|
||||
)
|
||||
server.quit()
|
||||
logger.info(
|
||||
"Email sent to %s (CC: %s)",
|
||||
"Email sent to %s (CC: %s) with subject: %s",
|
||||
to_list,
|
||||
cc_list or "None",
|
||||
subject,
|
||||
)
|
||||
return True
|
||||
|
||||
@@ -284,88 +302,316 @@ def send_email(
|
||||
raise EmailError(f"Failed to send email: {e}")
|
||||
|
||||
|
||||
def create_transcript_docx(text: str, filename: str):
|
||||
"""
|
||||
Create a .docx transcript with:
|
||||
- 1.5" left margin, 1" right margin
|
||||
- 12pt Courier
|
||||
- Continuous line numbering on the left
|
||||
- Speaker names capitalized and indented; spoken text further indented
|
||||
- No section headings; use bold/underline only.
|
||||
"""
|
||||
doc = Document()
|
||||
# ------------ DOCX helpers ------------
|
||||
|
||||
# Set margins via section properties
|
||||
section = doc.sections[0]
|
||||
section.left_margin = Inches(1.5)
|
||||
section.right_margin = Inches(1.0)
|
||||
section.top_margin = Inches(1.0)
|
||||
section.bottom_margin = Inches(1.0)
|
||||
# Namespaces
|
||||
W_NS = "http://schemas.openxmlformats.org/wordprocessingml/2006/main"
|
||||
|
||||
# Enable continuous line numbering on the left
|
||||
|
||||
def _set_element_attr(elem, attr, value):
|
||||
elem.set(f"{{{W_NS}}}{attr}", str(value))
|
||||
|
||||
|
||||
def _create_transcript_section_properties(section):
|
||||
"""
|
||||
Configure the section properties for transcript DOCX:
|
||||
- Margins: 1 inch all sides
|
||||
- Single column layout
|
||||
- No built-in line numbering (we embed line numbers as text for portability)
|
||||
- Remove document grid to avoid off-by-one line numbering
|
||||
"""
|
||||
sectPr = section._sectPr
|
||||
lnNumType = sectPr.find(qn("w:lnNumType"))
|
||||
if lnNumType is None:
|
||||
lnNumType = OxmlElement("w:lnNumType")
|
||||
sectPr.append(lnNumType)
|
||||
lnNumType.set(qn("w:start"), "continuous")
|
||||
lnNumType.set(qn("w:countBy"), "1")
|
||||
|
||||
# Default font
|
||||
style = doc.styles["Normal"]
|
||||
font = style.font
|
||||
font.name = "Courier"
|
||||
font.size = Pt(12)
|
||||
# Margins: 1 inch = 1440 twips
|
||||
pgMar = sectPr.find(f"{{{W_NS}}}pgMar")
|
||||
if pgMar is None:
|
||||
pgMar = OxmlElement("w:pgMar")
|
||||
sectPr.append(pgMar)
|
||||
_set_element_attr(pgMar, "top", "1440")
|
||||
_set_element_attr(pgMar, "right", "1440")
|
||||
_set_element_attr(pgMar, "bottom", "1440")
|
||||
_set_element_attr(pgMar, "left", "1440")
|
||||
_set_element_attr(pgMar, "header", "720")
|
||||
_set_element_attr(pgMar, "footer", "720")
|
||||
_set_element_attr(pgMar, "gutter", "0")
|
||||
|
||||
# Parse lines
|
||||
lines = text.strip().split("\n")
|
||||
for line in lines:
|
||||
line = line.strip()
|
||||
if not line:
|
||||
continue
|
||||
# Ensure single column (no multi-column layout)
|
||||
cols = sectPr.find(f"{{{W_NS}}}cols")
|
||||
if cols is not None:
|
||||
_set_element_attr(cols, "num", "1")
|
||||
_set_element_attr(cols, "space", "720")
|
||||
|
||||
# Remove document grid entirely
|
||||
for docGrid in sectPr.findall(f"{{{W_NS}}}docGrid"):
|
||||
sectPr.remove(docGrid)
|
||||
|
||||
# Remove any built-in line numbering; we will use text-based line numbers
|
||||
for lnNumType in sectPr.findall(f"{{{W_NS}}}lnNumType"):
|
||||
sectPr.remove(lnNumType)
|
||||
|
||||
|
||||
def _add_transcript_paragraph(doc, line_text, line_number):
|
||||
"""
|
||||
Add a single transcript line as a paragraph with an embedded line number.
|
||||
Uses a left tab stop so the line number appears in the left margin area,
|
||||
independent of built-in line numbering, ensuring consistent behavior
|
||||
across Word, LibreOffice, Google Docs, etc.
|
||||
"""
|
||||
line_text = line_text.strip()
|
||||
if not line_text:
|
||||
return
|
||||
|
||||
p = doc.add_paragraph()
|
||||
|
||||
# Set up paragraph formatting:
|
||||
# - No left indent; we control spacing via tab stop
|
||||
# - Single line spacing, no extra before/after
|
||||
pPr = p._p.get_or_add_pPr()
|
||||
|
||||
# Remove any default indent
|
||||
pPr.find(f"{{{W_NS}}}ind") and pPr.remove(pPr.find(f"{{{W_NS}}}ind"))
|
||||
|
||||
# Define a left tab stop for line numbers (e.g. 360 twips ≈ 0.25")
|
||||
tabs = OxmlElement("w:tabs")
|
||||
tab = OxmlElement("w:tab")
|
||||
tab.set("{http://schemas.openxmlformats.org/wordprocessingml/2006/main}val", "left")
|
||||
tab.set("{http://schemas.openxmlformats.org/wordprocessingml/2006/main}pos", "360")
|
||||
tabs.append(tab)
|
||||
pPr.append(tabs)
|
||||
|
||||
spacing = OxmlElement("w:spacing")
|
||||
_set_element_attr(spacing, "before", "0")
|
||||
_set_element_attr(spacing, "after", "0")
|
||||
_set_element_attr(spacing, "line", "360") # 1.5 line spacing (12pt * 1.5 = 18pt → 360 twips)
|
||||
_set_element_attr(spacing, "lineRule", "auto")
|
||||
pPr.append(spacing)
|
||||
|
||||
# Try to match: [00:00] SPEAKER 1: content
|
||||
m = re.match(r"\[(\d+:\d+(?::\d+)?)\]\s*(.+?):\s*(.*)", line_text)
|
||||
|
||||
# Line number run (no underline)
|
||||
run_ln = p.add_run(str(line_number))
|
||||
run_ln.font.name = "Courier"
|
||||
run_ln.font.size = Pt(12)
|
||||
run_ln.underline = False
|
||||
|
||||
# Tab + spaces between line number and content
|
||||
# - 2 base spaces + 7 more for first line of speaker turn
|
||||
# - 2 base spaces + 3 more for continuation lines
|
||||
if m:
|
||||
extra_spaces = " " # 7 spaces for speaker lines
|
||||
else:
|
||||
extra_spaces = " " # 3 spaces for continuation lines
|
||||
|
||||
run_tab = p.add_run("\t " + extra_spaces)
|
||||
run_tab.font.name = "Courier"
|
||||
run_tab.font.size = Pt(12)
|
||||
run_tab.underline = False
|
||||
|
||||
# Try to parse: [00:00] SPEAKER: text
|
||||
m = re.match(r"\[(\d+:\d+(?::\d+)?)\]\s*(.+?):\s*(.*)", line)
|
||||
if m:
|
||||
ts, speaker, content = m.groups()
|
||||
# Speaker line: bold, underlined, indented
|
||||
p_spk = doc.add_paragraph()
|
||||
p_spk.paragraph_format.left_indent = Inches(0.25)
|
||||
run_spk = p_spk.add_run(f"[{ts}] {speaker.upper()}")
|
||||
run_spk.bold = True
|
||||
run_spk.underline = True
|
||||
run_spk.font.name = "Courier"
|
||||
run_spk.font.size = Pt(12)
|
||||
label_text = f"[{ts}] {speaker.upper()}:"
|
||||
|
||||
# Spoken text line: further indented
|
||||
p_txt = doc.add_paragraph()
|
||||
p_txt.paragraph_format.left_indent = Inches(0.5)
|
||||
run_txt = p_txt.add_run(content.strip())
|
||||
# Label run (underline)
|
||||
run_label = p.add_run(label_text)
|
||||
run_label.underline = True
|
||||
run_label.font.name = "Courier"
|
||||
run_label.font.size = Pt(12)
|
||||
|
||||
# Space run (no underline)
|
||||
run_space = p.add_run(" ")
|
||||
run_space.underline = False
|
||||
run_space.font.name = "Courier"
|
||||
run_space.font.size = Pt(12)
|
||||
|
||||
# Content run (no underline)
|
||||
run_txt = p.add_run(content.strip())
|
||||
run_txt.underline = False
|
||||
run_txt.font.name = "Courier"
|
||||
run_txt.font.size = Pt(12)
|
||||
else:
|
||||
# Fallback for non-standard lines
|
||||
p = doc.add_paragraph()
|
||||
run = p.add_run(line)
|
||||
# Non-standard line: plain text
|
||||
run = p.add_run(line_text)
|
||||
run.underline = False
|
||||
run.font.name = "Courier"
|
||||
run.font.size = Pt(12)
|
||||
|
||||
|
||||
# ------------ Public DOCX functions ------------
|
||||
|
||||
def create_transcript_docx(text: str, filename: str):
|
||||
"""
|
||||
Create a transcript DOCX with:
|
||||
- 1" margins on all sides
|
||||
- 12pt Courier font
|
||||
- Each page has exactly 29 numbered lines of text
|
||||
- Max 60 characters per line (including number and spaces)
|
||||
- Words preserved (no clipping or omission)
|
||||
- Blank spacing between number and text preserved
|
||||
- Page break after every 29 lines
|
||||
- Centered footer: "X of Y"
|
||||
"""
|
||||
# Step 1: Prepare transcript into pages of 29 lines each
|
||||
# Each line <= 60 chars total, words preserved, no clipping
|
||||
# Structure: nested list of paragraphs (pages -> lines)
|
||||
prepared_pages = []
|
||||
current_page = []
|
||||
line_count = 0
|
||||
|
||||
# 52 chars content + 2 digits + 1 tab + 9 spaces = 64 max
|
||||
MAX_CONTENT_LEN = 52
|
||||
|
||||
for raw_line in text.strip().splitlines():
|
||||
raw_line = raw_line.strip()
|
||||
if not raw_line:
|
||||
continue
|
||||
|
||||
# Wrap into segments without clipping words
|
||||
words = raw_line.split()
|
||||
segments = []
|
||||
current = ""
|
||||
for w in words:
|
||||
if not current:
|
||||
current = w
|
||||
elif len(current) + 1 + len(w) <= MAX_CONTENT_LEN:
|
||||
current += " " + w
|
||||
else:
|
||||
segments.append(current)
|
||||
current = w
|
||||
if current:
|
||||
segments.append(current)
|
||||
|
||||
# Add segments to pages, enforcing 29 lines per page
|
||||
for seg in segments:
|
||||
if line_count == 30:
|
||||
prepared_pages.append(current_page)
|
||||
current_page = []
|
||||
line_count = 0
|
||||
current_page.append(seg)
|
||||
line_count += 1
|
||||
|
||||
if current_page:
|
||||
prepared_pages.append(current_page)
|
||||
|
||||
# Step 2: Create DOCX
|
||||
doc = Document()
|
||||
style = doc.styles["Normal"]
|
||||
style.font.name = "Courier"
|
||||
style.font.size = Pt(12)
|
||||
|
||||
body = doc.element.body
|
||||
for p in list(body.findall(f"{{{W_NS}}}p")):
|
||||
body.remove(p)
|
||||
|
||||
_create_transcript_section_properties(doc.sections[0])
|
||||
|
||||
# Step 3: Optionally add cover page
|
||||
from . import docx_cover
|
||||
cover_enabled = os.getenv("COVER_PAGE_ENABLED", "false").strip().lower() in ("true", "1", "yes")
|
||||
if cover_enabled:
|
||||
docx_cover.add_cover_page(
|
||||
doc,
|
||||
title="TRANSCRIPT",
|
||||
subtitle=None,
|
||||
metadata=None,
|
||||
include_logo=True,
|
||||
)
|
||||
|
||||
# Step 4: Write prepared pages into DOCX
|
||||
for page_idx, page_lines in enumerate(prepared_pages):
|
||||
# Insert page break between pages
|
||||
if page_idx > 0:
|
||||
p_break = doc.add_paragraph()
|
||||
pPr = p_break._p.get_or_add_pPr()
|
||||
for child in list(pPr):
|
||||
tag = child.tag.split("}")[-1] if "}" in child.tag else child.tag
|
||||
if tag in ("tabs", "spacing", "ind"):
|
||||
pPr.remove(child)
|
||||
page_break = OxmlElement("w:pageBreak")
|
||||
page_break.set("{http://schemas.openxmlformats.org/wordprocessingml/2006/main}val", "1")
|
||||
pPr.append(page_break)
|
||||
|
||||
# Write each line with its number (1-29)
|
||||
for line_num, line_text in enumerate(page_lines, start=1):
|
||||
_add_transcript_paragraph(doc, line_text, line_number=line_num)
|
||||
|
||||
# Step 5: Add footer: "X of Y" centered
|
||||
section = doc.sections[0]
|
||||
footer = section.footer
|
||||
footer.is_linked_to_previous = False
|
||||
footer_para = footer.paragraphs[0] if footer.paragraphs else footer.add_paragraph()
|
||||
footer_para.alignment = WD_ALIGN_PARAGRAPH.CENTER
|
||||
|
||||
for r in footer_para.runs:
|
||||
r.text = ""
|
||||
|
||||
def add_field(run, code):
|
||||
fldChar = OxmlElement("w:fldChar")
|
||||
fldChar.set(qn("w:fldCharType"), "begin")
|
||||
run._r.append(fldChar)
|
||||
|
||||
instrText = OxmlElement("w:instrText")
|
||||
instrText.set(qn("xml:space"), "preserve")
|
||||
instrText.text = code
|
||||
run._r.append(instrText)
|
||||
|
||||
fldCharEnd = OxmlElement("w:fldChar")
|
||||
fldCharEnd.set(qn("w:fldCharType"), "end")
|
||||
run._r.append(fldCharEnd)
|
||||
|
||||
run_page = footer_para.add_run()
|
||||
add_field(run_page, " PAGE ")
|
||||
|
||||
run_of = footer_para.add_run(" of ")
|
||||
|
||||
run_total = footer_para.add_run()
|
||||
add_field(run_total, " NUMPAGES ")
|
||||
|
||||
doc.save(filename)
|
||||
|
||||
|
||||
def create_summary_docx(text: str, filename: str):
|
||||
"""
|
||||
Create a .docx summary with consistent font.
|
||||
No section headings; use bold/underline only.
|
||||
Create a summary DOCX with:
|
||||
- 1" margins on all sides
|
||||
- 12pt Courier font
|
||||
- Markdown-aware WYSIWYG styling (headings, bullets, bold/italic)
|
||||
"""
|
||||
doc = Document()
|
||||
style = doc.styles["Normal"]
|
||||
font = style.font
|
||||
font.name = "Courier"
|
||||
font.size = Pt(12)
|
||||
from . import docx_styles
|
||||
|
||||
for line in text.splitlines():
|
||||
p = doc.add_paragraph(line)
|
||||
p.paragraph_format.space_after = Pt(4)
|
||||
doc = Document()
|
||||
|
||||
# Base font
|
||||
style = doc.styles["Normal"]
|
||||
style.font.name = "Courier"
|
||||
style.font.size = Pt(12)
|
||||
|
||||
# Margins: 1 inch all sides
|
||||
for section in doc.sections:
|
||||
section.left_margin = Inches(1.0)
|
||||
section.right_margin = Inches(1.0)
|
||||
section.top_margin = Inches(1.0)
|
||||
section.bottom_margin = Inches(1.0)
|
||||
|
||||
# Remove default paragraph
|
||||
body = doc.element.body
|
||||
for p in list(body.findall(f"{{{W_NS}}}p")):
|
||||
body.remove(p)
|
||||
|
||||
# Optionally add cover page
|
||||
from . import docx_cover
|
||||
cover_enabled = os.getenv("COVER_PAGE_ENABLED", "false").strip().lower() in ("true", "1", "yes")
|
||||
if cover_enabled:
|
||||
docx_cover.add_cover_page(
|
||||
doc,
|
||||
title="SUMMARY",
|
||||
subtitle=None,
|
||||
metadata=None,
|
||||
include_logo=True,
|
||||
)
|
||||
|
||||
# Add summary content using markdown-aware styling
|
||||
if text.strip():
|
||||
docx_styles.parse_simple_md_to_paragraphs(doc, text.strip())
|
||||
|
||||
doc.save(filename)
|
||||
|
||||
+308
-1
@@ -9,11 +9,21 @@ It replaces the previous local Whisper + Pyannote pipeline by sending
|
||||
audio files to the /v1/audio/diarization endpoint and mapping the
|
||||
response into the same Transcript format used by the UI.
|
||||
|
||||
For long audio files, it can chunk the input to avoid GPU OOM errors.
|
||||
|
||||
Environment Variables:
|
||||
LOCALAI_API_URL: (required) Base URL of the LocalAI server
|
||||
(e.g., http://localhost:8080)
|
||||
LOCALAI_API_KEY: (optional) API key, if configured
|
||||
LOCALAI_MODEL: (optional) Model name to use (default: vibevoice-diarize)
|
||||
|
||||
Chunking / long audio (all optional):
|
||||
LOCALAI_CHUNK_DURATION: Max duration of each chunk in seconds
|
||||
(default: 180.0)
|
||||
LOCALAI_CHUNK_OVERLAP: Overlap between consecutive chunks in seconds
|
||||
(default: 2.0)
|
||||
LOCALAI_MAX_SINGLE_REQUEST_DURATION: If audio duration exceeds this, chunking
|
||||
is enabled automatically (default: 300.0)
|
||||
"""
|
||||
|
||||
import os
|
||||
@@ -24,6 +34,8 @@ from typing import Dict, List, Any, Optional
|
||||
|
||||
import httpx
|
||||
|
||||
from .audio import get_audio_duration, split_audio_into_chunks
|
||||
|
||||
logger = logging.getLogger("scraibe.localai_client")
|
||||
|
||||
|
||||
@@ -41,14 +53,20 @@ class LocalAIClient:
|
||||
- Upload audio file as multipart/form-data.
|
||||
- Parse diarization + transcription response (verbose_json).
|
||||
- Map response into the same structure expected by Scraibe's Transcript.
|
||||
- For long audio: chunk, transcribe each chunk, merge results.
|
||||
"""
|
||||
|
||||
# Default thresholds for chunking long audio to avoid GPU OOM.
|
||||
# These can be overridden via environment or at call time.
|
||||
DEFAULT_CHUNK_DURATION = 180.0 # seconds
|
||||
DEFAULT_CHUNK_OVERLAP = 2.0 # seconds
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
api_url: Optional[str] = None,
|
||||
api_key: Optional[str] = None,
|
||||
model: Optional[str] = None,
|
||||
timeout: float = 600.0,
|
||||
timeout: float = 3600.0,
|
||||
):
|
||||
"""
|
||||
Args:
|
||||
@@ -82,6 +100,55 @@ class LocalAIClient:
|
||||
follow_redirects=True,
|
||||
)
|
||||
|
||||
@staticmethod
|
||||
def _env_float(var: str, default: float) -> float:
|
||||
"""
|
||||
Read a float from environment with a fallback default.
|
||||
"""
|
||||
val = (os.getenv(var) or "").strip()
|
||||
if val == "":
|
||||
return default
|
||||
try:
|
||||
return float(val)
|
||||
except ValueError:
|
||||
logger.warning(
|
||||
"Invalid value for %s: %s; using default %s", var, val, default
|
||||
)
|
||||
return default
|
||||
|
||||
def _effective_chunk_duration(self, provided: Optional[float]) -> float:
|
||||
"""
|
||||
Resolve chunk_duration using this precedence:
|
||||
1) provided argument
|
||||
2) LOCALAI_CHUNK_DURATION env
|
||||
3) class default
|
||||
"""
|
||||
if provided is not None:
|
||||
return provided
|
||||
return self._env_float("LOCALAI_CHUNK_DURATION", self.DEFAULT_CHUNK_DURATION)
|
||||
|
||||
def _effective_chunk_overlap(self, provided: Optional[float]) -> float:
|
||||
"""
|
||||
Resolve chunk_overlap:
|
||||
1) provided argument
|
||||
2) LOCALAI_CHUNK_OVERLAP env
|
||||
3) class default
|
||||
"""
|
||||
if provided is not None:
|
||||
return provided
|
||||
return self._env_float("LOCALAI_CHUNK_OVERLAP", self.DEFAULT_CHUNK_OVERLAP)
|
||||
|
||||
def _effective_max_single_request_duration(self, provided: Optional[float]) -> float:
|
||||
"""
|
||||
Resolve max_single_request_duration:
|
||||
1) provided argument
|
||||
2) LOCALAI_MAX_SINGLE_REQUEST_DURATION env
|
||||
3) default 300.0
|
||||
"""
|
||||
if provided is not None:
|
||||
return provided
|
||||
return self._env_float("LOCALAI_MAX_SINGLE_REQUEST_DURATION", 300.0)
|
||||
|
||||
def close(self):
|
||||
"""Close the underlying HTTP client."""
|
||||
self._client.close()
|
||||
@@ -107,6 +174,10 @@ class LocalAIClient:
|
||||
include_text: Optional[bool] = None,
|
||||
verbose: bool = False,
|
||||
return_raw: bool = False,
|
||||
use_chunking: Optional[bool] = None,
|
||||
chunk_duration: Optional[float] = None,
|
||||
chunk_overlap: Optional[float] = None,
|
||||
max_single_request_duration: Optional[float] = None,
|
||||
**_ignored,
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
@@ -114,6 +185,8 @@ class LocalAIClient:
|
||||
- A normalized dict with segments, speakers, transcripts.
|
||||
- Optionally, the raw verbose_json response (for JSON export).
|
||||
|
||||
For long audio, it can automatically chunk the file to avoid GPU OOM.
|
||||
|
||||
Args:
|
||||
audio_path: Path to the audio file.
|
||||
language: Language hint, forwarded if set.
|
||||
@@ -129,6 +202,93 @@ class LocalAIClient:
|
||||
Defaults to True.
|
||||
verbose: If True, prints progress messages.
|
||||
return_raw: If True, also return the raw API response in 'raw_result'.
|
||||
use_chunking: Whether to enable chunking for long audio.
|
||||
If None, enabled automatically based on duration.
|
||||
chunk_duration: Max duration per chunk in seconds.
|
||||
Falls back to LOCALAI_CHUNK_DURATION env, then 180.0.
|
||||
chunk_overlap: Overlap between chunks in seconds.
|
||||
Falls back to LOCALAI_CHUNK_OVERLAP env, then 2.0.
|
||||
max_single_request_duration: If audio duration exceeds this, chunking
|
||||
is enabled (unless explicitly disabled).
|
||||
Falls back to LOCALAI_MAX_SINGLE_REQUEST_DURATION
|
||||
env, then 300.0.
|
||||
"""
|
||||
if verbose:
|
||||
print("Starting diarization and transcription via LocalAI.")
|
||||
|
||||
logger.info("diarize_and_transcribe requested for: %s", audio_path)
|
||||
|
||||
# Resolve chunking parameters with environment support
|
||||
chunk_duration = self._effective_chunk_duration(chunk_duration)
|
||||
chunk_overlap = self._effective_chunk_overlap(chunk_overlap)
|
||||
max_single = self._effective_max_single_request_duration(max_single_request_duration)
|
||||
|
||||
if use_chunking is None:
|
||||
try:
|
||||
duration = get_audio_duration(audio_path)
|
||||
except RuntimeError:
|
||||
duration = None
|
||||
|
||||
use_chunking = (duration is not None and duration > max_single)
|
||||
logger.info(
|
||||
"Auto-chunking decision: duration=%s, threshold=%s, use_chunking=%s",
|
||||
duration,
|
||||
max_single,
|
||||
use_chunking,
|
||||
)
|
||||
|
||||
if use_chunking:
|
||||
return self._diarize_and_transcribe_chunked(
|
||||
audio_path=audio_path,
|
||||
language=language,
|
||||
num_speakers=num_speakers,
|
||||
min_speakers=min_speakers,
|
||||
max_speakers=max_speakers,
|
||||
clustering_threshold=clustering_threshold,
|
||||
min_duration_on=min_duration_on,
|
||||
min_duration_off=min_duration_off,
|
||||
response_format=response_format,
|
||||
include_text=include_text,
|
||||
verbose=verbose,
|
||||
return_raw=return_raw,
|
||||
chunk_duration=chunk_duration,
|
||||
chunk_overlap=chunk_overlap,
|
||||
)
|
||||
|
||||
# Single-request path (existing behavior)
|
||||
return self._diarize_and_transcribe_single(
|
||||
audio_path=audio_path,
|
||||
language=language,
|
||||
num_speakers=num_speakers,
|
||||
min_speakers=min_speakers,
|
||||
max_speakers=max_speakers,
|
||||
clustering_threshold=clustering_threshold,
|
||||
min_duration_on=min_duration_on,
|
||||
min_duration_off=min_duration_off,
|
||||
response_format=response_format,
|
||||
include_text=include_text,
|
||||
verbose=verbose,
|
||||
return_raw=return_raw,
|
||||
)
|
||||
|
||||
def _diarize_and_transcribe_single(
|
||||
self,
|
||||
audio_path: str,
|
||||
*,
|
||||
language: Optional[str] = None,
|
||||
num_speakers: Optional[int] = None,
|
||||
min_speakers: Optional[int] = None,
|
||||
max_speakers: Optional[int] = None,
|
||||
clustering_threshold: Optional[float] = None,
|
||||
min_duration_on: Optional[float] = None,
|
||||
min_duration_off: Optional[float] = None,
|
||||
response_format: Optional[str] = None,
|
||||
include_text: Optional[bool] = None,
|
||||
verbose: bool = False,
|
||||
return_raw: bool = False,
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Internal: single-request diarization and transcription.
|
||||
"""
|
||||
if verbose:
|
||||
print("Starting diarization and transcription via LocalAI.")
|
||||
@@ -214,6 +374,153 @@ class LocalAIClient:
|
||||
|
||||
return parsed
|
||||
|
||||
def _diarize_and_transcribe_chunked(
|
||||
self,
|
||||
audio_path: str,
|
||||
*,
|
||||
language: Optional[str] = None,
|
||||
num_speakers: Optional[int] = None,
|
||||
min_speakers: Optional[int] = None,
|
||||
max_speakers: Optional[int] = None,
|
||||
clustering_threshold: Optional[float] = None,
|
||||
min_duration_on: Optional[float] = None,
|
||||
min_duration_off: Optional[float] = None,
|
||||
response_format: Optional[str] = None,
|
||||
include_text: Optional[bool] = None,
|
||||
verbose: bool = False,
|
||||
return_raw: bool = False,
|
||||
chunk_duration: float = DEFAULT_CHUNK_DURATION,
|
||||
chunk_overlap: float = DEFAULT_CHUNK_OVERLAP,
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Internal: chunked diarization and transcription for long audio.
|
||||
|
||||
- Splits audio into overlapping chunks.
|
||||
- Transcribes each chunk via /v1/audio/diarization.
|
||||
- Merges segments with adjusted timestamps.
|
||||
"""
|
||||
if verbose:
|
||||
print("Audio is long; splitting into chunks to avoid GPU memory issues.")
|
||||
|
||||
logger.info(
|
||||
"Chunked transcription: chunk_duration=%s, overlap=%s",
|
||||
chunk_duration,
|
||||
chunk_overlap,
|
||||
)
|
||||
|
||||
chunks = split_audio_into_chunks(
|
||||
input_path=audio_path,
|
||||
max_duration=chunk_duration,
|
||||
overlap=chunk_overlap,
|
||||
)
|
||||
|
||||
if len(chunks) == 1:
|
||||
# No actual split needed; fall back to single-request path
|
||||
return self._diarize_and_transcribe_single(
|
||||
audio_path=chunks[0]["path"],
|
||||
language=language,
|
||||
num_speakers=num_speakers,
|
||||
min_speakers=min_speakers,
|
||||
max_speakers=max_speakers,
|
||||
clustering_threshold=clustering_threshold,
|
||||
min_duration_on=min_duration_on,
|
||||
min_duration_off=min_duration_off,
|
||||
response_format=response_format,
|
||||
include_text=include_text,
|
||||
verbose=verbose,
|
||||
return_raw=return_raw,
|
||||
)
|
||||
|
||||
all_segments: List[List[float]] = []
|
||||
all_speakers: List[str] = []
|
||||
all_transcripts: List[str] = []
|
||||
raw_results: List[Dict[str, Any]] = []
|
||||
temp_files = [c["path"] for c in chunks]
|
||||
|
||||
try:
|
||||
for i, chunk_info in enumerate(chunks):
|
||||
chunk_path = chunk_info["path"]
|
||||
chunk_start = chunk_info["start"]
|
||||
|
||||
if verbose:
|
||||
print(
|
||||
f"Transcribing chunk {i+1}/{len(chunks)} "
|
||||
f"(start={chunk_start:.1f}s)"
|
||||
)
|
||||
|
||||
logger.info(
|
||||
"Transcribing chunk %d/%d, start=%.1f", i + 1, len(chunks), chunk_start
|
||||
)
|
||||
|
||||
# Use single-request logic for each chunk
|
||||
chunk_result = self._diarize_and_transcribe_single(
|
||||
audio_path=chunk_path,
|
||||
language=language,
|
||||
num_speakers=num_speakers,
|
||||
min_speakers=min_speakers,
|
||||
max_speakers=max_speakers,
|
||||
clustering_threshold=clustering_threshold,
|
||||
min_duration_on=min_duration_on,
|
||||
min_duration_off=min_duration_off,
|
||||
response_format=response_format,
|
||||
include_text=include_text,
|
||||
verbose=False,
|
||||
return_raw=return_raw,
|
||||
)
|
||||
|
||||
segs = chunk_result.get("segments", [])
|
||||
spks = chunk_result.get("speakers", [])
|
||||
txts = chunk_result.get("transcripts", [])
|
||||
raw = chunk_result.get("raw_result")
|
||||
|
||||
# Adjust timestamps to global timeline
|
||||
adjusted_segs = []
|
||||
for seg, sp, txt in zip(segs, spks, txts):
|
||||
start = float(seg[0]) + chunk_start
|
||||
end = float(seg[1]) + chunk_start
|
||||
adjusted_segs.append([start, end])
|
||||
all_speakers.append(sp)
|
||||
all_transcripts.append(txt)
|
||||
all_segments.extend(adjusted_segs)
|
||||
|
||||
if return_raw and raw is not None:
|
||||
raw_results.append(raw)
|
||||
|
||||
finally:
|
||||
# Clean up temporary chunk files
|
||||
for path in temp_files:
|
||||
if path and os.path.exists(path) and path != audio_path:
|
||||
try:
|
||||
os.remove(path)
|
||||
except Exception as e:
|
||||
logger.warning("Failed to remove chunk file %s: %s", path, e)
|
||||
|
||||
# Sort segments by start time
|
||||
combined = list(zip(all_segments, all_speakers, all_transcripts))
|
||||
combined.sort(key=lambda x: x[0][0])
|
||||
all_segments = [x[0] for x in combined]
|
||||
all_speakers = [x[1] for x in combined]
|
||||
all_transcripts = [x[2] for x in combined]
|
||||
|
||||
if verbose:
|
||||
print(
|
||||
f"Chunked transcription complete. Total segments: {len(all_segments)}"
|
||||
)
|
||||
|
||||
result = {
|
||||
"segments": all_segments,
|
||||
"speakers": all_speakers,
|
||||
"transcripts": all_transcripts,
|
||||
}
|
||||
|
||||
if return_raw and raw_results:
|
||||
result["raw_result"] = {
|
||||
"chunked": True,
|
||||
"chunks": raw_results,
|
||||
}
|
||||
|
||||
return result
|
||||
|
||||
def _parse_diarization_response(self, result: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""
|
||||
Convert LocalAI verbose_json response into the internal format used by Scraibe:
|
||||
|
||||
@@ -0,0 +1,205 @@
|
||||
"""
|
||||
MCP-style HTTP server for ScrAIbe.
|
||||
|
||||
- Exposes an OpenAPI-compliant endpoint for external LLMs to:
|
||||
- Upload audio
|
||||
- Receive transcript JSON (no summary)
|
||||
- WebUI remains always enabled; this is additive.
|
||||
|
||||
Configuration (env):
|
||||
- MCP_SERVER_ENABLED: "true"/"false" (default: false)
|
||||
- MCP_SERVER_HOST: bind address (default: 0.0.0.0)
|
||||
- MCP_SERVER_PORT: port (default: 8000)
|
||||
- MCP_USE_CELERY: "true"/"false" (default: true)
|
||||
- If true, uses Celery tasks; if false, runs synchronously.
|
||||
"""
|
||||
|
||||
import os
|
||||
import time
|
||||
import uuid
|
||||
import json
|
||||
import logging
|
||||
from typing import Optional
|
||||
|
||||
from fastapi import FastAPI, UploadFile, File, Form, HTTPException
|
||||
from fastapi.responses import JSONResponse
|
||||
|
||||
from .autotranscript import Scraibe
|
||||
|
||||
logger = logging.getLogger("scraibe.mcp_server")
|
||||
|
||||
app = FastAPI(
|
||||
title="ScrAIbe MCP Transcription API",
|
||||
version="0.1.0",
|
||||
description=(
|
||||
"MCP-style HTTP API for ScrAIbe. "
|
||||
"Allows external LLMs to upload audio and receive transcript JSON."
|
||||
),
|
||||
)
|
||||
|
||||
# In-memory job store for MCP (simple; can be replaced with Redis later)
|
||||
_mcp_jobs: dict = {}
|
||||
|
||||
|
||||
def _job_id() -> str:
|
||||
return str(uuid.uuid4())
|
||||
|
||||
|
||||
@app.get("/health")
|
||||
async def health():
|
||||
return {"status": "ok"}
|
||||
|
||||
|
||||
@app.post("/transcribe")
|
||||
async def transcribe(
|
||||
file: UploadFile = File(...),
|
||||
language: Optional[str] = Form(None),
|
||||
num_speakers: Optional[int] = Form(None),
|
||||
):
|
||||
"""
|
||||
Upload audio and start transcription.
|
||||
|
||||
Returns:
|
||||
{
|
||||
"job_id": "<id>",
|
||||
"status": "queued" | "processing" | "completed" | "error",
|
||||
"message": "..."
|
||||
}
|
||||
|
||||
Use GET /transcribe/{job_id}/status and /json to retrieve results.
|
||||
"""
|
||||
use_celery = os.getenv("MCP_USE_CELERY", "true").strip().lower() in ("true", "1", "yes")
|
||||
|
||||
# Save uploaded file temporarily
|
||||
try:
|
||||
import tempfile
|
||||
from pathlib import Path
|
||||
|
||||
upload_dir = Path(os.getenv("SCRAIBE_UPLOAD_DIR", "/tmp/scraibe_uploads"))
|
||||
upload_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
ext = Path(file.filename or "file").suffix or ".wav"
|
||||
ts = time.strftime("%Y%m%d%H%M%S")
|
||||
tmp_name = f"mcp_upload_{ts}_{uuid.uuid4().hex[:8]}{ext}"
|
||||
file_path = upload_dir / tmp_name
|
||||
|
||||
content = await file.read()
|
||||
file_path.write_bytes(content)
|
||||
except Exception as e:
|
||||
logger.error("Error saving MCP upload: %s", e)
|
||||
raise HTTPException(status_code=500, detail=f"Error saving file: {e}")
|
||||
|
||||
job_id = _job_id()
|
||||
|
||||
if use_celery:
|
||||
try:
|
||||
from .tasks import process_mcp_transcribe_task
|
||||
except ImportError:
|
||||
# Fallback: run synchronously
|
||||
use_celery = False
|
||||
|
||||
if use_celery:
|
||||
try:
|
||||
process_mcp_transcribe_task.delay(
|
||||
audio_path=str(file_path),
|
||||
job_id=job_id,
|
||||
language=language or None,
|
||||
num_speakers=int(num_speakers) if num_speakers else None,
|
||||
)
|
||||
except Exception as e:
|
||||
logger.error("Error enqueuing MCP job: %s", e)
|
||||
_mcp_jobs[job_id] = {
|
||||
"status": "error",
|
||||
"message": f"Error enqueuing job: {e}",
|
||||
"file_path": str(file_path),
|
||||
}
|
||||
return {
|
||||
"job_id": job_id,
|
||||
"status": "error",
|
||||
"message": _mcp_jobs[job_id]["message"],
|
||||
}
|
||||
|
||||
_mcp_jobs[job_id] = {
|
||||
"status": "queued",
|
||||
"message": "Job queued for processing.",
|
||||
"file_path": str(file_path),
|
||||
}
|
||||
return {
|
||||
"job_id": job_id,
|
||||
"status": "queued",
|
||||
"message": _mcp_jobs[job_id]["message"],
|
||||
}
|
||||
|
||||
# Synchronous path
|
||||
_mcp_jobs[job_id] = {
|
||||
"status": "processing",
|
||||
"message": "Transcription started (synchronous).",
|
||||
"file_path": str(file_path),
|
||||
}
|
||||
|
||||
def _run_sync():
|
||||
try:
|
||||
scraibe = Scraibe(verbose=False)
|
||||
result = scraibe.transcribe(
|
||||
audio_file=str(file_path),
|
||||
language=language or None,
|
||||
num_speakers=int(num_speakers) if num_speakers else None,
|
||||
verbose=False,
|
||||
for_export=True,
|
||||
)
|
||||
transcript_text = result.get("transcript", "")
|
||||
segments = result.get("segments", [])
|
||||
_mcp_jobs[job_id]["status"] = "completed"
|
||||
_mcp_jobs[job_id]["transcript"] = transcript_text
|
||||
_mcp_jobs[job_id]["segments"] = segments
|
||||
_mcp_jobs[job_id]["message"] = "Transcription completed."
|
||||
except Exception as e:
|
||||
logger.error("MCP sync transcription error: %s", e)
|
||||
_mcp_jobs[job_id]["status"] = "error"
|
||||
_mcp_jobs[job_id]["message"] = f"Transcription error: {e}"
|
||||
|
||||
import threading
|
||||
t = threading.Thread(target=_run_sync, daemon=True)
|
||||
t.start()
|
||||
|
||||
return {
|
||||
"job_id": job_id,
|
||||
"status": "processing",
|
||||
"message": _mcp_jobs[job_id]["message"],
|
||||
}
|
||||
|
||||
|
||||
@app.get("/transcribe/{job_id}/status")
|
||||
async def get_status(job_id: str):
|
||||
job = _mcp_jobs.get(job_id)
|
||||
if not job:
|
||||
raise HTTPException(status_code=404, detail="Job not found")
|
||||
return {
|
||||
"job_id": job_id,
|
||||
"status": job["status"],
|
||||
"message": job.get("message", ""),
|
||||
}
|
||||
|
||||
|
||||
@app.get("/transcribe/{job_id}/json")
|
||||
async def get_json(job_id: str):
|
||||
job = _mcp_jobs.get(job_id)
|
||||
if not job:
|
||||
raise HTTPException(status_code=404, detail="Job not found")
|
||||
|
||||
if job["status"] != "completed":
|
||||
raise HTTPException(
|
||||
status_code=400,
|
||||
detail=f"Job not completed. Current status: {job['status']}",
|
||||
)
|
||||
|
||||
transcript_text = job.get("transcript", "")
|
||||
segments = job.get("segments", [])
|
||||
|
||||
return JSONResponse(
|
||||
content={
|
||||
"job_id": job_id,
|
||||
"transcript": transcript_text,
|
||||
"segments": segments,
|
||||
}
|
||||
)
|
||||
+59
-15
@@ -43,7 +43,7 @@ class SummarizerClient:
|
||||
api_url: Optional[str] = None,
|
||||
api_key: Optional[str] = None,
|
||||
model: Optional[str] = None,
|
||||
timeout: float = 600.0,
|
||||
timeout: float = 3600.0,
|
||||
):
|
||||
self.api_url = (api_url or os.getenv("SUMMARIZER_API_URL")).strip().rstrip("/")
|
||||
self.api_key = api_key or os.getenv("SUMMARIZER_API_KEY") or None
|
||||
@@ -148,8 +148,46 @@ class SummarizerClient:
|
||||
start = break_pos
|
||||
return chunks
|
||||
|
||||
def _summarize_chunk(self, chunk: str, index: int, total: int) -> str:
|
||||
system_prompt = (
|
||||
def _load_summary_prompt(self, role: str) -> str:
|
||||
"""
|
||||
Load summary prompt for the given role: 'chunk' or 'combined'.
|
||||
|
||||
Priority:
|
||||
1) SUMMARY_PROMPT_{ROLE} (env)
|
||||
2) SUMMARY_PROMPT_FILE (env) with [chunk] / [combined] sections
|
||||
3) Built-in default prompt
|
||||
"""
|
||||
role_upper = role.upper()
|
||||
|
||||
# 1) Direct env var: SUMMARY_PROMPT_CHUNK / SUMMARY_PROMPT_COMBINED
|
||||
env_key = f"SUMMARY_PROMPT_{role_upper}"
|
||||
env_prompt = (os.getenv(env_key) or "").strip()
|
||||
if env_prompt:
|
||||
return env_prompt
|
||||
|
||||
# 2) File-based prompt with sections
|
||||
prompt_file = (os.getenv("SUMMARY_PROMPT_FILE") or "").strip()
|
||||
if prompt_file and os.path.exists(prompt_file):
|
||||
try:
|
||||
with open(prompt_file, "r", encoding="utf-8") as f:
|
||||
content = f.read()
|
||||
# Simple section parser: [chunk], [combined]
|
||||
import re
|
||||
pattern = re.compile(
|
||||
r"\[" + role + r"\]\s*\n(.*?)(?=\n\[|$)",
|
||||
re.DOTALL,
|
||||
)
|
||||
m = pattern.search(content)
|
||||
if m:
|
||||
text = m.group(1).strip()
|
||||
if text:
|
||||
return text
|
||||
except Exception as e:
|
||||
logger.warning("Failed to load SUMMARY_PROMPT_FILE for %s: %s", role, e)
|
||||
|
||||
# 3) Default prompts
|
||||
if role == "chunk":
|
||||
return (
|
||||
"You are an expert legal and business meeting summarizer. "
|
||||
"You will receive a segment of a longer transcript. "
|
||||
"Provide a detailed, structured summary of this segment, focusing on: "
|
||||
@@ -158,19 +196,11 @@ class SummarizerClient:
|
||||
"- Decisions and agreements\n"
|
||||
"- Action items and responsibilities\n"
|
||||
"- Any risks, conflicts, or open issues\n\n"
|
||||
"Be concise but complete. Use bullet points when helpful. "
|
||||
"Be concise but complete. Use bullet points where helpful. "
|
||||
"Do not add information that is not present in the transcript."
|
||||
)
|
||||
|
||||
user_prompt = (
|
||||
f"This is segment {index + 1} of {total} from a longer conversation.\n\n"
|
||||
f"{chunk}"
|
||||
)
|
||||
|
||||
return self._chat_completion(system_prompt, user_prompt)
|
||||
|
||||
def _summarize_combined(self, combined_summaries: str) -> str:
|
||||
system_prompt = (
|
||||
else:
|
||||
return (
|
||||
"You are an expert legal and business meeting summarizer. "
|
||||
"You will receive several intermediate summaries of a longer conversation. "
|
||||
"Produce a single, comprehensive summary that makes it clear: "
|
||||
@@ -182,9 +212,23 @@ class SummarizerClient:
|
||||
"- Any unresolved issues or risks\n\n"
|
||||
"The summary should be detailed enough that a reader who was not present "
|
||||
"can understand what happened and what is expected going forward. "
|
||||
"Use clear, concise language and bullet points where appropriate."
|
||||
"Use clear, concise language and bullet points where appropriate. "
|
||||
"Use markdown formatting (headings, lists, bold) to structure the summary."
|
||||
)
|
||||
|
||||
def _summarize_chunk(self, chunk: str, index: int, total: int) -> str:
|
||||
system_prompt = self._load_summary_prompt("chunk")
|
||||
|
||||
user_prompt = (
|
||||
f"This is segment {index + 1} of {total} from a longer conversation.\n\n"
|
||||
f"{chunk}"
|
||||
)
|
||||
|
||||
return self._chat_completion(system_prompt, user_prompt)
|
||||
|
||||
def _summarize_combined(self, combined_summaries: str) -> str:
|
||||
system_prompt = self._load_summary_prompt("combined")
|
||||
|
||||
user_prompt = (
|
||||
"Here are the intermediate summaries from different parts of the same conversation:\n\n"
|
||||
f"{combined_summaries}"
|
||||
|
||||
+275
-24
@@ -70,20 +70,37 @@ def _get_subject(env_var: str, default: str) -> str:
|
||||
def get_queue_position(task_id: str) -> int:
|
||||
"""
|
||||
Estimate the job's position in the queue.
|
||||
Returns:
|
||||
- A positive int if we can estimate (1 = first in line).
|
||||
- 0 if we cannot reliably determine position.
|
||||
"""
|
||||
try:
|
||||
inspect = celery_app.control.inspect()
|
||||
ready = inspect.active() or {}
|
||||
reserved = inspect.reserved() or {}
|
||||
count = 0
|
||||
for _, tasks in list(ready.values()) + list(reserved.values()):
|
||||
reserved = inspect.reserved() or {} # queued but not yet running
|
||||
active = inspect.active() or {} # currently running
|
||||
|
||||
# Count tasks ahead of this one in the reserved (waiting) queue
|
||||
ahead = 0
|
||||
found = False
|
||||
for _, tasks in list(reserved.values()):
|
||||
for t in tasks:
|
||||
if t.get("id") == task_id:
|
||||
tid = t.get("id")
|
||||
if tid == task_id:
|
||||
found = True
|
||||
break
|
||||
count += 1
|
||||
return max(count + 1, 1)
|
||||
ahead += 1
|
||||
if found:
|
||||
break
|
||||
|
||||
# If not found in reserved, it may already be active or not yet visible.
|
||||
# In that case, treat it as position 1.
|
||||
if found:
|
||||
return max(ahead + 1, 1)
|
||||
else:
|
||||
return 1
|
||||
except Exception:
|
||||
return -1
|
||||
# If inspection fails, don't guess; caller should use a safe message.
|
||||
return 0
|
||||
|
||||
|
||||
def send_initial_email(to: str, queue_pos: int):
|
||||
@@ -103,8 +120,12 @@ def send_initial_email(to: str, queue_pos: int):
|
||||
|
||||
if queue_pos > 0:
|
||||
body += f"Your request is currently number {queue_pos} in the queue.\n"
|
||||
queue_position_display = (
|
||||
f'<span style="color:{_accent_color()}; font-weight:bold;">{queue_pos}</span>'
|
||||
)
|
||||
else:
|
||||
body += "Your request has been queued for processing.\n"
|
||||
queue_position_display = "the queue"
|
||||
|
||||
body += (
|
||||
"\n"
|
||||
@@ -119,7 +140,7 @@ def send_initial_email(to: str, queue_pos: int):
|
||||
try:
|
||||
html = load_template(
|
||||
"upload_notification_template.html",
|
||||
queue_position=str(max(queue_pos, 1)),
|
||||
queue_position_text=queue_position_display,
|
||||
)
|
||||
except EmailError as e:
|
||||
logger.warning("Failed to render upload notification template: %s", e)
|
||||
@@ -141,6 +162,7 @@ def send_success_email(
|
||||
"""
|
||||
Send final email with transcript and attachments.
|
||||
Subject is customizable via EMAIL_SUBJECT_SUCCESS.
|
||||
Falls back to a safe default if the env var is missing or blank.
|
||||
"""
|
||||
subject = _get_subject(
|
||||
"EMAIL_SUBJECT_SUCCESS",
|
||||
@@ -183,7 +205,7 @@ def send_success_email(
|
||||
html=html,
|
||||
attachments=attachments,
|
||||
)
|
||||
logger.info("Success email sent to %s for job %s", to, task_id)
|
||||
logger.info("Success email sent to %s for job %s with subject: %s", to, task_id, subject)
|
||||
except EmailError as e:
|
||||
logger.error("Failed to send success email to %s for job %s: %s", to, task_id, e)
|
||||
|
||||
@@ -229,6 +251,8 @@ def send_error_email(to: str, error_message: str, task_id: str):
|
||||
name="scraibe.tasks.process_transcription_task",
|
||||
bind=True,
|
||||
max_retries=1,
|
||||
task_time_limit=14400, # 4 hours
|
||||
task_soft_time_limit=13500, # warn at 3h45m
|
||||
)
|
||||
def process_transcription_task(
|
||||
self,
|
||||
@@ -306,14 +330,18 @@ def process_transcription_task(
|
||||
|
||||
prompt = (
|
||||
"Below is a transcript with speaker labels like 'SPEAKER 1', 'SPEAKER 2', etc. "
|
||||
"Based on how they speak and the context, suggest realistic names for each speaker. "
|
||||
"Do not add extra commentary. Output ONLY a mapping in this exact format, one per line:
|
||||
SPEAKER 1: Suggested Name
|
||||
SPEAKER 2: Suggested Name
|
||||
SPEAKER 3: Suggested Name
|
||||
|
||||
Transcript:
|
||||
" + transcript_text
|
||||
"Based on the context and how each speaker talks, identify each speaker as:\n"
|
||||
"- Their real name, if it is clearly mentioned or strongly implied, OR\n"
|
||||
"- A concise role/position (e.g., Judge, Doctor, Manager, Interviewer, Client, Witness), "
|
||||
"if their identity is not clear.\n"
|
||||
"Do not invent random personal names. "
|
||||
"Do not add extra commentary. Output ONLY a mapping in this exact format, one per line:\n"
|
||||
"SPEAKER 1: Name or Role\n"
|
||||
"SPEAKER 2: Name or Role\n"
|
||||
"SPEAKER 3: Name or Role\n"
|
||||
"\n"
|
||||
"Transcript:\n"
|
||||
+ transcript_text
|
||||
)
|
||||
|
||||
response = summarizer._chat_completion(
|
||||
@@ -331,7 +359,7 @@ Transcript:
|
||||
re.IGNORECASE,
|
||||
):
|
||||
spk = f"SPEAKER {m.group(1).strip()}"
|
||||
name = m.group(2).strip().rstrip(".")
|
||||
name = m.group(2).strip().rstrip(".").upper()
|
||||
if name:
|
||||
speaker_map[spk] = name
|
||||
|
||||
@@ -389,9 +417,12 @@ Transcript:
|
||||
f.write(transcript_text)
|
||||
temp_files.append(md_transcript_path)
|
||||
|
||||
# Transcript .docx
|
||||
# Transcript .docx (standalone, no cover page)
|
||||
docx_transcript_path = _safe_filename("TRANSCRIPT", local, date_tag, ".docx")
|
||||
create_transcript_docx(transcript_text, docx_transcript_path)
|
||||
create_transcript_docx(
|
||||
transcript_text,
|
||||
docx_transcript_path,
|
||||
)
|
||||
temp_files.append(docx_transcript_path)
|
||||
|
||||
# JSON as SOURCE
|
||||
@@ -415,26 +446,39 @@ Transcript:
|
||||
temp_files.append(json_path)
|
||||
|
||||
# Summary files (if present)
|
||||
md_summary_path = None
|
||||
docx_summary_path = None
|
||||
|
||||
if summary_text:
|
||||
# Summary .md
|
||||
md_summary_path = _safe_filename("SUMMARY", local, date_tag, ".md")
|
||||
with open(md_summary_path, "w", encoding="utf-8") as f:
|
||||
f.write("# Summary\n\n")
|
||||
f.write(summary_text)
|
||||
temp_files.append(md_summary_path)
|
||||
|
||||
# Summary .docx (standalone, no cover page)
|
||||
docx_summary_path = _safe_filename("SUMMARY", local, date_tag, ".docx")
|
||||
create_summary_docx(summary_text, docx_summary_path)
|
||||
create_summary_docx(
|
||||
summary_text,
|
||||
docx_summary_path,
|
||||
)
|
||||
temp_files.append(docx_summary_path)
|
||||
|
||||
# 5) Build attachments list
|
||||
|
||||
# Always: JSON, transcript MD, transcript DOCX
|
||||
attachments = [
|
||||
md_transcript_path,
|
||||
docx_transcript_path,
|
||||
json_path,
|
||||
]
|
||||
|
||||
# If summary is present, add summary MD and DOCX
|
||||
if summary_text:
|
||||
attachments += [md_summary_path, docx_summary_path]
|
||||
|
||||
# 5) Send success email
|
||||
# 6) Send success email
|
||||
send_success_email(
|
||||
to=email_to,
|
||||
transcript_text=transcript_text,
|
||||
@@ -454,9 +498,216 @@ Transcript:
|
||||
)
|
||||
raise e
|
||||
finally:
|
||||
# 6) Cleanup
|
||||
# 7) Cleanup
|
||||
for path in temp_files:
|
||||
_remove_file(path)
|
||||
if audio_path:
|
||||
_remove_file(audio_path)
|
||||
logger.info("Cleanup completed for job %s.", task_id)
|
||||
|
||||
|
||||
@celery_app.task(
|
||||
name="scraibe.tasks.process_mcp_transcribe_task",
|
||||
bind=True,
|
||||
max_retries=1,
|
||||
task_time_limit=14400,
|
||||
task_soft_time_limit=13500,
|
||||
)
|
||||
def process_mcp_transcribe_task(
|
||||
self,
|
||||
audio_path: str,
|
||||
job_id: str,
|
||||
language: str,
|
||||
num_speakers: int,
|
||||
):
|
||||
"""
|
||||
Async task used by MCP-style API:
|
||||
- Transcribe audio
|
||||
- Store transcript + segments in shared MCP job store
|
||||
- Clean up temporary file
|
||||
"""
|
||||
from .mcp_server import _mcp_jobs
|
||||
|
||||
log_level = os.getenv("LOG_LEVEL", "INFO")
|
||||
setup_logging(level=log_level)
|
||||
|
||||
# Initialize status
|
||||
_mcp_jobs.setdefault(
|
||||
job_id,
|
||||
{
|
||||
"status": "processing",
|
||||
"message": "Transcription started (async).",
|
||||
"file_path": audio_path,
|
||||
},
|
||||
)
|
||||
|
||||
try:
|
||||
scraibe = Scraibe(verbose=True)
|
||||
result = scraibe.transcribe(
|
||||
audio_file=audio_path,
|
||||
language=language or None,
|
||||
num_speakers=int(num_speakers) if num_speakers else None,
|
||||
verbose=True,
|
||||
for_export=True,
|
||||
)
|
||||
|
||||
transcript_text = result.get("transcript", "")
|
||||
segments = result.get("segments", [])
|
||||
|
||||
_mcp_jobs[job_id]["status"] = "completed"
|
||||
_mcp_jobs[job_id]["transcript"] = transcript_text
|
||||
_mcp_jobs[job_id]["segments"] = segments
|
||||
_mcp_jobs[job_id]["message"] = "Transcription completed."
|
||||
|
||||
logger.info("MCP job %s completed.", job_id)
|
||||
|
||||
except Exception as e:
|
||||
logger.error("MCP job %s failed: %s", job_id, e, exc_info=True)
|
||||
_mcp_jobs[job_id]["status"] = "error"
|
||||
_mcp_jobs[job_id]["message"] = f"Transcription error: {e}"
|
||||
|
||||
finally:
|
||||
_remove_file(audio_path)
|
||||
logger.info("MCP job %s cleanup completed.", job_id)
|
||||
|
||||
|
||||
@celery_app.task(
|
||||
name="scraibe.tasks.process_watch_file_task",
|
||||
bind=True,
|
||||
max_retries=1,
|
||||
task_time_limit=14400,
|
||||
task_soft_time_limit=13500,
|
||||
)
|
||||
def process_watch_file_task(
|
||||
self,
|
||||
file_path: str,
|
||||
):
|
||||
"""
|
||||
Async task for watch-folder mode:
|
||||
- Transcribe + summarize
|
||||
- Email results
|
||||
- Optionally delete source file
|
||||
"""
|
||||
task_id = self.request.id
|
||||
|
||||
log_level = os.getenv("LOG_LEVEL", "INFO")
|
||||
setup_logging(level=log_level)
|
||||
|
||||
email_to = os.getenv("WATCH_EMAIL_TO") or os.getenv("EMAIL_DEFAULT_TO")
|
||||
if not email_to:
|
||||
logger.error("No email address configured for watch-folder mode.")
|
||||
raise RuntimeError("WATCH_EMAIL_TO or EMAIL_DEFAULT_TO not set.")
|
||||
|
||||
delete_on_success = os.getenv("WATCH_DELETE_ON_SUCCESS", "true").strip().lower() in ("true", "1", "yes")
|
||||
|
||||
temp_files = []
|
||||
local = "watch"
|
||||
date_tag = _date_tag()
|
||||
|
||||
try:
|
||||
scraibe = Scraibe(verbose=True)
|
||||
|
||||
result = scraibe.transcript_and_summarize(
|
||||
audio_file=file_path,
|
||||
language=None,
|
||||
num_speakers=None,
|
||||
verbose=True,
|
||||
for_export=True,
|
||||
)
|
||||
|
||||
transcript_text = result.get("transcript", "")
|
||||
summary_text = result.get("summary", "")
|
||||
segments = result.get("segments", [])
|
||||
raw_result = result.get("raw_result")
|
||||
|
||||
# Transcript .md
|
||||
md_transcript_path = _safe_filename("TRANSCRIPT", local, date_tag, ".md")
|
||||
with open(md_transcript_path, "w", encoding="utf-8") as f:
|
||||
f.write("# Transcript\n\n")
|
||||
f.write(transcript_text)
|
||||
temp_files.append(md_transcript_path)
|
||||
|
||||
# Transcript .docx
|
||||
docx_transcript_path = _safe_filename("TRANSCRIPT", local, date_tag, ".docx")
|
||||
create_transcript_docx(
|
||||
transcript_text,
|
||||
docx_transcript_path,
|
||||
)
|
||||
temp_files.append(docx_transcript_path)
|
||||
|
||||
# Summary .md
|
||||
md_summary_path = _safe_filename("SUMMARY", local, date_tag, ".md")
|
||||
with open(md_summary_path, "w", encoding="utf-8") as f:
|
||||
f.write("# Summary\n\n")
|
||||
f.write(summary_text)
|
||||
temp_files.append(md_summary_path)
|
||||
|
||||
# Summary .docx
|
||||
docx_summary_path = _safe_filename("SUMMARY", local, date_tag, ".docx")
|
||||
create_summary_docx(
|
||||
summary_text,
|
||||
docx_summary_path,
|
||||
)
|
||||
temp_files.append(docx_summary_path)
|
||||
|
||||
# JSON as SOURCE
|
||||
json_data = {
|
||||
"task": "watch_transcript_and_summarize",
|
||||
"transcript": transcript_text,
|
||||
"summary": summary_text,
|
||||
"segments": segments,
|
||||
"metadata": {
|
||||
"timestamp": datetime.utcnow().isoformat(),
|
||||
"job_id": task_id,
|
||||
"source_file": file_path,
|
||||
},
|
||||
}
|
||||
if raw_result is not None:
|
||||
json_data["raw_result"] = raw_result
|
||||
|
||||
json_path = _safe_filename("SOURCE", local, date_tag, ".json")
|
||||
with open(json_path, "w", encoding="utf-8") as f:
|
||||
json.dump(json_data, f, indent=2, ensure_ascii=False)
|
||||
temp_files.append(json_path)
|
||||
|
||||
# Attachments
|
||||
attachments = [
|
||||
md_transcript_path,
|
||||
docx_transcript_path,
|
||||
md_summary_path,
|
||||
docx_summary_path,
|
||||
json_path,
|
||||
]
|
||||
|
||||
# Send email
|
||||
send_success_email(
|
||||
to=email_to,
|
||||
transcript_text=transcript_text,
|
||||
summary_text=summary_text,
|
||||
attachments=attachments,
|
||||
task_id=task_id,
|
||||
)
|
||||
|
||||
logger.info("Watch-folder job %s completed for %s.", task_id, file_path)
|
||||
|
||||
# Delete source file if configured
|
||||
if delete_on_success and os.path.exists(file_path):
|
||||
try:
|
||||
os.remove(file_path)
|
||||
logger.info("Deleted source file: %s", file_path)
|
||||
except Exception as e:
|
||||
logger.warning("Failed to delete source file %s: %s", file_path, e)
|
||||
|
||||
except Exception as e:
|
||||
logger.error("Error processing watch file %s: %s", file_path, e, exc_info=True)
|
||||
send_error_email(
|
||||
to=email_to,
|
||||
error_message=str(e),
|
||||
task_id=task_id,
|
||||
)
|
||||
raise e
|
||||
finally:
|
||||
# Cleanup temp files
|
||||
for path in temp_files:
|
||||
_remove_file(path)
|
||||
logger.info("Watch-folder job %s cleanup completed.", task_id)
|
||||
|
||||
@@ -0,0 +1,100 @@
|
||||
"""
|
||||
Watch-folder mode for ScrAIbe.
|
||||
|
||||
Monitors a folder for audio files. For each file:
|
||||
- Transcribes + summarizes
|
||||
- Emails results
|
||||
- Deletes source file
|
||||
|
||||
Configuration (env):
|
||||
- WATCH_ENABLED: "true"/"false" (default: false)
|
||||
- WATCH_DIR: directory to watch (required if enabled)
|
||||
- WATCH_EMAIL_TO: destination email (required if enabled)
|
||||
- WATCH_POLL_INTERVAL: seconds between scans (default: 10)
|
||||
- WATCH_DELETE_ON_SUCCESS: "true"/"false" (default: true)
|
||||
"""
|
||||
|
||||
import os
|
||||
import time
|
||||
import logging
|
||||
import threading
|
||||
from pathlib import Path
|
||||
|
||||
logger = logging.getLogger("scraibe.watcher")
|
||||
|
||||
AUDIO_EXTENSIONS = {
|
||||
".wav",
|
||||
".mp3",
|
||||
".flac",
|
||||
".m4a",
|
||||
".ogg",
|
||||
".webm",
|
||||
".mp4",
|
||||
}
|
||||
|
||||
|
||||
def _is_audio(path: Path) -> bool:
|
||||
return path.is_file() and path.suffix.lower() in AUDIO_EXTENSIONS
|
||||
|
||||
|
||||
def _enqueue_file(file_path: Path):
|
||||
"""
|
||||
Enqueue a file for transcription + summarization via Celery.
|
||||
"""
|
||||
from .tasks import process_watch_file_task
|
||||
|
||||
try:
|
||||
process_watch_file_task.delay(str(file_path))
|
||||
except Exception as e:
|
||||
logger.error("Failed to enqueue watch file %s: %s", file_path, e)
|
||||
|
||||
|
||||
def _scan_directory(watch_dir: Path):
|
||||
"""
|
||||
Scan directory and enqueue all audio files.
|
||||
"""
|
||||
if not watch_dir.is_dir():
|
||||
logger.warning("WATCH_DIR does not exist or is not a directory: %s", watch_dir)
|
||||
return
|
||||
|
||||
for p in watch_dir.iterdir():
|
||||
if _is_audio(p):
|
||||
logger.info("Found audio file in WATCH_DIR: %s", p)
|
||||
_enqueue_file(p)
|
||||
|
||||
|
||||
def start_watcher():
|
||||
"""
|
||||
Start watch-folder loop in a background thread.
|
||||
"""
|
||||
enabled = os.getenv("WATCH_ENABLED", "false").strip().lower() in ("true", "1", "yes")
|
||||
if not enabled:
|
||||
return
|
||||
|
||||
watch_dir = os.getenv("WATCH_DIR")
|
||||
if not watch_dir:
|
||||
logger.warning("WATCH_ENABLED is true but WATCH_DIR is not set. Watcher disabled.")
|
||||
return
|
||||
|
||||
email_to = os.getenv("WATCH_EMAIL_TO")
|
||||
if not email_to:
|
||||
logger.warning("WATCH_ENABLED is true but WATCH_EMAIL_TO is not set. Watcher disabled.")
|
||||
return
|
||||
|
||||
interval = float(os.getenv("WATCH_POLL_INTERVAL", "10"))
|
||||
|
||||
watch_path = Path(watch_dir).expanduser().resolve()
|
||||
watch_path.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
logger.info("Starting watch-folder: dir=%s, email=%s, interval=%s", watch_dir, email_to, interval)
|
||||
|
||||
def _loop():
|
||||
while True:
|
||||
try:
|
||||
_scan_directory(watch_path)
|
||||
except Exception as e:
|
||||
logger.error("Error scanning WATCH_DIR: %s", e)
|
||||
time.sleep(interval)
|
||||
|
||||
t = threading.Thread(target=_loop, daemon=True)
|
||||
t.start()
|
||||
+19
-20
@@ -134,7 +134,7 @@ def create_app():
|
||||
if header_html:
|
||||
gr.HTML(header_html)
|
||||
|
||||
with gr.Row(elem_id="main-row"):
|
||||
# Single-column layout: inputs followed by status/output
|
||||
with gr.Column():
|
||||
audio_input = gr.Audio(
|
||||
label="Upload or record audio",
|
||||
@@ -144,7 +144,7 @@ def create_app():
|
||||
task_choice = gr.Radio(
|
||||
choices=[
|
||||
("Transcribe", "transcribe"),
|
||||
("Transcript & Summarize", "transcript_and_summarize"),
|
||||
("Transcribe & summarize", "transcript_and_summarize"),
|
||||
],
|
||||
value="transcribe",
|
||||
label="Task",
|
||||
@@ -153,20 +153,10 @@ def create_app():
|
||||
|
||||
identify_speakers = gr.Checkbox(
|
||||
label="Identify speakers (best effort using AI)",
|
||||
value=False,
|
||||
value=True,
|
||||
info="If enabled, AI will attempt to infer real names for speakers and replace Speaker 1/2/etc. in the transcript.",
|
||||
)
|
||||
|
||||
with gr.Row():
|
||||
language_input = gr.Textbox(
|
||||
label="Language (optional)",
|
||||
placeholder="e.g., english, german",
|
||||
)
|
||||
num_speakers_input = gr.Number(
|
||||
label="Number of speakers (optional)",
|
||||
precision=0,
|
||||
)
|
||||
|
||||
email_to = gr.Textbox(
|
||||
label="Your email address (required)",
|
||||
placeholder="e.g. your.name@example.com",
|
||||
@@ -179,7 +169,6 @@ def create_app():
|
||||
|
||||
submit_btn = gr.Button("Submit for transcription", variant="primary")
|
||||
|
||||
with gr.Column():
|
||||
status_text = gr.Textbox(
|
||||
label="Status",
|
||||
lines=6,
|
||||
@@ -205,8 +194,6 @@ def create_app():
|
||||
def on_submit(
|
||||
audio,
|
||||
task,
|
||||
language,
|
||||
num_speakers,
|
||||
email_to_val,
|
||||
email_cc_val,
|
||||
identify_speakers_val,
|
||||
@@ -242,8 +229,8 @@ def create_app():
|
||||
task_result = process_transcription_task.delay(
|
||||
audio_path=dest_path,
|
||||
task_type=task,
|
||||
language=language or None,
|
||||
num_speakers=int(num_speakers) if num_speakers else None,
|
||||
language=None,
|
||||
num_speakers=None,
|
||||
email_to=email_to_val,
|
||||
email_cc=email_cc_val or None,
|
||||
include_summary=(task == "transcript_and_summarize"),
|
||||
@@ -266,8 +253,6 @@ def create_app():
|
||||
inputs=[
|
||||
audio_input,
|
||||
task_choice,
|
||||
language_input,
|
||||
num_speakers_input,
|
||||
email_to,
|
||||
email_cc,
|
||||
identify_speakers,
|
||||
@@ -307,6 +292,20 @@ def create_app():
|
||||
body {{
|
||||
font-family: Arial, sans-serif;
|
||||
}}
|
||||
/* Increase main title font size */
|
||||
h1,
|
||||
.webui-title,
|
||||
.header-title {{
|
||||
font-size: 60px !important;
|
||||
}}
|
||||
/* Hide Gradio's "Use via API" link/button */
|
||||
#share-btn,
|
||||
a[href*="/api"],
|
||||
a[href*="#/api"],
|
||||
a[href*="#api"],
|
||||
.gradio-container a[href*="api"] {{
|
||||
display: none !important;
|
||||
}}
|
||||
/* Mobile-friendly adjustments */
|
||||
@media (max-width: 700px) {{
|
||||
.gradio-container {{
|
||||
|
||||
@@ -0,0 +1,86 @@
|
||||
import os
|
||||
import subprocess
|
||||
import tempfile
|
||||
import pytest
|
||||
|
||||
from scraibe.audio import (
|
||||
get_audio_duration,
|
||||
split_audio_into_chunks,
|
||||
)
|
||||
|
||||
TEST_AUDIO_1 = "tests/audio_test_1.mp4"
|
||||
TEST_AUDIO_2 = "tests/audio_test_2.mp4"
|
||||
|
||||
|
||||
@pytest.fixture(params=[TEST_AUDIO_1, TEST_AUDIO_2])
|
||||
def test_audio_path(request):
|
||||
return request.param
|
||||
|
||||
|
||||
def test_get_audio_duration(test_audio_path):
|
||||
dur = get_audio_duration(test_audio_path)
|
||||
assert isinstance(dur, float)
|
||||
assert dur > 0
|
||||
|
||||
|
||||
def test_split_audio_into_chunks_no_split_short(test_audio_path):
|
||||
# For short files, should return the same file with no extra chunks
|
||||
chunks = split_audio_into_chunks(
|
||||
input_path=test_audio_path,
|
||||
max_duration=600.0, # larger than both test files
|
||||
overlap=2.0,
|
||||
)
|
||||
assert len(chunks) == 1
|
||||
assert chunks[0]["path"] == test_audio_path
|
||||
assert chunks[0]["start"] == 0.0
|
||||
dur = get_audio_duration(test_audio_path)
|
||||
assert abs(chunks[0]["end"] - dur) < 0.05
|
||||
|
||||
|
||||
def test_split_audio_into_chunks_creates_chunks(tmp_path):
|
||||
# Use a small chunk duration to force splitting
|
||||
chunks = split_audio_into_chunks(
|
||||
input_path=TEST_AUDIO_1,
|
||||
max_duration=2.0,
|
||||
overlap=0.5,
|
||||
)
|
||||
assert len(chunks) > 1
|
||||
|
||||
# Check that each chunk file exists and is non-empty
|
||||
for c in chunks:
|
||||
assert os.path.exists(c["path"])
|
||||
assert os.path.getsize(c["path"]) > 0
|
||||
|
||||
# Check time ordering and overlap
|
||||
for i in range(1, len(chunks)):
|
||||
prev = chunks[i - 1]
|
||||
curr = chunks[i]
|
||||
assert curr["start"] >= prev["start"]
|
||||
assert curr["start"] < prev["end"] # overlap
|
||||
|
||||
# Cleanup
|
||||
for c in chunks:
|
||||
if os.path.exists(c["path"]):
|
||||
os.remove(c["path"])
|
||||
|
||||
|
||||
def test_split_audio_into_chunks_total_coverage(test_audio_path):
|
||||
dur = get_audio_duration(test_audio_path)
|
||||
|
||||
# Use small chunks to ensure coverage
|
||||
chunks = split_audio_into_chunks(
|
||||
input_path=test_audio_path,
|
||||
max_duration=2.0,
|
||||
overlap=0.5,
|
||||
)
|
||||
|
||||
# First chunk starts at 0
|
||||
assert chunks[0]["start"] == 0.0
|
||||
|
||||
# Last chunk end should cover the duration
|
||||
assert chunks[-1]["end"] >= dur - 0.05
|
||||
|
||||
# Cleanup
|
||||
for c in chunks:
|
||||
if os.path.exists(c["path"]):
|
||||
os.remove(c["path"])
|
||||
@@ -0,0 +1,96 @@
|
||||
"""
|
||||
Local test for transcript/summary/combined .docx generation.
|
||||
Checks:
|
||||
- Line numbering only on transcript pages.
|
||||
- Page numbering (X of Y) in footer.
|
||||
- Cover pages present and centered.
|
||||
- Combined document structure.
|
||||
"""
|
||||
|
||||
import sys
|
||||
import os
|
||||
import tempfile
|
||||
|
||||
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||
|
||||
from scraibe.email_sender import (
|
||||
create_transcript_docx,
|
||||
create_summary_docx,
|
||||
create_combined_docx,
|
||||
)
|
||||
|
||||
TRANSCRIPT_TEXT = """[00:00] Speaker 1: Good morning, everyone. Thank you for joining today's meeting.
|
||||
[00:12] Speaker 2: Good morning. I'm looking forward to discussing the new requirements.
|
||||
[00:25] Speaker 1: Let's start with the timeline. We need to finalize the scope by Friday.
|
||||
[00:38] Speaker 2: Agreed. I'll send a summary of the key points after this call.
|
||||
[00:45] Speaker 1: Perfect. If there are no other items, we can wrap up here."""
|
||||
|
||||
SUMMARY_TEXT = """# Meeting Overview
|
||||
## Key Discussion Points
|
||||
### Timeline and Scope
|
||||
#### Next Steps"""
|
||||
|
||||
COVER_DATE = "June 14, 2026"
|
||||
TRANSCRIPT_DESC = "Transcript of a project planning meeting discussing timelines and scope."
|
||||
SUMMARY_DESC = "Summary of a project planning meeting covering key decisions and next steps."
|
||||
|
||||
|
||||
def main():
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
print("Using temp directory:", tmpdir)
|
||||
|
||||
# 1) Transcript-only
|
||||
transcript_path = os.path.join(tmpdir, "TRANSCRIPT_TEST.docx")
|
||||
print("Creating transcript-only docx:", transcript_path)
|
||||
create_transcript_docx(
|
||||
text=TRANSCRIPT_TEXT,
|
||||
filename=transcript_path,
|
||||
include_cover=True,
|
||||
cover_date=COVER_DATE,
|
||||
cover_desc=TRANSCRIPT_DESC,
|
||||
)
|
||||
print("OK: transcript-only created.")
|
||||
|
||||
# 2) Summary-only
|
||||
summary_path = os.path.join(tmpdir, "SUMMARY_TEST.docx")
|
||||
print("Creating summary-only docx:", summary_path)
|
||||
create_summary_docx(
|
||||
text=SUMMARY_TEXT,
|
||||
filename=summary_path,
|
||||
include_cover=True,
|
||||
cover_date=COVER_DATE,
|
||||
cover_desc=SUMMARY_DESC,
|
||||
)
|
||||
print("OK: summary-only created.")
|
||||
|
||||
# 3) Combined
|
||||
combined_path = os.path.join(tmpdir, "COMBINED_TEST.docx")
|
||||
print("Creating combined docx:", combined_path)
|
||||
create_combined_docx(
|
||||
transcript_text=TRANSCRIPT_TEXT,
|
||||
summary_text=SUMMARY_TEXT,
|
||||
filename=combined_path,
|
||||
transcript_cover_date=COVER_DATE,
|
||||
transcript_cover_desc=TRANSCRIPT_DESC,
|
||||
summary_cover_date=COVER_DATE,
|
||||
summary_cover_desc=SUMMARY_DESC,
|
||||
)
|
||||
print("OK: combined created.")
|
||||
|
||||
# Basic size sanity checks
|
||||
for path in [transcript_path, summary_path, combined_path]:
|
||||
size = os.path.getsize(path)
|
||||
print(f"File: {os.path.basename(path)} - size: {size} bytes")
|
||||
if size < 10000:
|
||||
print("WARNING: File seems unusually small:", path)
|
||||
|
||||
print("\nAll .docx files generated successfully.")
|
||||
print("Please open them in Word to verify:")
|
||||
print("- Only transcript pages have line numbers.")
|
||||
print("- Footer shows 'X of Y' on all pages.")
|
||||
print("- Cover pages are centered and use the correct date format.")
|
||||
print("- Combined doc order: cover, page break, summary, page break, transcript.")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,230 @@
|
||||
import os
|
||||
import json
|
||||
import tempfile
|
||||
from unittest.mock import patch, MagicMock
|
||||
|
||||
import pytest
|
||||
|
||||
from scraibe.localai_client import LocalAIClient, LocalAIError
|
||||
from scraibe.audio import get_audio_duration, split_audio_into_chunks
|
||||
|
||||
|
||||
TEST_AUDIO_1 = "tests/audio_test_1.mp4"
|
||||
|
||||
|
||||
def make_fake_segments(start=0.0, count=3):
|
||||
segments = []
|
||||
for i in range(count):
|
||||
s = start + i * 2.0
|
||||
e = s + 2.0
|
||||
segments.append({
|
||||
"start": s,
|
||||
"end": e,
|
||||
"speaker": "SPEAKER_00",
|
||||
"text": f"Segment text {i}",
|
||||
})
|
||||
return segments
|
||||
|
||||
|
||||
def fake_localai_response(segments):
|
||||
return {
|
||||
"segments": segments,
|
||||
"text": " ".join(seg["text"] for seg in segments),
|
||||
}
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def client():
|
||||
with patch.object(LocalAIClient, "__init__", lambda self, **kw: None):
|
||||
c = LocalAIClient()
|
||||
c.api_url = "http://localhost:8080"
|
||||
c.model = "vibevoice-diarize"
|
||||
c.api_key = None
|
||||
c._client = MagicMock()
|
||||
return c
|
||||
|
||||
|
||||
def test_parse_diarization_response(client):
|
||||
segs = make_fake_segments()
|
||||
raw = fake_localai_response(segs)
|
||||
|
||||
out = client._parse_diarization_response(raw)
|
||||
|
||||
assert "segments" in out
|
||||
assert "speakers" in out
|
||||
assert "transcripts" in out
|
||||
assert len(out["segments"]) == len(segs)
|
||||
for i, s in enumerate(segs):
|
||||
assert out["segments"][i][0] == s["start"]
|
||||
assert out["segments"][i][1] == s["end"]
|
||||
assert out["speakers"][i] == s["speaker"]
|
||||
assert out["transcripts"][i] == s["text"]
|
||||
|
||||
|
||||
def test_parse_diarization_empty(client):
|
||||
out = client._parse_diarization_response({"segments": []})
|
||||
assert out["segments"] == []
|
||||
assert out["speakers"] == []
|
||||
assert out["transcripts"] == []
|
||||
|
||||
|
||||
def test_diarize_and_transcribe_single_happy(client):
|
||||
with patch.object(client, "_client") as mock_client:
|
||||
mock_resp = MagicMock()
|
||||
mock_resp.status_code = 200
|
||||
mock_resp.json.return_value = fake_localai_response(make_fake_segments())
|
||||
mock_client.post.return_value = mock_resp
|
||||
|
||||
result = client.diarize_and_transcribe(
|
||||
audio_path=TEST_AUDIO_1,
|
||||
verbose=False,
|
||||
return_raw=True,
|
||||
)
|
||||
|
||||
assert "segments" in result
|
||||
assert "raw_result" in result
|
||||
assert len(result["segments"]) > 0
|
||||
|
||||
|
||||
def test_chunking_triggered_for_long_audio(client):
|
||||
# Simulate long audio by patching get_audio_duration
|
||||
with patch("scraibe.localai_client.get_audio_duration") as mock_dur, \
|
||||
patch.object(client, "_diarize_and_transcribe_chunked") as mock_chunked:
|
||||
|
||||
mock_dur.return_value = 600.0 # 10 minutes
|
||||
mock_chunked.return_value = {
|
||||
"segments": [],
|
||||
"speakers": [],
|
||||
"transcripts": [],
|
||||
}
|
||||
|
||||
client.diarize_and_transcribe(
|
||||
audio_path=TEST_AUDIO_1,
|
||||
verbose=False,
|
||||
use_chunking=None,
|
||||
max_single_request_duration=300.0,
|
||||
)
|
||||
|
||||
mock_chunked.assert_called_once()
|
||||
|
||||
|
||||
def test_chunking_not_triggered_for_short_audio(client):
|
||||
with patch("scraibe.localai_client.get_audio_duration") as mock_dur, \
|
||||
patch.object(client, "_diarize_and_transcribe_chunked") as mock_chunked, \
|
||||
patch.object(client, "_diarize_and_transcribe_single") as mock_single:
|
||||
|
||||
mock_dur.return_value = 120.0
|
||||
mock_single.return_value = {
|
||||
"segments": [],
|
||||
"speakers": [],
|
||||
"transcripts": [],
|
||||
}
|
||||
|
||||
client.diarize_and_transcribe(
|
||||
audio_path=TEST_AUDIO_1,
|
||||
verbose=False,
|
||||
use_chunking=None,
|
||||
max_single_request_duration=300.0,
|
||||
)
|
||||
|
||||
mock_chunked.assert_not_called()
|
||||
mock_single.assert_called_once()
|
||||
|
||||
|
||||
def test_chunked_transcription_adjusts_timestamps(client):
|
||||
# Mock split_audio_into_chunks to return two chunks
|
||||
chunk1_path = TEST_AUDIO_1
|
||||
chunk2_path = TEST_AUDIO_1 # reusing same file; in real usage different
|
||||
|
||||
chunks = [
|
||||
{"path": chunk1_path, "start": 0.0, "end": 10.0},
|
||||
{"path": chunk2_path, "start": 10.0, "end": 20.0},
|
||||
]
|
||||
|
||||
with patch("scraibe.localai_client.split_audio_into_chunks") as mock_split, \
|
||||
patch.object(client, "_diarize_and_transcribe_single") as mock_single, \
|
||||
patch("os.remove"):
|
||||
|
||||
mock_split.return_value = chunks
|
||||
|
||||
# First chunk: segments 0–4
|
||||
# Second chunk: segments 0–4 (local times)
|
||||
def side_effect(audio_path, **kw):
|
||||
if audio_path == chunk1_path:
|
||||
segs = make_fake_segments(start=0.0, count=2)
|
||||
else:
|
||||
segs = make_fake_segments(start=0.0, count=2)
|
||||
return client._parse_diarization_response(fake_localai_response(segs))
|
||||
|
||||
mock_single.side_effect = side_effect
|
||||
|
||||
result = client._diarize_and_transcribe_chunked(
|
||||
audio_path=TEST_AUDIO_1,
|
||||
verbose=False,
|
||||
return_raw=False,
|
||||
chunk_duration=10.0,
|
||||
chunk_overlap=2.0,
|
||||
)
|
||||
|
||||
# Check we got 4 segments total
|
||||
assert len(result["segments"]) == 4
|
||||
|
||||
# First two segments should be in [0, 4]
|
||||
assert result["segments"][0][0] == 0.0
|
||||
assert result["segments"][1][0] == 2.0
|
||||
|
||||
# Next two segments should be shifted by 10
|
||||
assert result["segments"][2][0] == 10.0
|
||||
assert result["segments"][3][0] == 12.0
|
||||
|
||||
|
||||
@pytest.mark.integration
|
||||
def test_integration_chunked_transcription_with_localai():
|
||||
"""
|
||||
Integration test: run chunked transcription against a live LocalAI instance.
|
||||
Only runs if LOCALAI_API_URL is set and an audio file is provided.
|
||||
This test is skipped by default unless run with:
|
||||
pytest -m integration
|
||||
"""
|
||||
api_url = os.getenv("LOCALAI_API_URL")
|
||||
if not api_url:
|
||||
pytest.skip("LOCALAI_API_URL not set; skipping integration test")
|
||||
|
||||
# Use one of the bundled test audio files
|
||||
audio_path = TEST_AUDIO_1
|
||||
if not os.path.exists(audio_path):
|
||||
pytest.skip(f"Test audio not found: {audio_path}")
|
||||
|
||||
# Force chunking with a very small max_single_request_duration
|
||||
# Use environment-configured model or a sensible default
|
||||
model = os.getenv("LOCALAI_MODEL") or "vibevoice-cpp-asr"
|
||||
|
||||
client = LocalAIClient(api_url=api_url, model=model)
|
||||
try:
|
||||
result = client.diarize_and_transcribe(
|
||||
audio_path=audio_path,
|
||||
verbose=True,
|
||||
return_raw=True,
|
||||
use_chunking=True,
|
||||
chunk_duration=3.0,
|
||||
chunk_overlap=0.5,
|
||||
max_single_request_duration=1.0,
|
||||
)
|
||||
|
||||
assert "segments" in result
|
||||
assert len(result["segments"]) > 0
|
||||
|
||||
# Basic sanity: segments are time-ordered
|
||||
for i in range(1, len(result["segments"])):
|
||||
prev_end = result["segments"][i - 1][1]
|
||||
curr_start = result["segments"][i][0]
|
||||
assert curr_start >= result["segments"][i - 1][0]
|
||||
|
||||
# If raw_result indicates chunked, ensure structure is sensible
|
||||
raw = result.get("raw_result")
|
||||
if raw and raw.get("chunked"):
|
||||
assert "chunks" in raw
|
||||
assert len(raw["chunks"]) > 1
|
||||
|
||||
finally:
|
||||
client.close()
|
||||
Reference in New Issue
Block a user