diff --git a/README.md b/README.md index 476e9bb..b89799d 100644 --- a/README.md +++ b/README.md @@ -1,173 +1,171 @@ -# `ScrAIbe: Streamlined Conversation Recording with Automated Intelligence Based Environment` ๐ŸŽ™๏ธ๐Ÿง  +# ScrAIbe โ€“ LocalAI-Backed Transcription and Summarization -Welcome to `ScrAIbe`, a state-of-the-art, [PyTorch](https://pytorch.org/) based multilingual speech-to-text framework designed to generate fully automated transcriptions. +ScrAIbe is a lightweight transcription and summarization client that: -Beyond transcription, ScrAIbe supports advanced functions such as speaker diarization and speaker recognition. ๐Ÿš€ +- Sends audio to a LocalAI server running vibevoice.cpp for transcription and speaker diarization. +- Optionally uses a second LLM to generate a detailed, structured summary of the conversation. -Designed as a comprehensive AI toolkit, it uses multiple powerful AI models: +No local speech models or heavy dependencies are required. ScrAIbe is designed to be run as a thin client in front of your own AI services. -- **[Whisper](https://github.com/openai/whisper)**: A general-purpose speech recognition model. -- **[WhisperX](https://github.com/m-bain/whisperX)**: A faster, quantized version of Whisper for enhanced performance on CPU. โšก -- **[Pyannote-Audio](https://github.com/pyannote/pyannote-audio)**: An open-source toolkit for speaker diarization. ๐Ÿ—ฃ๏ธ +## Features -The framework utilizes a PyanNet-inspired pipeline, with the `Pyannote` library for speaker diarization and `VoxCeleb` for speaker embedding. +- Transcription with speaker diarization via LocalAI: + - Uses the `/v1/audio/diarization` endpoint. + - Compatible with vibevoice.cpp and other diarization-capable backends. +- Optional AI-powered summarization: + - Task: `transcript_and_summarize` + - Chunks long transcripts, summarizes each chunk, then generates a final comprehensive summary. + - Summary highlights: + - Main topics and discussion points + - Key decisions and outcomes + - Action items and responsibilities + - Open issues and risks +- CLI and Python API: + - Simple command-line interface. + - Drop-in `Scraibe` class for integration into other tools. +- Docker-ready: + - Lightweight container, configured via environment variables. -During post-diarization, each audio segment is processed by the OpenAI `Whisper` model in a transformer encoder-decoder structure. Initially, a CNN mitigates noise and enhances speech. Before transcription, `VoxLingua` identifies the language segment, facilitating Whisper's role in both transcription and text translation. ๐ŸŒโœจ +## Architecture -The following graphic illustrates the whole pipeline: +- LocalAI (vibevoice.cpp): + - Handles audio โ†’ transcript + speaker segments. +- Summarizer LLM (OpenAI-compatible chat endpoint): + - Handles transcript โ†’ structured summary. +- ScrAIbe: + - Orchestrates: + - File upload to LocalAI + - Transcript assembly + - Chunked summarization + - Output formatting (e.g., .md with transcript + summary) -
- - -
+## Quick Start (CLI) -## Getting Started ๐Ÿš€ +Basic usage: -### Prerequisites +- Transcribe: -Before installing ScrAIbe, ensure you have the following prerequisites: + - python3 -m scraibe.cli -f "audio.wav" -o "./output" -of txt -- **Python**: Version 3.9 or later. -- **PyTorch**: Version 2.0 or later. -- **CUDA**: A compatible version with your PyTorch Version if you want to use GPU acceleration. +- Transcribe and summarize: -**Note:** PyTorch should be automatically installed with the pip installer. However, if you encounter any issues, you should consider installing it manually by following the instructions on the [PyTorch website](https://pytorch.org/get-started/locally/). + - python3 -m scraibe.cli -f "audio.wav" -o "./output" --task transcript_and_summarize -### Install ScrAIbe +Environment variables must be set to point to your LocalAI and summarizer LLM. -Install ScrAIbe on your local machine with ease using PyPI. +## Python API -```bash -pip install scraibe -``` +Example: transcribe only -If you want to install the development version, you can do so by installing it from GitHub: +- from scraibe import Scraibe -```bash -pip install git+https://github.com/JSchmie/ScrAIbe.git@develop -``` + - client = Scraibe() + - text = client.transcribe("audio.wav") + - print(text) -or from PyPI using our latest pre-release: +Example: transcribe and summarize -```bash -pip install --pre scraibe -``` +- from scraibe import Scraibe -Get started with ScrAIbe today and experience seamless, automated transcription and diarization. + - client = Scraibe() + - result = client.transcript_and_summarize("audio.wav") + - transcript = result["transcript"] + - summary = result["summary"] -## Usage +You can override endpoints and models via environment variables or constructor parameters if needed. -We've developed ScrAIbe with several access points to cater to diverse user needs. +## Command-Line Options -### Python Usage +Run: -Gain full control over the functionalities as well as process customization. +- python3 -m scraibe.cli -h -```python -from scraibe import Scraibe +Key options: -model = Scraibe() +- -f / --audio-files: + - One or more audio files to process. +- --task: + - transcribe (default) + - transcript_and_summarize +- -o / --output-directory: + - Output folder for generated files. +- -of / --output-format: + - txt, json, md, html + - For transcript_and_summarize, output is always saved as .md with: + - # Transcript + - # Summary -text = model.autotranscribe("audio.wav") +Other options (e.g., --language, --num-speakers) are accepted and forwarded where applicable; many legacy Whisper/Pyannote flags are kept for compatibility but ignored. -print(f"Transcription: \n{text}") -``` +## Docker Usage -The `Scraibe` class ensures the models are properly loaded. You can customize the models with various keywords: +ScrAIbe is designed to run in Docker as a client to your LocalAI and summarizer LLM. -- **Whisper Models**: Use the `whisper_model` keyword to specify models like `tiny`, `base`, `small`, `medium`, or `large` (`large-v2`, `large-v3`) depending on your accuracy and speed needs. -- **Pyannote Diarization Model**: Use the `dia_model` keyword to change the diarization model. -- **WhisperX**: Set the `whisper_type` to `"whisperX"` for enhanced performance on CPU and use their enhanced models. (Model names are the same) -- **Keyword Arguments**: A variety of different `kwargs` are available: - - `use_auth_token`: Pass a Hugging Face token to the Pyannote backend if you want to use one of the models hosted on their Hugging Face. - - `verbose`: Enable this to add an additional level of verbosity. - - In general, you should be able to input any `kwargs` that you can input in the original Whisper (WhisperX) and Pyannote Python APIs. +### Basic run (transcribe) -As input, `autotranscribe` accepts every format compatible with [FFmpeg](https://ffmpeg.org/ffmpeg-formats.html). Examples include `.mp4`, `.mp3`, `.wav`, `.ogg`, `.flac`, and many more. +- docker run -it \ + -e LOCALAI_API_URL=http://localai:8080 \ + -v /path/to/audio:/audio \ + scraibe:latest \ + -f /audio/meeting.wav -o /audio/output -of txt -To further control the pipeline of `ScrAIbe`, you can pass almost any keyword argument that is accepted by `Whisper` or `Pyannote`. For more options, refer to the documentation of these frameworks, as their keywords are likely to work here as well. +### Basic run (transcribe + summarize) -Here are some examples regarding `diarization` (which relies on the `pyannote` pipeline): +- docker run -it \ + -e LOCALAI_API_URL=http://localai:8080 \ + -e SUMMARIZER_API_URL=http://llm:8080 \ + -v /path/to/audio:/audio \ + scraibe:latest \ + -f /audio/meeting.wav -o /audio/output --task transcript_and_summarize -- `num_speakers`: Number of speakers in the audio file -- `min_speakers`: Minimum number of speakers in the audio file -- `max_speakers`: Maximum number of speakers in the audio file +### Docker Environment Variables -Then there are arguments for the transcription process, which uses the "Whisper" model: +The following environment variables configure ScrAIbe in Docker. -- `language`: Specify the language ([list of supported languages](https://github.com/openai/whisper/blob/main/language-breakdown.svg)) -- `task`: Can be either `transcribe` or `translate`. If `translate` is selected, the transcribed audio will be translated to English. +Transcription / Diarization (LocalAI): -For example: +- LOCALAI_API_URL: + - Required. + - Base URL of the LocalAI server. + - Example: http://localai:8080 +- LOCALAI_API_KEY: + - Optional. + - API key for LocalAI, if configured. +- LOCALAI_MODEL: + - Optional (default: vibevoice-diarize). + - Model name used for transcription/diarization. -```python -text = model.autotranscribe("audio.wav", language="german", num_speakers = 2) -``` +Summarization LLM: -`Scraibe` also contains the option to just do a transcription: +- SUMMARIZER_API_URL: + - Required when using --task transcript_and_summarize. + - Base URL of the summarization LLM (OpenAI-compatible /v1/chat/completions). + - Example: http://llm:8080 +- SUMMARIZER_API_KEY: + - Optional. + - API key for the summarization LLM, if required. +- SUMMARIZER_MODEL: + - Optional (default: llama-3.1-8b-instruct). + - Model name used for summarization. -```python -transcription = model.transcribe("audio.wav") -``` +All of these can also be overridden from the CLI when needed (e.g., --localai-api-url, --summarizer-api-url). -or just do a diarization: +## Dependencies -```python -diarization = model.diarization("audio.wav") -``` +Core runtime dependencies: -Start exploring the powerful features of ScrAIbe and customize it to fit your specific transcription and diarization needs! +- Python 3.9+ +- httpx +- numpy +- tqdm +- ffmpeg (for audio preprocessing) -### Command-line usage +No local Whisper, PyTorch, or Pyannote models are required. -Next to the Pyhton interface, you can also run ScrAIbe using the command-line interface: +## Contributing -```bash -scraibe -f "audio.wav" --language "german" --num_speakers 2 -``` +Contributions are welcome. Please refer to CONTRIBUTING.md for guidelines. -For the full list of options, run: +## License -```bash -scraibe -h -``` - -This will display a comprehensive list of all command-line options, allowing you to tailor ScrAIbeโ€™s functionality to your specific needs. - -## Gradio App ๐ŸŒ - -The Gradio App is now part of ScrAIbe-WebUI! This user-friendly interface enables you to run the model without any coding knowledge. You can easily run the app in your browser and upload your audio files, or make the framework available on your network and run it on your local machine. ๐Ÿš€ - -All functionalities previously available in the Gradio App are now part of the ScrAIbe-WebUI. For more information and detailed instructions, visit the [ScrAIbe-WebUI GitHub repository](https://github.com/JSchmie/ScrAIbe-WebUI). - -## Docker Container ๐Ÿณ - -ScrAIbe's Docker containers have also moved to ScrAIbe-WebUI! This option is especially useful if you want to run the model on a server or if you would like to use the GPU without dealing with CUDA. - -All Docker container functionalities are now part of ScrAIbe-WebUI. For more information and detailed instructions on how to use the Docker containers, please visit the [ScrAIbe-WebUI GitHub repository](https://github.com/JSchmie/ScrAIbe-WebUI). - ---- - -With these changes, ScrAIbe focuses on its core functionalities while the enhanced Gradio App and related Docker containers are now part of ScrAIbe-WebUI. Enjoy a more streamlined and powerful transcription experience! ๐ŸŽ‰ - -## Documentation ๐Ÿ“š - -For comprehensive guides, detailed instructions, and advanced usage tips, visit our [documentation page](https://jschmie.github.io/ScrAIbe/). Here, you will find everything you need to make the most out of ScrAIbe. - -### Contributions ๐Ÿค - -We warmly welcome contributions from the community! Whether youโ€™re fixing bugs, adding new features, or improving documentation, your help is invaluable. Please see our [Contributing Guidelines](./CONTRIBUTING.md) for more information on how to get involved and make your mark on ScrAIbe-WebUI. - - -### License ๐Ÿ“œ - -ScrAIbe-WebUI is proudly open source and licensed under the GPL-3.0 license. This promotes a collaborative and transparent development process. For more details, see the [LICENSE](./LICENSE) file in this repository. - -## Acknowledgments - -Special thanks go to the [KIDA](https://www.kida-bmel.de/) project and the [BMEL (Bundesministerium fรผr Ernรคhrung und Landwirtschaft)](https://www.bmel.de/EN/Home/home_node.html), especially to the AI Consultancy Team. - ---- - -Join us in making ScrAIbe even better! ๐Ÿš€ \ No newline at end of file +This project is licensed under GPL-3.0. See LICENSE for details.