Merge branch 'develop' into pyproject.toml

This commit is contained in:
Jacob Schmieder
2024-05-31 14:13:57 +02:00
committed by GitHub
11 changed files with 191 additions and 143 deletions
+34
View File
@@ -0,0 +1,34 @@
# Changelog
All notable changes to this project will be documented in this file.
## [0.2.0] - 2024-05-28
### Added
- **Python Usage Section**: Detailed instructions on how to use ScrAIbe with Python, including examples for Whisper models, WhisperX, and keyword arguments.
- **Command-line Usage Section**: Enhanced instructions for using ScrAIbe via the command-line interface, including examples and key options.
- **Documentation Section**: Expanded the documentation section with highlights on installation guides, usage examples, API reference, troubleshooting tips, and advanced configuration.
- **Getting Started Section**: Added detailed prerequisites and installation instructions for both stable and development versions of ScrAIbe.
- **WhisperX Support**: Added support for the WhisperX backend.
### Changed
- **Model Customization**: Clarified the use of various keywords to customize Whisper models, Pyannote diarization models, and WhisperX.
- **Example Enhancements**: Improved examples to illustrate the usage of different features and options in ScrAIbe.
- **Formatting and Clarity**: Improved formatting and clarity across all sections to enhance readability and user experience.
- **Backend Robustness**: Enhanced the backend to be more robust, removing the need for a HuggingFace token for basic usage.
- **CLI**: to Work without Gradio
### Removed
- **Docker Build**: Removed Docker build support.
- **Gradio App**: Removed the Gradio App integration.
Both the Docker Build and the Gradio App are now Available under [ScrAIbe-WebUI](https://github.com/JSchmie/ScrAIbe-WebUI)
### Documentation
- **Documentation Page Link**: Updated the documentation section with a direct link to the [ScrAIbe documentation page](https://jschmie.github.io/ScrAIbe/).
**Note**: This changelog might be incomplete, but we promise to improve it in the future. Thank you for your understanding and support.
+59
View File
@@ -0,0 +1,59 @@
# Contributing to ScrAIbe
Thank you for your interest in contributing to ScrAIbe! We appreciate your efforts to improve the project. Before making any changes, please discuss them with the project maintainers via an issue, email, or any other method.
Please note that we have a code of conduct, and we ask you to adhere to it in all your interactions with the project.
## Pull Request Process
1. **Dependency Management**: Ensure any install or build dependencies are removed before the end of the layer when doing a build.
2. **Documentation Updates**: Update the `README.md` with details of changes to the interface, including new environment variables, exposed ports, useful file locations, and container parameters.
3. **Versioning**: Increase the version numbers in any example files and the `README.md` to the new version that this Pull Request would represent. We use the [SemVer](http://semver.org/) versioning scheme.
4. **Review and Merge**: You may merge the Pull Request once you have the sign-off of two other developers. If you do not have permission to merge, request a second reviewer to merge it for you.
## Code of Conduct
### Our Pledge
In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to make participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation.
### Our Standards
Examples of behavior that contributes to creating a positive environment include:
* Using welcoming and inclusive language
* Being respectful of differing viewpoints and experiences
* Gracefully accepting constructive criticism
* Focusing on what is best for the community
* Showing empathy towards other community members
Examples of unacceptable behavior by participants include:
* The use of sexualized language or imagery and unwelcome sexual attention or advances
* Trolling, insulting/derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or electronic address, without explicit permission
* Other conduct which could reasonably be considered inappropriate in a professional setting
### Our Responsibilities
Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior.
Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful.
### Scope
This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples include using an official project email address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers.
### Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at [INSERT EMAIL ADDRESS]. All complaints will be reviewed and investigated and will result in a response that is deemed necessary and appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately.
Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership.
### Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, available at [http://contributor-covenant.org/version/1/4][version].
[homepage]: http://contributor-covenant.org
[version]: http://contributor-covenant.org/version/1/4/
BIN
View File
Binary file not shown.

Before

Width:  |  Height:  |  Size: 7.2 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 17 KiB

BIN
View File
Binary file not shown.

Before

Width:  |  Height:  |  Size: 14 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 15 KiB

BIN
View File
Binary file not shown.

Before

Width:  |  Height:  |  Size: 8.7 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 131 KiB

BIN
View File
Binary file not shown.

Before

Width:  |  Height:  |  Size: 16 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 17 KiB

+96 -141
View File
@@ -1,218 +1,173 @@
# `ScrAIbe: Streamlined Conversation Recording with Automated Intelligence Based Environment` 🎙️🧠
# `ScrAIbe: Streamlined Conversation Recording with Automated Intelligence Based Environment` Welcome to `ScrAIbe`, a state-of-the-art, [PyTorch](https://pytorch.org/) based multilingual speech-to-text framework designed to generate fully automated transcriptions.
`ScrAIbe` is a state-of-the-art, [PyTorch](https://pytorch.org/) based multilingual speech-to-text framework to generate fully automated transcriptions. Beyond transcription, ScrAIbe supports advanced functions such as speaker diarization and speaker recognition. 🚀
Beyond transcription, ScrAIbe supports advanced functions, such as speaker diarization and speaker recognition. Designed as a comprehensive AI toolkit, it uses multiple powerful AI models:
Designed as a comprehensive AI toolkit, it uses multiple AI models: - **[Whisper](https://github.com/openai/whisper)**: A general-purpose speech recognition model.
- **[WhisperX](https://github.com/m-bain/whisperX)**: A faster, quantized version of Whisper for enhanced performance on CPU. ⚡
- [whisper](https://github.com/openai/whisper): A general-purpose speech recognition model. - **[Pyannote-Audio](https://github.com/pyannote/pyannote-audio)**: An open-source toolkit for speaker diarization. 🗣️
- [payannote-audio](https://github.com/pyannote/pyannote-audio): An open-source toolkit for speaker diarization.
The framework utilizes a PyanNet-inspired pipeline, with the `Pyannote` library for speaker diarization and `VoxCeleb` for speaker embedding. The framework utilizes a PyanNet-inspired pipeline, with the `Pyannote` library for speaker diarization and `VoxCeleb` for speaker embedding.
During post-diarization, each audio segment is processed by the OpenAI `Whisper` model, in a transformer encoder-decoder structure. Initially, a CNN mitigates noise and enhances speech. Before transcription, `VoxLingua` identifies the language segment, facilitating Whisper's role in both transcription and text translation. During post-diarization, each audio segment is processed by the OpenAI `Whisper` model in a transformer encoder-decoder structure. Initially, a CNN mitigates noise and enhances speech. Before transcription, `VoxLingua` identifies the language segment, facilitating Whisper's role in both transcription and text translation. 🌍✨
The following graphic illustrates the whole pipeline: The following graphic illustrates the whole pipeline:
![Pipeline](./Pictures/pipeline.png#gh-dark-mode-only) <div style="text-align:center;">
![Pipeline](./Pictures/pipeline_light.png#gh-light-mode-only) <img src="./Pictures/pipeline.png#gh-dark-mode-only" style="width: 60%;" />
<img src="./Pictures/pipeline_light.png#gh-light-mode-only" style="width: 60%;" />
</div>
## Install `ScrAIbe` : ## Getting Started 🚀
The following command will pull and install the latest commit from this repository, along with its Python dependencies. ### Prerequisites
pip install scraibe Before installing ScrAIbe, ensure you have the following prerequisites:
- **Python version**: Python 3.8 - **Python**: Version 3.9 or later.
- **PyTorch version**: Python 1.11.0 - **PyTorch**: Version 2.0 or later.
- **CUDA version**: Cuda-toolkit 11.3.1 - **CUDA**: A compatible version with your PyTorch Version if you want to use GPU acceleration.
- **OS**: Linux
In order to run `scraibe` properly, it is recommended to install `pytoch` using: **Note:** PyTorch should be automatically installed with the pip installer. However, if you encounter any issues, you should consider installing it manually by following the instructions on the [PyTorch website](https://pytorch.org/get-started/locally/).
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0+cu113 --extra-index-url https://download.pytorch.org/whl/cu113 ### Install ScrAIbe
This ensures that the right torchaudio version is installed. Install ScrAIbe on your local machine with ease using PyPI.
We recommend using the CPU Version of Pytorch for a smooth ScrAIbe installation across both Windows and MacOS platforms. Should you face any issues, please contact us. ```bash
pip install scraibe
```
pip install torch==1.11.0+cpu torchvision==0.12.0+cpu torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cpu If you want to install the development version, you can do so by installing it from GitHub:
Important: For the `Pyannote` model, you need to be granted access to Hugging Face. ```bash
Check the [Pyannote model page](https://huggingface.co/pyannote/speaker-diarization) to get access to the model. pip install git+https://github.com/JSchmie/ScrAIbe.git@develop
```
Additionally, you need to generate a [Hugging Face token](https://huggingface.co/docs/hub/security-tokens). or from PyPI using our latest pre-release:
```bash
pip install --pre scraibe
```
Get started with ScrAIbe today and experience seamless, automated transcription and diarization.
## Usage ## Usage
We've developed ScrAIbe with several access points to cater to diverse user needs. We've developed ScrAIbe with several access points to cater to diverse user needs.
### Python usage ### Python Usage
It enables full control over the functionalities as well as process customization. Gain full control over the functionalities as well as process customization.
```python ```python
from scraibe import Scraibe from scraibe import Scraibe
model = Scraibe(use_auth_token = "hf_yourhftoken") model = Scraibe()
text = model.autotranscribe("audio.wav") text = model.autotranscribe("audio.wav")
print(f"Transcription: \n{text}") print(f"Transcription: \n{text}")
``` ```
The `Scraibe` Class is taking care of the models being properly loaded. Therefore, you can choose the other [whisper](https://github.com/openai/whisper/blob/main/model-card.md) models using the `whisper_model` keyword.
You can also change the `pyannote` diarization model using the `dia_model` keyword.
The `Scraibe` class ensures the models are properly loaded. You can customize the models with various keywords:
As input, `autoranscribe` accepts every format which is compatible with [ffmgeg](https://ffmpeg.org/ffmpeg-formats.html). Examples therefore are `.mp4 .mp3 .wav .ogg .flac` and many more. - **Whisper Models**: Use the `whisper_model` keyword to specify models like `tiny`, `base`, `small`, `medium`, or `large` (`large-v2`, `large-v3`) depending on your accuracy and speed needs.
- **Pyannote Diarization Model**: Use the `dia_model` keyword to change the diarization model.
- **WhisperX**: Set the `whisper_type` to `"whisperX"` for enhanced performance on CPU and use their enhanced models. (Model names are the same)
- **Keyword Arguments**: A variety of different `kwargs` are available:
- `use_auth_token`: Pass a Hugging Face token to the Pyannote backend if you want to use one of the models hosted on their Hugging Face.
- `verbose`: Enable this to add an additional level of verbosity.
To further control the pipeline of `ScrAIbe` you can parse almost any keyword you also cloud parsed towards `whisper` or `pyannote` if you need more option, try to check out the documentations tows two Frameworks, you might have a good chance that these keywords will work here as well. In general, you should be able to input any `kwargs` that you can input in the original Whisper (WhisperX) and Pyannote Python APIs.
Here's are some examples regarding the `diarization` (which relies on the `pyannote` pipeline):
- `num_speakers` Number of speakers in the audio file As input, `autotranscribe` accepts every format compatible with [FFmpeg](https://ffmpeg.org/ffmpeg-formats.html). Examples include `.mp4`, `.mp3`, `.wav`, `.ogg`, `.flac`, and many more.
- `min_speakers` Minimal Number of speakers in the audio file
- `max_speakers` maximal Number of speakers in the audio file
Then there are arguments about the transcription process, which uses the "whisper" model. To further control the pipeline of `ScrAIbe`, you can pass almost any keyword argument that is accepted by `Whisper` or `Pyannote`. For more options, refer to the documentation of these frameworks, as their keywords are likely to work here as well.
- `language` Specify the language ([list to supported languages](https://github.com/openai/whisper/blob/main/language-breakdown.svg)) Here are some examples regarding `diarization` (which relies on the `pyannote` pipeline):
- `task` can be just `transcribe` or `translate`. If `translate` is selected, the transcribed audio will be translated to English.
- `num_speakers`: Number of speakers in the audio file
- `min_speakers`: Minimum number of speakers in the audio file
- `max_speakers`: Maximum number of speakers in the audio file
Then there are arguments for the transcription process, which uses the "Whisper" model:
- `language`: Specify the language ([list of supported languages](https://github.com/openai/whisper/blob/main/language-breakdown.svg))
- `task`: Can be either `transcribe` or `translate`. If `translate` is selected, the transcribed audio will be translated to English.
For example: For example:
``` ```python
text = model.autotranscribe("audio.wav", language="german", num_speakers = 2) text = model.autotranscribe("audio.wav", language="german", num_speakers = 2)
``` ```
`Scraibe` also contains the option to just do a transcription `Scraibe` also contains the option to just do a transcription:
```python ```python
transcription = model.transcribe("audio.wav") transcription = model.transcribe("audio.wav")
``` ```
or just do a diarization: or just do a diarization:
```python ```python
diarization = model.diarize("audio.wav") diarization = model.diarization("audio.wav")
``` ```
Start exploring the powerful features of ScrAIbe and customize it to fit your specific transcription and diarization needs!
### Command-line usage ### Command-line usage
Next to the Pyhton interface, you can also run ScrAIbe using the command-line interface: Next to the Pyhton interface, you can also run ScrAIbe using the command-line interface:
scraibe -f "audio.wav" --hf-token "hf_yourhftoken" --language "german" --num_speakers 2 ```bash
scraibe -f "audio.wav" --language "german" --num_speakers 2
```
For the full list of options, run: For the full list of options, run:
scraibe -h ```bash
scraibe -h
The HuggingFace token will be saved after its initial run and can be found at `path/to/scraibe/.pyannotetoken`. It does not need to be called each time you execute `scraibe`.
### Gradio App
The Gradio App is a user-friendly interface for ScrAIbe. It enables you to run the model without any coding knowledge. Therefore, you can run the app in your browser and upload your audio file, or you can make the Framework avail on your network and run it on your local machine.
#### Running the Gradio App on your local machine
To run the Gradio App on your local machine, just use the following command:
```
scraibe --start-server --port 7860 --hf-token hf_yourhftoken
``` ```
- `--start-server`: Command to start the Gradio App. This will display a comprehensive list of all command-line options, allowing you to tailor ScrAIbes functionality to your specific needs.
- `--port`: Flag for connecting the container internal port to the port on your local machine.
- `--hf-token`: Flag for entering your personal HuggingFace token in the container.
When the app is running, it will show you at which address you can access it. ## Gradio App 🌐
The default address is: http://127.0.0.1:7860 or http://0.0.0.0:7860
After the app is running, you can upload your audio file and select the desired options. The Gradio App is now part of ScrAIbe-WebUI! This user-friendly interface enables you to run the model without any coding knowledge. You can easily run the app in your browser and upload your audio files, or make the framework available on your network and run it on your local machine. 🚀
An example is shown below:
![Gradio App](./Pictures//gradio_app.png) All functionalities previously available in the Gradio App are now part of the ScrAIbe-WebUI. For more information and detailed instructions, visit the [ScrAIbe-WebUI GitHub repository](https://github.com/JSchmie/ScrAIbe-WebUI).
## Docker Container 🐳
ScrAIbe's Docker containers have also moved to ScrAIbe-WebUI! This option is especially useful if you want to run the model on a server or if you would like to use the GPU without dealing with CUDA.
All Docker container functionalities are now part of ScrAIbe-WebUI. For more information and detailed instructions on how to use the Docker containers, please visit the [ScrAIbe-WebUI GitHub repository](https://github.com/JSchmie/ScrAIbe-WebUI).
---
With these changes, ScrAIbe focuses on its core functionalities while the enhanced Gradio App and related Docker containers are now part of ScrAIbe-WebUI. Enjoy a more streamlined and powerful transcription experience! 🎉
## Documentation 📚
For comprehensive guides, detailed instructions, and advanced usage tips, visit our [documentation page](https://jschmie.github.io/ScrAIbe/). Here, you will find everything you need to make the most out of ScrAIbe.
### Contributions 🤝
We warmly welcome contributions from the community! Whether youre fixing bugs, adding new features, or improving documentation, your help is invaluable. Please see our [Contributing Guidelines](./CONTRIBUTING.md) for more information on how to get involved and make your mark on ScrAIbe-WebUI.
### Running a Docker container ### License 📜
Another option to run ScrAIbe is to use a Docker container. This option is especially useful if you want to run the model on a server or if you would like to use the GPU without dealing with CUDA. ScrAIbe-WebUI is proudly open source and licensed under the GPL-3.0 license. This promotes a collaborative and transparent development process. For more details, see the [LICENSE](./LICENSE) file in this repository.
To get our Container, you can pull it from Docker Hub:
```
docker pull hadr0n/scraibe:tagname
```
We provide different tags for different versions of ScrAIbe, including different whisper models.
The current tags are:
| Tagname | Description |
| --- | --- |
|`0.1.1.dev-large`|Uses ScrAibe Verison 0.1.1 and the whisper model `large-v2`|
|`0.1.1.dev-medium`|Uses ScrAibe Verison 0.1.1 and the whisper model `medium`|
|`0.1.1.dev-small`|Uses ScrAibe Verison 0.1.1 and the whisper model `small`|
|`0.1.1.dev-base`|Uses ScrAibe Verison 0.1.1 and the whisper model `base`|
|`0.1.1.dev-tiny`|Uses ScrAibe Verison 0.1.1 and the whisper model `tiny`|
By running the container, you get access to the CLI and the Gradio App.
Here an example command for running the container with the Gradio App:
```
docker run -p 7860:7860 --name [container name] hadr0n/scraibe:tagname --start-server --server-name 0.0.0.0
```
- `-p`: Flag for connecting the container internal port to the port on your local machine.
- `--server-name 0.0.0.0` is used to make the Gradio App available on your network.
#### Enabling GPU usage
To use the GPU, ensure your Docker installation supports GPU usage.
For further information, check: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker
To enable GPU usage, you need to add the following flag to the `docker run` command:
```
docker run -it -p 7860:7860 --gpus all --name [container name]hadr0n/scraibe:tagname --start-server --server-name 0.0.0.0
```
For further guidance, check: https://blog.roboflow.com/use-the-gpu-in-docker/
## Documentation
For further insights, check the [documentation page](https://jschmie.github.io/ScrAIbe/).
## Contributions
We are happy to have any interest in contributing and about feedback: In order to do that, create an issue with your feedback or feel free to contact us.
## Roadmap
The following milestones are planned for further releases of ScrAIbe:
- Model quantization
Quantization to empower memory and computational efficiency.
- Model fine-tuning
In order to be able to cover a variety of linguistic phenomena.
For example, currently ScrAIbe is able to transcribe word by word, but ignores filler words or speech pauses.
These phenomena can be addressed by fine-tuning with the corresponding data.
- Implementation of LLMs
One example is the implementation of a summarization or extraction model, which enables ScrAIbe to automatically summarize or retrieve the key information out of a generated transcription, which could be the minutes of a meeting.
- Executable for Windows
## Contact
For queries contact [Jacob Schmieder](Jacob.Schmieder@dbfz.de)
## License
ScrAIbe is licensed under [GNU General Public License](LICENSE).
## Acknowledgments ## Acknowledgments
Special thanks go to the KIDA project and the BMEL (Bundesministerium für Ernährung und Landwirtschaft), especially to the AI Consultancy Team. Special thanks go to the [KIDA](https://www.kida-bmel.de/) project and the [BMEL (Bundesministerium für Ernährung und Landwirtschaft)](https://www.bmel.de/EN/Home/home_node.html), especially to the AI Consultancy Team.
![KIDA](./Pictures/kida_dark.png#gh-dark-mode-only) &nbsp; ![BMEL](./Pictures/BMEL_dark.png#gh-dark-mode-only) &nbsp;&nbsp;&nbsp;&nbsp; ![DBFZ](./Pictures/DBFZ_dark.png#gh-dark-mode-only) &nbsp; &nbsp;&nbsp;&nbsp; ![MRI](./Pictures/MRI.png#gh-dark-mode-only) ---
![KIDA](./Pictures/kida.png#gh-light-mode-only) &nbsp; ![BMEL](./Pictures/BMEL.jpg#gh-light-mode-only) &nbsp;&nbsp;&nbsp;&nbsp; ![DBFZ](./Pictures/DBFZ.png#gh-light-mode-only) &nbsp; &nbsp;&nbsp;&nbsp; ![MRI](./Pictures/MRI.png#gh-light-mode-only) Join us in making ScrAIbe even better! 🚀