readme
This commit is contained in:
@@ -1,23 +1,24 @@
|
|||||||
|
|
||||||
# `AutoTranscript`: Fully Automated Transcription using AI
|
# `ScrAIbe: Streamlined Conversation Recording with Automated Intelligence Based Environment`
|
||||||
|
|
||||||
`AutoTranscript` is a [PyTorch](https://pytorch.org/) based interface speech-to-text tool to generate fully automated transcriptions. AutoTranscript uses AI models containing speaker diarization models:
|
|
||||||
|
`ScrAIbe` is a [PyTorch](https://pytorch.org/) based interface speech-to-text tool to generate fully automated transcriptions. AutoTranscript uses AI models containing speaker diarization models:
|
||||||
|
|
||||||
- [whisper](https://github.com/openai/whisper): A general-purpose speech recognition model.
|
- [whisper](https://github.com/openai/whisper): A general-purpose speech recognition model.
|
||||||
- [payannote-audio](https://github.com/pyannote/pyannote-audio): An open-source toolkit for speaker diarization-.
|
- [payannote-audio](https://github.com/pyannote/pyannote-audio): An open-source toolkit for speaker diarization-.
|
||||||
|
|
||||||
`AutoTranscript` can be used as a command-line interface, a webserver, or as a Python API.
|
## Install `ScrAIbe` :
|
||||||
|
|
||||||
## Install `AutoTranscript` :
|
|
||||||
|
|
||||||
The following command will pull and install the latest commit from this repository, along with its Python dependencies.
|
The following command will pull and install the latest commit from this repository, along with its Python dependencies.
|
||||||
|
|
||||||
pip install https://github.com/JSchmie/autotranscript.git
|
pip install git+https://github.com/JSchmie/autotranscript.git
|
||||||
|
|
||||||
- **Python version**: Python 3.9
|
- **Python version**: Python 3.9
|
||||||
- **PyTorch version**: Python 1.11.0
|
- **PyTorch version**: Python 1.11.0
|
||||||
|
|
||||||
## Usage examples
|
## Usage
|
||||||
|
|
||||||
|
`AutoTranscript` can be used as a command-line interface, a webserver, or as a Python API.
|
||||||
|
|
||||||
### Python usage
|
### Python usage
|
||||||
|
|
||||||
@@ -32,37 +33,67 @@ print(f"Transcription: \n{text}")
|
|||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Refer to [whisper](https://github.com/openai/whisper) and [payannote-audio](https://github.com/pyannote/pyannote-audio) for further options.
|
||||||
|
|
||||||
### Command-line usage
|
### Command-line usage
|
||||||
|
|
||||||
If you do not want to control the optimization using Python, you also can use the command-line:
|
|
||||||
|
You can also run ScrAIbe in a [Gradio App](https://github.com/gradio-app/gradio) interface using the following command-line:
|
||||||
|
|
||||||
autotranscript audio.wav
|
autotranscript audio.wav
|
||||||
|
|
||||||
|
Some example of important functionalities are:
|
||||||
|
|
||||||
|
- `--task`: Task to be performed, either transcription, diarization or translation into English. Default is transcription.
|
||||||
|
- `--hf-token`: To download the models, a Hugging Face token must be generated. Check [Hugging Face](https://huggingface.co/docs/hub/security-tokens) for further information on how to do that.
|
||||||
|
- `--server-name`: Name of the Web Server. If empty 127.0.0.1 or 0.0.0.0 will be used
|
||||||
|
- `--whisper-model-name`: Name of the [whisper](https://github.com/openai/whisper) model to be used. Default is `medium`.
|
||||||
|
|
||||||
|
|
||||||
Run the following to view all available options:
|
Run the following to view all available options:
|
||||||
|
|
||||||
autotranscript -h
|
autotranscript -h
|
||||||
|
|
||||||
### Documentation usage
|
## Documentation
|
||||||
|
|
||||||
To access the documentation run the following command from the docs/_build/html directory:
|
For further insights check the [documentation page](https://cristinaortizcruz.github.io/Test/).
|
||||||
|
|
||||||
python -m http.server
|
## Contributions
|
||||||
|
|
||||||
|
We are happy for any interest in contributing: In order to do that, fork the repo and use merge requests to incorporate your contribution.
|
||||||
|
|
||||||
## Roadmap
|
## Roadmap
|
||||||
|
|
||||||
|
The following milestones are planned for the further development of ScrAIbe:
|
||||||
|
|
||||||
- Model quantization
|
- Model quantization
|
||||||
|
Quantization to empower memory and computational efficiency.
|
||||||
|
|
||||||
- Model fine-tuning
|
- Model fine-tuning
|
||||||
|
In order to be able to cover a variety of linguistic phenomena.
|
||||||
|
|
||||||
|
For example, currently ScrAIbe is able to transcribe word by word, but ignores filler words or speech pauses.
|
||||||
|
These phenomena can be addressed by fine-tuning with the corresponding data.
|
||||||
|
|
||||||
- Implementation of LLMs
|
- Implementation of LLMs
|
||||||
|
One example is the implementation of a summarization or extraction model, which enables ScrAIbe to automatically summarize or retrieve the key information out of a generated transcription, which could be the minutes of a meeting.
|
||||||
|
|
||||||
- Executable for Windows
|
- Executable for Windows
|
||||||
|
|
||||||
## Contact
|
## Contact
|
||||||
|
|
||||||
For queries contact Jacob Schmieder at Jacob.Schmieder@dbfz.de
|
For queries contact [Jacob Schmieder](Jacob.Schmieder@dbfz.de)
|
||||||
|
|
||||||
## License
|
## License
|
||||||
|
|
||||||
|
<!-- licensing missing? Apache 2.0 -->
|
||||||
|
ScrAIbe is licensed under (tbd).
|
||||||
|
|
||||||
## Acknowledgments
|
## Acknowledgments
|
||||||
|
|
||||||
Special thanks go to the colleagues of the KIDA project - especially the teams in I5 and I2 - and the BMEL (Bundesministerium für Ernährung und Landwirtschaft).
|
<!--add KIDA, MRI, DBFZ, BMEL logos-->
|
||||||
|
|
||||||
|
Special thanks go to the KIDA project and the BMEL (Bundesministerium für Ernährung und Landwirtschaft), especially to the AI Consultancy Team and the Infrastructure Team.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user