removed docs to aviod conflict
This commit is contained in:
Binary file not shown.
|
After Width: | Height: | Size: 131 KiB |
@@ -1,173 +0,0 @@
|
|||||||
|
|
||||||
# `ScrAIbe: Streamlined Conversation Recording with Automated Intelligence Based Environment`
|
|
||||||
|
|
||||||
`ScrAIbe` is a state-of-the-art, [PyTorch](https://pytorch.org/) based multilingual speech-to-text framework to generate fully automated transcriptions.
|
|
||||||
|
|
||||||
Beyond transcription, ScrAIbe supports advanced functions, such as speaker diarization and speaker recognition.
|
|
||||||
|
|
||||||
Designed as a comprehensive AI toolkit, it uses multiple AI models:
|
|
||||||
|
|
||||||
- [whisper](https://github.com/openai/whisper): A general-purpose speech recognition model.
|
|
||||||
- [payannote-audio](https://github.com/pyannote/pyannote-audio): An open-source toolkit for speaker diarization.
|
|
||||||
|
|
||||||
The framework utilizes a PyanNet-inspired pipeline with the `Pyannote` library for speaker diarization and `VoxCeleb` for speaker embedding.
|
|
||||||
|
|
||||||
During post-diarization, each audio segment is processed by the OpenAI `Whisper` model, in a transformer encoder-decoder structure. Initially, a CNN mitigates noise and enhances speech. Before transcription, `VoxLingua` dentifies the language segment, facilitating Whisper's role in both transcription and text translation.
|
|
||||||
|
|
||||||
The following graphic illustates the whole pipeline:
|
|
||||||
|
|
||||||

|
|
||||||

|
|
||||||
|
|
||||||
## Install `ScrAIbe` :
|
|
||||||
|
|
||||||
The following command will pull and install the latest commit from this repository, along with its Python dependencies.
|
|
||||||
|
|
||||||
pip install git+https://github.com/JSchmie/autotranscript.git
|
|
||||||
|
|
||||||
- **Python version**: Python 3.8
|
|
||||||
- **PyTorch version**: Python 1.11.0
|
|
||||||
- **CUDA version**: Cuda-toolkit 11.3.1
|
|
||||||
|
|
||||||
|
|
||||||
Important: For the `Pyannote` model you need to be granted access in Hugging Face.
|
|
||||||
Check the [Pyannote model page](https://huggingface.co/pyannote/speaker-diarization) to get access to the model.
|
|
||||||
|
|
||||||
Additionally, you need to generate a [Hugging Face token](https://huggingface.co/docs/hub/security-tokens).
|
|
||||||
|
|
||||||
## Usage
|
|
||||||
|
|
||||||
We've developed ScrAIbe with several access points to cater to diverse user needs.
|
|
||||||
|
|
||||||
### Python usage
|
|
||||||
|
|
||||||
It enables full control over the functionalities as well as process customization.
|
|
||||||
|
|
||||||
Some usage examples:
|
|
||||||
|
|
||||||
- Usage of `AutoTranscribe`, core of the transcription system, for performing trancription and diarization of audio files.
|
|
||||||
|
|
||||||
```python
|
|
||||||
from scraibe import AutoTranscribe
|
|
||||||
|
|
||||||
model = AutoTranscribe()
|
|
||||||
|
|
||||||
text = model.transcribe("audio.wav")
|
|
||||||
|
|
||||||
print(f"Transcription: \n{text}")
|
|
||||||
|
|
||||||
```
|
|
||||||
- Usage of `Diariser`, responsible for identifying
|
|
||||||
and segmenting individual speakers from a given audio file.
|
|
||||||
|
|
||||||
```python
|
|
||||||
from scraibe import Diariser
|
|
||||||
|
|
||||||
model = Diariser.load_model()
|
|
||||||
|
|
||||||
diarisation_output = model.diarization("audio.wav")
|
|
||||||
|
|
||||||
```
|
|
||||||
- Usage of `Transcriber`, for transcribing audio files and saving the transcription afterwards.
|
|
||||||
|
|
||||||
```python
|
|
||||||
from scraibe import Transcriber
|
|
||||||
|
|
||||||
transcriber = Transcriber.load_model()
|
|
||||||
|
|
||||||
transcript = transcriber.transcribe("audio.wav")
|
|
||||||
|
|
||||||
transcriber.save_transcript(transcript, "path/to/save.txt")
|
|
||||||
|
|
||||||
```
|
|
||||||
|
|
||||||
|
|
||||||
Refer to [whisper](https://github.com/openai/whisper) and [payannote-audio](https://github.com/pyannote/pyannote-audio) for further options.
|
|
||||||
|
|
||||||
### Command-line usage
|
|
||||||
|
|
||||||
You can also run ScrAIbe in a [Gradio App](https://github.com/gradio-app/gradio) interface using the following command-line:
|
|
||||||
|
|
||||||
scraibe audio.wav
|
|
||||||
|
|
||||||
Some example of important functionalities are:
|
|
||||||
|
|
||||||
- `--task`: Task to be performed, either transcription, diarization or translation into English. Default is transcription.
|
|
||||||
- `--hf-token`: Personal `Hugging Face` token.
|
|
||||||
- `--server-name`: Name of the Web Server. If empty 127.0.0.1 or 0.0.0.0 will be used.
|
|
||||||
- `--port`: To run the Gradio app. The default is 7860.
|
|
||||||
|
|
||||||
- `--whisper-model-name`: Name of the [whisper](https://github.com/openai/whisper) model to be used. Default is `medium`.
|
|
||||||
|
|
||||||
|
|
||||||
Run the following to view all available options:
|
|
||||||
|
|
||||||
scraibe -h
|
|
||||||
|
|
||||||
### Running a Docker container
|
|
||||||
|
|
||||||
After you have installed Docker, you can execute the following commands in the terminal.
|
|
||||||
|
|
||||||
```
|
|
||||||
sudo docker build . --build-arg="hf_token=[enter your HuggingFace token] " -t [image name]
|
|
||||||
|
|
||||||
sudo docker run -it -p 7860:7860 --name [container name][image name] --hf_token [enter your HuggingFace token] --start_server
|
|
||||||
|
|
||||||
```
|
|
||||||
- `-p`: Flag for connecting the container interal port to the port on your local machine.
|
|
||||||
- `--hf_token`: Flag for entering your personal HuggingFace token in the container.
|
|
||||||
- `--start_server`: Command to start the Gradio App.
|
|
||||||
|
|
||||||
Then click the following link to run the app:
|
|
||||||
|
|
||||||
http://0.0.0.0:7860
|
|
||||||
|
|
||||||
- Enabling GPU usage
|
|
||||||
|
|
||||||
```
|
|
||||||
sudo docker run -it -p 7860:7860 --gpus 'all,capabilities=utility' --name [container name][image name] --hf_token [enter your HuggingFace token] --start_server
|
|
||||||
```
|
|
||||||
For further guidance check: https://blog.roboflow.com/use-the-gpu-in-docker/
|
|
||||||
|
|
||||||
|
|
||||||
## Documentation
|
|
||||||
|
|
||||||
For further insights check the [documentation page](https://cristinaortizcruz.github.io/Test/).
|
|
||||||
|
|
||||||
## Contributions
|
|
||||||
|
|
||||||
We are happy for any interest in contributing and about feedback: In order to do that, create an issue with your feedback or feel free to contact us.
|
|
||||||
|
|
||||||
## Roadmap
|
|
||||||
|
|
||||||
The following milestones are planned for further releases of ScrAIbe:
|
|
||||||
|
|
||||||
- Model quantization
|
|
||||||
Quantization to empower memory and computational efficiency.
|
|
||||||
|
|
||||||
- Model fine-tuning
|
|
||||||
In order to be able to cover a variety of linguistic phenomena.
|
|
||||||
|
|
||||||
For example, currently ScrAIbe is able to transcribe word by word, but ignores filler words or speech pauses.
|
|
||||||
These phenomena can be addressed by fine-tuning with the corresponding data.
|
|
||||||
|
|
||||||
- Implementation of LLMs
|
|
||||||
One example is the implementation of a summarization or extraction model, which enables ScrAIbe to automatically summarize or retrieve the key information out of a generated transcription, which could be the minutes of a meeting.
|
|
||||||
|
|
||||||
- Executable for Windows
|
|
||||||
|
|
||||||
## Contact
|
|
||||||
|
|
||||||
For queries contact [Jacob Schmieder](Jacob.Schmieder@dbfz.de)
|
|
||||||
|
|
||||||
## License
|
|
||||||
|
|
||||||
ScrAIbe is licensed under GNU General Public License.
|
|
||||||
|
|
||||||
## Acknowledgments
|
|
||||||
|
|
||||||
Special thanks go to the KIDA project and the BMEL (Bundesministerium für Ernährung und Landwirtschaft), especially to the AI Consultancy Team and the Infrastructure Team.
|
|
||||||
|
|
||||||
   
|
|
||||||
|
|
||||||
   
|
|
||||||
@@ -1,101 +0,0 @@
|
|||||||
from dash import Dash, dcc, html, dash_table, Input, Output, State, callback
|
|
||||||
|
|
||||||
import base64
|
|
||||||
from autotranscript.app.qtfaststart import process
|
|
||||||
from autotranscript import AutoTranscribe
|
|
||||||
import io
|
|
||||||
import subprocess as sp
|
|
||||||
import numpy as np
|
|
||||||
from autotranscript.audio import SAMPLE_RATE
|
|
||||||
|
|
||||||
# Setup auto-transcript
|
|
||||||
autot = AutoTranscribe() # whisper_model="tiny", whisper_kwargs={"local" : False}
|
|
||||||
|
|
||||||
# Setup FFmpeg
|
|
||||||
PROBLEMATIC_FILE_TYPES : tuple = "mov","mp4","m4a","3gp","3g2","mj2"
|
|
||||||
|
|
||||||
|
|
||||||
# Setup Dash
|
|
||||||
external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']
|
|
||||||
|
|
||||||
app = Dash(__name__, external_stylesheets=external_stylesheets)
|
|
||||||
|
|
||||||
app.layout = html.Div([
|
|
||||||
dcc.Upload(
|
|
||||||
id='upload-data',
|
|
||||||
children=html.Div([
|
|
||||||
'Drag and Drop or ',
|
|
||||||
html.A('Select Files')
|
|
||||||
]),
|
|
||||||
style={
|
|
||||||
'width': '100%',
|
|
||||||
'height': '60px',
|
|
||||||
'lineHeight': '60px',
|
|
||||||
'borderWidth': '1px',
|
|
||||||
'borderStyle': 'dashed',
|
|
||||||
'borderRadius': '5px',
|
|
||||||
'textAlign': 'center',
|
|
||||||
'margin': '10px'
|
|
||||||
},
|
|
||||||
# Allow multiple files to be uploaded
|
|
||||||
multiple=True
|
|
||||||
),
|
|
||||||
html.Div(id='output-data-upload'),
|
|
||||||
])
|
|
||||||
|
|
||||||
def parse_contents(contents, filename, date):
|
|
||||||
content_type, content_string = contents.split(',')
|
|
||||||
|
|
||||||
decoded = base64.b64decode(content_string)
|
|
||||||
file = io.BytesIO(decoded).read()
|
|
||||||
|
|
||||||
if filename.endswith(PROBLEMATIC_FILE_TYPES):
|
|
||||||
# mp4 and other files need to be processed with qtfaststart
|
|
||||||
# since theire metadata is at the end of the file
|
|
||||||
# and we need it at the beginning
|
|
||||||
file = process(file)
|
|
||||||
|
|
||||||
cmd = [
|
|
||||||
"ffmpeg",
|
|
||||||
"-nostdin",
|
|
||||||
"-threads", "0",
|
|
||||||
"-i",'pipe:',
|
|
||||||
"-f", "s16le",
|
|
||||||
'-hide_banner',
|
|
||||||
'-loglevel', 'error',
|
|
||||||
"-c", "copy",
|
|
||||||
"-vn",
|
|
||||||
"-ac", "1",
|
|
||||||
"-acodec", "pcm_s16le",
|
|
||||||
"-ar", str(SAMPLE_RATE),
|
|
||||||
"-"
|
|
||||||
]
|
|
||||||
|
|
||||||
proc = sp.Popen(cmd, stdout=sp.PIPE, stdin=sp.PIPE)
|
|
||||||
|
|
||||||
out = proc.communicate(input=file)[0]
|
|
||||||
out = np.frombuffer(out, np.int16).flatten().astype(np.float32) / 32768.0
|
|
||||||
out = np.array([out, SAMPLE_RATE])
|
|
||||||
|
|
||||||
transcript = str(autot.transcribe(out))
|
|
||||||
|
|
||||||
return html.Div([
|
|
||||||
html.H5(f"File Name: {filename} \n" \
|
|
||||||
"Transcript: \n"
|
|
||||||
),
|
|
||||||
html.P(transcript)
|
|
||||||
])
|
|
||||||
|
|
||||||
@callback(Output('output-data-upload', 'children'),
|
|
||||||
Input('upload-data', 'contents'),
|
|
||||||
State('upload-data', 'filename'),
|
|
||||||
State('upload-data', 'last_modified'))
|
|
||||||
def update_output(list_of_contents, list_of_names, list_of_dates):
|
|
||||||
if list_of_contents is not None:
|
|
||||||
children = [
|
|
||||||
parse_contents(c, n, d) for c, n, d in
|
|
||||||
zip(list_of_contents, list_of_names, list_of_dates)]
|
|
||||||
return children
|
|
||||||
|
|
||||||
if __name__ == '__main__':
|
|
||||||
app.run_server()
|
|
||||||
+1
-3
@@ -9,8 +9,6 @@ pyannote.pipeline~=2.3
|
|||||||
setuptools~=65.6.3
|
setuptools~=65.6.3
|
||||||
setuptools-rust~=1.5.2
|
setuptools-rust~=1.5.2
|
||||||
|
|
||||||
sphinx~=5.0.2
|
|
||||||
|
|
||||||
tqdm>=4.65.0
|
tqdm>=4.65.0
|
||||||
|
|
||||||
gradio~=3.36.1
|
gradio~=3.36.1
|
||||||
@@ -22,6 +20,6 @@ torch~=1.11.0
|
|||||||
torchvision~=0.12.0
|
torchvision~=0.12.0
|
||||||
torchaudio~=0.11.0
|
torchaudio~=0.11.0
|
||||||
#optional:
|
#optional:
|
||||||
#dash~=2.10.2
|
#sphinx~=5.0.2
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -0,0 +1 @@
|
|||||||
|
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
|
||||||
|
Before Width: | Height: | Size: 38 KiB After Width: | Height: | Size: 38 KiB |
@@ -3,7 +3,7 @@ Gradio Audio Transcription App.
|
|||||||
--------------------------------
|
--------------------------------
|
||||||
|
|
||||||
This module provides an interface to transcribe audio files using the
|
This module provides an interface to transcribe audio files using the
|
||||||
AutoTranscribe model. Users can either upload an audio file or record their speech
|
Scraibe model. Users can either upload an audio file or record their speech
|
||||||
live for transcription. The application supports multiple languages and provides
|
live for transcription. The application supports multiple languages and provides
|
||||||
options to specify the number of speakers and the language of the audio.
|
options to specify the number of speakers and the language of the audio.
|
||||||
|
|
||||||
@@ -20,7 +20,7 @@ Gradio Audio Transcription App.
|
|||||||
--------------------------------
|
--------------------------------
|
||||||
|
|
||||||
This module provides an interface to transcribe audio files using the
|
This module provides an interface to transcribe audio files using the
|
||||||
AutoTranscribe model. Users can either upload an audio file or record their speech
|
Scraibe model. Users can either upload an audio file or record their speech
|
||||||
live for transcription. The application supports multiple languages and provides
|
live for transcription. The application supports multiple languages and provides
|
||||||
options to specify the number of speakers and the language of the audio.
|
options to specify the number of speakers and the language of the audio.
|
||||||
|
|
||||||
@@ -33,10 +33,13 @@ Usage:
|
|||||||
"""
|
"""
|
||||||
|
|
||||||
import json
|
import json
|
||||||
|
import os
|
||||||
|
from tkinter import CURRENT
|
||||||
|
|
||||||
import gradio as gr
|
import gradio as gr
|
||||||
from autotranscript import AutoTranscribe, Transcript
|
from tqdm import tqdm
|
||||||
|
|
||||||
|
from scraibe import Scraibe, Transcript
|
||||||
|
|
||||||
theme = gr.themes.Soft(
|
theme = gr.themes.Soft(
|
||||||
primary_hue="green",
|
primary_hue="green",
|
||||||
@@ -59,17 +62,19 @@ LANGUAGES = [
|
|||||||
"Vietnamese", "Welsh"
|
"Vietnamese", "Welsh"
|
||||||
]
|
]
|
||||||
|
|
||||||
|
CURRENT_PATH = os.path.dirname(os.path.realpath(__file__))
|
||||||
|
|
||||||
class GradioTranscriptionInterface:
|
class GradioTranscriptionInterface:
|
||||||
"""
|
"""
|
||||||
Interface handling the interaction between Gradio UI and the Audio Transcription system.
|
Interface handling the interaction between Gradio UI and the Audio Transcription system.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
def __init__(self, model: AutoTranscribe):
|
def __init__(self, model: Scraibe):
|
||||||
"""
|
"""
|
||||||
Initializes the GradioTranscriptionInterface with a transcription model.
|
Initializes the GradioTranscriptionInterface with a transcription model.
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
model (AutoTranscribe): Model responsible for audio transcription tasks.
|
model (Scraibe): Model responsible for audio transcription tasks.
|
||||||
"""
|
"""
|
||||||
self.model = model
|
self.model = model
|
||||||
|
|
||||||
@@ -78,7 +83,7 @@ class GradioTranscriptionInterface:
|
|||||||
translation : bool,
|
translation : bool,
|
||||||
language : str):
|
language : str):
|
||||||
"""
|
"""
|
||||||
Shortcut method for the AutoTranscribe task.
|
Shortcut method for the Scraibe task.
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
tuple: Transcribed text (str), JSON output (dict)
|
tuple: Transcribed text (str), JSON output (dict)
|
||||||
@@ -89,13 +94,43 @@ class GradioTranscriptionInterface:
|
|||||||
"language": language if language != "None" else None,
|
"language": language if language != "None" else None,
|
||||||
"task": 'translate' if translation else None
|
"task": 'translate' if translation else None
|
||||||
}
|
}
|
||||||
|
if isinstance(source, str):
|
||||||
|
try:
|
||||||
|
result = self.model.autotranscribe(source, **kwargs)
|
||||||
|
except ValueError:
|
||||||
|
raise gr.Error("Couldn't detect any speech in the provided audio. \
|
||||||
|
Please try again!")
|
||||||
|
|
||||||
|
return str(result), result.get_json()
|
||||||
|
|
||||||
try:
|
elif isinstance(source, list):
|
||||||
result = self.model.autotranscribe(source, **kwargs)
|
source_names = [s.split("/")[-1] for s in source]
|
||||||
except ValueError:
|
result = []
|
||||||
raise gr.Error("Couldn't detect any speech in the provided audio. \
|
for s in tqdm(source, total=len(source),desc = "Transcribing audio files"):
|
||||||
Please try again!")
|
try:
|
||||||
return str(result), result.get_json()
|
res = self.model.autotranscribe(s, **kwargs)
|
||||||
|
except ValueError:
|
||||||
|
_name = s.split("/")[-1]
|
||||||
|
res = f"NO TRANSCRIPT FOUND FOR {_name}"
|
||||||
|
gr.Warning(f"Couldn't detect any speech in {_name} will skip this file.")
|
||||||
|
result.append(res)
|
||||||
|
|
||||||
|
out = ''
|
||||||
|
out_dict = {}
|
||||||
|
for i, r in enumerate(result):
|
||||||
|
out += f"TRANSCRIPT {i} FOR ({source_names[i]}):\n\n"
|
||||||
|
out += str(r)
|
||||||
|
out += "\n\n"
|
||||||
|
|
||||||
|
if isinstance(r, str):
|
||||||
|
out_dict[source_names[i]] = r
|
||||||
|
else:
|
||||||
|
out_dict[source_names[i]] = r.get_dict()
|
||||||
|
|
||||||
|
return out, json.dumps(out_dict, indent=4)
|
||||||
|
|
||||||
|
else:
|
||||||
|
raise gr.Error("Please provide a valid audio file.")
|
||||||
|
|
||||||
|
|
||||||
def transcribe(self, source, translation, language):
|
def transcribe(self, source, translation, language):
|
||||||
@@ -110,8 +145,28 @@ class GradioTranscriptionInterface:
|
|||||||
"task": 'translate' if translation == "Yes" else None
|
"task": 'translate' if translation == "Yes" else None
|
||||||
}
|
}
|
||||||
|
|
||||||
result = self.model.transcribe(source, **kwargs)
|
if isinstance(source, str):
|
||||||
return str(result)
|
result = self.model.transcribe(source, **kwargs)
|
||||||
|
|
||||||
|
return str(result)
|
||||||
|
|
||||||
|
elif isinstance(source, list):
|
||||||
|
source_names = [s.split("/")[-1] for s in source]
|
||||||
|
result = []
|
||||||
|
for s in tqdm(source, total=len(source),desc = "Transcribing audio files"):
|
||||||
|
res = self.model.transcribe(s, **kwargs)
|
||||||
|
result.append(res)
|
||||||
|
|
||||||
|
out = ''
|
||||||
|
for i, res in enumerate(result):
|
||||||
|
out += f"TRANSCRIPT {i} FOR ({source_names[i]}):\n\n"
|
||||||
|
out += str(res)
|
||||||
|
out += "\n\n"
|
||||||
|
|
||||||
|
return out
|
||||||
|
|
||||||
|
else:
|
||||||
|
raise gr.Error("Please provide a valid audio file.")
|
||||||
|
|
||||||
def perform_diarisation(self, source, num_speakers):
|
def perform_diarisation(self, source, num_speakers):
|
||||||
"""
|
"""
|
||||||
@@ -124,22 +179,44 @@ class GradioTranscriptionInterface:
|
|||||||
"num_speakers": num_speakers if num_speakers != 0 else None,
|
"num_speakers": num_speakers if num_speakers != 0 else None,
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if isinstance(source, str):
|
||||||
|
try:
|
||||||
|
result = self.model.diarization(source, **kwargs)
|
||||||
|
except ValueError:
|
||||||
|
raise gr.Error("Couldn't detect any speech in the provided audio. \
|
||||||
|
Please try again!")
|
||||||
|
|
||||||
try:
|
return json.dumps(result, indent=2)
|
||||||
result = self.model.diarization(source, **kwargs)
|
elif isinstance(source, list):
|
||||||
except ValueError:
|
source_names = [s.split("/")[-1] for s in source]
|
||||||
raise gr.Error("Couldn't detect any speech in the provided audio. \
|
result = []
|
||||||
Please try again!")
|
for s in tqdm(source, total=len(source),desc = "Performing diarisation"):
|
||||||
return json.dumps(result, indent=2)
|
try:
|
||||||
|
res = self.model.diarization(s, **kwargs)
|
||||||
|
except ValueError:
|
||||||
|
res = f"NO DIARISATION FOUND FOR {s}"
|
||||||
|
gr.Warning(f"Couldn't detect any speech in {s} will skip this file.")
|
||||||
|
result.append(res)
|
||||||
|
|
||||||
|
out = {}
|
||||||
|
|
||||||
|
for i, res in enumerate(result):
|
||||||
|
out[source_names[i]] = res
|
||||||
|
|
||||||
|
return json.dumps(out, indent=4)
|
||||||
|
|
||||||
|
else:
|
||||||
|
gr.Error("Please provide a valid audio file.")
|
||||||
|
|
||||||
|
|
||||||
####
|
####
|
||||||
# Gradio Interface
|
# Gradio Interface
|
||||||
####
|
####
|
||||||
|
|
||||||
def gradio_Interface(model : AutoTranscribe = None):
|
def gradio_Interface(model : Scraibe = None):
|
||||||
|
|
||||||
if model is None:
|
if model is None:
|
||||||
model = AutoTranscribe()
|
model = Scraibe()
|
||||||
|
|
||||||
pipe = GradioTranscriptionInterface(model)
|
pipe = GradioTranscriptionInterface(model)
|
||||||
|
|
||||||
@@ -197,7 +274,7 @@ def gradio_Interface(model : AutoTranscribe = None):
|
|||||||
gr.update(visible = True),
|
gr.update(visible = True),
|
||||||
gr.update(visible = False, value = None))
|
gr.update(visible = False, value = None))
|
||||||
|
|
||||||
elif choice == "File":
|
elif choice == "File or Files":
|
||||||
|
|
||||||
return (gr.update(visible = False, value = None),
|
return (gr.update(visible = False, value = None),
|
||||||
gr.update(visible = False, value = None),
|
gr.update(visible = False, value = None),
|
||||||
@@ -205,22 +282,42 @@ def gradio_Interface(model : AutoTranscribe = None):
|
|||||||
gr.update(visible = False, value = None),
|
gr.update(visible = False, value = None),
|
||||||
gr.update(visible = True))
|
gr.update(visible = True))
|
||||||
|
|
||||||
def run_scribe(task, num_speakers, translate, language, audio1, audio2, video1, video2, file_in, progress = gr.Progress(track_tqdm= True)):
|
def run_scribe(task,
|
||||||
|
num_speakers,
|
||||||
|
translate,
|
||||||
|
language,
|
||||||
|
audio1,
|
||||||
|
audio2,
|
||||||
|
video1,
|
||||||
|
video2,
|
||||||
|
file_in,
|
||||||
|
progress = gr.Progress(track_tqdm= True)):
|
||||||
# get *args which are not None
|
# get *args which are not None
|
||||||
progress(0, desc='Starting task...')
|
progress(0, desc='Starting task...')
|
||||||
source = audio1 or audio2 or video1 or video2 or file_in
|
source = audio1 or audio2 or video1 or video2 or file_in
|
||||||
|
|
||||||
|
if isinstance(source, list):
|
||||||
|
source = [s.name for s in source]
|
||||||
|
if len(source) == 1:
|
||||||
|
source = source[0]
|
||||||
|
|
||||||
if task == 'Auto Transcribe':
|
if task == 'Auto Transcribe':
|
||||||
|
|
||||||
out_str , out_json = pipe.auto_transcribe(source = source,
|
out_str , out_json = pipe.auto_transcribe(source = source,
|
||||||
num_speakers = num_speakers,
|
num_speakers = num_speakers,
|
||||||
translation = translate,
|
translation = translate,
|
||||||
language = language)
|
language = language)
|
||||||
|
|
||||||
return (gr.update(value = out_str, visible = True),
|
if isinstance(source, str):
|
||||||
gr.update(value = out_json, visible = True),
|
return (gr.update(value = out_str, visible = True),
|
||||||
gr.update(visible = True),
|
gr.update(value = out_json, visible = True),
|
||||||
gr.update(visible = True))
|
gr.update(visible = True),
|
||||||
|
gr.update(visible = True))
|
||||||
|
else:
|
||||||
|
return (gr.update(value = out_str, visible = True),
|
||||||
|
gr.update(value = out_json, visible = True),
|
||||||
|
gr.update(visible = False),
|
||||||
|
gr.update(visible = False))
|
||||||
|
|
||||||
elif task == 'Transcribe':
|
elif task == 'Transcribe':
|
||||||
|
|
||||||
@@ -255,7 +352,8 @@ def gradio_Interface(model : AutoTranscribe = None):
|
|||||||
with gr.Blocks(theme=theme,title='ScrAIbe: Automatic Audio Transcription') as demo:
|
with gr.Blocks(theme=theme,title='ScrAIbe: Automatic Audio Transcription') as demo:
|
||||||
|
|
||||||
# Define components
|
# Define components
|
||||||
header = open("header.html", "r").read()
|
hname = os.path.join(CURRENT_PATH, "header.html")
|
||||||
|
header = open(hname, "r").read()
|
||||||
gr.HTML(header, visible= True, show_label=False)
|
gr.HTML(header, visible= True, show_label=False)
|
||||||
|
|
||||||
with gr.Row():
|
with gr.Row():
|
||||||
@@ -279,7 +377,7 @@ def gradio_Interface(model : AutoTranscribe = None):
|
|||||||
leave it at None.", visible= True)
|
leave it at None.", visible= True)
|
||||||
|
|
||||||
input = gr.Radio(["Upload Audio", "Record Audio", "Upload Video","Record Video"
|
input = gr.Radio(["Upload Audio", "Record Audio", "Upload Video","Record Video"
|
||||||
,"File"], label="Input Type", value="Upload Audio")
|
,"File or Files"], label="Input Type", value="Upload Audio")
|
||||||
|
|
||||||
audio1 = gr.Audio(source="upload", type="filepath", label="Upload Audio",
|
audio1 = gr.Audio(source="upload", type="filepath", label="Upload Audio",
|
||||||
interactive= True, visible= True)
|
interactive= True, visible= True)
|
||||||
@@ -289,7 +387,7 @@ def gradio_Interface(model : AutoTranscribe = None):
|
|||||||
interactive= True, visible= False)
|
interactive= True, visible= False)
|
||||||
video2 = gr.Video(source="webcam", label="Record Video", type="filepath",
|
video2 = gr.Video(source="webcam", label="Record Video", type="filepath",
|
||||||
interactive= True, visible= False)
|
interactive= True, visible= False)
|
||||||
file_in = gr.File(label="Upload File", interactive= True, visible= False)
|
file_in = gr.Files(label="Upload File or Files", interactive= True, visible= False)
|
||||||
|
|
||||||
submit = gr.Button()
|
submit = gr.Button()
|
||||||
|
|
||||||
@@ -1,5 +1,5 @@
|
|||||||
"""
|
"""
|
||||||
AutoTranscribe Class
|
Scraibe Class
|
||||||
--------------------
|
--------------------
|
||||||
|
|
||||||
This class serves as the core of the transcription system, responsible for handling
|
This class serves as the core of the transcription system, responsible for handling
|
||||||
@@ -12,15 +12,15 @@ By encapsulating the complexities of underlying models, it allows for straightfo
|
|||||||
integration into various applications, ranging from transcription services to voice assistants.
|
integration into various applications, ranging from transcription services to voice assistants.
|
||||||
|
|
||||||
Available Classes:
|
Available Classes:
|
||||||
- AutoTranscribe: Main class for performing transcription and diarization.
|
- Scraibe: Main class for performing transcription and diarization.
|
||||||
Includes methods for loading models, processing audio files,
|
Includes methods for loading models, processing audio files,
|
||||||
and formatting the transcription output.
|
and formatting the transcription output.
|
||||||
|
|
||||||
Usage:
|
Usage:
|
||||||
from .autotranscribe import AutoTranscribe
|
from scraibe import Scraibe
|
||||||
|
|
||||||
model = AutoTranscribe(whisper_model="path/to/whisper/model", dia_model="path/to/diarisation/model")
|
model = Scraibe()
|
||||||
transcript = model.transcribe("path/to/audiofile.wav")
|
transcript = model.autotranscribe("path/to/audiofile.wav")
|
||||||
"""
|
"""
|
||||||
|
|
||||||
# Standard Library Imports
|
# Standard Library Imports
|
||||||
@@ -45,9 +45,9 @@ from .transcript_exporter import Transcript
|
|||||||
DiarisationType = TypeVar('DiarisationType')
|
DiarisationType = TypeVar('DiarisationType')
|
||||||
|
|
||||||
|
|
||||||
class AutoTranscribe:
|
class Scraibe:
|
||||||
"""
|
"""
|
||||||
AutoTranscribe is a class responsible for managing the transcription and diarization of audio files.
|
Scraibe is a class responsible for managing the transcription and diarization of audio files.
|
||||||
It serves as the core of the transcription system, incorporating pretrained models
|
It serves as the core of the transcription system, incorporating pretrained models
|
||||||
for speech-to-text (such as Whisper) and speaker diarization (such as pyannote.audio),
|
for speech-to-text (such as Whisper) and speaker diarization (such as pyannote.audio),
|
||||||
allowing for comprehensive audio processing.
|
allowing for comprehensive audio processing.
|
||||||
@@ -57,7 +57,7 @@ class AutoTranscribe:
|
|||||||
diariser (Diariser): The diariser object to handle diarization.
|
diariser (Diariser): The diariser object to handle diarization.
|
||||||
|
|
||||||
Methods:
|
Methods:
|
||||||
__init__: Initializes the AutoTranscribe class with appropriate models.
|
__init__: Initializes the Scraibe class with appropriate models.
|
||||||
transcribe: Transcribes an audio file using the whisper model and pyannote diarization model.
|
transcribe: Transcribes an audio file using the whisper model and pyannote diarization model.
|
||||||
remove_audio_file: Removes the original audio file to avoid disk space issues or ensure data privacy.
|
remove_audio_file: Removes the original audio file to avoid disk space issues or ensure data privacy.
|
||||||
get_audio_file: Gets an audio file as an AudioProcessor object.
|
get_audio_file: Gets an audio file as an AudioProcessor object.
|
||||||
@@ -66,7 +66,7 @@ class AutoTranscribe:
|
|||||||
whisper_model: Union[bool, str, whisper] = None,
|
whisper_model: Union[bool, str, whisper] = None,
|
||||||
dia_model : Union[bool, str, DiarisationType] = None,
|
dia_model : Union[bool, str, DiarisationType] = None,
|
||||||
**kwargs) -> None:
|
**kwargs) -> None:
|
||||||
"""Initializes the AutoTranscribe class.
|
"""Initializes the Scraibe class.
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
whisper_model (Union[bool, str, whisper], optional):
|
whisper_model (Union[bool, str, whisper], optional):
|
||||||
@@ -92,7 +92,11 @@ class AutoTranscribe:
|
|||||||
else:
|
else:
|
||||||
self.diariser = dia_model
|
self.diariser = dia_model
|
||||||
|
|
||||||
print("AutoTranscribe initialized all models successfully loaded.")
|
if kwargs.get("verbose"):
|
||||||
|
print("Scraibe initialized all models successfully loaded.")
|
||||||
|
self.verbose = True
|
||||||
|
else:
|
||||||
|
self.verbose = False
|
||||||
|
|
||||||
def autotranscribe(self, audio_file : Union[str, torch.Tensor, ndarray],
|
def autotranscribe(self, audio_file : Union[str, torch.Tensor, ndarray],
|
||||||
remove_original : bool = False,
|
remove_original : bool = False,
|
||||||
@@ -112,7 +116,8 @@ class AutoTranscribe:
|
|||||||
Transcript: A Transcript object containing the transcription,
|
Transcript: A Transcript object containing the transcription,
|
||||||
which can be exported to different formats.
|
which can be exported to different formats.
|
||||||
"""
|
"""
|
||||||
|
if kwargs.get("verbose"):
|
||||||
|
self.verbose = kwargs.get("verbose")
|
||||||
# Get audio file as an AudioProcessor object
|
# Get audio file as an AudioProcessor object
|
||||||
audio_file = self.get_audio_file(audio_file)
|
audio_file = self.get_audio_file(audio_file)
|
||||||
|
|
||||||
@@ -121,12 +126,12 @@ class AutoTranscribe:
|
|||||||
"waveform" : audio_file.waveform.reshape(1,len(audio_file.waveform)),
|
"waveform" : audio_file.waveform.reshape(1,len(audio_file.waveform)),
|
||||||
"sample_rate": audio_file.sr
|
"sample_rate": audio_file.sr
|
||||||
}
|
}
|
||||||
|
|
||||||
print("Starting diarisation.")
|
if self.verbose:
|
||||||
|
print("Starting diarisation.")
|
||||||
|
|
||||||
diarisation = self.diariser.diarization(dia_audio, **kwargs)
|
diarisation = self.diariser.diarization(dia_audio, **kwargs)
|
||||||
|
|
||||||
|
|
||||||
if not diarisation["segments"]:
|
if not diarisation["segments"]:
|
||||||
print("No segments found. Try to run transcription without diarisation.")
|
print("No segments found. Try to run transcription without diarisation.")
|
||||||
|
|
||||||
@@ -138,16 +143,15 @@ class AutoTranscribe:
|
|||||||
|
|
||||||
return Transcript(final_transcript)
|
return Transcript(final_transcript)
|
||||||
|
|
||||||
print("Diarisation finished. Starting transcription.")
|
if self.verbose:
|
||||||
|
print("Diarisation finished. Starting transcription.")
|
||||||
|
|
||||||
audio_file.sr = torch.Tensor([audio_file.sr]).to(audio_file.waveform.device)
|
audio_file.sr = torch.Tensor([audio_file.sr]).to(audio_file.waveform.device)
|
||||||
|
|
||||||
# Transcribe each segment and store the results
|
# Transcribe each segment and store the results
|
||||||
final_transcript = dict()
|
final_transcript = dict()
|
||||||
|
|
||||||
|
for i in trange(len(diarisation["segments"]), desc= "Transcribing", disable = not self.verbose):
|
||||||
|
|
||||||
for i in trange(len(diarisation["segments"]), desc= "Transcribing"):
|
|
||||||
|
|
||||||
seg = diarisation["segments"][i]
|
seg = diarisation["segments"][i]
|
||||||
|
|
||||||
@@ -283,4 +287,4 @@ class AutoTranscribe:
|
|||||||
return audio_file
|
return audio_file
|
||||||
|
|
||||||
def __repr__(self):
|
def __repr__(self):
|
||||||
return f"AutoTranscribe(transcriber={self.transcriber}, diariser={self.diariser})"
|
return f"Scraibe(transcriber={self.transcriber}, diariser={self.diariser})"
|
||||||
@@ -1,5 +1,5 @@
|
|||||||
"""
|
"""
|
||||||
Command-Line Interface (CLI) for the AutoTranscribe class,
|
Command-Line Interface (CLI) for the Scraibe class,
|
||||||
allowing for user interaction to transcribe and diarize audio files.
|
allowing for user interaction to transcribe and diarize audio files.
|
||||||
The function includes arguments for specifying the audio files, model paths,
|
The function includes arguments for specifying the audio files, model paths,
|
||||||
output formats, and other options necessary for transcription.
|
output formats, and other options necessary for transcription.
|
||||||
@@ -8,9 +8,7 @@ import os
|
|||||||
from argparse import ArgumentParser, ArgumentDefaultsHelpFormatter
|
from argparse import ArgumentParser, ArgumentDefaultsHelpFormatter
|
||||||
import json
|
import json
|
||||||
|
|
||||||
from sympy import use
|
from .autotranscript import Scraibe
|
||||||
|
|
||||||
from .autotranscript import AutoTranscribe
|
|
||||||
from .app.gradio_app import gradio_Interface
|
from .app.gradio_app import gradio_Interface
|
||||||
|
|
||||||
from whisper.tokenizer import LANGUAGES , TO_LANGUAGE_CODE
|
from whisper.tokenizer import LANGUAGES , TO_LANGUAGE_CODE
|
||||||
@@ -20,12 +18,12 @@ from torch import set_num_threads
|
|||||||
|
|
||||||
def cli():
|
def cli():
|
||||||
"""
|
"""
|
||||||
Command-Line Interface (CLI) for the AutoTranscribe class, allowing for user interaction to transcribe
|
Command-Line Interface (CLI) for the Scraibe class, allowing for user interaction to transcribe
|
||||||
and diarize audio files. The function includes arguments for specifying the audio files, model paths,
|
and diarize audio files. The function includes arguments for specifying the audio files, model paths,
|
||||||
output formats, and other options necessary for transcription.
|
output formats, and other options necessary for transcription.
|
||||||
|
|
||||||
This function can be executed from the command line to perform transcription tasks, providing a
|
This function can be executed from the command line to perform transcription tasks, providing a
|
||||||
user-friendly way to access the AutoTranscribe class functionalities.
|
user-friendly way to access the Scraibe class functionalities.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
def str2bool(string):
|
def str2bool(string):
|
||||||
@@ -115,7 +113,7 @@ def cli():
|
|||||||
if arg_dict["whisper_model_directory"]:
|
if arg_dict["whisper_model_directory"]:
|
||||||
class_kwargs["download_root"] = arg_dict.pop("whisper_model_directory")
|
class_kwargs["download_root"] = arg_dict.pop("whisper_model_directory")
|
||||||
|
|
||||||
model = AutoTranscribe(**class_kwargs)
|
model = Scraibe(**class_kwargs)
|
||||||
|
|
||||||
|
|
||||||
if arg_dict["audio_files"]:
|
if arg_dict["audio_files"]:
|
||||||
@@ -14,7 +14,6 @@ WHISPER_DEFAULT_PATH = os.path.join(CACHE_DIR, "whisper")
|
|||||||
PYANNOTE_DEFAULT_PATH = os.path.join(CACHE_DIR, "pyannote")
|
PYANNOTE_DEFAULT_PATH = os.path.join(CACHE_DIR, "pyannote")
|
||||||
PYANNOTE_DEFAULT_CONFIG = os.path.join(PYANNOTE_DEFAULT_PATH, "config.yaml")
|
PYANNOTE_DEFAULT_CONFIG = os.path.join(PYANNOTE_DEFAULT_PATH, "config.yaml")
|
||||||
|
|
||||||
|
|
||||||
def config_diarization_yaml(file_path: str, path_to_segmentation: str = None) -> None:
|
def config_diarization_yaml(file_path: str, path_to_segmentation: str = None) -> None:
|
||||||
"""Configure diarization pipeline from a YAML file.
|
"""Configure diarization pipeline from a YAML file.
|
||||||
|
|
||||||
@@ -90,8 +90,8 @@ class Transcriber:
|
|||||||
|
|
||||||
kwargs = self._get_whisper_kwargs(**kwargs)
|
kwargs = self._get_whisper_kwargs(**kwargs)
|
||||||
|
|
||||||
if "verbose" not in kwargs:
|
if not kwargs.get("verbose"):
|
||||||
kwargs["verbose"] = False
|
kwargs["verbose"] = None
|
||||||
|
|
||||||
result = self.model.transcribe(audio, *args, **kwargs)
|
result = self.model.transcribe(audio, *args, **kwargs)
|
||||||
return result["text"]
|
return result["text"]
|
||||||
@@ -173,6 +173,9 @@ class Transcriber:
|
|||||||
if (task := kwargs.get("task")):
|
if (task := kwargs.get("task")):
|
||||||
whisper_kwargs["task"] = task
|
whisper_kwargs["task"] = task
|
||||||
|
|
||||||
|
if (language := kwargs.get("language")):
|
||||||
|
whisper_kwargs["language"] = language
|
||||||
|
|
||||||
return whisper_kwargs
|
return whisper_kwargs
|
||||||
|
|
||||||
def __repr__(self) -> str:
|
def __repr__(self) -> str:
|
||||||
@@ -1,69 +1,69 @@
|
|||||||
import os
|
import os
|
||||||
import subprocess as sp
|
import subprocess as sp
|
||||||
|
|
||||||
MAJOR = 0
|
MAJOR = 0
|
||||||
MINOR = 1
|
MINOR = 1
|
||||||
MICRO = 0
|
MICRO = 0
|
||||||
MICRO_POST = 0
|
MICRO_POST = 0
|
||||||
ISRELEASED = False
|
ISRELEASED = False
|
||||||
VERSION = '%d.%d.%d.%d' % (MAJOR, MINOR, MICRO, MICRO_POST)
|
VERSION = '%d.%d.%d.%d' % (MAJOR, MINOR, MICRO, MICRO_POST)
|
||||||
|
|
||||||
# Return the git revision as a string
|
# Return the git revision as a string
|
||||||
# taken from numpy/numpy
|
# taken from numpy/numpy
|
||||||
def git_version():
|
def git_version():
|
||||||
def _minimal_ext_cmd(cmd):
|
def _minimal_ext_cmd(cmd):
|
||||||
# construct minimal environment
|
# construct minimal environment
|
||||||
env = {}
|
env = {}
|
||||||
for k in ['SYSTEMROOT', 'PATH', 'HOME']:
|
for k in ['SYSTEMROOT', 'PATH', 'HOME']:
|
||||||
v = os.environ.get(k)
|
v = os.environ.get(k)
|
||||||
if v is not None:
|
if v is not None:
|
||||||
env[k] = v
|
env[k] = v
|
||||||
|
|
||||||
# LANGUAGE is used on win32
|
# LANGUAGE is used on win32
|
||||||
env['LANGUAGE'] = 'C'
|
env['LANGUAGE'] = 'C'
|
||||||
env['LANG'] = 'C'
|
env['LANG'] = 'C'
|
||||||
env['LC_ALL'] = 'C'
|
env['LC_ALL'] = 'C'
|
||||||
|
|
||||||
out = sp.Popen(cmd, stdout=sp.PIPE, stderr=sp.PIPE, env=env).communicate()[0]
|
out = sp.Popen(cmd, stdout=sp.PIPE, stderr=sp.PIPE, env=env).communicate()[0]
|
||||||
return out
|
return out
|
||||||
|
|
||||||
try:
|
try:
|
||||||
out = _minimal_ext_cmd(['git', 'rev-parse', 'HEAD'])
|
out = _minimal_ext_cmd(['git', 'rev-parse', 'HEAD'])
|
||||||
GIT_REVISION = out.strip().decode('ascii')
|
GIT_REVISION = out.strip().decode('ascii')
|
||||||
except OSError:
|
except OSError:
|
||||||
GIT_REVISION = "Unknown"
|
GIT_REVISION = "Unknown"
|
||||||
|
|
||||||
return GIT_REVISION
|
return GIT_REVISION
|
||||||
|
|
||||||
def _get_git_version():
|
def _get_git_version():
|
||||||
cwd = os.getcwd()
|
cwd = os.getcwd()
|
||||||
|
|
||||||
# go to the main directory
|
# go to the main directory
|
||||||
fdir = os.path.dirname(os.path.abspath(__file__))
|
fdir = os.path.dirname(os.path.abspath(__file__))
|
||||||
maindir = os.path.abspath(os.path.join(fdir, ".."))
|
maindir = os.path.abspath(os.path.join(fdir, ".."))
|
||||||
# maindir = fdir # os.path.join(fdir, "..")
|
# maindir = fdir # os.path.join(fdir, "..")
|
||||||
os.chdir(maindir)
|
os.chdir(maindir)
|
||||||
|
|
||||||
# get git version
|
# get git version
|
||||||
res = git_version()
|
res = git_version()
|
||||||
|
|
||||||
# restore the cwd
|
# restore the cwd
|
||||||
os.chdir(cwd)
|
os.chdir(cwd)
|
||||||
return res
|
return res
|
||||||
|
|
||||||
def get_version(build_version=False):
|
def get_version(build_version=False):
|
||||||
if ISRELEASED:
|
if ISRELEASED:
|
||||||
return VERSION
|
return VERSION
|
||||||
|
|
||||||
# unreleased version
|
# unreleased version
|
||||||
GIT_REVISION = _get_git_version()
|
GIT_REVISION = _get_git_version()
|
||||||
|
|
||||||
if build_version:
|
if build_version:
|
||||||
import datetime as dt
|
import datetime as dt
|
||||||
date = dt.date.strftime(dt.datetime.now(), "%Y%m%d%H%M%S")
|
date = dt.date.strftime(dt.datetime.now(), "%Y%m%d%H%M%S")
|
||||||
return VERSION + ".dev" + date
|
return VERSION + ".dev" + date
|
||||||
else:
|
else:
|
||||||
return VERSION + ".dev0+" + GIT_REVISION[:7]
|
return VERSION + ".dev0+" + GIT_REVISION[:7]
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
@@ -0,0 +1,31 @@
|
|||||||
|
[metadata]
|
||||||
|
name = scraibe
|
||||||
|
version = attr: scraibe.__version__
|
||||||
|
author = Jacob Schmieder
|
||||||
|
author_email = Jacob.Schmieder@dbfz.de
|
||||||
|
description = My package description
|
||||||
|
long_description = file: README.md, LICENSE
|
||||||
|
platforms = Linux
|
||||||
|
keywords = transcription speech recognition whisper pyannote audio speech-to-text speech-to-text transcription speech-to-text recognition voice-to-speech
|
||||||
|
license = GPL-3.0
|
||||||
|
classifiers =
|
||||||
|
Development Status :: 3 - Alpha
|
||||||
|
Environment :: GPU :: NVIDIA CUDA :: 11.2
|
||||||
|
License :: OSI Approved :: Open Software License 3.0 (OSL-3.0)
|
||||||
|
Topic :: Scientific/Engineering :: Artificial Intelligence
|
||||||
|
Programming Language :: Python :: 3.8
|
||||||
|
Programming Language :: Python :: 3.9
|
||||||
|
Programming Language :: Python :: 3.10
|
||||||
|
|
||||||
|
[options]
|
||||||
|
zip_safe = False
|
||||||
|
include_package_data = True
|
||||||
|
packages = find:
|
||||||
|
python_requires = >=3.7
|
||||||
|
install_requires =
|
||||||
|
requests
|
||||||
|
importlib-metadata; python_version<"3.8"
|
||||||
|
|
||||||
|
[options.entry_points]
|
||||||
|
console_scripts =
|
||||||
|
executable-name = scraibe.cli:cli
|
||||||
@@ -1,8 +1,9 @@
|
|||||||
|
from calendar import c
|
||||||
import pkg_resources
|
import pkg_resources
|
||||||
import os
|
import os
|
||||||
from setuptools import setup, find_packages
|
from setuptools import setup, find_packages
|
||||||
|
|
||||||
module_name = "autotranscript"
|
module_name = "scraibe"
|
||||||
github_url = "https://github.com/JSchmie/autotranscript"
|
github_url = "https://github.com/JSchmie/autotranscript"
|
||||||
|
|
||||||
file_dir = os.path.dirname(os.path.realpath(__file__))
|
file_dir = os.path.dirname(os.path.realpath(__file__))
|
||||||
@@ -18,7 +19,7 @@ with open(verfile, "r") as fp:
|
|||||||
|
|
||||||
############### setup ###############
|
############### setup ###############
|
||||||
|
|
||||||
build_version = "AUTOTRANSCRIPT_BUILD" in os.environ
|
build_version = "SCRAIBE_BUILD" in os.environ
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
|
|
||||||
@@ -36,11 +37,24 @@ if __name__ == "__main__":
|
|||||||
'https://download.pytorch.org/whl/cu113',
|
'https://download.pytorch.org/whl/cu113',
|
||||||
],
|
],
|
||||||
url= github_url,
|
url= github_url,
|
||||||
license='',
|
|
||||||
|
license='GPL-3',
|
||||||
author='Jacob Schmieder',
|
author='Jacob Schmieder',
|
||||||
author_email='Jacob.Schmieder@dbfz.de',
|
author_email='Jacob.Schmieder@dbfz.de',
|
||||||
description='Transcription tool for audio files based on Whisper and Pyannote',
|
description='Transcription tool for audio files based on Whisper and Pyannote',
|
||||||
package_data={ "header" : ["app/header.html"], "logo" : ["app/Logo_KIDA_bmel_green.svg"]},
|
classifiers=[
|
||||||
|
'Development Status :: 3 - Alpha',
|
||||||
|
'Environment :: GPU :: NVIDIA CUDA :: 11.2',
|
||||||
|
'License :: OSI Approved :: Open Software License 3.0 (OSL-3.0)',
|
||||||
|
'Topic :: Scientific/Engineering :: Artificial Intelligence',
|
||||||
|
'Programming Language :: Python :: 3.8',
|
||||||
|
'Programming Language :: Python :: 3.9',
|
||||||
|
'Programming Language :: Python :: 3.10'],
|
||||||
|
keywords = ['transcription', 'speech recognition', 'whisper', 'pyannote', 'audio',
|
||||||
|
'speech-to-text', 'speech-to-text transcription', 'speech-to-text recognition',
|
||||||
|
'voice-to-speech'],
|
||||||
|
package_data={'scraibe.app' : ["*.html", "*.svg"]},
|
||||||
entry_points={'console_scripts':
|
entry_points={'console_scripts':
|
||||||
['autotranscript = autotranscript.cli:cli']}
|
['scraibe = scraibe.cli:cli']}
|
||||||
|
|
||||||
)
|
)
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
import pytest
|
import pytest
|
||||||
from autotranscript import Transcriber
|
from scraibe import Transcriber
|
||||||
from unittest.mock import patch, mock_open
|
from unittest.mock import patch, mock_open
|
||||||
import os
|
import os
|
||||||
|
|
||||||
@@ -55,7 +55,7 @@ def test_save_transcript_to_file(transcriber):
|
|||||||
|
|
||||||
# Test Diaraization class
|
# Test Diaraization class
|
||||||
|
|
||||||
from autotranscript import Diariser
|
from scraibe import Diariser
|
||||||
|
|
||||||
@pytest.fixture
|
@pytest.fixture
|
||||||
def diarisation():
|
def diarisation():
|
||||||
@@ -83,7 +83,7 @@ def test_diarisation(diarisation):
|
|||||||
|
|
||||||
# Test AudioProcessor
|
# Test AudioProcessor
|
||||||
|
|
||||||
from autotranscript import AudioProcessor , TorchAudioProcessor
|
from scraibe import AudioProcessor , TorchAudioProcessor
|
||||||
|
|
||||||
|
|
||||||
def test_AudioProcessor_init():
|
def test_AudioProcessor_init():
|
||||||
@@ -1,38 +0,0 @@
|
|||||||
# import os
|
|
||||||
# import sys
|
|
||||||
# import traceback
|
|
||||||
|
|
||||||
# class TracePrints(object):
|
|
||||||
# def __init__(self):
|
|
||||||
# self.stdout = sys.stdout
|
|
||||||
# def write(self, s):
|
|
||||||
# self.stdout.write("Writing %r\n" % s)
|
|
||||||
# traceback.print_stack(file=self.stdout)
|
|
||||||
|
|
||||||
# sys.stdout = TracePrints()
|
|
||||||
|
|
||||||
# os.environ["PYANNOTE_CACHE"] = os.path.expanduser("~/PycharmProjects/autotranscript/autotranscript/models/pyannote")
|
|
||||||
# import os
|
|
||||||
|
|
||||||
# os.environ['TRANSFORMERS_CACHE'] = os.path.expanduser("~/PycharmProjects/autotranscript/autotranscript/models")
|
|
||||||
# os.environ['HF_HOME'] = os.path.expanduser("~/PycharmProjects/autotranscript/autotranscript/models")
|
|
||||||
|
|
||||||
|
|
||||||
from autotranscript import AutoTranscribe
|
|
||||||
|
|
||||||
model = AutoTranscribe()
|
|
||||||
|
|
||||||
text = model.transcribe("test.mp4")
|
|
||||||
|
|
||||||
print("Transcription:\n")
|
|
||||||
print(text)
|
|
||||||
|
|
||||||
|
|
||||||
# from autotranscript.misc import *
|
|
||||||
# import os
|
|
||||||
|
|
||||||
# print(os.path.exists(CACHE_DIR))
|
|
||||||
# print(os.path.exists(WHISPER_DEFAULT_PATH))
|
|
||||||
# print(os.path.exists(PYANNOTE_DEFAULT_PATH))
|
|
||||||
|
|
||||||
# print(os.path.exists(PYANNOTE_DEFAULT_CONFIG))
|
|
||||||
Reference in New Issue
Block a user