removed docs to aviod conflict

This commit is contained in:
Jaikinator
2023-09-22 18:44:24 +02:00
24 changed files with 287 additions and 453 deletions
Binary file not shown.

After

Width:  |  Height:  |  Size: 131 KiB

-173
View File
@@ -1,173 +0,0 @@
# `ScrAIbe: Streamlined Conversation Recording with Automated Intelligence Based Environment`
`ScrAIbe` is a state-of-the-art, [PyTorch](https://pytorch.org/) based multilingual speech-to-text framework to generate fully automated transcriptions.
Beyond transcription, ScrAIbe supports advanced functions, such as speaker diarization and speaker recognition.
Designed as a comprehensive AI toolkit, it uses multiple AI models:
- [whisper](https://github.com/openai/whisper): A general-purpose speech recognition model.
- [payannote-audio](https://github.com/pyannote/pyannote-audio): An open-source toolkit for speaker diarization.
The framework utilizes a PyanNet-inspired pipeline with the `Pyannote` library for speaker diarization and `VoxCeleb` for speaker embedding.
During post-diarization, each audio segment is processed by the OpenAI `Whisper` model, in a transformer encoder-decoder structure. Initially, a CNN mitigates noise and enhances speech. Before transcription, `VoxLingua` dentifies the language segment, facilitating Whisper's role in both transcription and text translation.
The following graphic illustates the whole pipeline:
![Pipeline](Pictures/pipeline.png#gh-dark-mode-only)
![Pipeline](Pictures/pipeline_light.png#gh-light-mode-only)
## Install `ScrAIbe` :
The following command will pull and install the latest commit from this repository, along with its Python dependencies.
pip install git+https://github.com/JSchmie/autotranscript.git
- **Python version**: Python 3.8
- **PyTorch version**: Python 1.11.0
- **CUDA version**: Cuda-toolkit 11.3.1
Important: For the `Pyannote` model you need to be granted access in Hugging Face.
Check the [Pyannote model page](https://huggingface.co/pyannote/speaker-diarization) to get access to the model.
Additionally, you need to generate a [Hugging Face token](https://huggingface.co/docs/hub/security-tokens).
## Usage
We've developed ScrAIbe with several access points to cater to diverse user needs.
### Python usage
It enables full control over the functionalities as well as process customization.
Some usage examples:
- Usage of `AutoTranscribe`, core of the transcription system, for performing trancription and diarization of audio files.
```python
from scraibe import AutoTranscribe
model = AutoTranscribe()
text = model.transcribe("audio.wav")
print(f"Transcription: \n{text}")
```
- Usage of `Diariser`, responsible for identifying
and segmenting individual speakers from a given audio file.
```python
from scraibe import Diariser
model = Diariser.load_model()
diarisation_output = model.diarization("audio.wav")
```
- Usage of `Transcriber`, for transcribing audio files and saving the transcription afterwards.
```python
from scraibe import Transcriber
transcriber = Transcriber.load_model()
transcript = transcriber.transcribe("audio.wav")
transcriber.save_transcript(transcript, "path/to/save.txt")
```
Refer to [whisper](https://github.com/openai/whisper) and [payannote-audio](https://github.com/pyannote/pyannote-audio) for further options.
### Command-line usage
You can also run ScrAIbe in a [Gradio App](https://github.com/gradio-app/gradio) interface using the following command-line:
scraibe audio.wav
Some example of important functionalities are:
- `--task`: Task to be performed, either transcription, diarization or translation into English. Default is transcription.
- `--hf-token`: Personal `Hugging Face` token.
- `--server-name`: Name of the Web Server. If empty 127.0.0.1 or 0.0.0.0 will be used.
- `--port`: To run the Gradio app. The default is 7860.
- `--whisper-model-name`: Name of the [whisper](https://github.com/openai/whisper) model to be used. Default is `medium`.
Run the following to view all available options:
scraibe -h
### Running a Docker container
After you have installed Docker, you can execute the following commands in the terminal.
```
sudo docker build . --build-arg="hf_token=[enter your HuggingFace token] " -t [image name]
sudo docker run -it -p 7860:7860 --name [container name][image name] --hf_token [enter your HuggingFace token] --start_server
```
- `-p`: Flag for connecting the container interal port to the port on your local machine.
- `--hf_token`: Flag for entering your personal HuggingFace token in the container.
- `--start_server`: Command to start the Gradio App.
Then click the following link to run the app:
http://0.0.0.0:7860
- Enabling GPU usage
```
sudo docker run -it -p 7860:7860 --gpus 'all,capabilities=utility' --name [container name][image name] --hf_token [enter your HuggingFace token] --start_server
```
For further guidance check: https://blog.roboflow.com/use-the-gpu-in-docker/
## Documentation
For further insights check the [documentation page](https://cristinaortizcruz.github.io/Test/).
## Contributions
We are happy for any interest in contributing and about feedback: In order to do that, create an issue with your feedback or feel free to contact us.
## Roadmap
The following milestones are planned for further releases of ScrAIbe:
- Model quantization
Quantization to empower memory and computational efficiency.
- Model fine-tuning
In order to be able to cover a variety of linguistic phenomena.
For example, currently ScrAIbe is able to transcribe word by word, but ignores filler words or speech pauses.
These phenomena can be addressed by fine-tuning with the corresponding data.
- Implementation of LLMs
One example is the implementation of a summarization or extraction model, which enables ScrAIbe to automatically summarize or retrieve the key information out of a generated transcription, which could be the minutes of a meeting.
- Executable for Windows
## Contact
For queries contact [Jacob Schmieder](Jacob.Schmieder@dbfz.de)
## License
ScrAIbe is licensed under GNU General Public License.
## Acknowledgments
Special thanks go to the KIDA project and the BMEL (Bundesministerium für Ernährung und Landwirtschaft), especially to the AI Consultancy Team and the Infrastructure Team.
![KIDA](Pictures/kida_dark.png#gh-dark-mode-only)   ![BMEL](Pictures/BMEL_dark.png#gh-dark-mode-only)      ![DBFZ](Pictures/DBFZ_dark.png#gh-dark-mode-only)       ![MRI](Pictures/MRI.png#gh-dark-mode-only)
![KIDA](Pictures/kida.png#gh-light-mode-only)   ![BMEL](Pictures/BMEL.jpg#gh-light-mode-only)      ![DBFZ](Pictures/DBFZ.png#gh-light-mode-only)       ![MRI](Pictures/MRI.png#gh-light-mode-only)
-101
View File
@@ -1,101 +0,0 @@
from dash import Dash, dcc, html, dash_table, Input, Output, State, callback
import base64
from autotranscript.app.qtfaststart import process
from autotranscript import AutoTranscribe
import io
import subprocess as sp
import numpy as np
from autotranscript.audio import SAMPLE_RATE
# Setup auto-transcript
autot = AutoTranscribe() # whisper_model="tiny", whisper_kwargs={"local" : False}
# Setup FFmpeg
PROBLEMATIC_FILE_TYPES : tuple = "mov","mp4","m4a","3gp","3g2","mj2"
# Setup Dash
external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']
app = Dash(__name__, external_stylesheets=external_stylesheets)
app.layout = html.Div([
dcc.Upload(
id='upload-data',
children=html.Div([
'Drag and Drop or ',
html.A('Select Files')
]),
style={
'width': '100%',
'height': '60px',
'lineHeight': '60px',
'borderWidth': '1px',
'borderStyle': 'dashed',
'borderRadius': '5px',
'textAlign': 'center',
'margin': '10px'
},
# Allow multiple files to be uploaded
multiple=True
),
html.Div(id='output-data-upload'),
])
def parse_contents(contents, filename, date):
content_type, content_string = contents.split(',')
decoded = base64.b64decode(content_string)
file = io.BytesIO(decoded).read()
if filename.endswith(PROBLEMATIC_FILE_TYPES):
# mp4 and other files need to be processed with qtfaststart
# since theire metadata is at the end of the file
# and we need it at the beginning
file = process(file)
cmd = [
"ffmpeg",
"-nostdin",
"-threads", "0",
"-i",'pipe:',
"-f", "s16le",
'-hide_banner',
'-loglevel', 'error',
"-c", "copy",
"-vn",
"-ac", "1",
"-acodec", "pcm_s16le",
"-ar", str(SAMPLE_RATE),
"-"
]
proc = sp.Popen(cmd, stdout=sp.PIPE, stdin=sp.PIPE)
out = proc.communicate(input=file)[0]
out = np.frombuffer(out, np.int16).flatten().astype(np.float32) / 32768.0
out = np.array([out, SAMPLE_RATE])
transcript = str(autot.transcribe(out))
return html.Div([
html.H5(f"File Name: {filename} \n" \
"Transcript: \n"
),
html.P(transcript)
])
@callback(Output('output-data-upload', 'children'),
Input('upload-data', 'contents'),
State('upload-data', 'filename'),
State('upload-data', 'last_modified'))
def update_output(list_of_contents, list_of_names, list_of_dates):
if list_of_contents is not None:
children = [
parse_contents(c, n, d) for c, n, d in
zip(list_of_contents, list_of_names, list_of_dates)]
return children
if __name__ == '__main__':
app.run_server()
View File
+1 -3
View File
@@ -9,8 +9,6 @@ pyannote.pipeline~=2.3
setuptools~=65.6.3 setuptools~=65.6.3
setuptools-rust~=1.5.2 setuptools-rust~=1.5.2
sphinx~=5.0.2
tqdm>=4.65.0 tqdm>=4.65.0
gradio~=3.36.1 gradio~=3.36.1
@@ -22,6 +20,6 @@ torch~=1.11.0
torchvision~=0.12.0 torchvision~=0.12.0
torchaudio~=0.11.0 torchaudio~=0.11.0
#optional: #optional:
#dash~=2.10.2 #sphinx~=5.0.2
+1
View File
@@ -0,0 +1 @@
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

Before

Width:  |  Height:  |  Size: 38 KiB

After

Width:  |  Height:  |  Size: 38 KiB

@@ -3,7 +3,7 @@ Gradio Audio Transcription App.
-------------------------------- --------------------------------
This module provides an interface to transcribe audio files using the This module provides an interface to transcribe audio files using the
AutoTranscribe model. Users can either upload an audio file or record their speech Scraibe model. Users can either upload an audio file or record their speech
live for transcription. The application supports multiple languages and provides live for transcription. The application supports multiple languages and provides
options to specify the number of speakers and the language of the audio. options to specify the number of speakers and the language of the audio.
@@ -20,7 +20,7 @@ Gradio Audio Transcription App.
-------------------------------- --------------------------------
This module provides an interface to transcribe audio files using the This module provides an interface to transcribe audio files using the
AutoTranscribe model. Users can either upload an audio file or record their speech Scraibe model. Users can either upload an audio file or record their speech
live for transcription. The application supports multiple languages and provides live for transcription. The application supports multiple languages and provides
options to specify the number of speakers and the language of the audio. options to specify the number of speakers and the language of the audio.
@@ -33,10 +33,13 @@ Usage:
""" """
import json import json
import os
from tkinter import CURRENT
import gradio as gr import gradio as gr
from autotranscript import AutoTranscribe, Transcript from tqdm import tqdm
from scraibe import Scraibe, Transcript
theme = gr.themes.Soft( theme = gr.themes.Soft(
primary_hue="green", primary_hue="green",
@@ -59,17 +62,19 @@ LANGUAGES = [
"Vietnamese", "Welsh" "Vietnamese", "Welsh"
] ]
CURRENT_PATH = os.path.dirname(os.path.realpath(__file__))
class GradioTranscriptionInterface: class GradioTranscriptionInterface:
""" """
Interface handling the interaction between Gradio UI and the Audio Transcription system. Interface handling the interaction between Gradio UI and the Audio Transcription system.
""" """
def __init__(self, model: AutoTranscribe): def __init__(self, model: Scraibe):
""" """
Initializes the GradioTranscriptionInterface with a transcription model. Initializes the GradioTranscriptionInterface with a transcription model.
Args: Args:
model (AutoTranscribe): Model responsible for audio transcription tasks. model (Scraibe): Model responsible for audio transcription tasks.
""" """
self.model = model self.model = model
@@ -78,7 +83,7 @@ class GradioTranscriptionInterface:
translation : bool, translation : bool,
language : str): language : str):
""" """
Shortcut method for the AutoTranscribe task. Shortcut method for the Scraibe task.
Returns: Returns:
tuple: Transcribed text (str), JSON output (dict) tuple: Transcribed text (str), JSON output (dict)
@@ -89,14 +94,44 @@ class GradioTranscriptionInterface:
"language": language if language != "None" else None, "language": language if language != "None" else None,
"task": 'translate' if translation else None "task": 'translate' if translation else None
} }
if isinstance(source, str):
try: try:
result = self.model.autotranscribe(source, **kwargs) result = self.model.autotranscribe(source, **kwargs)
except ValueError: except ValueError:
raise gr.Error("Couldn't detect any speech in the provided audio. \ raise gr.Error("Couldn't detect any speech in the provided audio. \
Please try again!") Please try again!")
return str(result), result.get_json() return str(result), result.get_json()
elif isinstance(source, list):
source_names = [s.split("/")[-1] for s in source]
result = []
for s in tqdm(source, total=len(source),desc = "Transcribing audio files"):
try:
res = self.model.autotranscribe(s, **kwargs)
except ValueError:
_name = s.split("/")[-1]
res = f"NO TRANSCRIPT FOUND FOR {_name}"
gr.Warning(f"Couldn't detect any speech in {_name} will skip this file.")
result.append(res)
out = ''
out_dict = {}
for i, r in enumerate(result):
out += f"TRANSCRIPT {i} FOR ({source_names[i]}):\n\n"
out += str(r)
out += "\n\n"
if isinstance(r, str):
out_dict[source_names[i]] = r
else:
out_dict[source_names[i]] = r.get_dict()
return out, json.dumps(out_dict, indent=4)
else:
raise gr.Error("Please provide a valid audio file.")
def transcribe(self, source, translation, language): def transcribe(self, source, translation, language):
""" """
@@ -110,9 +145,29 @@ class GradioTranscriptionInterface:
"task": 'translate' if translation == "Yes" else None "task": 'translate' if translation == "Yes" else None
} }
if isinstance(source, str):
result = self.model.transcribe(source, **kwargs) result = self.model.transcribe(source, **kwargs)
return str(result) return str(result)
elif isinstance(source, list):
source_names = [s.split("/")[-1] for s in source]
result = []
for s in tqdm(source, total=len(source),desc = "Transcribing audio files"):
res = self.model.transcribe(s, **kwargs)
result.append(res)
out = ''
for i, res in enumerate(result):
out += f"TRANSCRIPT {i} FOR ({source_names[i]}):\n\n"
out += str(res)
out += "\n\n"
return out
else:
raise gr.Error("Please provide a valid audio file.")
def perform_diarisation(self, source, num_speakers): def perform_diarisation(self, source, num_speakers):
""" """
Shortcut method for the Diarisation task. Shortcut method for the Diarisation task.
@@ -124,22 +179,44 @@ class GradioTranscriptionInterface:
"num_speakers": num_speakers if num_speakers != 0 else None, "num_speakers": num_speakers if num_speakers != 0 else None,
} }
if isinstance(source, str):
try: try:
result = self.model.diarization(source, **kwargs) result = self.model.diarization(source, **kwargs)
except ValueError: except ValueError:
raise gr.Error("Couldn't detect any speech in the provided audio. \ raise gr.Error("Couldn't detect any speech in the provided audio. \
Please try again!") Please try again!")
return json.dumps(result, indent=2) return json.dumps(result, indent=2)
elif isinstance(source, list):
source_names = [s.split("/")[-1] for s in source]
result = []
for s in tqdm(source, total=len(source),desc = "Performing diarisation"):
try:
res = self.model.diarization(s, **kwargs)
except ValueError:
res = f"NO DIARISATION FOUND FOR {s}"
gr.Warning(f"Couldn't detect any speech in {s} will skip this file.")
result.append(res)
out = {}
for i, res in enumerate(result):
out[source_names[i]] = res
return json.dumps(out, indent=4)
else:
gr.Error("Please provide a valid audio file.")
#### ####
# Gradio Interface # Gradio Interface
#### ####
def gradio_Interface(model : AutoTranscribe = None): def gradio_Interface(model : Scraibe = None):
if model is None: if model is None:
model = AutoTranscribe() model = Scraibe()
pipe = GradioTranscriptionInterface(model) pipe = GradioTranscriptionInterface(model)
@@ -197,7 +274,7 @@ def gradio_Interface(model : AutoTranscribe = None):
gr.update(visible = True), gr.update(visible = True),
gr.update(visible = False, value = None)) gr.update(visible = False, value = None))
elif choice == "File": elif choice == "File or Files":
return (gr.update(visible = False, value = None), return (gr.update(visible = False, value = None),
gr.update(visible = False, value = None), gr.update(visible = False, value = None),
@@ -205,11 +282,25 @@ def gradio_Interface(model : AutoTranscribe = None):
gr.update(visible = False, value = None), gr.update(visible = False, value = None),
gr.update(visible = True)) gr.update(visible = True))
def run_scribe(task, num_speakers, translate, language, audio1, audio2, video1, video2, file_in, progress = gr.Progress(track_tqdm= True)): def run_scribe(task,
num_speakers,
translate,
language,
audio1,
audio2,
video1,
video2,
file_in,
progress = gr.Progress(track_tqdm= True)):
# get *args which are not None # get *args which are not None
progress(0, desc='Starting task...') progress(0, desc='Starting task...')
source = audio1 or audio2 or video1 or video2 or file_in source = audio1 or audio2 or video1 or video2 or file_in
if isinstance(source, list):
source = [s.name for s in source]
if len(source) == 1:
source = source[0]
if task == 'Auto Transcribe': if task == 'Auto Transcribe':
out_str , out_json = pipe.auto_transcribe(source = source, out_str , out_json = pipe.auto_transcribe(source = source,
@@ -217,10 +308,16 @@ def gradio_Interface(model : AutoTranscribe = None):
translation = translate, translation = translate,
language = language) language = language)
if isinstance(source, str):
return (gr.update(value = out_str, visible = True), return (gr.update(value = out_str, visible = True),
gr.update(value = out_json, visible = True), gr.update(value = out_json, visible = True),
gr.update(visible = True), gr.update(visible = True),
gr.update(visible = True)) gr.update(visible = True))
else:
return (gr.update(value = out_str, visible = True),
gr.update(value = out_json, visible = True),
gr.update(visible = False),
gr.update(visible = False))
elif task == 'Transcribe': elif task == 'Transcribe':
@@ -255,7 +352,8 @@ def gradio_Interface(model : AutoTranscribe = None):
with gr.Blocks(theme=theme,title='ScrAIbe: Automatic Audio Transcription') as demo: with gr.Blocks(theme=theme,title='ScrAIbe: Automatic Audio Transcription') as demo:
# Define components # Define components
header = open("header.html", "r").read() hname = os.path.join(CURRENT_PATH, "header.html")
header = open(hname, "r").read()
gr.HTML(header, visible= True, show_label=False) gr.HTML(header, visible= True, show_label=False)
with gr.Row(): with gr.Row():
@@ -279,7 +377,7 @@ def gradio_Interface(model : AutoTranscribe = None):
leave it at None.", visible= True) leave it at None.", visible= True)
input = gr.Radio(["Upload Audio", "Record Audio", "Upload Video","Record Video" input = gr.Radio(["Upload Audio", "Record Audio", "Upload Video","Record Video"
,"File"], label="Input Type", value="Upload Audio") ,"File or Files"], label="Input Type", value="Upload Audio")
audio1 = gr.Audio(source="upload", type="filepath", label="Upload Audio", audio1 = gr.Audio(source="upload", type="filepath", label="Upload Audio",
interactive= True, visible= True) interactive= True, visible= True)
@@ -289,7 +387,7 @@ def gradio_Interface(model : AutoTranscribe = None):
interactive= True, visible= False) interactive= True, visible= False)
video2 = gr.Video(source="webcam", label="Record Video", type="filepath", video2 = gr.Video(source="webcam", label="Record Video", type="filepath",
interactive= True, visible= False) interactive= True, visible= False)
file_in = gr.File(label="Upload File", interactive= True, visible= False) file_in = gr.Files(label="Upload File or Files", interactive= True, visible= False)
submit = gr.Button() submit = gr.Button()
@@ -1,5 +1,5 @@
""" """
AutoTranscribe Class Scraibe Class
-------------------- --------------------
This class serves as the core of the transcription system, responsible for handling This class serves as the core of the transcription system, responsible for handling
@@ -12,15 +12,15 @@ By encapsulating the complexities of underlying models, it allows for straightfo
integration into various applications, ranging from transcription services to voice assistants. integration into various applications, ranging from transcription services to voice assistants.
Available Classes: Available Classes:
- AutoTranscribe: Main class for performing transcription and diarization. - Scraibe: Main class for performing transcription and diarization.
Includes methods for loading models, processing audio files, Includes methods for loading models, processing audio files,
and formatting the transcription output. and formatting the transcription output.
Usage: Usage:
from .autotranscribe import AutoTranscribe from scraibe import Scraibe
model = AutoTranscribe(whisper_model="path/to/whisper/model", dia_model="path/to/diarisation/model") model = Scraibe()
transcript = model.transcribe("path/to/audiofile.wav") transcript = model.autotranscribe("path/to/audiofile.wav")
""" """
# Standard Library Imports # Standard Library Imports
@@ -45,9 +45,9 @@ from .transcript_exporter import Transcript
DiarisationType = TypeVar('DiarisationType') DiarisationType = TypeVar('DiarisationType')
class AutoTranscribe: class Scraibe:
""" """
AutoTranscribe is a class responsible for managing the transcription and diarization of audio files. Scraibe is a class responsible for managing the transcription and diarization of audio files.
It serves as the core of the transcription system, incorporating pretrained models It serves as the core of the transcription system, incorporating pretrained models
for speech-to-text (such as Whisper) and speaker diarization (such as pyannote.audio), for speech-to-text (such as Whisper) and speaker diarization (such as pyannote.audio),
allowing for comprehensive audio processing. allowing for comprehensive audio processing.
@@ -57,7 +57,7 @@ class AutoTranscribe:
diariser (Diariser): The diariser object to handle diarization. diariser (Diariser): The diariser object to handle diarization.
Methods: Methods:
__init__: Initializes the AutoTranscribe class with appropriate models. __init__: Initializes the Scraibe class with appropriate models.
transcribe: Transcribes an audio file using the whisper model and pyannote diarization model. transcribe: Transcribes an audio file using the whisper model and pyannote diarization model.
remove_audio_file: Removes the original audio file to avoid disk space issues or ensure data privacy. remove_audio_file: Removes the original audio file to avoid disk space issues or ensure data privacy.
get_audio_file: Gets an audio file as an AudioProcessor object. get_audio_file: Gets an audio file as an AudioProcessor object.
@@ -66,7 +66,7 @@ class AutoTranscribe:
whisper_model: Union[bool, str, whisper] = None, whisper_model: Union[bool, str, whisper] = None,
dia_model : Union[bool, str, DiarisationType] = None, dia_model : Union[bool, str, DiarisationType] = None,
**kwargs) -> None: **kwargs) -> None:
"""Initializes the AutoTranscribe class. """Initializes the Scraibe class.
Args: Args:
whisper_model (Union[bool, str, whisper], optional): whisper_model (Union[bool, str, whisper], optional):
@@ -92,7 +92,11 @@ class AutoTranscribe:
else: else:
self.diariser = dia_model self.diariser = dia_model
print("AutoTranscribe initialized all models successfully loaded.") if kwargs.get("verbose"):
print("Scraibe initialized all models successfully loaded.")
self.verbose = True
else:
self.verbose = False
def autotranscribe(self, audio_file : Union[str, torch.Tensor, ndarray], def autotranscribe(self, audio_file : Union[str, torch.Tensor, ndarray],
remove_original : bool = False, remove_original : bool = False,
@@ -112,7 +116,8 @@ class AutoTranscribe:
Transcript: A Transcript object containing the transcription, Transcript: A Transcript object containing the transcription,
which can be exported to different formats. which can be exported to different formats.
""" """
if kwargs.get("verbose"):
self.verbose = kwargs.get("verbose")
# Get audio file as an AudioProcessor object # Get audio file as an AudioProcessor object
audio_file = self.get_audio_file(audio_file) audio_file = self.get_audio_file(audio_file)
@@ -122,11 +127,11 @@ class AutoTranscribe:
"sample_rate": audio_file.sr "sample_rate": audio_file.sr
} }
if self.verbose:
print("Starting diarisation.") print("Starting diarisation.")
diarisation = self.diariser.diarization(dia_audio, **kwargs) diarisation = self.diariser.diarization(dia_audio, **kwargs)
if not diarisation["segments"]: if not diarisation["segments"]:
print("No segments found. Try to run transcription without diarisation.") print("No segments found. Try to run transcription without diarisation.")
@@ -138,6 +143,7 @@ class AutoTranscribe:
return Transcript(final_transcript) return Transcript(final_transcript)
if self.verbose:
print("Diarisation finished. Starting transcription.") print("Diarisation finished. Starting transcription.")
audio_file.sr = torch.Tensor([audio_file.sr]).to(audio_file.waveform.device) audio_file.sr = torch.Tensor([audio_file.sr]).to(audio_file.waveform.device)
@@ -145,9 +151,7 @@ class AutoTranscribe:
# Transcribe each segment and store the results # Transcribe each segment and store the results
final_transcript = dict() final_transcript = dict()
for i in trange(len(diarisation["segments"]), desc= "Transcribing", disable = not self.verbose):
for i in trange(len(diarisation["segments"]), desc= "Transcribing"):
seg = diarisation["segments"][i] seg = diarisation["segments"][i]
@@ -283,4 +287,4 @@ class AutoTranscribe:
return audio_file return audio_file
def __repr__(self): def __repr__(self):
return f"AutoTranscribe(transcriber={self.transcriber}, diariser={self.diariser})" return f"Scraibe(transcriber={self.transcriber}, diariser={self.diariser})"
+5 -7
View File
@@ -1,5 +1,5 @@
""" """
Command-Line Interface (CLI) for the AutoTranscribe class, Command-Line Interface (CLI) for the Scraibe class,
allowing for user interaction to transcribe and diarize audio files. allowing for user interaction to transcribe and diarize audio files.
The function includes arguments for specifying the audio files, model paths, The function includes arguments for specifying the audio files, model paths,
output formats, and other options necessary for transcription. output formats, and other options necessary for transcription.
@@ -8,9 +8,7 @@ import os
from argparse import ArgumentParser, ArgumentDefaultsHelpFormatter from argparse import ArgumentParser, ArgumentDefaultsHelpFormatter
import json import json
from sympy import use from .autotranscript import Scraibe
from .autotranscript import AutoTranscribe
from .app.gradio_app import gradio_Interface from .app.gradio_app import gradio_Interface
from whisper.tokenizer import LANGUAGES , TO_LANGUAGE_CODE from whisper.tokenizer import LANGUAGES , TO_LANGUAGE_CODE
@@ -20,12 +18,12 @@ from torch import set_num_threads
def cli(): def cli():
""" """
Command-Line Interface (CLI) for the AutoTranscribe class, allowing for user interaction to transcribe Command-Line Interface (CLI) for the Scraibe class, allowing for user interaction to transcribe
and diarize audio files. The function includes arguments for specifying the audio files, model paths, and diarize audio files. The function includes arguments for specifying the audio files, model paths,
output formats, and other options necessary for transcription. output formats, and other options necessary for transcription.
This function can be executed from the command line to perform transcription tasks, providing a This function can be executed from the command line to perform transcription tasks, providing a
user-friendly way to access the AutoTranscribe class functionalities. user-friendly way to access the Scraibe class functionalities.
""" """
def str2bool(string): def str2bool(string):
@@ -115,7 +113,7 @@ def cli():
if arg_dict["whisper_model_directory"]: if arg_dict["whisper_model_directory"]:
class_kwargs["download_root"] = arg_dict.pop("whisper_model_directory") class_kwargs["download_root"] = arg_dict.pop("whisper_model_directory")
model = AutoTranscribe(**class_kwargs) model = Scraibe(**class_kwargs)
if arg_dict["audio_files"]: if arg_dict["audio_files"]:
@@ -14,7 +14,6 @@ WHISPER_DEFAULT_PATH = os.path.join(CACHE_DIR, "whisper")
PYANNOTE_DEFAULT_PATH = os.path.join(CACHE_DIR, "pyannote") PYANNOTE_DEFAULT_PATH = os.path.join(CACHE_DIR, "pyannote")
PYANNOTE_DEFAULT_CONFIG = os.path.join(PYANNOTE_DEFAULT_PATH, "config.yaml") PYANNOTE_DEFAULT_CONFIG = os.path.join(PYANNOTE_DEFAULT_PATH, "config.yaml")
def config_diarization_yaml(file_path: str, path_to_segmentation: str = None) -> None: def config_diarization_yaml(file_path: str, path_to_segmentation: str = None) -> None:
"""Configure diarization pipeline from a YAML file. """Configure diarization pipeline from a YAML file.
@@ -90,8 +90,8 @@ class Transcriber:
kwargs = self._get_whisper_kwargs(**kwargs) kwargs = self._get_whisper_kwargs(**kwargs)
if "verbose" not in kwargs: if not kwargs.get("verbose"):
kwargs["verbose"] = False kwargs["verbose"] = None
result = self.model.transcribe(audio, *args, **kwargs) result = self.model.transcribe(audio, *args, **kwargs)
return result["text"] return result["text"]
@@ -173,6 +173,9 @@ class Transcriber:
if (task := kwargs.get("task")): if (task := kwargs.get("task")):
whisper_kwargs["task"] = task whisper_kwargs["task"] = task
if (language := kwargs.get("language")):
whisper_kwargs["language"] = language
return whisper_kwargs return whisper_kwargs
def __repr__(self) -> str: def __repr__(self) -> str:
+31
View File
@@ -0,0 +1,31 @@
[metadata]
name = scraibe
version = attr: scraibe.__version__
author = Jacob Schmieder
author_email = Jacob.Schmieder@dbfz.de
description = My package description
long_description = file: README.md, LICENSE
platforms = Linux
keywords = transcription speech recognition whisper pyannote audio speech-to-text speech-to-text transcription speech-to-text recognition voice-to-speech
license = GPL-3.0
classifiers =
Development Status :: 3 - Alpha
Environment :: GPU :: NVIDIA CUDA :: 11.2
License :: OSI Approved :: Open Software License 3.0 (OSL-3.0)
Topic :: Scientific/Engineering :: Artificial Intelligence
Programming Language :: Python :: 3.8
Programming Language :: Python :: 3.9
Programming Language :: Python :: 3.10
[options]
zip_safe = False
include_package_data = True
packages = find:
python_requires = >=3.7
install_requires =
requests
importlib-metadata; python_version<"3.8"
[options.entry_points]
console_scripts =
executable-name = scraibe.cli:cli
+19 -5
View File
@@ -1,8 +1,9 @@
from calendar import c
import pkg_resources import pkg_resources
import os import os
from setuptools import setup, find_packages from setuptools import setup, find_packages
module_name = "autotranscript" module_name = "scraibe"
github_url = "https://github.com/JSchmie/autotranscript" github_url = "https://github.com/JSchmie/autotranscript"
file_dir = os.path.dirname(os.path.realpath(__file__)) file_dir = os.path.dirname(os.path.realpath(__file__))
@@ -18,7 +19,7 @@ with open(verfile, "r") as fp:
############### setup ############### ############### setup ###############
build_version = "AUTOTRANSCRIPT_BUILD" in os.environ build_version = "SCRAIBE_BUILD" in os.environ
if __name__ == "__main__": if __name__ == "__main__":
@@ -36,11 +37,24 @@ if __name__ == "__main__":
'https://download.pytorch.org/whl/cu113', 'https://download.pytorch.org/whl/cu113',
], ],
url= github_url, url= github_url,
license='',
license='GPL-3',
author='Jacob Schmieder', author='Jacob Schmieder',
author_email='Jacob.Schmieder@dbfz.de', author_email='Jacob.Schmieder@dbfz.de',
description='Transcription tool for audio files based on Whisper and Pyannote', description='Transcription tool for audio files based on Whisper and Pyannote',
package_data={ "header" : ["app/header.html"], "logo" : ["app/Logo_KIDA_bmel_green.svg"]}, classifiers=[
'Development Status :: 3 - Alpha',
'Environment :: GPU :: NVIDIA CUDA :: 11.2',
'License :: OSI Approved :: Open Software License 3.0 (OSL-3.0)',
'Topic :: Scientific/Engineering :: Artificial Intelligence',
'Programming Language :: Python :: 3.8',
'Programming Language :: Python :: 3.9',
'Programming Language :: Python :: 3.10'],
keywords = ['transcription', 'speech recognition', 'whisper', 'pyannote', 'audio',
'speech-to-text', 'speech-to-text transcription', 'speech-to-text recognition',
'voice-to-speech'],
package_data={'scraibe.app' : ["*.html", "*.svg"]},
entry_points={'console_scripts': entry_points={'console_scripts':
['autotranscript = autotranscript.cli:cli']} ['scraibe = scraibe.cli:cli']}
) )
@@ -1,5 +1,5 @@
import pytest import pytest
from autotranscript import Transcriber from scraibe import Transcriber
from unittest.mock import patch, mock_open from unittest.mock import patch, mock_open
import os import os
@@ -55,7 +55,7 @@ def test_save_transcript_to_file(transcriber):
# Test Diaraization class # Test Diaraization class
from autotranscript import Diariser from scraibe import Diariser
@pytest.fixture @pytest.fixture
def diarisation(): def diarisation():
@@ -83,7 +83,7 @@ def test_diarisation(diarisation):
# Test AudioProcessor # Test AudioProcessor
from autotranscript import AudioProcessor , TorchAudioProcessor from scraibe import AudioProcessor , TorchAudioProcessor
def test_AudioProcessor_init(): def test_AudioProcessor_init():
-38
View File
@@ -1,38 +0,0 @@
# import os
# import sys
# import traceback
# class TracePrints(object):
# def __init__(self):
# self.stdout = sys.stdout
# def write(self, s):
# self.stdout.write("Writing %r\n" % s)
# traceback.print_stack(file=self.stdout)
# sys.stdout = TracePrints()
# os.environ["PYANNOTE_CACHE"] = os.path.expanduser("~/PycharmProjects/autotranscript/autotranscript/models/pyannote")
# import os
# os.environ['TRANSFORMERS_CACHE'] = os.path.expanduser("~/PycharmProjects/autotranscript/autotranscript/models")
# os.environ['HF_HOME'] = os.path.expanduser("~/PycharmProjects/autotranscript/autotranscript/models")
from autotranscript import AutoTranscribe
model = AutoTranscribe()
text = model.transcribe("test.mp4")
print("Transcription:\n")
print(text)
# from autotranscript.misc import *
# import os
# print(os.path.exists(CACHE_DIR))
# print(os.path.exists(WHISPER_DEFAULT_PATH))
# print(os.path.exists(PYANNOTE_DEFAULT_PATH))
# print(os.path.exists(PYANNOTE_DEFAULT_CONFIG))