removed docs to aviod conflict
This commit is contained in:
Binary file not shown.
|
After Width: | Height: | Size: 131 KiB |
@@ -1,173 +0,0 @@
|
||||
|
||||
# `ScrAIbe: Streamlined Conversation Recording with Automated Intelligence Based Environment`
|
||||
|
||||
`ScrAIbe` is a state-of-the-art, [PyTorch](https://pytorch.org/) based multilingual speech-to-text framework to generate fully automated transcriptions.
|
||||
|
||||
Beyond transcription, ScrAIbe supports advanced functions, such as speaker diarization and speaker recognition.
|
||||
|
||||
Designed as a comprehensive AI toolkit, it uses multiple AI models:
|
||||
|
||||
- [whisper](https://github.com/openai/whisper): A general-purpose speech recognition model.
|
||||
- [payannote-audio](https://github.com/pyannote/pyannote-audio): An open-source toolkit for speaker diarization.
|
||||
|
||||
The framework utilizes a PyanNet-inspired pipeline with the `Pyannote` library for speaker diarization and `VoxCeleb` for speaker embedding.
|
||||
|
||||
During post-diarization, each audio segment is processed by the OpenAI `Whisper` model, in a transformer encoder-decoder structure. Initially, a CNN mitigates noise and enhances speech. Before transcription, `VoxLingua` dentifies the language segment, facilitating Whisper's role in both transcription and text translation.
|
||||
|
||||
The following graphic illustates the whole pipeline:
|
||||
|
||||

|
||||

|
||||
|
||||
## Install `ScrAIbe` :
|
||||
|
||||
The following command will pull and install the latest commit from this repository, along with its Python dependencies.
|
||||
|
||||
pip install git+https://github.com/JSchmie/autotranscript.git
|
||||
|
||||
- **Python version**: Python 3.8
|
||||
- **PyTorch version**: Python 1.11.0
|
||||
- **CUDA version**: Cuda-toolkit 11.3.1
|
||||
|
||||
|
||||
Important: For the `Pyannote` model you need to be granted access in Hugging Face.
|
||||
Check the [Pyannote model page](https://huggingface.co/pyannote/speaker-diarization) to get access to the model.
|
||||
|
||||
Additionally, you need to generate a [Hugging Face token](https://huggingface.co/docs/hub/security-tokens).
|
||||
|
||||
## Usage
|
||||
|
||||
We've developed ScrAIbe with several access points to cater to diverse user needs.
|
||||
|
||||
### Python usage
|
||||
|
||||
It enables full control over the functionalities as well as process customization.
|
||||
|
||||
Some usage examples:
|
||||
|
||||
- Usage of `AutoTranscribe`, core of the transcription system, for performing trancription and diarization of audio files.
|
||||
|
||||
```python
|
||||
from scraibe import AutoTranscribe
|
||||
|
||||
model = AutoTranscribe()
|
||||
|
||||
text = model.transcribe("audio.wav")
|
||||
|
||||
print(f"Transcription: \n{text}")
|
||||
|
||||
```
|
||||
- Usage of `Diariser`, responsible for identifying
|
||||
and segmenting individual speakers from a given audio file.
|
||||
|
||||
```python
|
||||
from scraibe import Diariser
|
||||
|
||||
model = Diariser.load_model()
|
||||
|
||||
diarisation_output = model.diarization("audio.wav")
|
||||
|
||||
```
|
||||
- Usage of `Transcriber`, for transcribing audio files and saving the transcription afterwards.
|
||||
|
||||
```python
|
||||
from scraibe import Transcriber
|
||||
|
||||
transcriber = Transcriber.load_model()
|
||||
|
||||
transcript = transcriber.transcribe("audio.wav")
|
||||
|
||||
transcriber.save_transcript(transcript, "path/to/save.txt")
|
||||
|
||||
```
|
||||
|
||||
|
||||
Refer to [whisper](https://github.com/openai/whisper) and [payannote-audio](https://github.com/pyannote/pyannote-audio) for further options.
|
||||
|
||||
### Command-line usage
|
||||
|
||||
You can also run ScrAIbe in a [Gradio App](https://github.com/gradio-app/gradio) interface using the following command-line:
|
||||
|
||||
scraibe audio.wav
|
||||
|
||||
Some example of important functionalities are:
|
||||
|
||||
- `--task`: Task to be performed, either transcription, diarization or translation into English. Default is transcription.
|
||||
- `--hf-token`: Personal `Hugging Face` token.
|
||||
- `--server-name`: Name of the Web Server. If empty 127.0.0.1 or 0.0.0.0 will be used.
|
||||
- `--port`: To run the Gradio app. The default is 7860.
|
||||
|
||||
- `--whisper-model-name`: Name of the [whisper](https://github.com/openai/whisper) model to be used. Default is `medium`.
|
||||
|
||||
|
||||
Run the following to view all available options:
|
||||
|
||||
scraibe -h
|
||||
|
||||
### Running a Docker container
|
||||
|
||||
After you have installed Docker, you can execute the following commands in the terminal.
|
||||
|
||||
```
|
||||
sudo docker build . --build-arg="hf_token=[enter your HuggingFace token] " -t [image name]
|
||||
|
||||
sudo docker run -it -p 7860:7860 --name [container name][image name] --hf_token [enter your HuggingFace token] --start_server
|
||||
|
||||
```
|
||||
- `-p`: Flag for connecting the container interal port to the port on your local machine.
|
||||
- `--hf_token`: Flag for entering your personal HuggingFace token in the container.
|
||||
- `--start_server`: Command to start the Gradio App.
|
||||
|
||||
Then click the following link to run the app:
|
||||
|
||||
http://0.0.0.0:7860
|
||||
|
||||
- Enabling GPU usage
|
||||
|
||||
```
|
||||
sudo docker run -it -p 7860:7860 --gpus 'all,capabilities=utility' --name [container name][image name] --hf_token [enter your HuggingFace token] --start_server
|
||||
```
|
||||
For further guidance check: https://blog.roboflow.com/use-the-gpu-in-docker/
|
||||
|
||||
|
||||
## Documentation
|
||||
|
||||
For further insights check the [documentation page](https://cristinaortizcruz.github.io/Test/).
|
||||
|
||||
## Contributions
|
||||
|
||||
We are happy for any interest in contributing and about feedback: In order to do that, create an issue with your feedback or feel free to contact us.
|
||||
|
||||
## Roadmap
|
||||
|
||||
The following milestones are planned for further releases of ScrAIbe:
|
||||
|
||||
- Model quantization
|
||||
Quantization to empower memory and computational efficiency.
|
||||
|
||||
- Model fine-tuning
|
||||
In order to be able to cover a variety of linguistic phenomena.
|
||||
|
||||
For example, currently ScrAIbe is able to transcribe word by word, but ignores filler words or speech pauses.
|
||||
These phenomena can be addressed by fine-tuning with the corresponding data.
|
||||
|
||||
- Implementation of LLMs
|
||||
One example is the implementation of a summarization or extraction model, which enables ScrAIbe to automatically summarize or retrieve the key information out of a generated transcription, which could be the minutes of a meeting.
|
||||
|
||||
- Executable for Windows
|
||||
|
||||
## Contact
|
||||
|
||||
For queries contact [Jacob Schmieder](Jacob.Schmieder@dbfz.de)
|
||||
|
||||
## License
|
||||
|
||||
ScrAIbe is licensed under GNU General Public License.
|
||||
|
||||
## Acknowledgments
|
||||
|
||||
Special thanks go to the KIDA project and the BMEL (Bundesministerium für Ernährung und Landwirtschaft), especially to the AI Consultancy Team and the Infrastructure Team.
|
||||
|
||||
   
|
||||
|
||||
   
|
||||
@@ -1,101 +0,0 @@
|
||||
from dash import Dash, dcc, html, dash_table, Input, Output, State, callback
|
||||
|
||||
import base64
|
||||
from autotranscript.app.qtfaststart import process
|
||||
from autotranscript import AutoTranscribe
|
||||
import io
|
||||
import subprocess as sp
|
||||
import numpy as np
|
||||
from autotranscript.audio import SAMPLE_RATE
|
||||
|
||||
# Setup auto-transcript
|
||||
autot = AutoTranscribe() # whisper_model="tiny", whisper_kwargs={"local" : False}
|
||||
|
||||
# Setup FFmpeg
|
||||
PROBLEMATIC_FILE_TYPES : tuple = "mov","mp4","m4a","3gp","3g2","mj2"
|
||||
|
||||
|
||||
# Setup Dash
|
||||
external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']
|
||||
|
||||
app = Dash(__name__, external_stylesheets=external_stylesheets)
|
||||
|
||||
app.layout = html.Div([
|
||||
dcc.Upload(
|
||||
id='upload-data',
|
||||
children=html.Div([
|
||||
'Drag and Drop or ',
|
||||
html.A('Select Files')
|
||||
]),
|
||||
style={
|
||||
'width': '100%',
|
||||
'height': '60px',
|
||||
'lineHeight': '60px',
|
||||
'borderWidth': '1px',
|
||||
'borderStyle': 'dashed',
|
||||
'borderRadius': '5px',
|
||||
'textAlign': 'center',
|
||||
'margin': '10px'
|
||||
},
|
||||
# Allow multiple files to be uploaded
|
||||
multiple=True
|
||||
),
|
||||
html.Div(id='output-data-upload'),
|
||||
])
|
||||
|
||||
def parse_contents(contents, filename, date):
|
||||
content_type, content_string = contents.split(',')
|
||||
|
||||
decoded = base64.b64decode(content_string)
|
||||
file = io.BytesIO(decoded).read()
|
||||
|
||||
if filename.endswith(PROBLEMATIC_FILE_TYPES):
|
||||
# mp4 and other files need to be processed with qtfaststart
|
||||
# since theire metadata is at the end of the file
|
||||
# and we need it at the beginning
|
||||
file = process(file)
|
||||
|
||||
cmd = [
|
||||
"ffmpeg",
|
||||
"-nostdin",
|
||||
"-threads", "0",
|
||||
"-i",'pipe:',
|
||||
"-f", "s16le",
|
||||
'-hide_banner',
|
||||
'-loglevel', 'error',
|
||||
"-c", "copy",
|
||||
"-vn",
|
||||
"-ac", "1",
|
||||
"-acodec", "pcm_s16le",
|
||||
"-ar", str(SAMPLE_RATE),
|
||||
"-"
|
||||
]
|
||||
|
||||
proc = sp.Popen(cmd, stdout=sp.PIPE, stdin=sp.PIPE)
|
||||
|
||||
out = proc.communicate(input=file)[0]
|
||||
out = np.frombuffer(out, np.int16).flatten().astype(np.float32) / 32768.0
|
||||
out = np.array([out, SAMPLE_RATE])
|
||||
|
||||
transcript = str(autot.transcribe(out))
|
||||
|
||||
return html.Div([
|
||||
html.H5(f"File Name: {filename} \n" \
|
||||
"Transcript: \n"
|
||||
),
|
||||
html.P(transcript)
|
||||
])
|
||||
|
||||
@callback(Output('output-data-upload', 'children'),
|
||||
Input('upload-data', 'contents'),
|
||||
State('upload-data', 'filename'),
|
||||
State('upload-data', 'last_modified'))
|
||||
def update_output(list_of_contents, list_of_names, list_of_dates):
|
||||
if list_of_contents is not None:
|
||||
children = [
|
||||
parse_contents(c, n, d) for c, n, d in
|
||||
zip(list_of_contents, list_of_names, list_of_dates)]
|
||||
return children
|
||||
|
||||
if __name__ == '__main__':
|
||||
app.run_server()
|
||||
+1
-3
@@ -9,8 +9,6 @@ pyannote.pipeline~=2.3
|
||||
setuptools~=65.6.3
|
||||
setuptools-rust~=1.5.2
|
||||
|
||||
sphinx~=5.0.2
|
||||
|
||||
tqdm>=4.65.0
|
||||
|
||||
gradio~=3.36.1
|
||||
@@ -22,6 +20,6 @@ torch~=1.11.0
|
||||
torchvision~=0.12.0
|
||||
torchaudio~=0.11.0
|
||||
#optional:
|
||||
#dash~=2.10.2
|
||||
#sphinx~=5.0.2
|
||||
|
||||
|
||||
|
||||
@@ -0,0 +1 @@
|
||||
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
|
||||
|
Before Width: | Height: | Size: 38 KiB After Width: | Height: | Size: 38 KiB |
@@ -3,7 +3,7 @@ Gradio Audio Transcription App.
|
||||
--------------------------------
|
||||
|
||||
This module provides an interface to transcribe audio files using the
|
||||
AutoTranscribe model. Users can either upload an audio file or record their speech
|
||||
Scraibe model. Users can either upload an audio file or record their speech
|
||||
live for transcription. The application supports multiple languages and provides
|
||||
options to specify the number of speakers and the language of the audio.
|
||||
|
||||
@@ -20,7 +20,7 @@ Gradio Audio Transcription App.
|
||||
--------------------------------
|
||||
|
||||
This module provides an interface to transcribe audio files using the
|
||||
AutoTranscribe model. Users can either upload an audio file or record their speech
|
||||
Scraibe model. Users can either upload an audio file or record their speech
|
||||
live for transcription. The application supports multiple languages and provides
|
||||
options to specify the number of speakers and the language of the audio.
|
||||
|
||||
@@ -33,10 +33,13 @@ Usage:
|
||||
"""
|
||||
|
||||
import json
|
||||
import os
|
||||
from tkinter import CURRENT
|
||||
|
||||
import gradio as gr
|
||||
from autotranscript import AutoTranscribe, Transcript
|
||||
from tqdm import tqdm
|
||||
|
||||
from scraibe import Scraibe, Transcript
|
||||
|
||||
theme = gr.themes.Soft(
|
||||
primary_hue="green",
|
||||
@@ -59,17 +62,19 @@ LANGUAGES = [
|
||||
"Vietnamese", "Welsh"
|
||||
]
|
||||
|
||||
CURRENT_PATH = os.path.dirname(os.path.realpath(__file__))
|
||||
|
||||
class GradioTranscriptionInterface:
|
||||
"""
|
||||
Interface handling the interaction between Gradio UI and the Audio Transcription system.
|
||||
"""
|
||||
|
||||
def __init__(self, model: AutoTranscribe):
|
||||
def __init__(self, model: Scraibe):
|
||||
"""
|
||||
Initializes the GradioTranscriptionInterface with a transcription model.
|
||||
|
||||
Args:
|
||||
model (AutoTranscribe): Model responsible for audio transcription tasks.
|
||||
model (Scraibe): Model responsible for audio transcription tasks.
|
||||
"""
|
||||
self.model = model
|
||||
|
||||
@@ -78,7 +83,7 @@ class GradioTranscriptionInterface:
|
||||
translation : bool,
|
||||
language : str):
|
||||
"""
|
||||
Shortcut method for the AutoTranscribe task.
|
||||
Shortcut method for the Scraibe task.
|
||||
|
||||
Returns:
|
||||
tuple: Transcribed text (str), JSON output (dict)
|
||||
@@ -89,14 +94,44 @@ class GradioTranscriptionInterface:
|
||||
"language": language if language != "None" else None,
|
||||
"task": 'translate' if translation else None
|
||||
}
|
||||
|
||||
if isinstance(source, str):
|
||||
try:
|
||||
result = self.model.autotranscribe(source, **kwargs)
|
||||
except ValueError:
|
||||
raise gr.Error("Couldn't detect any speech in the provided audio. \
|
||||
Please try again!")
|
||||
|
||||
return str(result), result.get_json()
|
||||
|
||||
elif isinstance(source, list):
|
||||
source_names = [s.split("/")[-1] for s in source]
|
||||
result = []
|
||||
for s in tqdm(source, total=len(source),desc = "Transcribing audio files"):
|
||||
try:
|
||||
res = self.model.autotranscribe(s, **kwargs)
|
||||
except ValueError:
|
||||
_name = s.split("/")[-1]
|
||||
res = f"NO TRANSCRIPT FOUND FOR {_name}"
|
||||
gr.Warning(f"Couldn't detect any speech in {_name} will skip this file.")
|
||||
result.append(res)
|
||||
|
||||
out = ''
|
||||
out_dict = {}
|
||||
for i, r in enumerate(result):
|
||||
out += f"TRANSCRIPT {i} FOR ({source_names[i]}):\n\n"
|
||||
out += str(r)
|
||||
out += "\n\n"
|
||||
|
||||
if isinstance(r, str):
|
||||
out_dict[source_names[i]] = r
|
||||
else:
|
||||
out_dict[source_names[i]] = r.get_dict()
|
||||
|
||||
return out, json.dumps(out_dict, indent=4)
|
||||
|
||||
else:
|
||||
raise gr.Error("Please provide a valid audio file.")
|
||||
|
||||
|
||||
def transcribe(self, source, translation, language):
|
||||
"""
|
||||
@@ -110,9 +145,29 @@ class GradioTranscriptionInterface:
|
||||
"task": 'translate' if translation == "Yes" else None
|
||||
}
|
||||
|
||||
if isinstance(source, str):
|
||||
result = self.model.transcribe(source, **kwargs)
|
||||
|
||||
return str(result)
|
||||
|
||||
elif isinstance(source, list):
|
||||
source_names = [s.split("/")[-1] for s in source]
|
||||
result = []
|
||||
for s in tqdm(source, total=len(source),desc = "Transcribing audio files"):
|
||||
res = self.model.transcribe(s, **kwargs)
|
||||
result.append(res)
|
||||
|
||||
out = ''
|
||||
for i, res in enumerate(result):
|
||||
out += f"TRANSCRIPT {i} FOR ({source_names[i]}):\n\n"
|
||||
out += str(res)
|
||||
out += "\n\n"
|
||||
|
||||
return out
|
||||
|
||||
else:
|
||||
raise gr.Error("Please provide a valid audio file.")
|
||||
|
||||
def perform_diarisation(self, source, num_speakers):
|
||||
"""
|
||||
Shortcut method for the Diarisation task.
|
||||
@@ -124,22 +179,44 @@ class GradioTranscriptionInterface:
|
||||
"num_speakers": num_speakers if num_speakers != 0 else None,
|
||||
}
|
||||
|
||||
|
||||
if isinstance(source, str):
|
||||
try:
|
||||
result = self.model.diarization(source, **kwargs)
|
||||
except ValueError:
|
||||
raise gr.Error("Couldn't detect any speech in the provided audio. \
|
||||
Please try again!")
|
||||
|
||||
return json.dumps(result, indent=2)
|
||||
elif isinstance(source, list):
|
||||
source_names = [s.split("/")[-1] for s in source]
|
||||
result = []
|
||||
for s in tqdm(source, total=len(source),desc = "Performing diarisation"):
|
||||
try:
|
||||
res = self.model.diarization(s, **kwargs)
|
||||
except ValueError:
|
||||
res = f"NO DIARISATION FOUND FOR {s}"
|
||||
gr.Warning(f"Couldn't detect any speech in {s} will skip this file.")
|
||||
result.append(res)
|
||||
|
||||
out = {}
|
||||
|
||||
for i, res in enumerate(result):
|
||||
out[source_names[i]] = res
|
||||
|
||||
return json.dumps(out, indent=4)
|
||||
|
||||
else:
|
||||
gr.Error("Please provide a valid audio file.")
|
||||
|
||||
|
||||
####
|
||||
# Gradio Interface
|
||||
####
|
||||
|
||||
def gradio_Interface(model : AutoTranscribe = None):
|
||||
def gradio_Interface(model : Scraibe = None):
|
||||
|
||||
if model is None:
|
||||
model = AutoTranscribe()
|
||||
model = Scraibe()
|
||||
|
||||
pipe = GradioTranscriptionInterface(model)
|
||||
|
||||
@@ -197,7 +274,7 @@ def gradio_Interface(model : AutoTranscribe = None):
|
||||
gr.update(visible = True),
|
||||
gr.update(visible = False, value = None))
|
||||
|
||||
elif choice == "File":
|
||||
elif choice == "File or Files":
|
||||
|
||||
return (gr.update(visible = False, value = None),
|
||||
gr.update(visible = False, value = None),
|
||||
@@ -205,11 +282,25 @@ def gradio_Interface(model : AutoTranscribe = None):
|
||||
gr.update(visible = False, value = None),
|
||||
gr.update(visible = True))
|
||||
|
||||
def run_scribe(task, num_speakers, translate, language, audio1, audio2, video1, video2, file_in, progress = gr.Progress(track_tqdm= True)):
|
||||
def run_scribe(task,
|
||||
num_speakers,
|
||||
translate,
|
||||
language,
|
||||
audio1,
|
||||
audio2,
|
||||
video1,
|
||||
video2,
|
||||
file_in,
|
||||
progress = gr.Progress(track_tqdm= True)):
|
||||
# get *args which are not None
|
||||
progress(0, desc='Starting task...')
|
||||
source = audio1 or audio2 or video1 or video2 or file_in
|
||||
|
||||
if isinstance(source, list):
|
||||
source = [s.name for s in source]
|
||||
if len(source) == 1:
|
||||
source = source[0]
|
||||
|
||||
if task == 'Auto Transcribe':
|
||||
|
||||
out_str , out_json = pipe.auto_transcribe(source = source,
|
||||
@@ -217,10 +308,16 @@ def gradio_Interface(model : AutoTranscribe = None):
|
||||
translation = translate,
|
||||
language = language)
|
||||
|
||||
if isinstance(source, str):
|
||||
return (gr.update(value = out_str, visible = True),
|
||||
gr.update(value = out_json, visible = True),
|
||||
gr.update(visible = True),
|
||||
gr.update(visible = True))
|
||||
else:
|
||||
return (gr.update(value = out_str, visible = True),
|
||||
gr.update(value = out_json, visible = True),
|
||||
gr.update(visible = False),
|
||||
gr.update(visible = False))
|
||||
|
||||
elif task == 'Transcribe':
|
||||
|
||||
@@ -255,7 +352,8 @@ def gradio_Interface(model : AutoTranscribe = None):
|
||||
with gr.Blocks(theme=theme,title='ScrAIbe: Automatic Audio Transcription') as demo:
|
||||
|
||||
# Define components
|
||||
header = open("header.html", "r").read()
|
||||
hname = os.path.join(CURRENT_PATH, "header.html")
|
||||
header = open(hname, "r").read()
|
||||
gr.HTML(header, visible= True, show_label=False)
|
||||
|
||||
with gr.Row():
|
||||
@@ -279,7 +377,7 @@ def gradio_Interface(model : AutoTranscribe = None):
|
||||
leave it at None.", visible= True)
|
||||
|
||||
input = gr.Radio(["Upload Audio", "Record Audio", "Upload Video","Record Video"
|
||||
,"File"], label="Input Type", value="Upload Audio")
|
||||
,"File or Files"], label="Input Type", value="Upload Audio")
|
||||
|
||||
audio1 = gr.Audio(source="upload", type="filepath", label="Upload Audio",
|
||||
interactive= True, visible= True)
|
||||
@@ -289,7 +387,7 @@ def gradio_Interface(model : AutoTranscribe = None):
|
||||
interactive= True, visible= False)
|
||||
video2 = gr.Video(source="webcam", label="Record Video", type="filepath",
|
||||
interactive= True, visible= False)
|
||||
file_in = gr.File(label="Upload File", interactive= True, visible= False)
|
||||
file_in = gr.Files(label="Upload File or Files", interactive= True, visible= False)
|
||||
|
||||
submit = gr.Button()
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
"""
|
||||
AutoTranscribe Class
|
||||
Scraibe Class
|
||||
--------------------
|
||||
|
||||
This class serves as the core of the transcription system, responsible for handling
|
||||
@@ -12,15 +12,15 @@ By encapsulating the complexities of underlying models, it allows for straightfo
|
||||
integration into various applications, ranging from transcription services to voice assistants.
|
||||
|
||||
Available Classes:
|
||||
- AutoTranscribe: Main class for performing transcription and diarization.
|
||||
- Scraibe: Main class for performing transcription and diarization.
|
||||
Includes methods for loading models, processing audio files,
|
||||
and formatting the transcription output.
|
||||
|
||||
Usage:
|
||||
from .autotranscribe import AutoTranscribe
|
||||
from scraibe import Scraibe
|
||||
|
||||
model = AutoTranscribe(whisper_model="path/to/whisper/model", dia_model="path/to/diarisation/model")
|
||||
transcript = model.transcribe("path/to/audiofile.wav")
|
||||
model = Scraibe()
|
||||
transcript = model.autotranscribe("path/to/audiofile.wav")
|
||||
"""
|
||||
|
||||
# Standard Library Imports
|
||||
@@ -45,9 +45,9 @@ from .transcript_exporter import Transcript
|
||||
DiarisationType = TypeVar('DiarisationType')
|
||||
|
||||
|
||||
class AutoTranscribe:
|
||||
class Scraibe:
|
||||
"""
|
||||
AutoTranscribe is a class responsible for managing the transcription and diarization of audio files.
|
||||
Scraibe is a class responsible for managing the transcription and diarization of audio files.
|
||||
It serves as the core of the transcription system, incorporating pretrained models
|
||||
for speech-to-text (such as Whisper) and speaker diarization (such as pyannote.audio),
|
||||
allowing for comprehensive audio processing.
|
||||
@@ -57,7 +57,7 @@ class AutoTranscribe:
|
||||
diariser (Diariser): The diariser object to handle diarization.
|
||||
|
||||
Methods:
|
||||
__init__: Initializes the AutoTranscribe class with appropriate models.
|
||||
__init__: Initializes the Scraibe class with appropriate models.
|
||||
transcribe: Transcribes an audio file using the whisper model and pyannote diarization model.
|
||||
remove_audio_file: Removes the original audio file to avoid disk space issues or ensure data privacy.
|
||||
get_audio_file: Gets an audio file as an AudioProcessor object.
|
||||
@@ -66,7 +66,7 @@ class AutoTranscribe:
|
||||
whisper_model: Union[bool, str, whisper] = None,
|
||||
dia_model : Union[bool, str, DiarisationType] = None,
|
||||
**kwargs) -> None:
|
||||
"""Initializes the AutoTranscribe class.
|
||||
"""Initializes the Scraibe class.
|
||||
|
||||
Args:
|
||||
whisper_model (Union[bool, str, whisper], optional):
|
||||
@@ -92,7 +92,11 @@ class AutoTranscribe:
|
||||
else:
|
||||
self.diariser = dia_model
|
||||
|
||||
print("AutoTranscribe initialized all models successfully loaded.")
|
||||
if kwargs.get("verbose"):
|
||||
print("Scraibe initialized all models successfully loaded.")
|
||||
self.verbose = True
|
||||
else:
|
||||
self.verbose = False
|
||||
|
||||
def autotranscribe(self, audio_file : Union[str, torch.Tensor, ndarray],
|
||||
remove_original : bool = False,
|
||||
@@ -112,7 +116,8 @@ class AutoTranscribe:
|
||||
Transcript: A Transcript object containing the transcription,
|
||||
which can be exported to different formats.
|
||||
"""
|
||||
|
||||
if kwargs.get("verbose"):
|
||||
self.verbose = kwargs.get("verbose")
|
||||
# Get audio file as an AudioProcessor object
|
||||
audio_file = self.get_audio_file(audio_file)
|
||||
|
||||
@@ -122,11 +127,11 @@ class AutoTranscribe:
|
||||
"sample_rate": audio_file.sr
|
||||
}
|
||||
|
||||
if self.verbose:
|
||||
print("Starting diarisation.")
|
||||
|
||||
diarisation = self.diariser.diarization(dia_audio, **kwargs)
|
||||
|
||||
|
||||
if not diarisation["segments"]:
|
||||
print("No segments found. Try to run transcription without diarisation.")
|
||||
|
||||
@@ -138,6 +143,7 @@ class AutoTranscribe:
|
||||
|
||||
return Transcript(final_transcript)
|
||||
|
||||
if self.verbose:
|
||||
print("Diarisation finished. Starting transcription.")
|
||||
|
||||
audio_file.sr = torch.Tensor([audio_file.sr]).to(audio_file.waveform.device)
|
||||
@@ -145,9 +151,7 @@ class AutoTranscribe:
|
||||
# Transcribe each segment and store the results
|
||||
final_transcript = dict()
|
||||
|
||||
|
||||
|
||||
for i in trange(len(diarisation["segments"]), desc= "Transcribing"):
|
||||
for i in trange(len(diarisation["segments"]), desc= "Transcribing", disable = not self.verbose):
|
||||
|
||||
seg = diarisation["segments"][i]
|
||||
|
||||
@@ -283,4 +287,4 @@ class AutoTranscribe:
|
||||
return audio_file
|
||||
|
||||
def __repr__(self):
|
||||
return f"AutoTranscribe(transcriber={self.transcriber}, diariser={self.diariser})"
|
||||
return f"Scraibe(transcriber={self.transcriber}, diariser={self.diariser})"
|
||||
@@ -1,5 +1,5 @@
|
||||
"""
|
||||
Command-Line Interface (CLI) for the AutoTranscribe class,
|
||||
Command-Line Interface (CLI) for the Scraibe class,
|
||||
allowing for user interaction to transcribe and diarize audio files.
|
||||
The function includes arguments for specifying the audio files, model paths,
|
||||
output formats, and other options necessary for transcription.
|
||||
@@ -8,9 +8,7 @@ import os
|
||||
from argparse import ArgumentParser, ArgumentDefaultsHelpFormatter
|
||||
import json
|
||||
|
||||
from sympy import use
|
||||
|
||||
from .autotranscript import AutoTranscribe
|
||||
from .autotranscript import Scraibe
|
||||
from .app.gradio_app import gradio_Interface
|
||||
|
||||
from whisper.tokenizer import LANGUAGES , TO_LANGUAGE_CODE
|
||||
@@ -20,12 +18,12 @@ from torch import set_num_threads
|
||||
|
||||
def cli():
|
||||
"""
|
||||
Command-Line Interface (CLI) for the AutoTranscribe class, allowing for user interaction to transcribe
|
||||
Command-Line Interface (CLI) for the Scraibe class, allowing for user interaction to transcribe
|
||||
and diarize audio files. The function includes arguments for specifying the audio files, model paths,
|
||||
output formats, and other options necessary for transcription.
|
||||
|
||||
This function can be executed from the command line to perform transcription tasks, providing a
|
||||
user-friendly way to access the AutoTranscribe class functionalities.
|
||||
user-friendly way to access the Scraibe class functionalities.
|
||||
"""
|
||||
|
||||
def str2bool(string):
|
||||
@@ -115,7 +113,7 @@ def cli():
|
||||
if arg_dict["whisper_model_directory"]:
|
||||
class_kwargs["download_root"] = arg_dict.pop("whisper_model_directory")
|
||||
|
||||
model = AutoTranscribe(**class_kwargs)
|
||||
model = Scraibe(**class_kwargs)
|
||||
|
||||
|
||||
if arg_dict["audio_files"]:
|
||||
@@ -14,7 +14,6 @@ WHISPER_DEFAULT_PATH = os.path.join(CACHE_DIR, "whisper")
|
||||
PYANNOTE_DEFAULT_PATH = os.path.join(CACHE_DIR, "pyannote")
|
||||
PYANNOTE_DEFAULT_CONFIG = os.path.join(PYANNOTE_DEFAULT_PATH, "config.yaml")
|
||||
|
||||
|
||||
def config_diarization_yaml(file_path: str, path_to_segmentation: str = None) -> None:
|
||||
"""Configure diarization pipeline from a YAML file.
|
||||
|
||||
@@ -90,8 +90,8 @@ class Transcriber:
|
||||
|
||||
kwargs = self._get_whisper_kwargs(**kwargs)
|
||||
|
||||
if "verbose" not in kwargs:
|
||||
kwargs["verbose"] = False
|
||||
if not kwargs.get("verbose"):
|
||||
kwargs["verbose"] = None
|
||||
|
||||
result = self.model.transcribe(audio, *args, **kwargs)
|
||||
return result["text"]
|
||||
@@ -173,6 +173,9 @@ class Transcriber:
|
||||
if (task := kwargs.get("task")):
|
||||
whisper_kwargs["task"] = task
|
||||
|
||||
if (language := kwargs.get("language")):
|
||||
whisper_kwargs["language"] = language
|
||||
|
||||
return whisper_kwargs
|
||||
|
||||
def __repr__(self) -> str:
|
||||
@@ -0,0 +1,31 @@
|
||||
[metadata]
|
||||
name = scraibe
|
||||
version = attr: scraibe.__version__
|
||||
author = Jacob Schmieder
|
||||
author_email = Jacob.Schmieder@dbfz.de
|
||||
description = My package description
|
||||
long_description = file: README.md, LICENSE
|
||||
platforms = Linux
|
||||
keywords = transcription speech recognition whisper pyannote audio speech-to-text speech-to-text transcription speech-to-text recognition voice-to-speech
|
||||
license = GPL-3.0
|
||||
classifiers =
|
||||
Development Status :: 3 - Alpha
|
||||
Environment :: GPU :: NVIDIA CUDA :: 11.2
|
||||
License :: OSI Approved :: Open Software License 3.0 (OSL-3.0)
|
||||
Topic :: Scientific/Engineering :: Artificial Intelligence
|
||||
Programming Language :: Python :: 3.8
|
||||
Programming Language :: Python :: 3.9
|
||||
Programming Language :: Python :: 3.10
|
||||
|
||||
[options]
|
||||
zip_safe = False
|
||||
include_package_data = True
|
||||
packages = find:
|
||||
python_requires = >=3.7
|
||||
install_requires =
|
||||
requests
|
||||
importlib-metadata; python_version<"3.8"
|
||||
|
||||
[options.entry_points]
|
||||
console_scripts =
|
||||
executable-name = scraibe.cli:cli
|
||||
@@ -1,8 +1,9 @@
|
||||
from calendar import c
|
||||
import pkg_resources
|
||||
import os
|
||||
from setuptools import setup, find_packages
|
||||
|
||||
module_name = "autotranscript"
|
||||
module_name = "scraibe"
|
||||
github_url = "https://github.com/JSchmie/autotranscript"
|
||||
|
||||
file_dir = os.path.dirname(os.path.realpath(__file__))
|
||||
@@ -18,7 +19,7 @@ with open(verfile, "r") as fp:
|
||||
|
||||
############### setup ###############
|
||||
|
||||
build_version = "AUTOTRANSCRIPT_BUILD" in os.environ
|
||||
build_version = "SCRAIBE_BUILD" in os.environ
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
||||
@@ -36,11 +37,24 @@ if __name__ == "__main__":
|
||||
'https://download.pytorch.org/whl/cu113',
|
||||
],
|
||||
url= github_url,
|
||||
license='',
|
||||
|
||||
license='GPL-3',
|
||||
author='Jacob Schmieder',
|
||||
author_email='Jacob.Schmieder@dbfz.de',
|
||||
description='Transcription tool for audio files based on Whisper and Pyannote',
|
||||
package_data={ "header" : ["app/header.html"], "logo" : ["app/Logo_KIDA_bmel_green.svg"]},
|
||||
classifiers=[
|
||||
'Development Status :: 3 - Alpha',
|
||||
'Environment :: GPU :: NVIDIA CUDA :: 11.2',
|
||||
'License :: OSI Approved :: Open Software License 3.0 (OSL-3.0)',
|
||||
'Topic :: Scientific/Engineering :: Artificial Intelligence',
|
||||
'Programming Language :: Python :: 3.8',
|
||||
'Programming Language :: Python :: 3.9',
|
||||
'Programming Language :: Python :: 3.10'],
|
||||
keywords = ['transcription', 'speech recognition', 'whisper', 'pyannote', 'audio',
|
||||
'speech-to-text', 'speech-to-text transcription', 'speech-to-text recognition',
|
||||
'voice-to-speech'],
|
||||
package_data={'scraibe.app' : ["*.html", "*.svg"]},
|
||||
entry_points={'console_scripts':
|
||||
['autotranscript = autotranscript.cli:cli']}
|
||||
['scraibe = scraibe.cli:cli']}
|
||||
|
||||
)
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
import pytest
|
||||
from autotranscript import Transcriber
|
||||
from scraibe import Transcriber
|
||||
from unittest.mock import patch, mock_open
|
||||
import os
|
||||
|
||||
@@ -55,7 +55,7 @@ def test_save_transcript_to_file(transcriber):
|
||||
|
||||
# Test Diaraization class
|
||||
|
||||
from autotranscript import Diariser
|
||||
from scraibe import Diariser
|
||||
|
||||
@pytest.fixture
|
||||
def diarisation():
|
||||
@@ -83,7 +83,7 @@ def test_diarisation(diarisation):
|
||||
|
||||
# Test AudioProcessor
|
||||
|
||||
from autotranscript import AudioProcessor , TorchAudioProcessor
|
||||
from scraibe import AudioProcessor , TorchAudioProcessor
|
||||
|
||||
|
||||
def test_AudioProcessor_init():
|
||||
@@ -1,38 +0,0 @@
|
||||
# import os
|
||||
# import sys
|
||||
# import traceback
|
||||
|
||||
# class TracePrints(object):
|
||||
# def __init__(self):
|
||||
# self.stdout = sys.stdout
|
||||
# def write(self, s):
|
||||
# self.stdout.write("Writing %r\n" % s)
|
||||
# traceback.print_stack(file=self.stdout)
|
||||
|
||||
# sys.stdout = TracePrints()
|
||||
|
||||
# os.environ["PYANNOTE_CACHE"] = os.path.expanduser("~/PycharmProjects/autotranscript/autotranscript/models/pyannote")
|
||||
# import os
|
||||
|
||||
# os.environ['TRANSFORMERS_CACHE'] = os.path.expanduser("~/PycharmProjects/autotranscript/autotranscript/models")
|
||||
# os.environ['HF_HOME'] = os.path.expanduser("~/PycharmProjects/autotranscript/autotranscript/models")
|
||||
|
||||
|
||||
from autotranscript import AutoTranscribe
|
||||
|
||||
model = AutoTranscribe()
|
||||
|
||||
text = model.transcribe("test.mp4")
|
||||
|
||||
print("Transcription:\n")
|
||||
print(text)
|
||||
|
||||
|
||||
# from autotranscript.misc import *
|
||||
# import os
|
||||
|
||||
# print(os.path.exists(CACHE_DIR))
|
||||
# print(os.path.exists(WHISPER_DEFAULT_PATH))
|
||||
# print(os.path.exists(PYANNOTE_DEFAULT_PATH))
|
||||
|
||||
# print(os.path.exists(PYANNOTE_DEFAULT_CONFIG))
|
||||
Reference in New Issue
Block a user