removed docs to aviod conflict

This commit is contained in:
Jaikinator
2023-09-22 18:44:24 +02:00
24 changed files with 287 additions and 453 deletions
Binary file not shown.

After

Width:  |  Height:  |  Size: 131 KiB

-173
View File
@@ -1,173 +0,0 @@
# `ScrAIbe: Streamlined Conversation Recording with Automated Intelligence Based Environment`
`ScrAIbe` is a state-of-the-art, [PyTorch](https://pytorch.org/) based multilingual speech-to-text framework to generate fully automated transcriptions.
Beyond transcription, ScrAIbe supports advanced functions, such as speaker diarization and speaker recognition.
Designed as a comprehensive AI toolkit, it uses multiple AI models:
- [whisper](https://github.com/openai/whisper): A general-purpose speech recognition model.
- [payannote-audio](https://github.com/pyannote/pyannote-audio): An open-source toolkit for speaker diarization.
The framework utilizes a PyanNet-inspired pipeline with the `Pyannote` library for speaker diarization and `VoxCeleb` for speaker embedding.
During post-diarization, each audio segment is processed by the OpenAI `Whisper` model, in a transformer encoder-decoder structure. Initially, a CNN mitigates noise and enhances speech. Before transcription, `VoxLingua` dentifies the language segment, facilitating Whisper's role in both transcription and text translation.
The following graphic illustates the whole pipeline:
![Pipeline](Pictures/pipeline.png#gh-dark-mode-only)
![Pipeline](Pictures/pipeline_light.png#gh-light-mode-only)
## Install `ScrAIbe` :
The following command will pull and install the latest commit from this repository, along with its Python dependencies.
pip install git+https://github.com/JSchmie/autotranscript.git
- **Python version**: Python 3.8
- **PyTorch version**: Python 1.11.0
- **CUDA version**: Cuda-toolkit 11.3.1
Important: For the `Pyannote` model you need to be granted access in Hugging Face.
Check the [Pyannote model page](https://huggingface.co/pyannote/speaker-diarization) to get access to the model.
Additionally, you need to generate a [Hugging Face token](https://huggingface.co/docs/hub/security-tokens).
## Usage
We've developed ScrAIbe with several access points to cater to diverse user needs.
### Python usage
It enables full control over the functionalities as well as process customization.
Some usage examples:
- Usage of `AutoTranscribe`, core of the transcription system, for performing trancription and diarization of audio files.
```python
from scraibe import AutoTranscribe
model = AutoTranscribe()
text = model.transcribe("audio.wav")
print(f"Transcription: \n{text}")
```
- Usage of `Diariser`, responsible for identifying
and segmenting individual speakers from a given audio file.
```python
from scraibe import Diariser
model = Diariser.load_model()
diarisation_output = model.diarization("audio.wav")
```
- Usage of `Transcriber`, for transcribing audio files and saving the transcription afterwards.
```python
from scraibe import Transcriber
transcriber = Transcriber.load_model()
transcript = transcriber.transcribe("audio.wav")
transcriber.save_transcript(transcript, "path/to/save.txt")
```
Refer to [whisper](https://github.com/openai/whisper) and [payannote-audio](https://github.com/pyannote/pyannote-audio) for further options.
### Command-line usage
You can also run ScrAIbe in a [Gradio App](https://github.com/gradio-app/gradio) interface using the following command-line:
scraibe audio.wav
Some example of important functionalities are:
- `--task`: Task to be performed, either transcription, diarization or translation into English. Default is transcription.
- `--hf-token`: Personal `Hugging Face` token.
- `--server-name`: Name of the Web Server. If empty 127.0.0.1 or 0.0.0.0 will be used.
- `--port`: To run the Gradio app. The default is 7860.
- `--whisper-model-name`: Name of the [whisper](https://github.com/openai/whisper) model to be used. Default is `medium`.
Run the following to view all available options:
scraibe -h
### Running a Docker container
After you have installed Docker, you can execute the following commands in the terminal.
```
sudo docker build . --build-arg="hf_token=[enter your HuggingFace token] " -t [image name]
sudo docker run -it -p 7860:7860 --name [container name][image name] --hf_token [enter your HuggingFace token] --start_server
```
- `-p`: Flag for connecting the container interal port to the port on your local machine.
- `--hf_token`: Flag for entering your personal HuggingFace token in the container.
- `--start_server`: Command to start the Gradio App.
Then click the following link to run the app:
http://0.0.0.0:7860
- Enabling GPU usage
```
sudo docker run -it -p 7860:7860 --gpus 'all,capabilities=utility' --name [container name][image name] --hf_token [enter your HuggingFace token] --start_server
```
For further guidance check: https://blog.roboflow.com/use-the-gpu-in-docker/
## Documentation
For further insights check the [documentation page](https://cristinaortizcruz.github.io/Test/).
## Contributions
We are happy for any interest in contributing and about feedback: In order to do that, create an issue with your feedback or feel free to contact us.
## Roadmap
The following milestones are planned for further releases of ScrAIbe:
- Model quantization
Quantization to empower memory and computational efficiency.
- Model fine-tuning
In order to be able to cover a variety of linguistic phenomena.
For example, currently ScrAIbe is able to transcribe word by word, but ignores filler words or speech pauses.
These phenomena can be addressed by fine-tuning with the corresponding data.
- Implementation of LLMs
One example is the implementation of a summarization or extraction model, which enables ScrAIbe to automatically summarize or retrieve the key information out of a generated transcription, which could be the minutes of a meeting.
- Executable for Windows
## Contact
For queries contact [Jacob Schmieder](Jacob.Schmieder@dbfz.de)
## License
ScrAIbe is licensed under GNU General Public License.
## Acknowledgments
Special thanks go to the KIDA project and the BMEL (Bundesministerium für Ernährung und Landwirtschaft), especially to the AI Consultancy Team and the Infrastructure Team.
![KIDA](Pictures/kida_dark.png#gh-dark-mode-only)   ![BMEL](Pictures/BMEL_dark.png#gh-dark-mode-only)      ![DBFZ](Pictures/DBFZ_dark.png#gh-dark-mode-only)       ![MRI](Pictures/MRI.png#gh-dark-mode-only)
![KIDA](Pictures/kida.png#gh-light-mode-only)   ![BMEL](Pictures/BMEL.jpg#gh-light-mode-only)      ![DBFZ](Pictures/DBFZ.png#gh-light-mode-only)       ![MRI](Pictures/MRI.png#gh-light-mode-only)
-101
View File
@@ -1,101 +0,0 @@
from dash import Dash, dcc, html, dash_table, Input, Output, State, callback
import base64
from autotranscript.app.qtfaststart import process
from autotranscript import AutoTranscribe
import io
import subprocess as sp
import numpy as np
from autotranscript.audio import SAMPLE_RATE
# Setup auto-transcript
autot = AutoTranscribe() # whisper_model="tiny", whisper_kwargs={"local" : False}
# Setup FFmpeg
PROBLEMATIC_FILE_TYPES : tuple = "mov","mp4","m4a","3gp","3g2","mj2"
# Setup Dash
external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']
app = Dash(__name__, external_stylesheets=external_stylesheets)
app.layout = html.Div([
dcc.Upload(
id='upload-data',
children=html.Div([
'Drag and Drop or ',
html.A('Select Files')
]),
style={
'width': '100%',
'height': '60px',
'lineHeight': '60px',
'borderWidth': '1px',
'borderStyle': 'dashed',
'borderRadius': '5px',
'textAlign': 'center',
'margin': '10px'
},
# Allow multiple files to be uploaded
multiple=True
),
html.Div(id='output-data-upload'),
])
def parse_contents(contents, filename, date):
content_type, content_string = contents.split(',')
decoded = base64.b64decode(content_string)
file = io.BytesIO(decoded).read()
if filename.endswith(PROBLEMATIC_FILE_TYPES):
# mp4 and other files need to be processed with qtfaststart
# since theire metadata is at the end of the file
# and we need it at the beginning
file = process(file)
cmd = [
"ffmpeg",
"-nostdin",
"-threads", "0",
"-i",'pipe:',
"-f", "s16le",
'-hide_banner',
'-loglevel', 'error',
"-c", "copy",
"-vn",
"-ac", "1",
"-acodec", "pcm_s16le",
"-ar", str(SAMPLE_RATE),
"-"
]
proc = sp.Popen(cmd, stdout=sp.PIPE, stdin=sp.PIPE)
out = proc.communicate(input=file)[0]
out = np.frombuffer(out, np.int16).flatten().astype(np.float32) / 32768.0
out = np.array([out, SAMPLE_RATE])
transcript = str(autot.transcribe(out))
return html.Div([
html.H5(f"File Name: {filename} \n" \
"Transcript: \n"
),
html.P(transcript)
])
@callback(Output('output-data-upload', 'children'),
Input('upload-data', 'contents'),
State('upload-data', 'filename'),
State('upload-data', 'last_modified'))
def update_output(list_of_contents, list_of_names, list_of_dates):
if list_of_contents is not None:
children = [
parse_contents(c, n, d) for c, n, d in
zip(list_of_contents, list_of_names, list_of_dates)]
return children
if __name__ == '__main__':
app.run_server()
View File
+1 -3
View File
@@ -9,8 +9,6 @@ pyannote.pipeline~=2.3
setuptools~=65.6.3
setuptools-rust~=1.5.2
sphinx~=5.0.2
tqdm>=4.65.0
gradio~=3.36.1
@@ -22,6 +20,6 @@ torch~=1.11.0
torchvision~=0.12.0
torchaudio~=0.11.0
#optional:
#dash~=2.10.2
#sphinx~=5.0.2
+1
View File
@@ -0,0 +1 @@
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

Before

Width:  |  Height:  |  Size: 38 KiB

After

Width:  |  Height:  |  Size: 38 KiB

@@ -3,7 +3,7 @@ Gradio Audio Transcription App.
--------------------------------
This module provides an interface to transcribe audio files using the
AutoTranscribe model. Users can either upload an audio file or record their speech
Scraibe model. Users can either upload an audio file or record their speech
live for transcription. The application supports multiple languages and provides
options to specify the number of speakers and the language of the audio.
@@ -20,7 +20,7 @@ Gradio Audio Transcription App.
--------------------------------
This module provides an interface to transcribe audio files using the
AutoTranscribe model. Users can either upload an audio file or record their speech
Scraibe model. Users can either upload an audio file or record their speech
live for transcription. The application supports multiple languages and provides
options to specify the number of speakers and the language of the audio.
@@ -33,10 +33,13 @@ Usage:
"""
import json
import os
from tkinter import CURRENT
import gradio as gr
from autotranscript import AutoTranscribe, Transcript
from tqdm import tqdm
from scraibe import Scraibe, Transcript
theme = gr.themes.Soft(
primary_hue="green",
@@ -59,17 +62,19 @@ LANGUAGES = [
"Vietnamese", "Welsh"
]
CURRENT_PATH = os.path.dirname(os.path.realpath(__file__))
class GradioTranscriptionInterface:
"""
Interface handling the interaction between Gradio UI and the Audio Transcription system.
"""
def __init__(self, model: AutoTranscribe):
def __init__(self, model: Scraibe):
"""
Initializes the GradioTranscriptionInterface with a transcription model.
Args:
model (AutoTranscribe): Model responsible for audio transcription tasks.
model (Scraibe): Model responsible for audio transcription tasks.
"""
self.model = model
@@ -78,7 +83,7 @@ class GradioTranscriptionInterface:
translation : bool,
language : str):
"""
Shortcut method for the AutoTranscribe task.
Shortcut method for the Scraibe task.
Returns:
tuple: Transcribed text (str), JSON output (dict)
@@ -89,13 +94,43 @@ class GradioTranscriptionInterface:
"language": language if language != "None" else None,
"task": 'translate' if translation else None
}
if isinstance(source, str):
try:
result = self.model.autotranscribe(source, **kwargs)
except ValueError:
raise gr.Error("Couldn't detect any speech in the provided audio. \
Please try again!")
return str(result), result.get_json()
try:
result = self.model.autotranscribe(source, **kwargs)
except ValueError:
raise gr.Error("Couldn't detect any speech in the provided audio. \
Please try again!")
return str(result), result.get_json()
elif isinstance(source, list):
source_names = [s.split("/")[-1] for s in source]
result = []
for s in tqdm(source, total=len(source),desc = "Transcribing audio files"):
try:
res = self.model.autotranscribe(s, **kwargs)
except ValueError:
_name = s.split("/")[-1]
res = f"NO TRANSCRIPT FOUND FOR {_name}"
gr.Warning(f"Couldn't detect any speech in {_name} will skip this file.")
result.append(res)
out = ''
out_dict = {}
for i, r in enumerate(result):
out += f"TRANSCRIPT {i} FOR ({source_names[i]}):\n\n"
out += str(r)
out += "\n\n"
if isinstance(r, str):
out_dict[source_names[i]] = r
else:
out_dict[source_names[i]] = r.get_dict()
return out, json.dumps(out_dict, indent=4)
else:
raise gr.Error("Please provide a valid audio file.")
def transcribe(self, source, translation, language):
@@ -110,8 +145,28 @@ class GradioTranscriptionInterface:
"task": 'translate' if translation == "Yes" else None
}
result = self.model.transcribe(source, **kwargs)
return str(result)
if isinstance(source, str):
result = self.model.transcribe(source, **kwargs)
return str(result)
elif isinstance(source, list):
source_names = [s.split("/")[-1] for s in source]
result = []
for s in tqdm(source, total=len(source),desc = "Transcribing audio files"):
res = self.model.transcribe(s, **kwargs)
result.append(res)
out = ''
for i, res in enumerate(result):
out += f"TRANSCRIPT {i} FOR ({source_names[i]}):\n\n"
out += str(res)
out += "\n\n"
return out
else:
raise gr.Error("Please provide a valid audio file.")
def perform_diarisation(self, source, num_speakers):
"""
@@ -124,22 +179,44 @@ class GradioTranscriptionInterface:
"num_speakers": num_speakers if num_speakers != 0 else None,
}
if isinstance(source, str):
try:
result = self.model.diarization(source, **kwargs)
except ValueError:
raise gr.Error("Couldn't detect any speech in the provided audio. \
Please try again!")
try:
result = self.model.diarization(source, **kwargs)
except ValueError:
raise gr.Error("Couldn't detect any speech in the provided audio. \
Please try again!")
return json.dumps(result, indent=2)
return json.dumps(result, indent=2)
elif isinstance(source, list):
source_names = [s.split("/")[-1] for s in source]
result = []
for s in tqdm(source, total=len(source),desc = "Performing diarisation"):
try:
res = self.model.diarization(s, **kwargs)
except ValueError:
res = f"NO DIARISATION FOUND FOR {s}"
gr.Warning(f"Couldn't detect any speech in {s} will skip this file.")
result.append(res)
out = {}
for i, res in enumerate(result):
out[source_names[i]] = res
return json.dumps(out, indent=4)
else:
gr.Error("Please provide a valid audio file.")
####
# Gradio Interface
####
def gradio_Interface(model : AutoTranscribe = None):
def gradio_Interface(model : Scraibe = None):
if model is None:
model = AutoTranscribe()
model = Scraibe()
pipe = GradioTranscriptionInterface(model)
@@ -197,7 +274,7 @@ def gradio_Interface(model : AutoTranscribe = None):
gr.update(visible = True),
gr.update(visible = False, value = None))
elif choice == "File":
elif choice == "File or Files":
return (gr.update(visible = False, value = None),
gr.update(visible = False, value = None),
@@ -205,22 +282,42 @@ def gradio_Interface(model : AutoTranscribe = None):
gr.update(visible = False, value = None),
gr.update(visible = True))
def run_scribe(task, num_speakers, translate, language, audio1, audio2, video1, video2, file_in, progress = gr.Progress(track_tqdm= True)):
def run_scribe(task,
num_speakers,
translate,
language,
audio1,
audio2,
video1,
video2,
file_in,
progress = gr.Progress(track_tqdm= True)):
# get *args which are not None
progress(0, desc='Starting task...')
source = audio1 or audio2 or video1 or video2 or file_in
if isinstance(source, list):
source = [s.name for s in source]
if len(source) == 1:
source = source[0]
if task == 'Auto Transcribe':
out_str , out_json = pipe.auto_transcribe(source = source,
num_speakers = num_speakers,
translation = translate,
language = language)
return (gr.update(value = out_str, visible = True),
gr.update(value = out_json, visible = True),
gr.update(visible = True),
gr.update(visible = True))
if isinstance(source, str):
return (gr.update(value = out_str, visible = True),
gr.update(value = out_json, visible = True),
gr.update(visible = True),
gr.update(visible = True))
else:
return (gr.update(value = out_str, visible = True),
gr.update(value = out_json, visible = True),
gr.update(visible = False),
gr.update(visible = False))
elif task == 'Transcribe':
@@ -255,7 +352,8 @@ def gradio_Interface(model : AutoTranscribe = None):
with gr.Blocks(theme=theme,title='ScrAIbe: Automatic Audio Transcription') as demo:
# Define components
header = open("header.html", "r").read()
hname = os.path.join(CURRENT_PATH, "header.html")
header = open(hname, "r").read()
gr.HTML(header, visible= True, show_label=False)
with gr.Row():
@@ -279,7 +377,7 @@ def gradio_Interface(model : AutoTranscribe = None):
leave it at None.", visible= True)
input = gr.Radio(["Upload Audio", "Record Audio", "Upload Video","Record Video"
,"File"], label="Input Type", value="Upload Audio")
,"File or Files"], label="Input Type", value="Upload Audio")
audio1 = gr.Audio(source="upload", type="filepath", label="Upload Audio",
interactive= True, visible= True)
@@ -289,7 +387,7 @@ def gradio_Interface(model : AutoTranscribe = None):
interactive= True, visible= False)
video2 = gr.Video(source="webcam", label="Record Video", type="filepath",
interactive= True, visible= False)
file_in = gr.File(label="Upload File", interactive= True, visible= False)
file_in = gr.Files(label="Upload File or Files", interactive= True, visible= False)
submit = gr.Button()
@@ -1,5 +1,5 @@
"""
AutoTranscribe Class
Scraibe Class
--------------------
This class serves as the core of the transcription system, responsible for handling
@@ -12,15 +12,15 @@ By encapsulating the complexities of underlying models, it allows for straightfo
integration into various applications, ranging from transcription services to voice assistants.
Available Classes:
- AutoTranscribe: Main class for performing transcription and diarization.
- Scraibe: Main class for performing transcription and diarization.
Includes methods for loading models, processing audio files,
and formatting the transcription output.
Usage:
from .autotranscribe import AutoTranscribe
from scraibe import Scraibe
model = AutoTranscribe(whisper_model="path/to/whisper/model", dia_model="path/to/diarisation/model")
transcript = model.transcribe("path/to/audiofile.wav")
model = Scraibe()
transcript = model.autotranscribe("path/to/audiofile.wav")
"""
# Standard Library Imports
@@ -45,9 +45,9 @@ from .transcript_exporter import Transcript
DiarisationType = TypeVar('DiarisationType')
class AutoTranscribe:
class Scraibe:
"""
AutoTranscribe is a class responsible for managing the transcription and diarization of audio files.
Scraibe is a class responsible for managing the transcription and diarization of audio files.
It serves as the core of the transcription system, incorporating pretrained models
for speech-to-text (such as Whisper) and speaker diarization (such as pyannote.audio),
allowing for comprehensive audio processing.
@@ -57,7 +57,7 @@ class AutoTranscribe:
diariser (Diariser): The diariser object to handle diarization.
Methods:
__init__: Initializes the AutoTranscribe class with appropriate models.
__init__: Initializes the Scraibe class with appropriate models.
transcribe: Transcribes an audio file using the whisper model and pyannote diarization model.
remove_audio_file: Removes the original audio file to avoid disk space issues or ensure data privacy.
get_audio_file: Gets an audio file as an AudioProcessor object.
@@ -66,7 +66,7 @@ class AutoTranscribe:
whisper_model: Union[bool, str, whisper] = None,
dia_model : Union[bool, str, DiarisationType] = None,
**kwargs) -> None:
"""Initializes the AutoTranscribe class.
"""Initializes the Scraibe class.
Args:
whisper_model (Union[bool, str, whisper], optional):
@@ -92,7 +92,11 @@ class AutoTranscribe:
else:
self.diariser = dia_model
print("AutoTranscribe initialized all models successfully loaded.")
if kwargs.get("verbose"):
print("Scraibe initialized all models successfully loaded.")
self.verbose = True
else:
self.verbose = False
def autotranscribe(self, audio_file : Union[str, torch.Tensor, ndarray],
remove_original : bool = False,
@@ -112,7 +116,8 @@ class AutoTranscribe:
Transcript: A Transcript object containing the transcription,
which can be exported to different formats.
"""
if kwargs.get("verbose"):
self.verbose = kwargs.get("verbose")
# Get audio file as an AudioProcessor object
audio_file = self.get_audio_file(audio_file)
@@ -121,12 +126,12 @@ class AutoTranscribe:
"waveform" : audio_file.waveform.reshape(1,len(audio_file.waveform)),
"sample_rate": audio_file.sr
}
print("Starting diarisation.")
if self.verbose:
print("Starting diarisation.")
diarisation = self.diariser.diarization(dia_audio, **kwargs)
if not diarisation["segments"]:
print("No segments found. Try to run transcription without diarisation.")
@@ -138,16 +143,15 @@ class AutoTranscribe:
return Transcript(final_transcript)
print("Diarisation finished. Starting transcription.")
if self.verbose:
print("Diarisation finished. Starting transcription.")
audio_file.sr = torch.Tensor([audio_file.sr]).to(audio_file.waveform.device)
# Transcribe each segment and store the results
final_transcript = dict()
for i in trange(len(diarisation["segments"]), desc= "Transcribing"):
for i in trange(len(diarisation["segments"]), desc= "Transcribing", disable = not self.verbose):
seg = diarisation["segments"][i]
@@ -283,4 +287,4 @@ class AutoTranscribe:
return audio_file
def __repr__(self):
return f"AutoTranscribe(transcriber={self.transcriber}, diariser={self.diariser})"
return f"Scraibe(transcriber={self.transcriber}, diariser={self.diariser})"
+5 -7
View File
@@ -1,5 +1,5 @@
"""
Command-Line Interface (CLI) for the AutoTranscribe class,
Command-Line Interface (CLI) for the Scraibe class,
allowing for user interaction to transcribe and diarize audio files.
The function includes arguments for specifying the audio files, model paths,
output formats, and other options necessary for transcription.
@@ -8,9 +8,7 @@ import os
from argparse import ArgumentParser, ArgumentDefaultsHelpFormatter
import json
from sympy import use
from .autotranscript import AutoTranscribe
from .autotranscript import Scraibe
from .app.gradio_app import gradio_Interface
from whisper.tokenizer import LANGUAGES , TO_LANGUAGE_CODE
@@ -20,12 +18,12 @@ from torch import set_num_threads
def cli():
"""
Command-Line Interface (CLI) for the AutoTranscribe class, allowing for user interaction to transcribe
Command-Line Interface (CLI) for the Scraibe class, allowing for user interaction to transcribe
and diarize audio files. The function includes arguments for specifying the audio files, model paths,
output formats, and other options necessary for transcription.
This function can be executed from the command line to perform transcription tasks, providing a
user-friendly way to access the AutoTranscribe class functionalities.
user-friendly way to access the Scraibe class functionalities.
"""
def str2bool(string):
@@ -115,7 +113,7 @@ def cli():
if arg_dict["whisper_model_directory"]:
class_kwargs["download_root"] = arg_dict.pop("whisper_model_directory")
model = AutoTranscribe(**class_kwargs)
model = Scraibe(**class_kwargs)
if arg_dict["audio_files"]:
@@ -14,7 +14,6 @@ WHISPER_DEFAULT_PATH = os.path.join(CACHE_DIR, "whisper")
PYANNOTE_DEFAULT_PATH = os.path.join(CACHE_DIR, "pyannote")
PYANNOTE_DEFAULT_CONFIG = os.path.join(PYANNOTE_DEFAULT_PATH, "config.yaml")
def config_diarization_yaml(file_path: str, path_to_segmentation: str = None) -> None:
"""Configure diarization pipeline from a YAML file.
@@ -90,8 +90,8 @@ class Transcriber:
kwargs = self._get_whisper_kwargs(**kwargs)
if "verbose" not in kwargs:
kwargs["verbose"] = False
if not kwargs.get("verbose"):
kwargs["verbose"] = None
result = self.model.transcribe(audio, *args, **kwargs)
return result["text"]
@@ -173,6 +173,9 @@ class Transcriber:
if (task := kwargs.get("task")):
whisper_kwargs["task"] = task
if (language := kwargs.get("language")):
whisper_kwargs["language"] = language
return whisper_kwargs
def __repr__(self) -> str:
@@ -1,69 +1,69 @@
import os
import subprocess as sp
MAJOR = 0
MINOR = 1
MICRO = 0
MICRO_POST = 0
ISRELEASED = False
VERSION = '%d.%d.%d.%d' % (MAJOR, MINOR, MICRO, MICRO_POST)
# Return the git revision as a string
# taken from numpy/numpy
def git_version():
def _minimal_ext_cmd(cmd):
# construct minimal environment
env = {}
for k in ['SYSTEMROOT', 'PATH', 'HOME']:
v = os.environ.get(k)
if v is not None:
env[k] = v
# LANGUAGE is used on win32
env['LANGUAGE'] = 'C'
env['LANG'] = 'C'
env['LC_ALL'] = 'C'
out = sp.Popen(cmd, stdout=sp.PIPE, stderr=sp.PIPE, env=env).communicate()[0]
return out
try:
out = _minimal_ext_cmd(['git', 'rev-parse', 'HEAD'])
GIT_REVISION = out.strip().decode('ascii')
except OSError:
GIT_REVISION = "Unknown"
return GIT_REVISION
def _get_git_version():
cwd = os.getcwd()
# go to the main directory
fdir = os.path.dirname(os.path.abspath(__file__))
maindir = os.path.abspath(os.path.join(fdir, ".."))
# maindir = fdir # os.path.join(fdir, "..")
os.chdir(maindir)
# get git version
res = git_version()
# restore the cwd
os.chdir(cwd)
return res
def get_version(build_version=False):
if ISRELEASED:
return VERSION
# unreleased version
GIT_REVISION = _get_git_version()
if build_version:
import datetime as dt
date = dt.date.strftime(dt.datetime.now(), "%Y%m%d%H%M%S")
return VERSION + ".dev" + date
else:
return VERSION + ".dev0+" + GIT_REVISION[:7]
import os
import subprocess as sp
MAJOR = 0
MINOR = 1
MICRO = 0
MICRO_POST = 0
ISRELEASED = False
VERSION = '%d.%d.%d.%d' % (MAJOR, MINOR, MICRO, MICRO_POST)
# Return the git revision as a string
# taken from numpy/numpy
def git_version():
def _minimal_ext_cmd(cmd):
# construct minimal environment
env = {}
for k in ['SYSTEMROOT', 'PATH', 'HOME']:
v = os.environ.get(k)
if v is not None:
env[k] = v
# LANGUAGE is used on win32
env['LANGUAGE'] = 'C'
env['LANG'] = 'C'
env['LC_ALL'] = 'C'
out = sp.Popen(cmd, stdout=sp.PIPE, stderr=sp.PIPE, env=env).communicate()[0]
return out
try:
out = _minimal_ext_cmd(['git', 'rev-parse', 'HEAD'])
GIT_REVISION = out.strip().decode('ascii')
except OSError:
GIT_REVISION = "Unknown"
return GIT_REVISION
def _get_git_version():
cwd = os.getcwd()
# go to the main directory
fdir = os.path.dirname(os.path.abspath(__file__))
maindir = os.path.abspath(os.path.join(fdir, ".."))
# maindir = fdir # os.path.join(fdir, "..")
os.chdir(maindir)
# get git version
res = git_version()
# restore the cwd
os.chdir(cwd)
return res
def get_version(build_version=False):
if ISRELEASED:
return VERSION
# unreleased version
GIT_REVISION = _get_git_version()
if build_version:
import datetime as dt
date = dt.date.strftime(dt.datetime.now(), "%Y%m%d%H%M%S")
return VERSION + ".dev" + date
else:
return VERSION + ".dev0+" + GIT_REVISION[:7]
+31
View File
@@ -0,0 +1,31 @@
[metadata]
name = scraibe
version = attr: scraibe.__version__
author = Jacob Schmieder
author_email = Jacob.Schmieder@dbfz.de
description = My package description
long_description = file: README.md, LICENSE
platforms = Linux
keywords = transcription speech recognition whisper pyannote audio speech-to-text speech-to-text transcription speech-to-text recognition voice-to-speech
license = GPL-3.0
classifiers =
Development Status :: 3 - Alpha
Environment :: GPU :: NVIDIA CUDA :: 11.2
License :: OSI Approved :: Open Software License 3.0 (OSL-3.0)
Topic :: Scientific/Engineering :: Artificial Intelligence
Programming Language :: Python :: 3.8
Programming Language :: Python :: 3.9
Programming Language :: Python :: 3.10
[options]
zip_safe = False
include_package_data = True
packages = find:
python_requires = >=3.7
install_requires =
requests
importlib-metadata; python_version<"3.8"
[options.entry_points]
console_scripts =
executable-name = scraibe.cli:cli
+19 -5
View File
@@ -1,8 +1,9 @@
from calendar import c
import pkg_resources
import os
from setuptools import setup, find_packages
module_name = "autotranscript"
module_name = "scraibe"
github_url = "https://github.com/JSchmie/autotranscript"
file_dir = os.path.dirname(os.path.realpath(__file__))
@@ -18,7 +19,7 @@ with open(verfile, "r") as fp:
############### setup ###############
build_version = "AUTOTRANSCRIPT_BUILD" in os.environ
build_version = "SCRAIBE_BUILD" in os.environ
if __name__ == "__main__":
@@ -36,11 +37,24 @@ if __name__ == "__main__":
'https://download.pytorch.org/whl/cu113',
],
url= github_url,
license='',
license='GPL-3',
author='Jacob Schmieder',
author_email='Jacob.Schmieder@dbfz.de',
description='Transcription tool for audio files based on Whisper and Pyannote',
package_data={ "header" : ["app/header.html"], "logo" : ["app/Logo_KIDA_bmel_green.svg"]},
classifiers=[
'Development Status :: 3 - Alpha',
'Environment :: GPU :: NVIDIA CUDA :: 11.2',
'License :: OSI Approved :: Open Software License 3.0 (OSL-3.0)',
'Topic :: Scientific/Engineering :: Artificial Intelligence',
'Programming Language :: Python :: 3.8',
'Programming Language :: Python :: 3.9',
'Programming Language :: Python :: 3.10'],
keywords = ['transcription', 'speech recognition', 'whisper', 'pyannote', 'audio',
'speech-to-text', 'speech-to-text transcription', 'speech-to-text recognition',
'voice-to-speech'],
package_data={'scraibe.app' : ["*.html", "*.svg"]},
entry_points={'console_scripts':
['autotranscript = autotranscript.cli:cli']}
['scraibe = scraibe.cli:cli']}
)
@@ -1,5 +1,5 @@
import pytest
from autotranscript import Transcriber
from scraibe import Transcriber
from unittest.mock import patch, mock_open
import os
@@ -55,7 +55,7 @@ def test_save_transcript_to_file(transcriber):
# Test Diaraization class
from autotranscript import Diariser
from scraibe import Diariser
@pytest.fixture
def diarisation():
@@ -83,7 +83,7 @@ def test_diarisation(diarisation):
# Test AudioProcessor
from autotranscript import AudioProcessor , TorchAudioProcessor
from scraibe import AudioProcessor , TorchAudioProcessor
def test_AudioProcessor_init():
-38
View File
@@ -1,38 +0,0 @@
# import os
# import sys
# import traceback
# class TracePrints(object):
# def __init__(self):
# self.stdout = sys.stdout
# def write(self, s):
# self.stdout.write("Writing %r\n" % s)
# traceback.print_stack(file=self.stdout)
# sys.stdout = TracePrints()
# os.environ["PYANNOTE_CACHE"] = os.path.expanduser("~/PycharmProjects/autotranscript/autotranscript/models/pyannote")
# import os
# os.environ['TRANSFORMERS_CACHE'] = os.path.expanduser("~/PycharmProjects/autotranscript/autotranscript/models")
# os.environ['HF_HOME'] = os.path.expanduser("~/PycharmProjects/autotranscript/autotranscript/models")
from autotranscript import AutoTranscribe
model = AutoTranscribe()
text = model.transcribe("test.mp4")
print("Transcription:\n")
print(text)
# from autotranscript.misc import *
# import os
# print(os.path.exists(CACHE_DIR))
# print(os.path.exists(WHISPER_DEFAULT_PATH))
# print(os.path.exists(PYANNOTE_DEFAULT_PATH))
# print(os.path.exists(PYANNOTE_DEFAULT_CONFIG))