diff --git a/Pictures/gradio_app.png b/Pictures/gradio_app.png new file mode 100644 index 0000000..7060598 Binary files /dev/null and b/Pictures/gradio_app.png differ diff --git a/README.md b/README.md deleted file mode 100644 index d33e580..0000000 --- a/README.md +++ /dev/null @@ -1,173 +0,0 @@ - -# `ScrAIbe: Streamlined Conversation Recording with Automated Intelligence Based Environment` - -`ScrAIbe` is a state-of-the-art, [PyTorch](https://pytorch.org/) based multilingual speech-to-text framework to generate fully automated transcriptions. - -Beyond transcription, ScrAIbe supports advanced functions, such as speaker diarization and speaker recognition. - -Designed as a comprehensive AI toolkit, it uses multiple AI models: - -- [whisper](https://github.com/openai/whisper): A general-purpose speech recognition model. -- [payannote-audio](https://github.com/pyannote/pyannote-audio): An open-source toolkit for speaker diarization. - -The framework utilizes a PyanNet-inspired pipeline with the `Pyannote` library for speaker diarization and `VoxCeleb` for speaker embedding. - -During post-diarization, each audio segment is processed by the OpenAI `Whisper` model, in a transformer encoder-decoder structure. Initially, a CNN mitigates noise and enhances speech. Before transcription, `VoxLingua` dentifies the language segment, facilitating Whisper's role in both transcription and text translation. - -The following graphic illustates the whole pipeline: - -![Pipeline](Pictures/pipeline.png#gh-dark-mode-only) -![Pipeline](Pictures/pipeline_light.png#gh-light-mode-only) - -## Install `ScrAIbe` : - -The following command will pull and install the latest commit from this repository, along with its Python dependencies. - - pip install git+https://github.com/JSchmie/autotranscript.git - -- **Python version**: Python 3.8 -- **PyTorch version**: Python 1.11.0 -- **CUDA version**: Cuda-toolkit 11.3.1 - - -Important: For the `Pyannote` model you need to be granted access in Hugging Face. -Check the [Pyannote model page](https://huggingface.co/pyannote/speaker-diarization) to get access to the model. - -Additionally, you need to generate a [Hugging Face token](https://huggingface.co/docs/hub/security-tokens). - -## Usage - -We've developed ScrAIbe with several access points to cater to diverse user needs. - -### Python usage - -It enables full control over the functionalities as well as process customization. - -Some usage examples: - -- Usage of `AutoTranscribe`, core of the transcription system, for performing trancription and diarization of audio files. - -```python -from scraibe import AutoTranscribe - -model = AutoTranscribe() - -text = model.transcribe("audio.wav") - -print(f"Transcription: \n{text}") - -``` -- Usage of `Diariser`, responsible for identifying -and segmenting individual speakers from a given audio file. - -```python - from scraibe import Diariser - -model = Diariser.load_model() - -diarisation_output = model.diarization("audio.wav") - -``` -- Usage of `Transcriber`, for transcribing audio files and saving the transcription afterwards. - -```python - from scraibe import Transcriber - -transcriber = Transcriber.load_model() - -transcript = transcriber.transcribe("audio.wav") - -transcriber.save_transcript(transcript, "path/to/save.txt") - -``` - - -Refer to [whisper](https://github.com/openai/whisper) and [payannote-audio](https://github.com/pyannote/pyannote-audio) for further options. - -### Command-line usage - -You can also run ScrAIbe in a [Gradio App](https://github.com/gradio-app/gradio) interface using the following command-line: - - scraibe audio.wav - -Some example of important functionalities are: - -- `--task`: Task to be performed, either transcription, diarization or translation into English. Default is transcription. -- `--hf-token`: Personal `Hugging Face` token. -- `--server-name`: Name of the Web Server. If empty 127.0.0.1 or 0.0.0.0 will be used. -- `--port`: To run the Gradio app. The default is 7860. - -- `--whisper-model-name`: Name of the [whisper](https://github.com/openai/whisper) model to be used. Default is `medium`. - - -Run the following to view all available options: - - scraibe -h - -### Running a Docker container - -After you have installed Docker, you can execute the following commands in the terminal. - -``` -sudo docker build . --build-arg="hf_token=[enter your HuggingFace token] " -t [image name] - -sudo docker run -it -p 7860:7860 --name [container name][image name] --hf_token [enter your HuggingFace token] --start_server - -``` -- `-p`: Flag for connecting the container interal port to the port on your local machine. -- `--hf_token`: Flag for entering your personal HuggingFace token in the container. -- `--start_server`: Command to start the Gradio App. - -Then click the following link to run the app: - -http://0.0.0.0:7860 - -- Enabling GPU usage - -``` -sudo docker run -it -p 7860:7860 --gpus 'all,capabilities=utility' --name [container name][image name] --hf_token [enter your HuggingFace token] --start_server -``` -For further guidance check: https://blog.roboflow.com/use-the-gpu-in-docker/ - - -## Documentation - -For further insights check the [documentation page](https://cristinaortizcruz.github.io/Test/). - -## Contributions - -We are happy for any interest in contributing and about feedback: In order to do that, create an issue with your feedback or feel free to contact us. - -## Roadmap - -The following milestones are planned for further releases of ScrAIbe: - -- Model quantization -Quantization to empower memory and computational efficiency. - -- Model fine-tuning -In order to be able to cover a variety of linguistic phenomena. - -For example, currently ScrAIbe is able to transcribe word by word, but ignores filler words or speech pauses. -These phenomena can be addressed by fine-tuning with the corresponding data. - -- Implementation of LLMs -One example is the implementation of a summarization or extraction model, which enables ScrAIbe to automatically summarize or retrieve the key information out of a generated transcription, which could be the minutes of a meeting. - -- Executable for Windows - -## Contact - -For queries contact [Jacob Schmieder](Jacob.Schmieder@dbfz.de) - -## License - -ScrAIbe is licensed under GNU General Public License. - -## Acknowledgments - -Special thanks go to the KIDA project and the BMEL (Bundesministerium für Ernährung und Landwirtschaft), especially to the AI Consultancy Team and the Infrastructure Team. - -![KIDA](Pictures/kida_dark.png#gh-dark-mode-only)   ![BMEL](Pictures/BMEL_dark.png#gh-dark-mode-only)      ![DBFZ](Pictures/DBFZ_dark.png#gh-dark-mode-only)       ![MRI](Pictures/MRI.png#gh-dark-mode-only) - -![KIDA](Pictures/kida.png#gh-light-mode-only)   ![BMEL](Pictures/BMEL.jpg#gh-light-mode-only)      ![DBFZ](Pictures/DBFZ.png#gh-light-mode-only)       ![MRI](Pictures/MRI.png#gh-light-mode-only) diff --git a/app.py b/app.py deleted file mode 100644 index 3645d79..0000000 --- a/app.py +++ /dev/null @@ -1,101 +0,0 @@ -from dash import Dash, dcc, html, dash_table, Input, Output, State, callback - -import base64 -from autotranscript.app.qtfaststart import process -from autotranscript import AutoTranscribe -import io -import subprocess as sp -import numpy as np -from autotranscript.audio import SAMPLE_RATE - -# Setup auto-transcript -autot = AutoTranscribe() # whisper_model="tiny", whisper_kwargs={"local" : False} - -# Setup FFmpeg -PROBLEMATIC_FILE_TYPES : tuple = "mov","mp4","m4a","3gp","3g2","mj2" - - -# Setup Dash -external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css'] - -app = Dash(__name__, external_stylesheets=external_stylesheets) - -app.layout = html.Div([ - dcc.Upload( - id='upload-data', - children=html.Div([ - 'Drag and Drop or ', - html.A('Select Files') - ]), - style={ - 'width': '100%', - 'height': '60px', - 'lineHeight': '60px', - 'borderWidth': '1px', - 'borderStyle': 'dashed', - 'borderRadius': '5px', - 'textAlign': 'center', - 'margin': '10px' - }, - # Allow multiple files to be uploaded - multiple=True - ), - html.Div(id='output-data-upload'), -]) - -def parse_contents(contents, filename, date): - content_type, content_string = contents.split(',') - - decoded = base64.b64decode(content_string) - file = io.BytesIO(decoded).read() - - if filename.endswith(PROBLEMATIC_FILE_TYPES): - # mp4 and other files need to be processed with qtfaststart - # since theire metadata is at the end of the file - # and we need it at the beginning - file = process(file) - - cmd = [ - "ffmpeg", - "-nostdin", - "-threads", "0", - "-i",'pipe:', - "-f", "s16le", - '-hide_banner', - '-loglevel', 'error', - "-c", "copy", - "-vn", - "-ac", "1", - "-acodec", "pcm_s16le", - "-ar", str(SAMPLE_RATE), - "-" - ] - - proc = sp.Popen(cmd, stdout=sp.PIPE, stdin=sp.PIPE) - - out = proc.communicate(input=file)[0] - out = np.frombuffer(out, np.int16).flatten().astype(np.float32) / 32768.0 - out = np.array([out, SAMPLE_RATE]) - - transcript = str(autot.transcribe(out)) - - return html.Div([ - html.H5(f"File Name: {filename} \n" \ - "Transcript: \n" - ), - html.P(transcript) - ]) - -@callback(Output('output-data-upload', 'children'), - Input('upload-data', 'contents'), - State('upload-data', 'filename'), - State('upload-data', 'last_modified')) -def update_output(list_of_contents, list_of_names, list_of_dates): - if list_of_contents is not None: - children = [ - parse_contents(c, n, d) for c, n, d in - zip(list_of_contents, list_of_names, list_of_dates)] - return children - -if __name__ == '__main__': - app.run_server() diff --git a/autotranscript/.pyannotetoken b/autotranscript/.pyannotetoken deleted file mode 100644 index e69de29..0000000 diff --git a/requirements.txt b/requirements.txt index dbd1bc3..ef74c29 100644 --- a/requirements.txt +++ b/requirements.txt @@ -9,8 +9,6 @@ pyannote.pipeline~=2.3 setuptools~=65.6.3 setuptools-rust~=1.5.2 -sphinx~=5.0.2 - tqdm>=4.65.0 gradio~=3.36.1 @@ -22,6 +20,6 @@ torch~=1.11.0 torchvision~=0.12.0 torchaudio~=0.11.0 #optional: -#dash~=2.10.2 +#sphinx~=5.0.2 diff --git a/scraibe/.pyannotetoken b/scraibe/.pyannotetoken new file mode 100644 index 0000000..42ba269 --- /dev/null +++ b/scraibe/.pyannotetoken @@ -0,0 +1 @@ +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 \ No newline at end of file diff --git a/autotranscript/__init__.py b/scraibe/__init__.py similarity index 100% rename from autotranscript/__init__.py rename to scraibe/__init__.py diff --git a/autotranscript/app/Logo_KIDA_bmel_green.svg b/scraibe/app/Logo_KIDA_bmel_green.svg similarity index 100% rename from autotranscript/app/Logo_KIDA_bmel_green.svg rename to scraibe/app/Logo_KIDA_bmel_green.svg diff --git a/autotranscript/app/__init__.py b/scraibe/app/__init__.py similarity index 100% rename from autotranscript/app/__init__.py rename to scraibe/app/__init__.py diff --git a/autotranscript/app/gradio_app.py b/scraibe/app/gradio_app.py similarity index 68% rename from autotranscript/app/gradio_app.py rename to scraibe/app/gradio_app.py index 13a6ee1..fa3e8fb 100644 --- a/autotranscript/app/gradio_app.py +++ b/scraibe/app/gradio_app.py @@ -3,7 +3,7 @@ Gradio Audio Transcription App. -------------------------------- This module provides an interface to transcribe audio files using the -AutoTranscribe model. Users can either upload an audio file or record their speech +Scraibe model. Users can either upload an audio file or record their speech live for transcription. The application supports multiple languages and provides options to specify the number of speakers and the language of the audio. @@ -20,7 +20,7 @@ Gradio Audio Transcription App. -------------------------------- This module provides an interface to transcribe audio files using the -AutoTranscribe model. Users can either upload an audio file or record their speech +Scraibe model. Users can either upload an audio file or record their speech live for transcription. The application supports multiple languages and provides options to specify the number of speakers and the language of the audio. @@ -33,10 +33,13 @@ Usage: """ import json +import os +from tkinter import CURRENT import gradio as gr -from autotranscript import AutoTranscribe, Transcript +from tqdm import tqdm +from scraibe import Scraibe, Transcript theme = gr.themes.Soft( primary_hue="green", @@ -59,17 +62,19 @@ LANGUAGES = [ "Vietnamese", "Welsh" ] +CURRENT_PATH = os.path.dirname(os.path.realpath(__file__)) + class GradioTranscriptionInterface: """ Interface handling the interaction between Gradio UI and the Audio Transcription system. """ - def __init__(self, model: AutoTranscribe): + def __init__(self, model: Scraibe): """ Initializes the GradioTranscriptionInterface with a transcription model. Args: - model (AutoTranscribe): Model responsible for audio transcription tasks. + model (Scraibe): Model responsible for audio transcription tasks. """ self.model = model @@ -78,7 +83,7 @@ class GradioTranscriptionInterface: translation : bool, language : str): """ - Shortcut method for the AutoTranscribe task. + Shortcut method for the Scraibe task. Returns: tuple: Transcribed text (str), JSON output (dict) @@ -89,13 +94,43 @@ class GradioTranscriptionInterface: "language": language if language != "None" else None, "task": 'translate' if translation else None } + if isinstance(source, str): + try: + result = self.model.autotranscribe(source, **kwargs) + except ValueError: + raise gr.Error("Couldn't detect any speech in the provided audio. \ + Please try again!") + + return str(result), result.get_json() - try: - result = self.model.autotranscribe(source, **kwargs) - except ValueError: - raise gr.Error("Couldn't detect any speech in the provided audio. \ - Please try again!") - return str(result), result.get_json() + elif isinstance(source, list): + source_names = [s.split("/")[-1] for s in source] + result = [] + for s in tqdm(source, total=len(source),desc = "Transcribing audio files"): + try: + res = self.model.autotranscribe(s, **kwargs) + except ValueError: + _name = s.split("/")[-1] + res = f"NO TRANSCRIPT FOUND FOR {_name}" + gr.Warning(f"Couldn't detect any speech in {_name} will skip this file.") + result.append(res) + + out = '' + out_dict = {} + for i, r in enumerate(result): + out += f"TRANSCRIPT {i} FOR ({source_names[i]}):\n\n" + out += str(r) + out += "\n\n" + + if isinstance(r, str): + out_dict[source_names[i]] = r + else: + out_dict[source_names[i]] = r.get_dict() + + return out, json.dumps(out_dict, indent=4) + + else: + raise gr.Error("Please provide a valid audio file.") def transcribe(self, source, translation, language): @@ -110,8 +145,28 @@ class GradioTranscriptionInterface: "task": 'translate' if translation == "Yes" else None } - result = self.model.transcribe(source, **kwargs) - return str(result) + if isinstance(source, str): + result = self.model.transcribe(source, **kwargs) + + return str(result) + + elif isinstance(source, list): + source_names = [s.split("/")[-1] for s in source] + result = [] + for s in tqdm(source, total=len(source),desc = "Transcribing audio files"): + res = self.model.transcribe(s, **kwargs) + result.append(res) + + out = '' + for i, res in enumerate(result): + out += f"TRANSCRIPT {i} FOR ({source_names[i]}):\n\n" + out += str(res) + out += "\n\n" + + return out + + else: + raise gr.Error("Please provide a valid audio file.") def perform_diarisation(self, source, num_speakers): """ @@ -124,22 +179,44 @@ class GradioTranscriptionInterface: "num_speakers": num_speakers if num_speakers != 0 else None, } + if isinstance(source, str): + try: + result = self.model.diarization(source, **kwargs) + except ValueError: + raise gr.Error("Couldn't detect any speech in the provided audio. \ + Please try again!") - try: - result = self.model.diarization(source, **kwargs) - except ValueError: - raise gr.Error("Couldn't detect any speech in the provided audio. \ - Please try again!") - return json.dumps(result, indent=2) + return json.dumps(result, indent=2) + elif isinstance(source, list): + source_names = [s.split("/")[-1] for s in source] + result = [] + for s in tqdm(source, total=len(source),desc = "Performing diarisation"): + try: + res = self.model.diarization(s, **kwargs) + except ValueError: + res = f"NO DIARISATION FOUND FOR {s}" + gr.Warning(f"Couldn't detect any speech in {s} will skip this file.") + result.append(res) + + out = {} + + for i, res in enumerate(result): + out[source_names[i]] = res + + return json.dumps(out, indent=4) + + else: + gr.Error("Please provide a valid audio file.") + #### # Gradio Interface #### -def gradio_Interface(model : AutoTranscribe = None): +def gradio_Interface(model : Scraibe = None): if model is None: - model = AutoTranscribe() + model = Scraibe() pipe = GradioTranscriptionInterface(model) @@ -197,7 +274,7 @@ def gradio_Interface(model : AutoTranscribe = None): gr.update(visible = True), gr.update(visible = False, value = None)) - elif choice == "File": + elif choice == "File or Files": return (gr.update(visible = False, value = None), gr.update(visible = False, value = None), @@ -205,22 +282,42 @@ def gradio_Interface(model : AutoTranscribe = None): gr.update(visible = False, value = None), gr.update(visible = True)) - def run_scribe(task, num_speakers, translate, language, audio1, audio2, video1, video2, file_in, progress = gr.Progress(track_tqdm= True)): + def run_scribe(task, + num_speakers, + translate, + language, + audio1, + audio2, + video1, + video2, + file_in, + progress = gr.Progress(track_tqdm= True)): # get *args which are not None progress(0, desc='Starting task...') source = audio1 or audio2 or video1 or video2 or file_in + if isinstance(source, list): + source = [s.name for s in source] + if len(source) == 1: + source = source[0] + if task == 'Auto Transcribe': - + out_str , out_json = pipe.auto_transcribe(source = source, num_speakers = num_speakers, translation = translate, language = language) - return (gr.update(value = out_str, visible = True), - gr.update(value = out_json, visible = True), - gr.update(visible = True), - gr.update(visible = True)) + if isinstance(source, str): + return (gr.update(value = out_str, visible = True), + gr.update(value = out_json, visible = True), + gr.update(visible = True), + gr.update(visible = True)) + else: + return (gr.update(value = out_str, visible = True), + gr.update(value = out_json, visible = True), + gr.update(visible = False), + gr.update(visible = False)) elif task == 'Transcribe': @@ -255,7 +352,8 @@ def gradio_Interface(model : AutoTranscribe = None): with gr.Blocks(theme=theme,title='ScrAIbe: Automatic Audio Transcription') as demo: # Define components - header = open("header.html", "r").read() + hname = os.path.join(CURRENT_PATH, "header.html") + header = open(hname, "r").read() gr.HTML(header, visible= True, show_label=False) with gr.Row(): @@ -279,7 +377,7 @@ def gradio_Interface(model : AutoTranscribe = None): leave it at None.", visible= True) input = gr.Radio(["Upload Audio", "Record Audio", "Upload Video","Record Video" - ,"File"], label="Input Type", value="Upload Audio") + ,"File or Files"], label="Input Type", value="Upload Audio") audio1 = gr.Audio(source="upload", type="filepath", label="Upload Audio", interactive= True, visible= True) @@ -289,7 +387,7 @@ def gradio_Interface(model : AutoTranscribe = None): interactive= True, visible= False) video2 = gr.Video(source="webcam", label="Record Video", type="filepath", interactive= True, visible= False) - file_in = gr.File(label="Upload File", interactive= True, visible= False) + file_in = gr.Files(label="Upload File or Files", interactive= True, visible= False) submit = gr.Button() diff --git a/autotranscript/app/header.html b/scraibe/app/header.html similarity index 100% rename from autotranscript/app/header.html rename to scraibe/app/header.html diff --git a/autotranscript/app/qtfaststart.py b/scraibe/app/qtfaststart.py similarity index 100% rename from autotranscript/app/qtfaststart.py rename to scraibe/app/qtfaststart.py diff --git a/autotranscript/audio.py b/scraibe/audio.py similarity index 100% rename from autotranscript/audio.py rename to scraibe/audio.py diff --git a/autotranscript/autotranscript.py b/scraibe/autotranscript.py similarity index 91% rename from autotranscript/autotranscript.py rename to scraibe/autotranscript.py index d27dba8..b3545e4 100644 --- a/autotranscript/autotranscript.py +++ b/scraibe/autotranscript.py @@ -1,5 +1,5 @@ """ -AutoTranscribe Class +Scraibe Class -------------------- This class serves as the core of the transcription system, responsible for handling @@ -12,15 +12,15 @@ By encapsulating the complexities of underlying models, it allows for straightfo integration into various applications, ranging from transcription services to voice assistants. Available Classes: -- AutoTranscribe: Main class for performing transcription and diarization. +- Scraibe: Main class for performing transcription and diarization. Includes methods for loading models, processing audio files, and formatting the transcription output. Usage: - from .autotranscribe import AutoTranscribe + from scraibe import Scraibe - model = AutoTranscribe(whisper_model="path/to/whisper/model", dia_model="path/to/diarisation/model") - transcript = model.transcribe("path/to/audiofile.wav") + model = Scraibe() + transcript = model.autotranscribe("path/to/audiofile.wav") """ # Standard Library Imports @@ -45,9 +45,9 @@ from .transcript_exporter import Transcript DiarisationType = TypeVar('DiarisationType') -class AutoTranscribe: +class Scraibe: """ - AutoTranscribe is a class responsible for managing the transcription and diarization of audio files. + Scraibe is a class responsible for managing the transcription and diarization of audio files. It serves as the core of the transcription system, incorporating pretrained models for speech-to-text (such as Whisper) and speaker diarization (such as pyannote.audio), allowing for comprehensive audio processing. @@ -57,7 +57,7 @@ class AutoTranscribe: diariser (Diariser): The diariser object to handle diarization. Methods: - __init__: Initializes the AutoTranscribe class with appropriate models. + __init__: Initializes the Scraibe class with appropriate models. transcribe: Transcribes an audio file using the whisper model and pyannote diarization model. remove_audio_file: Removes the original audio file to avoid disk space issues or ensure data privacy. get_audio_file: Gets an audio file as an AudioProcessor object. @@ -66,7 +66,7 @@ class AutoTranscribe: whisper_model: Union[bool, str, whisper] = None, dia_model : Union[bool, str, DiarisationType] = None, **kwargs) -> None: - """Initializes the AutoTranscribe class. + """Initializes the Scraibe class. Args: whisper_model (Union[bool, str, whisper], optional): @@ -92,7 +92,11 @@ class AutoTranscribe: else: self.diariser = dia_model - print("AutoTranscribe initialized all models successfully loaded.") + if kwargs.get("verbose"): + print("Scraibe initialized all models successfully loaded.") + self.verbose = True + else: + self.verbose = False def autotranscribe(self, audio_file : Union[str, torch.Tensor, ndarray], remove_original : bool = False, @@ -112,7 +116,8 @@ class AutoTranscribe: Transcript: A Transcript object containing the transcription, which can be exported to different formats. """ - + if kwargs.get("verbose"): + self.verbose = kwargs.get("verbose") # Get audio file as an AudioProcessor object audio_file = self.get_audio_file(audio_file) @@ -121,12 +126,12 @@ class AutoTranscribe: "waveform" : audio_file.waveform.reshape(1,len(audio_file.waveform)), "sample_rate": audio_file.sr } - - print("Starting diarisation.") + + if self.verbose: + print("Starting diarisation.") diarisation = self.diariser.diarization(dia_audio, **kwargs) - if not diarisation["segments"]: print("No segments found. Try to run transcription without diarisation.") @@ -138,16 +143,15 @@ class AutoTranscribe: return Transcript(final_transcript) - print("Diarisation finished. Starting transcription.") + if self.verbose: + print("Diarisation finished. Starting transcription.") audio_file.sr = torch.Tensor([audio_file.sr]).to(audio_file.waveform.device) # Transcribe each segment and store the results final_transcript = dict() - - - for i in trange(len(diarisation["segments"]), desc= "Transcribing"): + for i in trange(len(diarisation["segments"]), desc= "Transcribing", disable = not self.verbose): seg = diarisation["segments"][i] @@ -283,4 +287,4 @@ class AutoTranscribe: return audio_file def __repr__(self): - return f"AutoTranscribe(transcriber={self.transcriber}, diariser={self.diariser})" + return f"Scraibe(transcriber={self.transcriber}, diariser={self.diariser})" diff --git a/autotranscript/cli.py b/scraibe/cli.py similarity index 95% rename from autotranscript/cli.py rename to scraibe/cli.py index b9da56d..b05da92 100644 --- a/autotranscript/cli.py +++ b/scraibe/cli.py @@ -1,5 +1,5 @@ """ -Command-Line Interface (CLI) for the AutoTranscribe class, +Command-Line Interface (CLI) for the Scraibe class, allowing for user interaction to transcribe and diarize audio files. The function includes arguments for specifying the audio files, model paths, output formats, and other options necessary for transcription. @@ -8,9 +8,7 @@ import os from argparse import ArgumentParser, ArgumentDefaultsHelpFormatter import json -from sympy import use - -from .autotranscript import AutoTranscribe +from .autotranscript import Scraibe from .app.gradio_app import gradio_Interface from whisper.tokenizer import LANGUAGES , TO_LANGUAGE_CODE @@ -20,12 +18,12 @@ from torch import set_num_threads def cli(): """ - Command-Line Interface (CLI) for the AutoTranscribe class, allowing for user interaction to transcribe + Command-Line Interface (CLI) for the Scraibe class, allowing for user interaction to transcribe and diarize audio files. The function includes arguments for specifying the audio files, model paths, output formats, and other options necessary for transcription. This function can be executed from the command line to perform transcription tasks, providing a - user-friendly way to access the AutoTranscribe class functionalities. + user-friendly way to access the Scraibe class functionalities. """ def str2bool(string): @@ -115,7 +113,7 @@ def cli(): if arg_dict["whisper_model_directory"]: class_kwargs["download_root"] = arg_dict.pop("whisper_model_directory") - model = AutoTranscribe(**class_kwargs) + model = Scraibe(**class_kwargs) if arg_dict["audio_files"]: diff --git a/autotranscript/diarisation.py b/scraibe/diarisation.py similarity index 100% rename from autotranscript/diarisation.py rename to scraibe/diarisation.py diff --git a/autotranscript/misc.py b/scraibe/misc.py similarity index 99% rename from autotranscript/misc.py rename to scraibe/misc.py index 399fcbb..b1afeea 100644 --- a/autotranscript/misc.py +++ b/scraibe/misc.py @@ -14,7 +14,6 @@ WHISPER_DEFAULT_PATH = os.path.join(CACHE_DIR, "whisper") PYANNOTE_DEFAULT_PATH = os.path.join(CACHE_DIR, "pyannote") PYANNOTE_DEFAULT_CONFIG = os.path.join(PYANNOTE_DEFAULT_PATH, "config.yaml") - def config_diarization_yaml(file_path: str, path_to_segmentation: str = None) -> None: """Configure diarization pipeline from a YAML file. diff --git a/autotranscript/transcriber.py b/scraibe/transcriber.py similarity index 97% rename from autotranscript/transcriber.py rename to scraibe/transcriber.py index 63174a4..dbb290e 100644 --- a/autotranscript/transcriber.py +++ b/scraibe/transcriber.py @@ -90,8 +90,8 @@ class Transcriber: kwargs = self._get_whisper_kwargs(**kwargs) - if "verbose" not in kwargs: - kwargs["verbose"] = False + if not kwargs.get("verbose"): + kwargs["verbose"] = None result = self.model.transcribe(audio, *args, **kwargs) return result["text"] @@ -173,6 +173,9 @@ class Transcriber: if (task := kwargs.get("task")): whisper_kwargs["task"] = task + if (language := kwargs.get("language")): + whisper_kwargs["language"] = language + return whisper_kwargs def __repr__(self) -> str: diff --git a/autotranscript/transcript_exporter.py b/scraibe/transcript_exporter.py similarity index 100% rename from autotranscript/transcript_exporter.py rename to scraibe/transcript_exporter.py diff --git a/autotranscript/version.py b/scraibe/version.py similarity index 95% rename from autotranscript/version.py rename to scraibe/version.py index 0a3730e..b3cf626 100644 --- a/autotranscript/version.py +++ b/scraibe/version.py @@ -1,69 +1,69 @@ -import os -import subprocess as sp - -MAJOR = 0 -MINOR = 1 -MICRO = 0 -MICRO_POST = 0 -ISRELEASED = False -VERSION = '%d.%d.%d.%d' % (MAJOR, MINOR, MICRO, MICRO_POST) - -# Return the git revision as a string -# taken from numpy/numpy -def git_version(): - def _minimal_ext_cmd(cmd): - # construct minimal environment - env = {} - for k in ['SYSTEMROOT', 'PATH', 'HOME']: - v = os.environ.get(k) - if v is not None: - env[k] = v - - # LANGUAGE is used on win32 - env['LANGUAGE'] = 'C' - env['LANG'] = 'C' - env['LC_ALL'] = 'C' - - out = sp.Popen(cmd, stdout=sp.PIPE, stderr=sp.PIPE, env=env).communicate()[0] - return out - - try: - out = _minimal_ext_cmd(['git', 'rev-parse', 'HEAD']) - GIT_REVISION = out.strip().decode('ascii') - except OSError: - GIT_REVISION = "Unknown" - - return GIT_REVISION - -def _get_git_version(): - cwd = os.getcwd() - - # go to the main directory - fdir = os.path.dirname(os.path.abspath(__file__)) - maindir = os.path.abspath(os.path.join(fdir, "..")) - # maindir = fdir # os.path.join(fdir, "..") - os.chdir(maindir) - - # get git version - res = git_version() - - # restore the cwd - os.chdir(cwd) - return res - -def get_version(build_version=False): - if ISRELEASED: - return VERSION - - # unreleased version - GIT_REVISION = _get_git_version() - - if build_version: - import datetime as dt - date = dt.date.strftime(dt.datetime.now(), "%Y%m%d%H%M%S") - return VERSION + ".dev" + date - else: - return VERSION + ".dev0+" + GIT_REVISION[:7] - - - +import os +import subprocess as sp + +MAJOR = 0 +MINOR = 1 +MICRO = 0 +MICRO_POST = 0 +ISRELEASED = False +VERSION = '%d.%d.%d.%d' % (MAJOR, MINOR, MICRO, MICRO_POST) + +# Return the git revision as a string +# taken from numpy/numpy +def git_version(): + def _minimal_ext_cmd(cmd): + # construct minimal environment + env = {} + for k in ['SYSTEMROOT', 'PATH', 'HOME']: + v = os.environ.get(k) + if v is not None: + env[k] = v + + # LANGUAGE is used on win32 + env['LANGUAGE'] = 'C' + env['LANG'] = 'C' + env['LC_ALL'] = 'C' + + out = sp.Popen(cmd, stdout=sp.PIPE, stderr=sp.PIPE, env=env).communicate()[0] + return out + + try: + out = _minimal_ext_cmd(['git', 'rev-parse', 'HEAD']) + GIT_REVISION = out.strip().decode('ascii') + except OSError: + GIT_REVISION = "Unknown" + + return GIT_REVISION + +def _get_git_version(): + cwd = os.getcwd() + + # go to the main directory + fdir = os.path.dirname(os.path.abspath(__file__)) + maindir = os.path.abspath(os.path.join(fdir, "..")) + # maindir = fdir # os.path.join(fdir, "..") + os.chdir(maindir) + + # get git version + res = git_version() + + # restore the cwd + os.chdir(cwd) + return res + +def get_version(build_version=False): + if ISRELEASED: + return VERSION + + # unreleased version + GIT_REVISION = _get_git_version() + + if build_version: + import datetime as dt + date = dt.date.strftime(dt.datetime.now(), "%Y%m%d%H%M%S") + return VERSION + ".dev" + date + else: + return VERSION + ".dev0+" + GIT_REVISION[:7] + + + diff --git a/setup.cfg b/setup.cfg new file mode 100644 index 0000000..3e2cac7 --- /dev/null +++ b/setup.cfg @@ -0,0 +1,31 @@ +[metadata] +name = scraibe +version = attr: scraibe.__version__ +author = Jacob Schmieder +author_email = Jacob.Schmieder@dbfz.de +description = My package description +long_description = file: README.md, LICENSE +platforms = Linux +keywords = transcription speech recognition whisper pyannote audio speech-to-text speech-to-text transcription speech-to-text recognition voice-to-speech +license = GPL-3.0 +classifiers = + Development Status :: 3 - Alpha + Environment :: GPU :: NVIDIA CUDA :: 11.2 + License :: OSI Approved :: Open Software License 3.0 (OSL-3.0) + Topic :: Scientific/Engineering :: Artificial Intelligence + Programming Language :: Python :: 3.8 + Programming Language :: Python :: 3.9 + Programming Language :: Python :: 3.10 + +[options] +zip_safe = False +include_package_data = True +packages = find: +python_requires = >=3.7 +install_requires = + requests + importlib-metadata; python_version<"3.8" + +[options.entry_points] +console_scripts = + executable-name = scraibe.cli:cli \ No newline at end of file diff --git a/setup.py b/setup.py index 05a7f77..98d178e 100644 --- a/setup.py +++ b/setup.py @@ -1,8 +1,9 @@ +from calendar import c import pkg_resources import os from setuptools import setup, find_packages -module_name = "autotranscript" +module_name = "scraibe" github_url = "https://github.com/JSchmie/autotranscript" file_dir = os.path.dirname(os.path.realpath(__file__)) @@ -18,7 +19,7 @@ with open(verfile, "r") as fp: ############### setup ############### -build_version = "AUTOTRANSCRIPT_BUILD" in os.environ +build_version = "SCRAIBE_BUILD" in os.environ if __name__ == "__main__": @@ -36,11 +37,24 @@ if __name__ == "__main__": 'https://download.pytorch.org/whl/cu113', ], url= github_url, - license='', + + license='GPL-3', author='Jacob Schmieder', author_email='Jacob.Schmieder@dbfz.de', description='Transcription tool for audio files based on Whisper and Pyannote', - package_data={ "header" : ["app/header.html"], "logo" : ["app/Logo_KIDA_bmel_green.svg"]}, + classifiers=[ + 'Development Status :: 3 - Alpha', + 'Environment :: GPU :: NVIDIA CUDA :: 11.2', + 'License :: OSI Approved :: Open Software License 3.0 (OSL-3.0)', + 'Topic :: Scientific/Engineering :: Artificial Intelligence', + 'Programming Language :: Python :: 3.8', + 'Programming Language :: Python :: 3.9', + 'Programming Language :: Python :: 3.10'], + keywords = ['transcription', 'speech recognition', 'whisper', 'pyannote', 'audio', + 'speech-to-text', 'speech-to-text transcription', 'speech-to-text recognition', + 'voice-to-speech'], + package_data={'scraibe.app' : ["*.html", "*.svg"]}, entry_points={'console_scripts': - ['autotranscript = autotranscript.cli:cli']} + ['scraibe = scraibe.cli:cli']} + ) diff --git a/test_autotranscript.py b/tests/test_autotranscript.py similarity index 95% rename from test_autotranscript.py rename to tests/test_autotranscript.py index 8f745a0..475f4de 100644 --- a/test_autotranscript.py +++ b/tests/test_autotranscript.py @@ -1,5 +1,5 @@ import pytest -from autotranscript import Transcriber +from scraibe import Transcriber from unittest.mock import patch, mock_open import os @@ -55,7 +55,7 @@ def test_save_transcript_to_file(transcriber): # Test Diaraization class -from autotranscript import Diariser +from scraibe import Diariser @pytest.fixture def diarisation(): @@ -83,7 +83,7 @@ def test_diarisation(diarisation): # Test AudioProcessor -from autotranscript import AudioProcessor , TorchAudioProcessor +from scraibe import AudioProcessor , TorchAudioProcessor def test_AudioProcessor_init(): diff --git a/transcribe.py b/transcribe.py deleted file mode 100644 index 73d8838..0000000 --- a/transcribe.py +++ /dev/null @@ -1,38 +0,0 @@ -# import os -# import sys -# import traceback - -# class TracePrints(object): -# def __init__(self): -# self.stdout = sys.stdout -# def write(self, s): -# self.stdout.write("Writing %r\n" % s) -# traceback.print_stack(file=self.stdout) - -# sys.stdout = TracePrints() - -# os.environ["PYANNOTE_CACHE"] = os.path.expanduser("~/PycharmProjects/autotranscript/autotranscript/models/pyannote") -# import os - -# os.environ['TRANSFORMERS_CACHE'] = os.path.expanduser("~/PycharmProjects/autotranscript/autotranscript/models") -# os.environ['HF_HOME'] = os.path.expanduser("~/PycharmProjects/autotranscript/autotranscript/models") - - -from autotranscript import AutoTranscribe - -model = AutoTranscribe() - -text = model.transcribe("test.mp4") - -print("Transcription:\n") -print(text) - - -# from autotranscript.misc import * -# import os - -# print(os.path.exists(CACHE_DIR)) -# print(os.path.exists(WHISPER_DEFAULT_PATH)) -# print(os.path.exists(PYANNOTE_DEFAULT_PATH)) - -# print(os.path.exists(PYANNOTE_DEFAULT_CONFIG))