removed docs to aviod conflict

2023-09-22 18:44:24 +02:00
parent 6a7c2644f2 064aac9c5b
commit 455c2e3276
24 changed files with 287 additions and 453 deletions
@@ -1,173 +0,0 @@
 # `ScrAIbe: Streamlined Conversation Recording with Automated Intelligence Based Environment`
 `ScrAIbe` is a state-of-the-art,  [PyTorch](https://pytorch.org/) based multilingual speech-to-text framework to generate fully automated transcriptions. 
 Beyond transcription, ScrAIbe supports advanced functions, such as speaker diarization and speaker recognition.
 Designed as a comprehensive AI toolkit, it uses multiple AI models:
 - [whisper](https://github.com/openai/whisper): A general-purpose speech recognition model.
 - [payannote-audio](https://github.com/pyannote/pyannote-audio): An open-source toolkit for speaker diarization.
 The framework utilizes a PyanNet-inspired pipeline with the `Pyannote` library for speaker diarization and `VoxCeleb` for speaker embedding.
 During post-diarization, each audio segment is processed by the OpenAI `Whisper` model, in a transformer encoder-decoder structure. Initially, a CNN mitigates noise and enhances speech. Before transcription, `VoxLingua` dentifies the language segment, facilitating Whisper's role in both transcription and text translation.
 The following graphic illustates the whole pipeline:
 ![Pipeline](Pictures/pipeline.png#gh-dark-mode-only) 
 ![Pipeline](Pictures/pipeline_light.png#gh-light-mode-only) 
 ## Install `ScrAIbe` : 
 The following command will pull and install the latest commit from this repository, along with its Python dependencies.
    pip install git+https://github.com/JSchmie/autotranscript.git
 - **Python version**: Python 3.8
 - **PyTorch version**: Python 1.11.0
 - **CUDA version**: Cuda-toolkit 11.3.1
 Important: For the `Pyannote` model you need to be granted access in Hugging Face.
 Check the [Pyannote model page](https://huggingface.co/pyannote/speaker-diarization) to get access to the model.
 Additionally, you need to generate a [Hugging Face token](https://huggingface.co/docs/hub/security-tokens). 
 ## Usage 
 We've developed ScrAIbe with several access points to cater to diverse user needs.
 ### Python usage
 It enables full control over the functionalities as well as process customization.
 Some usage examples:
 - Usage of `AutoTranscribe`, core of the transcription system, for performing trancription and diarization of audio files.
 ```python
 from scraibe import AutoTranscribe
 model = AutoTranscribe()
 text = model.transcribe("audio.wav")
 print(f"Transcription: \n{text}")
 ```
 - Usage of `Diariser`, responsible for identifying
 and segmenting individual speakers from a given audio file.
 ```python
 from scraibe import Diariser
 model = Diariser.load_model()
 diarisation_output  = model.diarization("audio.wav")
 ```
 - Usage of `Transcriber`, for transcribing audio files and saving the transcription afterwards.
 ```python
 from scraibe import Transcriber
 transcriber = Transcriber.load_model()
 transcript  = transcriber.transcribe("audio.wav")
 transcriber.save_transcript(transcript, "path/to/save.txt")
 ```
 Refer to [whisper](https://github.com/openai/whisper) and [payannote-audio](https://github.com/pyannote/pyannote-audio) for further options.
 ### Command-line usage
 You can also run ScrAIbe in a [Gradio App](https://github.com/gradio-app/gradio)  interface using the following command-line:
 	scraibe audio.wav
 Some example of important functionalities are:
 -  `--task`: Task to be performed, either transcription, diarization or translation into English. Default is transcription.
 - `--hf-token`: Personal `Hugging Face` token.
 - `--server-name`: Name of the Web Server. If empty 127.0.0.1 or 0.0.0.0 will be used.
 -  `--port`: To run the Gradio app. The default is 7860.
 - `--whisper-model-name`: Name of the [whisper](https://github.com/openai/whisper) model to be used. Default is `medium`.
 Run the following to view all available options:
 	scraibe -h
 ### Running a Docker container
 After you have installed Docker, you can execute the following commands in the terminal.
 ```
 sudo docker build . --build-arg="hf_token=[enter your HuggingFace token] " -t [image name] 
 sudo docker run -it  -p 7860:7860  --name [container name][image name]  --hf_token [enter your HuggingFace token] --start_server
 ```
 -  `-p`: Flag for connecting the container interal port to the port on your local machine.
 -  `--hf_token`: Flag for entering your personal HuggingFace token in the container.
 - `--start_server`: Command to start the Gradio App.
 Then click the following link to run the app:
 http://0.0.0.0:7860
 - Enabling GPU usage
 ```
 sudo docker run -it  -p 7860:7860 --gpus 'all,capabilities=utility'  --name [container name][image name]  --hf_token [enter your HuggingFace token] --start_server
 ```
 For further guidance check: https://blog.roboflow.com/use-the-gpu-in-docker/ 
 ## Documentation 
 For further insights check the [documentation page](https://cristinaortizcruz.github.io/Test/).
 ## Contributions
 We are happy for any interest in contributing and about feedback: In order to do that, create an issue with your feedback or feel free to contact us.
 ## Roadmap
 The following milestones are planned for further releases of ScrAIbe:
 - Model quantization   
 Quantization to empower memory and computational efficiency.
 - Model fine-tuning  
 In order to be able to cover a variety of linguistic phenomena.
 For example, currently ScrAIbe is able to transcribe word by word, but ignores filler words or speech pauses. 
 These phenomena can be addressed by fine-tuning with the corresponding data.
 - Implementation of LLMs   
 One example is the implementation of a summarization or extraction model, which enables ScrAIbe to automatically summarize or retrieve the key information out of a generated transcription, which could be the minutes of a meeting.
 - Executable for Windows
 ## Contact
 For queries contact [Jacob Schmieder](Jacob.Schmieder@dbfz.de)
 ## License 
 ScrAIbe is licensed under GNU General Public License.
 ## Acknowledgments
 Special thanks go to the KIDA project and the BMEL (Bundesministerium für Ernährung und Landwirtschaft), especially to the AI Consultancy Team and the Infrastructure Team.
 ![KIDA](Pictures/kida_dark.png#gh-dark-mode-only)    &nbsp;    ![BMEL](Pictures/BMEL_dark.png#gh-dark-mode-only) &nbsp;&nbsp;&nbsp;&nbsp; ![DBFZ](Pictures/DBFZ_dark.png#gh-dark-mode-only)   &nbsp;  &nbsp;&nbsp;&nbsp;    ![MRI](Pictures/MRI.png#gh-dark-mode-only)   
 ![KIDA](Pictures/kida.png#gh-light-mode-only)    &nbsp;    ![BMEL](Pictures/BMEL.jpg#gh-light-mode-only) &nbsp;&nbsp;&nbsp;&nbsp; ![DBFZ](Pictures/DBFZ.png#gh-light-mode-only)   &nbsp;  &nbsp;&nbsp;&nbsp;    ![MRI](Pictures/MRI.png#gh-light-mode-only)  
@@ -1,101 +0,0 @@
 from dash import Dash, dcc, html, dash_table, Input, Output, State, callback
 import base64
 from autotranscript.app.qtfaststart import process
 from autotranscript import AutoTranscribe
 import io
 import subprocess as sp
 import numpy as np
 from autotranscript.audio import SAMPLE_RATE
 # Setup auto-transcript
 autot = AutoTranscribe() # whisper_model="tiny", whisper_kwargs={"local" : False}
 # Setup FFmpeg
 PROBLEMATIC_FILE_TYPES : tuple = "mov","mp4","m4a","3gp","3g2","mj2"
 # Setup Dash
 external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']
 app = Dash(__name__, external_stylesheets=external_stylesheets)
 app.layout = html.Div([
    dcc.Upload(
        id='upload-data',
        children=html.Div([
            'Drag and Drop or ',
            html.A('Select Files')
        ]),
        style={
            'width': '100%',
            'height': '60px',
            'lineHeight': '60px',
            'borderWidth': '1px',
            'borderStyle': 'dashed',
            'borderRadius': '5px',
            'textAlign': 'center',
            'margin': '10px'
        },
        # Allow multiple files to be uploaded
        multiple=True
    ),
    html.Div(id='output-data-upload'),
 ])
 def parse_contents(contents, filename, date):
    content_type, content_string = contents.split(',')
    decoded = base64.b64decode(content_string)
    file = io.BytesIO(decoded).read()
    if filename.endswith(PROBLEMATIC_FILE_TYPES):
        # mp4 and other files need to be processed with qtfaststart
        # since theire metadata is at the end of the file
        # and we need it at the beginning
        file = process(file) 
    cmd = [
            "ffmpeg",
            "-nostdin",
            "-threads", "0",
            "-i",'pipe:',
            "-f", "s16le",
            '-hide_banner',
            '-loglevel', 'error',
            "-c", "copy",
            "-vn",
            "-ac", "1",
            "-acodec", "pcm_s16le",
            "-ar", str(SAMPLE_RATE),
            "-"
        ]
    proc = sp.Popen(cmd, stdout=sp.PIPE, stdin=sp.PIPE)
    out = proc.communicate(input=file)[0]
    out = np.frombuffer(out, np.int16).flatten().astype(np.float32) / 32768.0
    out = np.array([out, SAMPLE_RATE])
    transcript = str(autot.transcribe(out))
    return html.Div([
        html.H5(f"File Name: {filename} \n" \
                "Transcript: \n"
                ),
        html.P(transcript)
    ])
@callback(Output('output-data-upload', 'children'),
              Input('upload-data', 'contents'),
              State('upload-data', 'filename'),
              State('upload-data', 'last_modified'))
 def update_output(list_of_contents, list_of_names, list_of_dates):
    if list_of_contents is not None:
        children = [
            parse_contents(c, n, d) for c, n, d in
            zip(list_of_contents, list_of_names, list_of_dates)]
        return children
 if __name__ == '__main__':
    app.run_server()
@@ -9,8 +9,6 @@ pyannote.pipeline~=2.3
 setuptools~=65.6.3
 setuptools-rust~=1.5.2
 sphinx~=5.0.2         
 tqdm>=4.65.0
 gradio~=3.36.1
@@ -22,6 +20,6 @@ torch~=1.11.0
 torchvision~=0.12.0
 torchaudio~=0.11.0
 #optional: 
-#dash~=2.10.2
+#sphinx~=5.0.2 
@@ -0,0 +1 @@
 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
@@ -3,7 +3,7 @@ Gradio Audio Transcription App.
 --------------------------------
 This module provides an interface to transcribe audio files using the 
-AutoTranscribe model. Users can either upload an audio file or record their speech 
+Scraibe model. Users can either upload an audio file or record their speech 
 live for transcription. The application supports multiple languages and provides 
 options to specify the number of speakers and the language of the audio.
@@ -20,7 +20,7 @@ Gradio Audio Transcription App.
 --------------------------------
 This module provides an interface to transcribe audio files using the 
-AutoTranscribe model. Users can either upload an audio file or record their speech 
+Scraibe model. Users can either upload an audio file or record their speech 
 live for transcription. The application supports multiple languages and provides 
 options to specify the number of speakers and the language of the audio.
@@ -33,10 +33,13 @@ Usage:
 """
 import json
 import os
 from tkinter import CURRENT
 import gradio as gr
-from autotranscript import AutoTranscribe, Transcript
+from tqdm import tqdm
 from scraibe import Scraibe, Transcript
 theme = gr.themes.Soft(
    primary_hue="green",
@@ -59,17 +62,19 @@ LANGUAGES = [
    "Vietnamese", "Welsh"
 ]
 CURRENT_PATH = os.path.dirname(os.path.realpath(__file__))
 class GradioTranscriptionInterface:
    """
    Interface handling the interaction between Gradio UI and the Audio Transcription system.
    """
-    def __init__(self, model: AutoTranscribe):
+    def __init__(self, model: Scraibe):
        """
        Initializes the GradioTranscriptionInterface with a transcription model.
        Args:
-            model (AutoTranscribe): Model responsible for audio transcription tasks.
+            model (Scraibe): Model responsible for audio transcription tasks.
        """
        self.model = model
@@ -78,7 +83,7 @@ class GradioTranscriptionInterface:
                        translation : bool,
                        language : str):
        """
-        Shortcut method for the AutoTranscribe task.
+        Shortcut method for the Scraibe task.
        Returns:
            tuple: Transcribed text (str), JSON output (dict)
@@ -89,13 +94,43 @@ class GradioTranscriptionInterface:
            "language": language if language != "None" else None,
            "task": 'translate' if translation else None
        }
        if isinstance(source, str):
            try:
                result = self.model.autotranscribe(source, **kwargs)
            except ValueError:
                raise gr.Error("Couldn't detect any speech in the provided audio. \
                        Please try again!")
            return str(result), result.get_json()
-        try:
+        elif isinstance(source, list):
-            result = self.model.autotranscribe(source, **kwargs)
+            source_names = [s.split("/")[-1] for s in source]
-        except ValueError:
+            result = []
-            raise gr.Error("Couldn't detect any speech in the provided audio. \
+            for s in tqdm(source, total=len(source),desc = "Transcribing audio files"):
-                    Please try again!")
+                try:
-        return str(result), result.get_json()
+                    res = self.model.autotranscribe(s, **kwargs)
                except ValueError:
                    _name = s.split("/")[-1]
                    res = f"NO TRANSCRIPT FOUND FOR {_name}"
                    gr.Warning(f"Couldn't detect any speech in {_name} will skip this file.")
                result.append(res)
            out = ''
            out_dict = {}
            for i, r in enumerate(result):
                out += f"TRANSCRIPT {i} FOR ({source_names[i]}):\n\n"
                out += str(r)
                out += "\n\n"
                if isinstance(r, str):
                    out_dict[source_names[i]] = r
                else:
                    out_dict[source_names[i]] = r.get_dict()
            return out, json.dumps(out_dict, indent=4)
        else:
            raise gr.Error("Please provide a valid audio file.")
    def transcribe(self, source, translation, language):
@@ -110,8 +145,28 @@ class GradioTranscriptionInterface:
            "task": 'translate' if translation == "Yes" else None
        }
-        result = self.model.transcribe(source, **kwargs)
+        if isinstance(source, str):
-        return str(result)
+            result = self.model.transcribe(source, **kwargs)
            return str(result)
        elif isinstance(source, list):
            source_names = [s.split("/")[-1] for s in source]
            result = []
            for s in tqdm(source, total=len(source),desc = "Transcribing audio files"):
                res = self.model.transcribe(s, **kwargs)
                result.append(res)
            out = ''
            for i, res in enumerate(result):
                out += f"TRANSCRIPT {i} FOR ({source_names[i]}):\n\n"
                out += str(res)
                out += "\n\n"
            return out
        else:
            raise gr.Error("Please provide a valid audio file.")
    def perform_diarisation(self, source, num_speakers):
        """
@@ -124,22 +179,44 @@ class GradioTranscriptionInterface:
            "num_speakers": num_speakers if num_speakers != 0 else None,
        }
        if isinstance(source, str):
            try:
                result = self.model.diarization(source, **kwargs)
            except ValueError:
                raise gr.Error("Couldn't detect any speech in the provided audio. \
                        Please try again!")
-        try:
+            return json.dumps(result, indent=2)
-            result = self.model.diarization(source, **kwargs)
+        elif isinstance(source, list):
-        except ValueError:
+            source_names = [s.split("/")[-1] for s in source]
-            raise gr.Error("Couldn't detect any speech in the provided audio. \
+            result = []
-                    Please try again!")
+            for s in tqdm(source, total=len(source),desc = "Performing diarisation"):
-        return json.dumps(result, indent=2)
+                try:
                    res = self.model.diarization(s, **kwargs)
                except ValueError:
                    res = f"NO DIARISATION FOUND FOR {s}"
                    gr.Warning(f"Couldn't detect any speech in {s} will skip this file.")
                result.append(res)
            out = {}
            for i, res in enumerate(result):
                out[source_names[i]] = res
            return json.dumps(out, indent=4)
        else:
            gr.Error("Please provide a valid audio file.")
 ####
 # Gradio Interface
 ####
-def gradio_Interface(model : AutoTranscribe = None):
+def gradio_Interface(model : Scraibe = None):
    if model is None:
-        model = AutoTranscribe()
+        model = Scraibe()
    pipe = GradioTranscriptionInterface(model)
@@ -197,7 +274,7 @@ def gradio_Interface(model : AutoTranscribe = None):
                    gr.update(visible = True),
                    gr.update(visible = False, value = None))
-        elif choice == "File":
+        elif choice == "File or Files":
            return (gr.update(visible = False, value = None),
                    gr.update(visible = False, value = None),
@@ -205,22 +282,42 @@ def gradio_Interface(model : AutoTranscribe = None):
                    gr.update(visible = False, value = None),
                    gr.update(visible = True))
-    def run_scribe(task, num_speakers, translate, language, audio1, audio2, video1, video2, file_in, progress = gr.Progress(track_tqdm= True)):
+    def run_scribe(task,
                   num_speakers,
                   translate,
                   language,
                   audio1,
                   audio2,
                   video1,
                   video2,
                   file_in,
                   progress = gr.Progress(track_tqdm= True)):
        # get *args which are not None
        progress(0, desc='Starting task...')
        source = audio1 or audio2 or video1 or video2 or file_in
        if isinstance(source, list):
            source = [s.name for s in source]
            if len(source) == 1:
                source = source[0]
        if task == 'Auto Transcribe':
-            
+    
            out_str , out_json = pipe.auto_transcribe(source = source,
                                num_speakers = num_speakers,
                                translation = translate,
                                language = language)
-            return (gr.update(value = out_str, visible = True),
+            if isinstance(source, str):
-                    gr.update(value = out_json, visible = True),
+                return (gr.update(value = out_str, visible = True),
-                    gr.update(visible = True),
+                        gr.update(value = out_json, visible = True),
-                    gr.update(visible = True))        
+                        gr.update(visible = True),
                        gr.update(visible = True))      
            else:
                return (gr.update(value = out_str, visible = True),
                        gr.update(value = out_json, visible = True),
                        gr.update(visible = False),
                        gr.update(visible = False))  
        elif task == 'Transcribe':
@@ -255,7 +352,8 @@ def gradio_Interface(model : AutoTranscribe = None):
    with gr.Blocks(theme=theme,title='ScrAIbe: Automatic Audio Transcription') as demo:
        # Define components
-        header = open("header.html", "r").read()
+        hname = os.path.join(CURRENT_PATH, "header.html")
        header = open(hname, "r").read()
        gr.HTML(header, visible= True, show_label=False)
        with gr.Row():
@@ -279,7 +377,7 @@ def gradio_Interface(model : AutoTranscribe = None):
                                    leave it at None.", visible= True)
                input = gr.Radio(["Upload Audio", "Record Audio", "Upload Video","Record Video" 
-                                    ,"File"], label="Input Type", value="Upload Audio")
+                                    ,"File or Files"], label="Input Type", value="Upload Audio")
                audio1 = gr.Audio(source="upload", type="filepath", label="Upload Audio",
                                    interactive= True, visible= True)
@@ -289,7 +387,7 @@ def gradio_Interface(model : AutoTranscribe = None):
                                    interactive= True, visible= False)
                video2 = gr.Video(source="webcam", label="Record Video", type="filepath",
                                    interactive= True, visible= False)
-                file_in = gr.File(label="Upload File", interactive= True, visible= False)
+                file_in = gr.Files(label="Upload File or Files", interactive= True, visible= False)
                submit = gr.Button()
@@ -1,5 +1,5 @@
 """
-AutoTranscribe Class
+Scraibe Class
 --------------------
 This class serves as the core of the transcription system, responsible for handling
@@ -12,15 +12,15 @@ By encapsulating the complexities of underlying models, it allows for straightfo
 integration into various applications, ranging from transcription services to voice assistants.
 Available Classes:
- AutoTranscribe: Main class for performing transcription and diarization.
+- Scraibe: Main class for performing transcription and diarization.
                  Includes methods for loading models, processing audio files,
                  and formatting the transcription output.
 Usage:
-    from .autotranscribe import AutoTranscribe
+    from scraibe import Scraibe
-    model = AutoTranscribe(whisper_model="path/to/whisper/model", dia_model="path/to/diarisation/model")
+    model = Scraibe()
-    transcript = model.transcribe("path/to/audiofile.wav")
+    transcript = model.autotranscribe("path/to/audiofile.wav")
 """
 # Standard Library Imports
@@ -45,9 +45,9 @@ from .transcript_exporter import Transcript
 DiarisationType = TypeVar('DiarisationType')
-class AutoTranscribe:
+class Scraibe:
    """
-    AutoTranscribe is a class responsible for managing the transcription and diarization of audio files.
+    Scraibe is a class responsible for managing the transcription and diarization of audio files.
    It serves as the core of the transcription system, incorporating pretrained models
    for speech-to-text (such as Whisper) and speaker diarization (such as pyannote.audio),
    allowing for comprehensive audio processing.
@@ -57,7 +57,7 @@ class AutoTranscribe:
        diariser (Diariser): The diariser object to handle diarization.
    Methods:
-        __init__: Initializes the AutoTranscribe class with appropriate models.
+        __init__: Initializes the Scraibe class with appropriate models.
        transcribe: Transcribes an audio file using the whisper model and pyannote diarization model.
        remove_audio_file: Removes the original audio file to avoid disk space issues or ensure data privacy.
        get_audio_file: Gets an audio file as an AudioProcessor object.
@@ -66,7 +66,7 @@ class AutoTranscribe:
                whisper_model: Union[bool, str, whisper] = None,
                dia_model : Union[bool, str, DiarisationType] = None,
                **kwargs) -> None:
-        """Initializes the AutoTranscribe class.
+        """Initializes the Scraibe class.
        Args:
            whisper_model (Union[bool, str, whisper], optional): 
@@ -92,7 +92,11 @@ class AutoTranscribe:
        else:
            self.diariser = dia_model
-        print("AutoTranscribe initialized all models successfully loaded.")
+        if kwargs.get("verbose"):
            print("Scraibe initialized all models successfully loaded.")
            self.verbose = True
        else:
            self.verbose = False
    def autotranscribe(self, audio_file : Union[str, torch.Tensor, ndarray],
                   remove_original : bool = False,
@@ -112,7 +116,8 @@ class AutoTranscribe:
            Transcript: A Transcript object containing the transcription,
                        which can be exported to different formats.
        """
-        
+        if kwargs.get("verbose"):
            self.verbose = kwargs.get("verbose")
        # Get audio file as an AudioProcessor object
        audio_file = self.get_audio_file(audio_file)
@@ -121,12 +126,12 @@ class AutoTranscribe:
            "waveform" : audio_file.waveform.reshape(1,len(audio_file.waveform)), 
            "sample_rate": audio_file.sr
            }
-       
+
-        print("Starting diarisation.")
+        if self.verbose:
            print("Starting diarisation.")
        diarisation = self.diariser.diarization(dia_audio, **kwargs)
        if not diarisation["segments"]:
            print("No segments found. Try to run transcription without diarisation.")
@@ -138,16 +143,15 @@ class AutoTranscribe:
            return Transcript(final_transcript)
-        print("Diarisation finished. Starting transcription.")
+        if self.verbose:
            print("Diarisation finished. Starting transcription.")
        audio_file.sr = torch.Tensor([audio_file.sr]).to(audio_file.waveform.device)
        # Transcribe each segment and store the results
        final_transcript = dict()
-        
+        for i in trange(len(diarisation["segments"]), desc= "Transcribing", disable = not self.verbose):
        for i in trange(len(diarisation["segments"]), desc= "Transcribing"):
            seg = diarisation["segments"][i]
@@ -283,4 +287,4 @@ class AutoTranscribe:
        return audio_file
    def __repr__(self):
-        return f"AutoTranscribe(transcriber={self.transcriber}, diariser={self.diariser})"
+        return f"Scraibe(transcriber={self.transcriber}, diariser={self.diariser})"
@@ -1,5 +1,5 @@
 """
-Command-Line Interface (CLI) for the AutoTranscribe class,
+Command-Line Interface (CLI) for the Scraibe class,
 allowing for user interaction to transcribe and diarize audio files. 
 The function includes arguments for specifying the audio files, model paths,
 output formats, and other options necessary for transcription.
@@ -8,9 +8,7 @@ import os
 from argparse import ArgumentParser, ArgumentDefaultsHelpFormatter
 import json
-from sympy import use
+from .autotranscript import Scraibe
 from .autotranscript import AutoTranscribe
 from .app.gradio_app import gradio_Interface
 from whisper.tokenizer import LANGUAGES , TO_LANGUAGE_CODE
@@ -20,12 +18,12 @@ from torch import set_num_threads
 def cli():
    """
-    Command-Line Interface (CLI) for the AutoTranscribe class, allowing for user interaction to transcribe 
+    Command-Line Interface (CLI) for the Scraibe class, allowing for user interaction to transcribe 
    and diarize audio files. The function includes arguments for specifying the audio files, model paths, 
    output formats, and other options necessary for transcription.
    This function can be executed from the command line to perform transcription tasks, providing a 
-    user-friendly way to access the AutoTranscribe class functionalities.
+    user-friendly way to access the Scraibe class functionalities.
    """
    def str2bool(string):
@@ -115,7 +113,7 @@ def cli():
    if arg_dict["whisper_model_directory"]:
        class_kwargs["download_root"] = arg_dict.pop("whisper_model_directory")
-    model = AutoTranscribe(**class_kwargs)
+    model = Scraibe(**class_kwargs)
    if arg_dict["audio_files"]:
@@ -14,7 +14,6 @@ WHISPER_DEFAULT_PATH = os.path.join(CACHE_DIR, "whisper")
 PYANNOTE_DEFAULT_PATH = os.path.join(CACHE_DIR, "pyannote")
 PYANNOTE_DEFAULT_CONFIG = os.path.join(PYANNOTE_DEFAULT_PATH, "config.yaml")
 def config_diarization_yaml(file_path: str, path_to_segmentation: str = None) -> None:
    """Configure diarization pipeline from a YAML file.
@@ -90,8 +90,8 @@ class Transcriber:
        kwargs = self._get_whisper_kwargs(**kwargs)
-        if "verbose" not in kwargs:
+        if not kwargs.get("verbose"):
-            kwargs["verbose"] = False    
+            kwargs["verbose"] = None 
        result = self.model.transcribe(audio, *args, **kwargs)
        return result["text"]
@@ -173,6 +173,9 @@ class Transcriber:
        if (task := kwargs.get("task")):
            whisper_kwargs["task"] = task
        if (language := kwargs.get("language")):
            whisper_kwargs["language"] = language 
        return whisper_kwargs
    def __repr__(self) -> str:
@@ -1,69 +1,69 @@
-import os
+import os
-import subprocess as sp
+import subprocess as sp
-
+
-MAJOR = 0
+MAJOR = 0
-MINOR = 1
+MINOR = 1
-MICRO = 0
+MICRO = 0
-MICRO_POST = 0
+MICRO_POST = 0
-ISRELEASED = False
+ISRELEASED = False
-VERSION = '%d.%d.%d.%d' % (MAJOR, MINOR, MICRO, MICRO_POST)
+VERSION = '%d.%d.%d.%d' % (MAJOR, MINOR, MICRO, MICRO_POST)
-
+
-# Return the git revision as a string
+# Return the git revision as a string
-# taken from numpy/numpy
+# taken from numpy/numpy
-def git_version():
+def git_version():
-    def _minimal_ext_cmd(cmd):
+    def _minimal_ext_cmd(cmd):
-        # construct minimal environment
+        # construct minimal environment
-        env = {}
+        env = {}
-        for k in ['SYSTEMROOT', 'PATH', 'HOME']:
+        for k in ['SYSTEMROOT', 'PATH', 'HOME']:
-            v = os.environ.get(k)
+            v = os.environ.get(k)
-            if v is not None:
+            if v is not None:
-                env[k] = v
+                env[k] = v
-
+
-        # LANGUAGE is used on win32
+        # LANGUAGE is used on win32
-        env['LANGUAGE'] = 'C'
+        env['LANGUAGE'] = 'C'
-        env['LANG'] = 'C'
+        env['LANG'] = 'C'
-        env['LC_ALL'] = 'C'
+        env['LC_ALL'] = 'C'
-
+
-        out = sp.Popen(cmd, stdout=sp.PIPE, stderr=sp.PIPE, env=env).communicate()[0]
+        out = sp.Popen(cmd, stdout=sp.PIPE, stderr=sp.PIPE, env=env).communicate()[0]
-        return out
+        return out
-
+
-    try:
+    try:
-        out = _minimal_ext_cmd(['git', 'rev-parse', 'HEAD'])
+        out = _minimal_ext_cmd(['git', 'rev-parse', 'HEAD'])
-        GIT_REVISION = out.strip().decode('ascii')
+        GIT_REVISION = out.strip().decode('ascii')
-    except OSError:
+    except OSError:
-        GIT_REVISION = "Unknown"
+        GIT_REVISION = "Unknown"
-
+
-    return GIT_REVISION
+    return GIT_REVISION
-
+
-def _get_git_version():
+def _get_git_version():
-    cwd = os.getcwd()
+    cwd = os.getcwd()
-
+
-    # go to the main directory
+    # go to the main directory
-    fdir = os.path.dirname(os.path.abspath(__file__))
+    fdir = os.path.dirname(os.path.abspath(__file__))
-    maindir = os.path.abspath(os.path.join(fdir, ".."))
+    maindir = os.path.abspath(os.path.join(fdir, ".."))
-    # maindir = fdir # os.path.join(fdir, "..")
+    # maindir = fdir # os.path.join(fdir, "..")
-    os.chdir(maindir)
+    os.chdir(maindir)
-
+
-    # get git version
+    # get git version
-    res = git_version()
+    res = git_version()
-
+
-    # restore the cwd
+    # restore the cwd
-    os.chdir(cwd)
+    os.chdir(cwd)
-    return res
+    return res
-
+
-def get_version(build_version=False):
+def get_version(build_version=False):
-    if ISRELEASED:
+    if ISRELEASED:
-        return VERSION
+        return VERSION
-
+
-    # unreleased version
+    # unreleased version
-    GIT_REVISION = _get_git_version()
+    GIT_REVISION = _get_git_version()
-
+
-    if build_version:
+    if build_version:
-        import datetime as dt
+        import datetime as dt
-        date = dt.date.strftime(dt.datetime.now(), "%Y%m%d%H%M%S")
+        date = dt.date.strftime(dt.datetime.now(), "%Y%m%d%H%M%S")
-        return VERSION + ".dev" + date
+        return VERSION + ".dev" + date
-    else:
+    else:
-        return VERSION + ".dev0+" + GIT_REVISION[:7]
+        return VERSION + ".dev0+" + GIT_REVISION[:7]
-
+
-
+
-
+
@@ -0,0 +1,31 @@
 [metadata]
 name = scraibe
 version = attr: scraibe.__version__
 author = Jacob Schmieder
 author_email = Jacob.Schmieder@dbfz.de
 description = My package description
 long_description = file: README.md, LICENSE
 platforms = Linux
 keywords = transcription speech recognition whisper pyannote audio speech-to-text speech-to-text transcription speech-to-text recognition voice-to-speech
 license = GPL-3.0
 classifiers =
    Development Status :: 3 - Alpha
    Environment :: GPU :: NVIDIA CUDA :: 11.2
    License :: OSI Approved :: Open Software License 3.0 (OSL-3.0)
    Topic :: Scientific/Engineering :: Artificial Intelligence
    Programming Language :: Python :: 3.8
    Programming Language :: Python :: 3.9
    Programming Language :: Python :: 3.10
 [options]
 zip_safe = False
 include_package_data = True
 packages = find:
 python_requires = >=3.7
 install_requires =
    requests
    importlib-metadata; python_version<"3.8"
 [options.entry_points]
 console_scripts =
    executable-name = scraibe.cli:cli
@@ -1,8 +1,9 @@
 from calendar import c
 import pkg_resources
 import os
 from setuptools import setup, find_packages
-module_name = "autotranscript"
+module_name = "scraibe"
 github_url = "https://github.com/JSchmie/autotranscript"
 file_dir = os.path.dirname(os.path.realpath(__file__))
@@ -18,7 +19,7 @@ with open(verfile, "r") as fp:
 ############### setup ###############
-build_version = "AUTOTRANSCRIPT_BUILD" in os.environ
+build_version = "SCRAIBE_BUILD" in os.environ
 if __name__ == "__main__":
@@ -36,11 +37,24 @@ if __name__ == "__main__":
            'https://download.pytorch.org/whl/cu113',
            ],
        url= github_url,
-        license='',
+        
        license='GPL-3',
        author='Jacob Schmieder',
        author_email='Jacob.Schmieder@dbfz.de',
        description='Transcription tool for audio files based on Whisper and Pyannote',
-        package_data={ "header" : ["app/header.html"], "logo" : ["app/Logo_KIDA_bmel_green.svg"]},
+        classifiers=[
            'Development Status :: 3 - Alpha',
            'Environment :: GPU :: NVIDIA CUDA :: 11.2',
            'License :: OSI Approved :: Open Software License 3.0 (OSL-3.0)',
            'Topic :: Scientific/Engineering :: Artificial Intelligence',
            'Programming Language :: Python :: 3.8',
            'Programming Language :: Python :: 3.9',
            'Programming Language :: Python :: 3.10'],
        keywords = ['transcription', 'speech recognition', 'whisper', 'pyannote', 'audio',
                    'speech-to-text', 'speech-to-text transcription', 'speech-to-text recognition',
                    'voice-to-speech'],
        package_data={'scraibe.app' : ["*.html", "*.svg"]},
        entry_points={'console_scripts':
-            ['autotranscript = autotranscript.cli:cli']}
+            ['scraibe = scraibe.cli:cli']}
    )
@@ -1,5 +1,5 @@
 import pytest
-from autotranscript import Transcriber
+from scraibe import Transcriber
 from unittest.mock import patch, mock_open
 import os
@@ -55,7 +55,7 @@ def test_save_transcript_to_file(transcriber):
 # Test Diaraization class
-from autotranscript import Diariser
+from scraibe import Diariser
@pytest.fixture
 def diarisation():
@@ -83,7 +83,7 @@ def test_diarisation(diarisation):
 # Test AudioProcessor
-from autotranscript import AudioProcessor , TorchAudioProcessor
+from scraibe import AudioProcessor , TorchAudioProcessor
 def test_AudioProcessor_init():
@@ -1,38 +0,0 @@
 # import os
 # import sys
 # import traceback
 # class TracePrints(object):
 #   def __init__(self):    
 #     self.stdout = sys.stdout
 #   def write(self, s):
 #     self.stdout.write("Writing %r\n" % s)
 #     traceback.print_stack(file=self.stdout)
 # sys.stdout = TracePrints()
 # os.environ["PYANNOTE_CACHE"] = os.path.expanduser("~/PycharmProjects/autotranscript/autotranscript/models/pyannote")
 # import os
 # os.environ['TRANSFORMERS_CACHE'] = os.path.expanduser("~/PycharmProjects/autotranscript/autotranscript/models")
 # os.environ['HF_HOME'] = os.path.expanduser("~/PycharmProjects/autotranscript/autotranscript/models")
 from autotranscript import AutoTranscribe
 model = AutoTranscribe()
 text = model.transcribe("test.mp4")
 print("Transcription:\n")
 print(text)
 # from autotranscript.misc import *
 # import os
 # print(os.path.exists(CACHE_DIR))
 # print(os.path.exists(WHISPER_DEFAULT_PATH))
 # print(os.path.exists(PYANNOTE_DEFAULT_PATH))
 # print(os.path.exists(PYANNOTE_DEFAULT_CONFIG))