removed docs to aviod conflict

2023-09-22 18:44:24 +02:00
parent 6a7c2644f2 064aac9c5b
commit 455c2e3276
24 changed files with 287 additions and 453 deletions
@@ -1,173 +0,0 @@
-
-# `ScrAIbe: Streamlined Conversation Recording with Automated Intelligence Based Environment`
-
-`ScrAIbe` is a state-of-the-art,  [PyTorch](https://pytorch.org/) based multilingual speech-to-text framework to generate fully automated transcriptions. 
-
-Beyond transcription, ScrAIbe supports advanced functions, such as speaker diarization and speaker recognition.
-
-Designed as a comprehensive AI toolkit, it uses multiple AI models:
-
- [whisper](https://github.com/openai/whisper): A general-purpose speech recognition model.
- [payannote-audio](https://github.com/pyannote/pyannote-audio): An open-source toolkit for speaker diarization.
-
-The framework utilizes a PyanNet-inspired pipeline with the `Pyannote` library for speaker diarization and `VoxCeleb` for speaker embedding.
-
-During post-diarization, each audio segment is processed by the OpenAI `Whisper` model, in a transformer encoder-decoder structure. Initially, a CNN mitigates noise and enhances speech. Before transcription, `VoxLingua` dentifies the language segment, facilitating Whisper's role in both transcription and text translation.
-
-The following graphic illustates the whole pipeline:
-
-![Pipeline](Pictures/pipeline.png#gh-dark-mode-only) 
-![Pipeline](Pictures/pipeline_light.png#gh-light-mode-only) 
-
-## Install `ScrAIbe` : 
-
-The following command will pull and install the latest commit from this repository, along with its Python dependencies.
-
-    pip install git+https://github.com/JSchmie/autotranscript.git
-
- **Python version**: Python 3.8
- **PyTorch version**: Python 1.11.0
- **CUDA version**: Cuda-toolkit 11.3.1
-
-
-Important: For the `Pyannote` model you need to be granted access in Hugging Face.
-Check the [Pyannote model page](https://huggingface.co/pyannote/speaker-diarization) to get access to the model.
-
-Additionally, you need to generate a [Hugging Face token](https://huggingface.co/docs/hub/security-tokens). 
-
-## Usage 
-
-We've developed ScrAIbe with several access points to cater to diverse user needs.
-
-### Python usage
-
-It enables full control over the functionalities as well as process customization.
-
-Some usage examples:
-
- Usage of `AutoTranscribe`, core of the transcription system, for performing trancription and diarization of audio files.
-
-```python
-from scraibe import AutoTranscribe
-
-model = AutoTranscribe()
-
-text = model.transcribe("audio.wav")
-
-print(f"Transcription: \n{text}")
-
-```
- Usage of `Diariser`, responsible for identifying
-and segmenting individual speakers from a given audio file.
- 
-```python
- from scraibe import Diariser
-
-model = Diariser.load_model()
-
-diarisation_output  = model.diarization("audio.wav")
-
-```
- Usage of `Transcriber`, for transcribing audio files and saving the transcription afterwards.
-
-```python
- from scraibe import Transcriber
-
-transcriber = Transcriber.load_model()
-
-transcript  = transcriber.transcribe("audio.wav")
-
-transcriber.save_transcript(transcript, "path/to/save.txt")
-
-```
-
-
-Refer to [whisper](https://github.com/openai/whisper) and [payannote-audio](https://github.com/pyannote/pyannote-audio) for further options.
-
-### Command-line usage
-
-You can also run ScrAIbe in a [Gradio App](https://github.com/gradio-app/gradio)  interface using the following command-line:
-
-	scraibe audio.wav
-
-Some example of important functionalities are:
-
-  `--task`: Task to be performed, either transcription, diarization or translation into English. Default is transcription.
- `--hf-token`: Personal `Hugging Face` token.
- `--server-name`: Name of the Web Server. If empty 127.0.0.1 or 0.0.0.0 will be used.
-  `--port`: To run the Gradio app. The default is 7860.
-
- `--whisper-model-name`: Name of the [whisper](https://github.com/openai/whisper) model to be used. Default is `medium`.
-
-
-Run the following to view all available options:
-		
-	scraibe -h
-
-### Running a Docker container
-
-After you have installed Docker, you can execute the following commands in the terminal.
-
-```
-sudo docker build . --build-arg="hf_token=[enter your HuggingFace token] " -t [image name] 
-
-sudo docker run -it  -p 7860:7860  --name [container name][image name]  --hf_token [enter your HuggingFace token] --start_server
-
-```
-  `-p`: Flag for connecting the container interal port to the port on your local machine.
-  `--hf_token`: Flag for entering your personal HuggingFace token in the container.
- `--start_server`: Command to start the Gradio App.
-
-Then click the following link to run the app:
-
-http://0.0.0.0:7860
-
- Enabling GPU usage
-
-```
-sudo docker run -it  -p 7860:7860 --gpus 'all,capabilities=utility'  --name [container name][image name]  --hf_token [enter your HuggingFace token] --start_server
-```
-For further guidance check: https://blog.roboflow.com/use-the-gpu-in-docker/ 
-
-
-## Documentation 
-
-For further insights check the [documentation page](https://cristinaortizcruz.github.io/Test/).
-
-## Contributions
-
-We are happy for any interest in contributing and about feedback: In order to do that, create an issue with your feedback or feel free to contact us.
-
-## Roadmap
-
-The following milestones are planned for further releases of ScrAIbe:
-
- Model quantization   
-Quantization to empower memory and computational efficiency.
-
- Model fine-tuning  
-In order to be able to cover a variety of linguistic phenomena.
-
-For example, currently ScrAIbe is able to transcribe word by word, but ignores filler words or speech pauses. 
-These phenomena can be addressed by fine-tuning with the corresponding data.
-
- Implementation of LLMs   
-One example is the implementation of a summarization or extraction model, which enables ScrAIbe to automatically summarize or retrieve the key information out of a generated transcription, which could be the minutes of a meeting.
-
- Executable for Windows
-
-## Contact
-
-For queries contact [Jacob Schmieder](Jacob.Schmieder@dbfz.de)
-
-## License 
-
-ScrAIbe is licensed under GNU General Public License.
-
-## Acknowledgments
-
-Special thanks go to the KIDA project and the BMEL (Bundesministerium für Ernährung und Landwirtschaft), especially to the AI Consultancy Team and the Infrastructure Team.
-
-![KIDA](Pictures/kida_dark.png#gh-dark-mode-only)    &nbsp;    ![BMEL](Pictures/BMEL_dark.png#gh-dark-mode-only) &nbsp;&nbsp;&nbsp;&nbsp; ![DBFZ](Pictures/DBFZ_dark.png#gh-dark-mode-only)   &nbsp;  &nbsp;&nbsp;&nbsp;    ![MRI](Pictures/MRI.png#gh-dark-mode-only)   
-
-![KIDA](Pictures/kida.png#gh-light-mode-only)    &nbsp;    ![BMEL](Pictures/BMEL.jpg#gh-light-mode-only) &nbsp;&nbsp;&nbsp;&nbsp; ![DBFZ](Pictures/DBFZ.png#gh-light-mode-only)   &nbsp;  &nbsp;&nbsp;&nbsp;    ![MRI](Pictures/MRI.png#gh-light-mode-only)  
@@ -1,101 +0,0 @@
-from dash import Dash, dcc, html, dash_table, Input, Output, State, callback
-
-import base64
-from autotranscript.app.qtfaststart import process
-from autotranscript import AutoTranscribe
-import io
-import subprocess as sp
-import numpy as np
-from autotranscript.audio import SAMPLE_RATE
-
-# Setup auto-transcript
-autot = AutoTranscribe() # whisper_model="tiny", whisper_kwargs={"local" : False}
-
-# Setup FFmpeg
-PROBLEMATIC_FILE_TYPES : tuple = "mov","mp4","m4a","3gp","3g2","mj2"
-
-
-# Setup Dash
-external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']
-
-app = Dash(__name__, external_stylesheets=external_stylesheets)
-
-app.layout = html.Div([
-    dcc.Upload(
-        id='upload-data',
-        children=html.Div([
-            'Drag and Drop or ',
-            html.A('Select Files')
-        ]),
-        style={
-            'width': '100%',
-            'height': '60px',
-            'lineHeight': '60px',
-            'borderWidth': '1px',
-            'borderStyle': 'dashed',
-            'borderRadius': '5px',
-            'textAlign': 'center',
-            'margin': '10px'
-        },
-        # Allow multiple files to be uploaded
-        multiple=True
-    ),
-    html.Div(id='output-data-upload'),
-])
-
-def parse_contents(contents, filename, date):
-    content_type, content_string = contents.split(',')
-
-    decoded = base64.b64decode(content_string)
-    file = io.BytesIO(decoded).read()
-    
-    if filename.endswith(PROBLEMATIC_FILE_TYPES):
-        # mp4 and other files need to be processed with qtfaststart
-        # since theire metadata is at the end of the file
-        # and we need it at the beginning
-        file = process(file) 
-
-    cmd = [
-            "ffmpeg",
-            "-nostdin",
-            "-threads", "0",
-            "-i",'pipe:',
-            "-f", "s16le",
-            '-hide_banner',
-            '-loglevel', 'error',
-            "-c", "copy",
-            "-vn",
-            "-ac", "1",
-            "-acodec", "pcm_s16le",
-            "-ar", str(SAMPLE_RATE),
-            "-"
-        ]
-    
-    proc = sp.Popen(cmd, stdout=sp.PIPE, stdin=sp.PIPE)
-    
-    out = proc.communicate(input=file)[0]
-    out = np.frombuffer(out, np.int16).flatten().astype(np.float32) / 32768.0
-    out = np.array([out, SAMPLE_RATE])
-    
-    transcript = str(autot.transcribe(out))
-    
-    return html.Div([
-        html.H5(f"File Name: {filename} \n" \
-                "Transcript: \n"
-                ),
-        html.P(transcript)
-    ])
-
-@callback(Output('output-data-upload', 'children'),
-              Input('upload-data', 'contents'),
-              State('upload-data', 'filename'),
-              State('upload-data', 'last_modified'))
-def update_output(list_of_contents, list_of_names, list_of_dates):
-    if list_of_contents is not None:
-        children = [
-            parse_contents(c, n, d) for c, n, d in
-            zip(list_of_contents, list_of_names, list_of_dates)]
-        return children
-
-if __name__ == '__main__':
-    app.run_server()
@@ -9,8 +9,6 @@ pyannote.pipeline~=2.3
 setuptools~=65.6.3
 setuptools-rust~=1.5.2

-sphinx~=5.0.2         
-
 tqdm>=4.65.0

 gradio~=3.36.1
@@ -22,6 +20,6 @@ torch~=1.11.0
 torchvision~=0.12.0
 torchaudio~=0.11.0
 #optional: 
-#dash~=2.10.2
+#sphinx~=5.0.2 


@@ -0,0 +1 @@
+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
@@ -3,7 +3,7 @@ Gradio Audio Transcription App.
 --------------------------------

 This module provides an interface to transcribe audio files using the 
-AutoTranscribe model. Users can either upload an audio file or record their speech 
+Scraibe model. Users can either upload an audio file or record their speech 
 live for transcription. The application supports multiple languages and provides 
 options to specify the number of speakers and the language of the audio.

@@ -20,7 +20,7 @@ Gradio Audio Transcription App.
 --------------------------------

 This module provides an interface to transcribe audio files using the 
-AutoTranscribe model. Users can either upload an audio file or record their speech 
+Scraibe model. Users can either upload an audio file or record their speech 
 live for transcription. The application supports multiple languages and provides 
 options to specify the number of speakers and the language of the audio.

@@ -33,10 +33,13 @@ Usage:
 """

 import json
+import os
+from tkinter import CURRENT

 import gradio as gr
-from autotranscript import AutoTranscribe, Transcript
+from tqdm import tqdm

+from scraibe import Scraibe, Transcript

 theme = gr.themes.Soft(
    primary_hue="green",
@@ -59,17 +62,19 @@ LANGUAGES = [
    "Vietnamese", "Welsh"
 ]

+CURRENT_PATH = os.path.dirname(os.path.realpath(__file__))
+
 class GradioTranscriptionInterface:
    """
    Interface handling the interaction between Gradio UI and the Audio Transcription system.
    """

-    def __init__(self, model: AutoTranscribe):
+    def __init__(self, model: Scraibe):
        """
        Initializes the GradioTranscriptionInterface with a transcription model.

        Args:
-            model (AutoTranscribe): Model responsible for audio transcription tasks.
+            model (Scraibe): Model responsible for audio transcription tasks.
        """
        self.model = model

@@ -78,7 +83,7 @@ class GradioTranscriptionInterface:
                        translation : bool,
                        language : str):
        """
-        Shortcut method for the AutoTranscribe task.
+        Shortcut method for the Scraibe task.

        Returns:
            tuple: Transcribed text (str), JSON output (dict)
@@ -89,13 +94,43 @@ class GradioTranscriptionInterface:
            "language": language if language != "None" else None,
            "task": 'translate' if translation else None
        }
+        if isinstance(source, str):
+            try:
+                result = self.model.autotranscribe(source, **kwargs)
+            except ValueError:
+                raise gr.Error("Couldn't detect any speech in the provided audio. \
+                        Please try again!")
+            
+            return str(result), result.get_json()
        
-        try:
-            result = self.model.autotranscribe(source, **kwargs)
-        except ValueError:
-            raise gr.Error("Couldn't detect any speech in the provided audio. \
-                    Please try again!")
-        return str(result), result.get_json()
+        elif isinstance(source, list):
+            source_names = [s.split("/")[-1] for s in source]
+            result = []
+            for s in tqdm(source, total=len(source),desc = "Transcribing audio files"):
+                try:
+                    res = self.model.autotranscribe(s, **kwargs)
+                except ValueError:
+                    _name = s.split("/")[-1]
+                    res = f"NO TRANSCRIPT FOUND FOR {_name}"
+                    gr.Warning(f"Couldn't detect any speech in {_name} will skip this file.")
+                result.append(res)
+            
+            out = ''
+            out_dict = {}
+            for i, r in enumerate(result):
+                out += f"TRANSCRIPT {i} FOR ({source_names[i]}):\n\n"
+                out += str(r)
+                out += "\n\n"
+                
+                if isinstance(r, str):
+                    out_dict[source_names[i]] = r
+                else:
+                    out_dict[source_names[i]] = r.get_dict()
+              
+            return out, json.dumps(out_dict, indent=4)
+        
+        else:
+            raise gr.Error("Please provide a valid audio file.")


    def transcribe(self, source, translation, language):
@@ -110,8 +145,28 @@ class GradioTranscriptionInterface:
            "task": 'translate' if translation == "Yes" else None
        }
        
-        result = self.model.transcribe(source, **kwargs)
-        return str(result)
+        if isinstance(source, str):
+            result = self.model.transcribe(source, **kwargs)
+            
+            return str(result)
+        
+        elif isinstance(source, list):
+            source_names = [s.split("/")[-1] for s in source]
+            result = []
+            for s in tqdm(source, total=len(source),desc = "Transcribing audio files"):
+                res = self.model.transcribe(s, **kwargs)
+                result.append(res)
+            
+            out = ''
+            for i, res in enumerate(result):
+                out += f"TRANSCRIPT {i} FOR ({source_names[i]}):\n\n"
+                out += str(res)
+                out += "\n\n"
+            
+            return out
+        
+        else:
+            raise gr.Error("Please provide a valid audio file.")

    def perform_diarisation(self, source, num_speakers):
        """
@@ -124,22 +179,44 @@ class GradioTranscriptionInterface:
            "num_speakers": num_speakers if num_speakers != 0 else None,
        }
        
+        if isinstance(source, str):
+            try:
+                result = self.model.diarization(source, **kwargs)
+            except ValueError:
+                raise gr.Error("Couldn't detect any speech in the provided audio. \
+                        Please try again!")
        
-        try:
-            result = self.model.diarization(source, **kwargs)
-        except ValueError:
-            raise gr.Error("Couldn't detect any speech in the provided audio. \
-                    Please try again!")
-        return json.dumps(result, indent=2)
+            return json.dumps(result, indent=2)
+        elif isinstance(source, list):
+            source_names = [s.split("/")[-1] for s in source]
+            result = []
+            for s in tqdm(source, total=len(source),desc = "Performing diarisation"):
+                try:
+                    res = self.model.diarization(s, **kwargs)
+                except ValueError:
+                    res = f"NO DIARISATION FOUND FOR {s}"
+                    gr.Warning(f"Couldn't detect any speech in {s} will skip this file.")
+                result.append(res)
+            
+            out = {}
+            
+            for i, res in enumerate(result):
+                out[source_names[i]] = res
+                
+            return json.dumps(out, indent=4)
+        
+        else:
+            gr.Error("Please provide a valid audio file.")
+            

 ####
 # Gradio Interface
 ####

-def gradio_Interface(model : AutoTranscribe = None):
+def gradio_Interface(model : Scraibe = None):
    
    if model is None:
-        model = AutoTranscribe()
+        model = Scraibe()
        
    pipe = GradioTranscriptionInterface(model)

@@ -197,7 +274,7 @@ def gradio_Interface(model : AutoTranscribe = None):
                    gr.update(visible = True),
                    gr.update(visible = False, value = None))
            
-        elif choice == "File":
+        elif choice == "File or Files":
            
            return (gr.update(visible = False, value = None),
                    gr.update(visible = False, value = None),
@@ -205,22 +282,42 @@ def gradio_Interface(model : AutoTranscribe = None):
                    gr.update(visible = False, value = None),
                    gr.update(visible = True))

-    def run_scribe(task, num_speakers, translate, language, audio1, audio2, video1, video2, file_in, progress = gr.Progress(track_tqdm= True)):
+    def run_scribe(task,
+                   num_speakers,
+                   translate,
+                   language,
+                   audio1,
+                   audio2,
+                   video1,
+                   video2,
+                   file_in,
+                   progress = gr.Progress(track_tqdm= True)):
        # get *args which are not None
        progress(0, desc='Starting task...')
        source = audio1 or audio2 or video1 or video2 or file_in
        
+        if isinstance(source, list):
+            source = [s.name for s in source]
+            if len(source) == 1:
+                source = source[0]
+ 
        if task == 'Auto Transcribe':
-            
+    
            out_str , out_json = pipe.auto_transcribe(source = source,
                                num_speakers = num_speakers,
                                translation = translate,
                                language = language)
            
-            return (gr.update(value = out_str, visible = True),
-                    gr.update(value = out_json, visible = True),
-                    gr.update(visible = True),
-                    gr.update(visible = True))        
+            if isinstance(source, str):
+                return (gr.update(value = out_str, visible = True),
+                        gr.update(value = out_json, visible = True),
+                        gr.update(visible = True),
+                        gr.update(visible = True))      
+            else:
+                return (gr.update(value = out_str, visible = True),
+                        gr.update(value = out_json, visible = True),
+                        gr.update(visible = False),
+                        gr.update(visible = False))  
            
        elif task == 'Transcribe':
            
@@ -255,7 +352,8 @@ def gradio_Interface(model : AutoTranscribe = None):
    with gr.Blocks(theme=theme,title='ScrAIbe: Automatic Audio Transcription') as demo:
            
        # Define components
-        header = open("header.html", "r").read()
+        hname = os.path.join(CURRENT_PATH, "header.html")
+        header = open(hname, "r").read()
        gr.HTML(header, visible= True, show_label=False)
        
        with gr.Row():
@@ -279,7 +377,7 @@ def gradio_Interface(model : AutoTranscribe = None):
                                    leave it at None.", visible= True)
                
                input = gr.Radio(["Upload Audio", "Record Audio", "Upload Video","Record Video" 
-                                    ,"File"], label="Input Type", value="Upload Audio")
+                                    ,"File or Files"], label="Input Type", value="Upload Audio")
                
                audio1 = gr.Audio(source="upload", type="filepath", label="Upload Audio",
                                    interactive= True, visible= True)
@@ -289,7 +387,7 @@ def gradio_Interface(model : AutoTranscribe = None):
                                    interactive= True, visible= False)
                video2 = gr.Video(source="webcam", label="Record Video", type="filepath",
                                    interactive= True, visible= False)
-                file_in = gr.File(label="Upload File", interactive= True, visible= False)
+                file_in = gr.Files(label="Upload File or Files", interactive= True, visible= False)
                
                submit = gr.Button()
            
@@ -1,5 +1,5 @@
 """
-AutoTranscribe Class
+Scraibe Class
 --------------------

 This class serves as the core of the transcription system, responsible for handling
@@ -12,15 +12,15 @@ By encapsulating the complexities of underlying models, it allows for straightfo
 integration into various applications, ranging from transcription services to voice assistants.

 Available Classes:
- AutoTranscribe: Main class for performing transcription and diarization.
+- Scraibe: Main class for performing transcription and diarization.
                  Includes methods for loading models, processing audio files,
                  and formatting the transcription output.

 Usage:
-    from .autotranscribe import AutoTranscribe
+    from scraibe import Scraibe

-    model = AutoTranscribe(whisper_model="path/to/whisper/model", dia_model="path/to/diarisation/model")
-    transcript = model.transcribe("path/to/audiofile.wav")
+    model = Scraibe()
+    transcript = model.autotranscribe("path/to/audiofile.wav")
 """

 # Standard Library Imports
@@ -45,9 +45,9 @@ from .transcript_exporter import Transcript
 DiarisationType = TypeVar('DiarisationType')


-class AutoTranscribe:
+class Scraibe:
    """
-    AutoTranscribe is a class responsible for managing the transcription and diarization of audio files.
+    Scraibe is a class responsible for managing the transcription and diarization of audio files.
    It serves as the core of the transcription system, incorporating pretrained models
    for speech-to-text (such as Whisper) and speaker diarization (such as pyannote.audio),
    allowing for comprehensive audio processing.
@@ -57,7 +57,7 @@ class AutoTranscribe:
        diariser (Diariser): The diariser object to handle diarization.
    
    Methods:
-        __init__: Initializes the AutoTranscribe class with appropriate models.
+        __init__: Initializes the Scraibe class with appropriate models.
        transcribe: Transcribes an audio file using the whisper model and pyannote diarization model.
        remove_audio_file: Removes the original audio file to avoid disk space issues or ensure data privacy.
        get_audio_file: Gets an audio file as an AudioProcessor object.
@@ -66,7 +66,7 @@ class AutoTranscribe:
                whisper_model: Union[bool, str, whisper] = None,
                dia_model : Union[bool, str, DiarisationType] = None,
                **kwargs) -> None:
-        """Initializes the AutoTranscribe class.
+        """Initializes the Scraibe class.

        Args:
            whisper_model (Union[bool, str, whisper], optional): 
@@ -92,7 +92,11 @@ class AutoTranscribe:
        else:
            self.diariser = dia_model

-        print("AutoTranscribe initialized all models successfully loaded.")
+        if kwargs.get("verbose"):
+            print("Scraibe initialized all models successfully loaded.")
+            self.verbose = True
+        else:
+            self.verbose = False
            
    def autotranscribe(self, audio_file : Union[str, torch.Tensor, ndarray],
                   remove_original : bool = False,
@@ -112,7 +116,8 @@ class AutoTranscribe:
            Transcript: A Transcript object containing the transcription,
                        which can be exported to different formats.
        """
-        
+        if kwargs.get("verbose"):
+            self.verbose = kwargs.get("verbose")
        # Get audio file as an AudioProcessor object
        audio_file = self.get_audio_file(audio_file)
        
@@ -121,12 +126,12 @@ class AutoTranscribe:
            "waveform" : audio_file.waveform.reshape(1,len(audio_file.waveform)), 
            "sample_rate": audio_file.sr
            }
-       
-        print("Starting diarisation.")
+
+        if self.verbose:
+            print("Starting diarisation.")
        
        diarisation = self.diariser.diarization(dia_audio, **kwargs)
        
-        
        if not diarisation["segments"]:
            print("No segments found. Try to run transcription without diarisation.")
 
@@ -138,16 +143,15 @@ class AutoTranscribe:
            
            return Transcript(final_transcript)
        
-        print("Diarisation finished. Starting transcription.")
+        if self.verbose:
+            print("Diarisation finished. Starting transcription.")
        
        audio_file.sr = torch.Tensor([audio_file.sr]).to(audio_file.waveform.device)
        
        # Transcribe each segment and store the results
        final_transcript = dict()
        
-        
-        
-        for i in trange(len(diarisation["segments"]), desc= "Transcribing"):
+        for i in trange(len(diarisation["segments"]), desc= "Transcribing", disable = not self.verbose):
            
            seg = diarisation["segments"][i]
            
@@ -283,4 +287,4 @@ class AutoTranscribe:
        return audio_file

    def __repr__(self):
-        return f"AutoTranscribe(transcriber={self.transcriber}, diariser={self.diariser})"
+        return f"Scraibe(transcriber={self.transcriber}, diariser={self.diariser})"
@@ -1,5 +1,5 @@
 """
-Command-Line Interface (CLI) for the AutoTranscribe class,
+Command-Line Interface (CLI) for the Scraibe class,
 allowing for user interaction to transcribe and diarize audio files. 
 The function includes arguments for specifying the audio files, model paths,
 output formats, and other options necessary for transcription.
@@ -8,9 +8,7 @@ import os
 from argparse import ArgumentParser, ArgumentDefaultsHelpFormatter
 import json

-from sympy import use
-
-from .autotranscript import AutoTranscribe
+from .autotranscript import Scraibe
 from .app.gradio_app import gradio_Interface

 from whisper.tokenizer import LANGUAGES , TO_LANGUAGE_CODE
@@ -20,12 +18,12 @@ from torch import set_num_threads

 def cli():
    """
-    Command-Line Interface (CLI) for the AutoTranscribe class, allowing for user interaction to transcribe 
+    Command-Line Interface (CLI) for the Scraibe class, allowing for user interaction to transcribe 
    and diarize audio files. The function includes arguments for specifying the audio files, model paths, 
    output formats, and other options necessary for transcription.

    This function can be executed from the command line to perform transcription tasks, providing a 
-    user-friendly way to access the AutoTranscribe class functionalities.
+    user-friendly way to access the Scraibe class functionalities.
    """
 
    def str2bool(string):
@@ -115,7 +113,7 @@ def cli():
    if arg_dict["whisper_model_directory"]:
        class_kwargs["download_root"] = arg_dict.pop("whisper_model_directory")

-    model = AutoTranscribe(**class_kwargs)
+    model = Scraibe(**class_kwargs)

    
    if arg_dict["audio_files"]:
@@ -14,7 +14,6 @@ WHISPER_DEFAULT_PATH = os.path.join(CACHE_DIR, "whisper")
 PYANNOTE_DEFAULT_PATH = os.path.join(CACHE_DIR, "pyannote")
 PYANNOTE_DEFAULT_CONFIG = os.path.join(PYANNOTE_DEFAULT_PATH, "config.yaml")

-
 def config_diarization_yaml(file_path: str, path_to_segmentation: str = None) -> None:
    """Configure diarization pipeline from a YAML file.

@@ -90,8 +90,8 @@ class Transcriber:
        
        kwargs = self._get_whisper_kwargs(**kwargs)
        
-        if "verbose" not in kwargs:
-            kwargs["verbose"] = False    
+        if not kwargs.get("verbose"):
+            kwargs["verbose"] = None 

        result = self.model.transcribe(audio, *args, **kwargs)
        return result["text"]
@@ -173,6 +173,9 @@ class Transcriber:
        if (task := kwargs.get("task")):
            whisper_kwargs["task"] = task
            
+        if (language := kwargs.get("language")):
+            whisper_kwargs["language"] = language 
+        
        return whisper_kwargs
    
    def __repr__(self) -> str:
@@ -1,69 +1,69 @@
-import os
-import subprocess as sp
-
-MAJOR = 0
-MINOR = 1
-MICRO = 0
-MICRO_POST = 0
-ISRELEASED = False
-VERSION = '%d.%d.%d.%d' % (MAJOR, MINOR, MICRO, MICRO_POST)
-
-# Return the git revision as a string
-# taken from numpy/numpy
-def git_version():
-    def _minimal_ext_cmd(cmd):
-        # construct minimal environment
-        env = {}
-        for k in ['SYSTEMROOT', 'PATH', 'HOME']:
-            v = os.environ.get(k)
-            if v is not None:
-                env[k] = v
-
-        # LANGUAGE is used on win32
-        env['LANGUAGE'] = 'C'
-        env['LANG'] = 'C'
-        env['LC_ALL'] = 'C'
-
-        out = sp.Popen(cmd, stdout=sp.PIPE, stderr=sp.PIPE, env=env).communicate()[0]
-        return out
-
-    try:
-        out = _minimal_ext_cmd(['git', 'rev-parse', 'HEAD'])
-        GIT_REVISION = out.strip().decode('ascii')
-    except OSError:
-        GIT_REVISION = "Unknown"
-
-    return GIT_REVISION
-
-def _get_git_version():
-    cwd = os.getcwd()
-
-    # go to the main directory
-    fdir = os.path.dirname(os.path.abspath(__file__))
-    maindir = os.path.abspath(os.path.join(fdir, ".."))
-    # maindir = fdir # os.path.join(fdir, "..")
-    os.chdir(maindir)
-
-    # get git version
-    res = git_version()
-
-    # restore the cwd
-    os.chdir(cwd)
-    return res
-
-def get_version(build_version=False):
-    if ISRELEASED:
-        return VERSION
-
-    # unreleased version
-    GIT_REVISION = _get_git_version()
-
-    if build_version:
-        import datetime as dt
-        date = dt.date.strftime(dt.datetime.now(), "%Y%m%d%H%M%S")
-        return VERSION + ".dev" + date
-    else:
-        return VERSION + ".dev0+" + GIT_REVISION[:7]
-
-
-
+import os
+import subprocess as sp
+
+MAJOR = 0
+MINOR = 1
+MICRO = 0
+MICRO_POST = 0
+ISRELEASED = False
+VERSION = '%d.%d.%d.%d' % (MAJOR, MINOR, MICRO, MICRO_POST)
+
+# Return the git revision as a string
+# taken from numpy/numpy
+def git_version():
+    def _minimal_ext_cmd(cmd):
+        # construct minimal environment
+        env = {}
+        for k in ['SYSTEMROOT', 'PATH', 'HOME']:
+            v = os.environ.get(k)
+            if v is not None:
+                env[k] = v
+
+        # LANGUAGE is used on win32
+        env['LANGUAGE'] = 'C'
+        env['LANG'] = 'C'
+        env['LC_ALL'] = 'C'
+
+        out = sp.Popen(cmd, stdout=sp.PIPE, stderr=sp.PIPE, env=env).communicate()[0]
+        return out
+
+    try:
+        out = _minimal_ext_cmd(['git', 'rev-parse', 'HEAD'])
+        GIT_REVISION = out.strip().decode('ascii')
+    except OSError:
+        GIT_REVISION = "Unknown"
+
+    return GIT_REVISION
+
+def _get_git_version():
+    cwd = os.getcwd()
+
+    # go to the main directory
+    fdir = os.path.dirname(os.path.abspath(__file__))
+    maindir = os.path.abspath(os.path.join(fdir, ".."))
+    # maindir = fdir # os.path.join(fdir, "..")
+    os.chdir(maindir)
+
+    # get git version
+    res = git_version()
+
+    # restore the cwd
+    os.chdir(cwd)
+    return res
+
+def get_version(build_version=False):
+    if ISRELEASED:
+        return VERSION
+
+    # unreleased version
+    GIT_REVISION = _get_git_version()
+
+    if build_version:
+        import datetime as dt
+        date = dt.date.strftime(dt.datetime.now(), "%Y%m%d%H%M%S")
+        return VERSION + ".dev" + date
+    else:
+        return VERSION + ".dev0+" + GIT_REVISION[:7]
+
+
+
@@ -0,0 +1,31 @@
+[metadata]
+name = scraibe
+version = attr: scraibe.__version__
+author = Jacob Schmieder
+author_email = Jacob.Schmieder@dbfz.de
+description = My package description
+long_description = file: README.md, LICENSE
+platforms = Linux
+keywords = transcription speech recognition whisper pyannote audio speech-to-text speech-to-text transcription speech-to-text recognition voice-to-speech
+license = GPL-3.0
+classifiers =
+    Development Status :: 3 - Alpha
+    Environment :: GPU :: NVIDIA CUDA :: 11.2
+    License :: OSI Approved :: Open Software License 3.0 (OSL-3.0)
+    Topic :: Scientific/Engineering :: Artificial Intelligence
+    Programming Language :: Python :: 3.8
+    Programming Language :: Python :: 3.9
+    Programming Language :: Python :: 3.10
+
+[options]
+zip_safe = False
+include_package_data = True
+packages = find:
+python_requires = >=3.7
+install_requires =
+    requests
+    importlib-metadata; python_version<"3.8"
+
+[options.entry_points]
+console_scripts =
+    executable-name = scraibe.cli:cli
@@ -1,8 +1,9 @@
+from calendar import c
 import pkg_resources
 import os
 from setuptools import setup, find_packages

-module_name = "autotranscript"
+module_name = "scraibe"
 github_url = "https://github.com/JSchmie/autotranscript"

 file_dir = os.path.dirname(os.path.realpath(__file__))
@@ -18,7 +19,7 @@ with open(verfile, "r") as fp:

 ############### setup ###############

-build_version = "AUTOTRANSCRIPT_BUILD" in os.environ
+build_version = "SCRAIBE_BUILD" in os.environ

 if __name__ == "__main__":

@@ -36,11 +37,24 @@ if __name__ == "__main__":
            'https://download.pytorch.org/whl/cu113',
            ],
        url= github_url,
-        license='',
+        
+        license='GPL-3',
        author='Jacob Schmieder',
        author_email='Jacob.Schmieder@dbfz.de',
        description='Transcription tool for audio files based on Whisper and Pyannote',
-        package_data={ "header" : ["app/header.html"], "logo" : ["app/Logo_KIDA_bmel_green.svg"]},
+        classifiers=[
+            'Development Status :: 3 - Alpha',
+            'Environment :: GPU :: NVIDIA CUDA :: 11.2',
+            'License :: OSI Approved :: Open Software License 3.0 (OSL-3.0)',
+            'Topic :: Scientific/Engineering :: Artificial Intelligence',
+            'Programming Language :: Python :: 3.8',
+            'Programming Language :: Python :: 3.9',
+            'Programming Language :: Python :: 3.10'],
+        keywords = ['transcription', 'speech recognition', 'whisper', 'pyannote', 'audio',
+                    'speech-to-text', 'speech-to-text transcription', 'speech-to-text recognition',
+                    'voice-to-speech'],
+        package_data={'scraibe.app' : ["*.html", "*.svg"]},
        entry_points={'console_scripts':
-            ['autotranscript = autotranscript.cli:cli']}
+            ['scraibe = scraibe.cli:cli']}
+        
    )
@@ -1,5 +1,5 @@
 import pytest
-from autotranscript import Transcriber
+from scraibe import Transcriber
 from unittest.mock import patch, mock_open
 import os

@@ -55,7 +55,7 @@ def test_save_transcript_to_file(transcriber):
    
 # Test Diaraization class

-from autotranscript import Diariser
+from scraibe import Diariser

@pytest.fixture
 def diarisation():
@@ -83,7 +83,7 @@ def test_diarisation(diarisation):

 # Test AudioProcessor

-from autotranscript import AudioProcessor , TorchAudioProcessor
+from scraibe import AudioProcessor , TorchAudioProcessor


 def test_AudioProcessor_init():
@@ -1,38 +0,0 @@
-# import os
-# import sys
-# import traceback
-
-# class TracePrints(object):
-#   def __init__(self):    
-#     self.stdout = sys.stdout
-#   def write(self, s):
-#     self.stdout.write("Writing %r\n" % s)
-#     traceback.print_stack(file=self.stdout)
-
-# sys.stdout = TracePrints()
-
-# os.environ["PYANNOTE_CACHE"] = os.path.expanduser("~/PycharmProjects/autotranscript/autotranscript/models/pyannote")
-# import os
- 
-# os.environ['TRANSFORMERS_CACHE'] = os.path.expanduser("~/PycharmProjects/autotranscript/autotranscript/models")
-# os.environ['HF_HOME'] = os.path.expanduser("~/PycharmProjects/autotranscript/autotranscript/models")
-
-
-from autotranscript import AutoTranscribe
-
-model = AutoTranscribe()
-
-text = model.transcribe("test.mp4")
-
-print("Transcription:\n")
-print(text)
-
-
-# from autotranscript.misc import *
-# import os
-
-# print(os.path.exists(CACHE_DIR))
-# print(os.path.exists(WHISPER_DEFAULT_PATH))
-# print(os.path.exists(PYANNOTE_DEFAULT_PATH))
-
-# print(os.path.exists(PYANNOTE_DEFAULT_CONFIG))