autotranscript package¶
Subpackages¶
Submodules¶
autotranscript.audio module¶
Audio Processor Module¶
This module provides the AudioProcessor class, utilizing PyTorchaudio for handling audio files. It includes functionalities to load, cut, and manage audio waveforms, offering efficient and flexible audio processing.
Available Classes: - AudioProcessor: Processes audio waveforms and provides methods for loading,
cutting, and handling audio.
- Usage:
from .audio_import AudioProcessor
processor = AudioProcessor.from_file(“path/to/audiofile.wav”) cut_waveform = processor.cut(start=1.0, end=5.0)
Constants: - SAMPLE_RATE (int): Default sample rate for processing. - NORMALIZATION_FACTOR (float): Normalization factor for audio waveform.
- class AudioProcessor(waveform: torch.Tensor, sr: int = 16000, *args, **kwargs)¶
Bases:
objectAudio Processor class that leverages PyTorchaudio to provide functionalities for loading, cutting, and handling audio waveforms.
- Attributes:
- waveform: torch.Tensor
The audio waveform tensor.
- sr: int
The sample rate of the audio.
- __init__(waveform: torch.Tensor, sr: int = 16000, *args, **kwargs) None¶
Initialize the AudioProcessor object.
- Args:
waveform (torch.Tensor): The audio waveform tensor. sr (int, optional): The sample rate of the audio. Defaults to SAMPLE_RATE. args: Additional arguments. kwargs: Additional keyword arguments, e.g., device to use for processing. If CUDA is available, it defaults to CUDA.
- Raises:
ValueError: If the provided sample rate is not of type int.
- __repr__() str¶
Return repr(self).
- cut(start: float, end: float) torch.Tensor¶
Cut a segment from the audio waveform between the specified start and end times.
- Args:
start (float): Start time in seconds. end (float): End time in seconds.
- Returns:
torch.Tensor: The cut waveform segment.
- classmethod from_file(file: str, *args, **kwargs) AudioProcessor¶
Create an AudioProcessor instance from an audio file.
- Args:
file (str): The audio file path.
- Returns:
AudioProcessor: An instance of the AudioProcessor class containing the loaded audio.
- static load_audio(file: str, sr: int = 16000)¶
Open an audio file and read it as a mono waveform, resampling if necessary. This method ensures compatibility with pyannote.audio and requires the ffmpeg CLI in PATH.
- Args:
file (str): The audio file to open. sr (int, optional): The desired sample rate. Defaults to SAMPLE_RATE.
- Returns:
- tuple: A NumPy array containing the audio waveform in float32 dtype
and the sample rate.
- Raises:
RuntimeError: If failed to load audio.
autotranscript.autotranscript module¶
AutoTranscribe Class¶
This class serves as the core of the transcription system, responsible for handling transcription and diarization of audio files. It leverages pretrained models for speech-to-text (such as Whisper) and speaker diarization (such as pyannote.audio), providing an accessible interface for audio processing tasks such as transcription, speaker separation, and timestamping.
By encapsulating the complexities of underlying models, it allows for straightforward integration into various applications, ranging from transcription services to voice assistants.
Available Classes: - AutoTranscribe: Main class for performing transcription and diarization.
Includes methods for loading models, processing audio files, and formatting the transcription output.
- Usage:
from .autotranscribe import AutoTranscribe
model = AutoTranscribe(whisper_model=”path/to/whisper/model”, dia_model=”path/to/diarisation/model”) transcript = model.transcribe(“path/to/audiofile.wav”)
- class AutoTranscribe(whisper_model: Union[bool, str, whisper] = None, dia_model: Union[bool, str, DiarisationType] = None, **kwargs)¶
Bases:
objectAutoTranscribe is a class responsible for managing the transcription and diarization of audio files. It serves as the core of the transcription system, incorporating pretrained models for speech-to-text (such as Whisper) and speaker diarization (such as pyannote.audio), allowing for comprehensive audio processing.
- Attributes:
transcriber (Transcriber): The transcriber object to handle transcription. diariser (Diariser): The diariser object to handle diarization.
- Methods:
__init__: Initializes the AutoTranscribe class with appropriate models. transcribe: Transcribes an audio file using the whisper model and pyannote diarization model. remove_audio_file: Removes the original audio file to avoid disk space issues or ensure data privacy. get_audio_file: Gets an audio file as an AudioProcessor object.
- __init__(whisper_model: Union[bool, str, whisper] = None, dia_model: Union[bool, str, DiarisationType] = None, **kwargs) None¶
Initializes the AutoTranscribe class.
- Args:
- whisper_model (Union[bool, str, whisper], optional):
Path to whisper model or whisper model itself.
- diarisation_model (Union[bool, str, DiarisationType], optional):
Path to pyannote diarization model or model itself.
- **kwargs: Additional keyword arguments for whisper
and pyannote diarization models.
- __repr__()¶
Return repr(self).
- autotranscribe(audio_file: Union[str, torch.Tensor, numpy.ndarray], remove_original: bool = False, **kwargs) Transcript¶
Transcribes an audio file using the whisper model and pyannote diarization model.
- Args:
- audio_file (Union[str, torch.Tensor, ndarray]):
Path to audio file or a tensor representing the audio.
- remove_original (bool, optional): If True, the original audio file will
be removed after transcription.
*args: Additional positional arguments for diarization and transcription. **kwargs: Additional keyword arguments for diarization and transcription.
- Returns:
- Transcript: A Transcript object containing the transcription,
which can be exported to different formats.
- diarization(audio_file: Union[str, torch.Tensor, numpy.ndarray], **kwargs) dict¶
Perform diarization on an audio file using the pyannote diarization model.
- Args:
- audio_file (Union[str, torch.Tensor, ndarray]):
The audio source which can either be a path to the audio file or a tensor representation.
- **kwargs:
Additional keyword arguments for diarization.
- Returns:
- dict:
A dictionary containing the results of the diarization process.
- static get_audio_file(audio_file: Union[str, torch.Tensor, numpy.ndarray], *args, **kwargs) AudioProcessor¶
Gets an audio file as TorchAudioProcessor.
- static remove_audio_file(audio_file: str, shred: bool = False) None¶
Removes the original audio file to avoid disk space issues or ensure data privacy.
- Args:
audio_file_path (str): Path to the audio file. shred (bool, optional): If True, the audio file will be shredded,
not just removed.
- transcribe(audio_file: Union[str, torch.Tensor, numpy.ndarray], **kwargs)¶
Transcribe the provided audio file.
- Args:
- audio_file (Union[str, torch.Tensor, ndarray]):
The audio source, which can either be a path or a tensor representation.
- **kwargs:
Additional keyword arguments for transcription.
- Returns:
- str:
The transcribed text from the audio source.
autotranscript.cli module¶
Command-Line Interface (CLI) for the AutoTranscribe class, allowing for user interaction to transcribe and diarize audio files. The function includes arguments for specifying the audio files, model paths, output formats, and other options necessary for transcription.
- cli()¶
Command-Line Interface (CLI) for the AutoTranscribe class, allowing for user interaction to transcribe and diarize audio files. The function includes arguments for specifying the audio files, model paths, output formats, and other options necessary for transcription.
This function can be executed from the command line to perform transcription tasks, providing a user-friendly way to access the AutoTranscribe class functionalities.
autotranscript.diarisation module¶
Diarisation Class¶
This class serves as the heart of the speaker diarization system, responsible for identifying and segmenting individual speakers from a given audio file. It leverages a pretrained model from pyannote.audio, providing an accessible interface for audio processing tasks such as speaker separation, and timestamping.
By encapsulating the complexities of the underlying model, it allows for straightforward integration into various applications, ranging from transcription services to voice assistants.
Available Classes: - Diariser: Main class for performing speaker diarization.
Includes methods for loading models, processing audio files, and formatting the diarization output.
Constants: - TOKEN_PATH (str): Path to the Pyannote token. - PYANNOTE_DEFAULT_PATH (str): Default path to Pyannote models. - PYANNOTE_DEFAULT_CONFIG (str): Default configuration for Pyannote models.
- Usage:
from .diarisation import Diariser
model = Diariser.load_model(model=”path/to/model/config.yaml”) diarisation_output = model.diarization(“path/to/audiofile.wav”)
- class Diariser(model)¶
Bases:
objectHandles the diarization process of an audio file using a pretrained model from pyannote.audio. Diarization is the task of determining “who spoke when.”
- Args:
model: The pretrained model to use for diarization.
- __init__(model) None¶
- __repr__()¶
Return repr(self).
- diarization(audiofile: Union[str, torch.Tensor, dict], *args, **kwargs) Annotation¶
Perform speaker diarization on the provided audio file, effectively separating different speakers and providing a timestamp for each segment.
- Args:
- audiofile: The path to the audio file or a torch.Tensor
containing the audio data.
args: Additional arguments for the diarization model. kwargs: Additional keyword arguments for the diarization model.
- Returns:
- dict: A dictionary containing speaker names,
segments, and other information related to the diarization process.
- static format_diarization_output(dia: Annotation) dict¶
Formats the raw diarization output into a more usable structure for this project.
- Args:
dia: Raw diarization output.
- Returns:
- dict: A structured representation of the diarization, with speaker names
as keys and a list of tuples representing segments as values.
- classmethod load_model(model: str = '/home/ortizcruzc/.cache/torch/models/pyannote/config.yaml', use_auth_token: str = None, cache_token: bool = True, cache_dir: Union[Path, str] = '/home/ortizcruzc/.cache/torch/models/pyannote', hparams_file: Union[str, Path] = None, *args, **kwargs) pyannote.audio.Pipeline¶
Loads a pretrained model from pyannote.audio, either from a local cache or online repository.
- Args:
- model: Path or identifier for the pyannote model.
default: /models/pyannote/speaker_diarization/config.yaml
token: Optional HUGGINGFACE_TOKEN for authenticated access. cache_token: Whether to cache the token locally for future use. cache_dir: Directory for caching models. hparams_file: Path to a YAML file containing hyperparameters. args: Additional arguments only to avoid errors. kwargs: Additional keyword arguments only to avoid errors.
- Returns:
Pipeline: A pyannote.audio Pipeline object, encapsulating the loaded model.
autotranscript.misc module¶
- config_diarization_yaml(file_path: str, path_to_segmentation: str = None) None¶
Configure diarization pipeline from a YAML file.
This function updates the YAML file to use the given segmentation model offline, and avoids manual file manipulation.
- Args:
file_path (str): Path to the YAML file. path_to_segmentation (str, optional): Optional path to the segmentation model.
- Raises:
FileNotFoundError: If the segmentation model file is not found.
autotranscript.transcriber module¶
Transcriber Module¶
This module provides the Transcriber class, a comprehensive tool for working with Whisper models. The Transcriber class offers functionalities such as loading different Whisper models, transcribing audio files, and saving transcriptions to text files. It acts as an interface between various Whisper models and the user, simplifying the process of audio transcription.
- Main Features:
Loading different sizes and versions of Whisper models.
Transcribing audio in various formats including str, Tensor, and nparray.
Saving the transcriptions to the specified paths.
Adaptable to various language specifications.
Options to control the verbosity of the transcription process.
- Constants:
WHISPER_DEFAULT_PATH: Default path for downloading and loading Whisper models.
- Usage:
>>> from your_package import Transcriber >>> transcriber = Transcriber.load_model(model="medium") >>> transcript = transcriber.transcribe(audio="path/to/audio.wav") >>> transcriber.save_transcript(transcript, "path/to/save.txt")
- class Transcriber(model: whisper)¶
Bases:
objectThe Transcriber class serves as a wrapper around Whisper models for efficient audio transcription. By encapsulating the intricacies of loading models, processing audio, and saving transcripts, it offers an easy-to-use interface for users to transcribe audio files.
- Attributes:
model (whisper): The Whisper model used for transcription.
- Methods:
transcribe: Transcribes the given audio file. save_transcript: Saves the transcript to a file. load_model: Loads a specific Whisper model. _get_whisper_kwargs: Private method to get valid keyword arguments for the whisper model.
- Examples:
>>> transcriber = Transcriber.load_model(model="medium") >>> transcript = transcriber.transcribe(audio="path/to/audio.wav") >>> transcriber.save_transcript(transcript, "path/to/save.txt")
- Note:
The class supports various sizes and versions of Whisper models. Please refer to the load_model method for available options.
- __init__(model: whisper) None¶
Initialize the Transcriber class with a Whisper model.
- Args:
model (whisper): The Whisper model to use for transcription.
- __repr__() str¶
Return repr(self).
- classmethod load_model(model: str = 'medium', download_root: str = '/home/ortizcruzc/.cache/torch/models/whisper', device: Optional[Union[str, torch.device]] = None, in_memory: bool = False, *args, **kwargs) Transcriber¶
Load whisper model.
- Args:
- model (str): Whisper model. Available models include:
‘tiny.en’
‘tiny’
‘base.en’
‘base’
‘small.en’
‘small’
‘medium.en’
‘medium’
‘large-v1’
‘large-v2’
‘large’
- download_root (str, optional): Path to download the model.
Defaults to WHISPER_DEFAULT_PATH.
- device (Optional[Union[str, torch.device]], optional):
Device to load model on. Defaults to None.
- in_memory (bool, optional): Whether to load model in memory.
Defaults to False.
args: Additional arguments only to avoid errors. kwargs: Additional keyword arguments only to avoid errors.
- Returns:
Transcriber: A Transcriber object initialized with the specified model.
- static save_transcript(transcript: str, save_path: str) None¶
Save a transcript to a file.
- Args:
transcript (str): The transcript as a string. save_path (str): The path to save the transcript.
- Returns:
None
- transcribe(audio: Union[str, torch.Tensor, numpy.ndarray], *args, **kwargs) str¶
Transcribe an audio file.
autotranscript.transcript_exporter module¶
- class Transcript(transcript: dict)¶
Bases:
objectClass for storing transcript data, including speaker information and text segments, and exporting it to various file formats such as JSON, HTML, and LaTeX.
- __init__(transcript: dict) None¶
Initializes the Transcript object with the given transcript data.
- Args:
- transcript (dict): A dictionary containing the formatted transcript string.
Keys should correspond to segment IDs, and values should contain speaker and segment information.
- __repr__() str¶
Return a string representation of the Transcript object.
- Returns:
str: A string that provides an informative description of the object.
- __str__() str¶
Converts the transcript to a string representation.
- Returns:
- str: String representation of the transcript, including speaker names and
time stamps for each segment.
- annotate(*args, **kwargs) dict¶
Annotates the transcript to associate specific names with speakers.
- Args:
args (list): List of speaker names. These will be mapped sequentially to the speakers. kwargs (dict): Dictionary with speaker names as keys and list of segments as values.
- Returns:
dict: Dictionary with speaker names as keys and list of segments as values.
- Raises:
- ValueError: If the number of speaker names does not match the number
of speakers, or if an unknown speaker is found.
- classmethod from_json(json: Union[dict, str]) Transcript¶
Load transcript from json file
- Args:
path (str): path to json file
- Returns:
Transcript: Transcript object
- get_dict() dict¶
Get transcript as dict
- Returns:
transcript as dict
- Return type:
dict
- get_html() str¶
Get transcript as html string
- Returns:
transcript as html string
- Return type:
str
- get_json(*args, use_annotation: bool = True, **kwargs) str¶
Get transcript as json string :return: transcript as json string :rtype: str
- get_md() str¶
Get transcript as Markdown string, using HTML formatting.
- Returns:
str: Transcript as a Markdown string.
- get_tex() str¶
Get transcript as LaTeX string. If no annotations are present, the speakers will be annotated with the first letters of the alphabet.
- Returns:
str: Transcript as LaTeX string.
- save(path: str, *args, **kwargs) None¶
Save transcript to file with the given path and file format.
This method can save the transcript in various formats including JSON, TXT, MD, HTML, TEX, and PDF. The file format is determined by the extension of the path.
- Args:
path (str): Path to save the file, including the desired file extension. *args: Additional positional arguments to be passed to the specific save methods. **kwargs: Additional keyword arguments to be passed to the specific save methods.
- Raises:
ValueError: If the file format specified in the path is unknown.
- to_html(path: str) None¶
Save transcript as html file
- Parameters:
path (str) – path to save file
- to_json(path, *args, **kwargs) None¶
Save transcript as json file
- Args:
path (str): path to save file
- to_md(path: str) None¶
Get transcript as Markdown string, using HTML formatting.
- Returns:
str: Transcript as a Markdown string.
- to_pdf(path: str) None¶
Save transcript as a PDF file (placeholder function, implementation needed).
- Args:
path (str): Path to save the PDF file.
- to_tex(path: str) None¶
Save transcript as a LaTeX file (placeholder function, implementation needed).
- Args:
path (str): Path to save the LaTeX file.
- to_txt(path: str) None¶
Save transcript as a LaTeX file (placeholder function, implementation needed).
- Args:
path (str): Path to save the LaTeX file.
autotranscript.version module¶
- get_version(build_version=False)¶
- git_version()¶