autotranscript package

Subpackages

Submodules

autotranscript.audio module

Audio Processor Module

This module provides the AudioProcessor class, utilizing PyTorchaudio for handling audio files. It includes functionalities to load, cut, and manage audio waveforms, offering efficient and flexible audio processing.

Available Classes: - AudioProcessor: Processes audio waveforms and provides methods for loading,

cutting, and handling audio.

Usage:

from .audio_import AudioProcessor

processor = AudioProcessor.from_file(“path/to/audiofile.wav”) cut_waveform = processor.cut(start=1.0, end=5.0)

Constants: - SAMPLE_RATE (int): Default sample rate for processing. - NORMALIZATION_FACTOR (float): Normalization factor for audio waveform.

class AudioProcessor(waveform: torch.Tensor, sr: int = 16000, *args, **kwargs)

Bases: object

Audio Processor class that leverages PyTorchaudio to provide functionalities for loading, cutting, and handling audio waveforms.

Attributes:
waveform: torch.Tensor

The audio waveform tensor.

sr: int

The sample rate of the audio.

__init__(waveform: torch.Tensor, sr: int = 16000, *args, **kwargs) None

Initialize the AudioProcessor object.

Args:

waveform (torch.Tensor): The audio waveform tensor. sr (int, optional): The sample rate of the audio. Defaults to SAMPLE_RATE. args: Additional arguments. kwargs: Additional keyword arguments, e.g., device to use for processing. If CUDA is available, it defaults to CUDA.

Raises:

ValueError: If the provided sample rate is not of type int.

__repr__() str

Return repr(self).

cut(start: float, end: float) torch.Tensor

Cut a segment from the audio waveform between the specified start and end times.

Args:

start (float): Start time in seconds. end (float): End time in seconds.

Returns:

torch.Tensor: The cut waveform segment.

classmethod from_file(file: str, *args, **kwargs) AudioProcessor

Create an AudioProcessor instance from an audio file.

Args:

file (str): The audio file path.

Returns:

AudioProcessor: An instance of the AudioProcessor class containing the loaded audio.

static load_audio(file: str, sr: int = 16000)

Open an audio file and read it as a mono waveform, resampling if necessary. This method ensures compatibility with pyannote.audio and requires the ffmpeg CLI in PATH.

Args:

file (str): The audio file to open. sr (int, optional): The desired sample rate. Defaults to SAMPLE_RATE.

Returns:
tuple: A NumPy array containing the audio waveform in float32 dtype

and the sample rate.

Raises:

RuntimeError: If failed to load audio.

autotranscript.autotranscript module

AutoTranscribe Class

This class serves as the core of the transcription system, responsible for handling transcription and diarization of audio files. It leverages pretrained models for speech-to-text (such as Whisper) and speaker diarization (such as pyannote.audio), providing an accessible interface for audio processing tasks such as transcription, speaker separation, and timestamping.

By encapsulating the complexities of underlying models, it allows for straightforward integration into various applications, ranging from transcription services to voice assistants.

Available Classes: - AutoTranscribe: Main class for performing transcription and diarization.

Includes methods for loading models, processing audio files, and formatting the transcription output.

Usage:

from .autotranscribe import AutoTranscribe

model = AutoTranscribe(whisper_model=”path/to/whisper/model”, dia_model=”path/to/diarisation/model”) transcript = model.transcribe(“path/to/audiofile.wav”)

class AutoTranscribe(whisper_model: Union[bool, str, whisper] = None, dia_model: Union[bool, str, DiarisationType] = None, **kwargs)

Bases: object

AutoTranscribe is a class responsible for managing the transcription and diarization of audio files. It serves as the core of the transcription system, incorporating pretrained models for speech-to-text (such as Whisper) and speaker diarization (such as pyannote.audio), allowing for comprehensive audio processing.

Attributes:

transcriber (Transcriber): The transcriber object to handle transcription. diariser (Diariser): The diariser object to handle diarization.

Methods:

__init__: Initializes the AutoTranscribe class with appropriate models. transcribe: Transcribes an audio file using the whisper model and pyannote diarization model. remove_audio_file: Removes the original audio file to avoid disk space issues or ensure data privacy. get_audio_file: Gets an audio file as an AudioProcessor object.

__init__(whisper_model: Union[bool, str, whisper] = None, dia_model: Union[bool, str, DiarisationType] = None, **kwargs) None

Initializes the AutoTranscribe class.

Args:
whisper_model (Union[bool, str, whisper], optional):

Path to whisper model or whisper model itself.

diarisation_model (Union[bool, str, DiarisationType], optional):

Path to pyannote diarization model or model itself.

**kwargs: Additional keyword arguments for whisper

and pyannote diarization models.

static get_audio_file(audio_file: Union[str, torch.Tensor, numpy.ndarray], *args, **kwargs) AudioProcessor

Gets an audio file as TorchAudioProcessor.

Args:
audio_file (Union[str, torch.Tensor, ndarray]): Path to the audio file or

a tensor representing the audio.

*args: Additional positional arguments. **kwargs: Additional keyword arguments.

Returns:
AudioProcessor: An object containing the waveform and sample rate in

torch.Tensor format.

static remove_audio_file(audio_file: str, shred: bool = False) None

Removes the original audio file to avoid disk space issues or ensure data privacy.

Args:

audio_file_path (str): Path to the audio file. shred (bool, optional): If True, the audio file will be shredded,

not just removed.

transcribe(audio_file: Union[str, torch.Tensor, numpy.ndarray], remove_original: bool = False, **kwargs) Transcript

Transcribes an audio file using the whisper model and pyannote diarization model.

Args:
audio_file (Union[str, torch.Tensor, ndarray]):

Path to audio file or a tensor representing the audio.

remove_original (bool, optional): If True, the original audio file will

be removed after transcription.

*args: Additional positional arguments for diarization and transcription. **kwargs: Additional keyword arguments for diarization and transcription.

Returns:
Transcript: A Transcript object containing the transcription,

which can be exported to different formats.

cli()

Command-Line Interface (CLI) for the AutoTranscribe class, allowing for user interaction to transcribe and diarize audio files. The function includes arguments for specifying the audio files, model paths, output formats, and other options necessary for transcription.

This function can be executed from the command line to perform transcription tasks, providing a user-friendly way to access the AutoTranscribe class functionalities.

autotranscript.diarisation module

Diarisation Class

This class serves as the heart of the speaker diarization system, responsible for identifying and segmenting individual speakers from a given audio file. It leverages a pretrained model from pyannote.audio, providing an accessible interface for audio processing tasks such as speaker separation, and timestamping.

By encapsulating the complexities of the underlying model, it allows for straightforward integration into various applications, ranging from transcription services to voice assistants.

Available Classes: - Diariser: Main class for performing speaker diarization.

Includes methods for loading models, processing audio files, and formatting the diarization output.

Constants: - TOKEN_PATH (str): Path to the Pyannote token. - PYANNOTE_DEFAULT_PATH (str): Default path to Pyannote models. - PYANNOTE_DEFAULT_CONFIG (str): Default configuration for Pyannote models.

Usage:

from .diarisation import Diariser

model = Diariser.load_model(model=”path/to/model/config.yaml”) diarisation_output = model.diarization(“path/to/audiofile.wav”)

class Diariser(model)

Bases: object

Handles the diarization process of an audio file using a pretrained model from pyannote.audio. Diarization is the task of determining “who spoke when.”

Args:

model: The pretrained model to use for diarization.

__init__(model) None
__repr__()

Return repr(self).

diarization(audiofile: Union[str, torch.Tensor, dict], *args, **kwargs) Annotation

Perform speaker diarization on the provided audio file, effectively separating different speakers and providing a timestamp for each segment.

Args:
audiofile: The path to the audio file or a torch.Tensor

containing the audio data.

args: Additional arguments for the diarization model. kwargs: Additional keyword arguments for the diarization model.

Returns:
dict: A dictionary containing speaker names,

segments, and other information related to the diarization process.

static format_diarization_output(dia: Annotation) dict

Formats the raw diarization output into a more usable structure for this project.

Args:

dia: Raw diarization output.

Returns:
dict: A structured representation of the diarization, with speaker names

as keys and a list of tuples representing segments as values.

classmethod load_model(model: str = '/home/ortizcruzc/.cache/torch/models/pyannote/config.yaml', token: str = None, cache_token: bool = False, cache_dir: Union[Path, str] = '/home/ortizcruzc/.cache/torch/models/pyannote', hparams_file: Union[str, Path] = None) pyannote.audio.Pipeline

Loads a pretrained model from pyannote.audio, either from a local cache or online repository.

Args:
model: Path or identifier for the pyannote model.

default: /models/pyannote/speaker_diarization/config.yaml

token: Optional HUGGINGFACE_TOKEN for authenticated access. cache_token: Whether to cache the token locally for future use. cache_dir: Directory for caching models. hparams_file: Path to a YAML file containing hyperparameters.

Returns:

Pipeline: A pyannote.audio Pipeline object, encapsulating the loaded model.

autotranscript.misc module

config_diarization_yaml(file_path: str, path_to_segmentation: str = None) None

Configure diarization pipeline from a YAML file.

This function updates the YAML file to use the given segmentation model offline, and avoids manual file manipulation.

Args:

file_path (str): Path to the YAML file. path_to_segmentation (str, optional): Optional path to the segmentation model.

Raises:

FileNotFoundError: If the segmentation model file is not found.

autotranscript.transcriber module

Transcriber Module

This module provides the Transcriber class, a comprehensive tool for working with Whisper models. The Transcriber class offers functionalities such as loading different Whisper models, transcribing audio files, and saving transcriptions to text files. It acts as an interface between various Whisper models and the user, simplifying the process of audio transcription.

Main Features:
  • Loading different sizes and versions of Whisper models.

  • Transcribing audio in various formats including str, Tensor, and nparray.

  • Saving the transcriptions to the specified paths.

  • Adaptable to various language specifications.

  • Options to control the verbosity of the transcription process.

Constants:

WHISPER_DEFAULT_PATH: Default path for downloading and loading Whisper models.

Usage:
>>> from your_package import Transcriber
>>> transcriber = Transcriber.load_model(model="medium")
>>> transcript = transcriber.transcribe(audio="path/to/audio.wav")
>>> transcriber.save_transcript(transcript, "path/to/save.txt")
class Transcriber(model: whisper)

Bases: object

The Transcriber class serves as a wrapper around Whisper models for efficient audio transcription. By encapsulating the intricacies of loading models, processing audio, and saving transcripts, it offers an easy-to-use interface for users to transcribe audio files.

Attributes:

model (whisper): The Whisper model used for transcription.

Methods:

transcribe: Transcribes the given audio file. save_transcript: Saves the transcript to a file. load_model: Loads a specific Whisper model. _get_whisper_kwargs: Private method to get valid keyword arguments for the whisper model.

Examples:
>>> transcriber = Transcriber.load_model(model="medium")
>>> transcript = transcriber.transcribe(audio="path/to/audio.wav")
>>> transcriber.save_transcript(transcript, "path/to/save.txt")
Note:

The class supports various sizes and versions of Whisper models. Please refer to the load_model method for available options.

__init__(model: whisper) None

Initialize the Transcriber class with a Whisper model.

Args:

model (whisper): The Whisper model to use for transcription.

__repr__() str

Return repr(self).

classmethod load_model(model: str = 'medium', download_root: str = '/home/ortizcruzc/.cache/torch/models/whisper', device: Optional[Union[str, torch.device]] = None, in_memory: bool = False) Transcriber

Load whisper model.

Args:
model (str): Whisper model. Available models include:
  • ‘tiny.en’

  • ‘tiny’

  • ‘base.en’

  • ‘base’

  • ‘small.en’

  • ‘small’

  • ‘medium.en’

  • ‘medium’

  • ‘large-v1’

  • ‘large-v2’

  • ‘large’

download_root (str, optional): Path to download the model.

Defaults to WHISPER_DEFAULT_PATH.

device (Optional[Union[str, torch.device]], optional):

Device to load model on. Defaults to None.

in_memory (bool, optional): Whether to load model in memory.

Defaults to False.

Returns:

Transcriber: A Transcriber object initialized with the specified model.

static save_transcript(transcript: str, save_path: str) None

Save a transcript to a file.

Args:

transcript (str): The transcript as a string. save_path (str): The path to save the transcript.

Returns:

None

transcribe(audio: Union[str, torch.Tensor, numpy.ndarray], *args, **kwargs) str

Transcribe an audio file.

Args:

audio (Union[str, Tensor, nparray]): The audio file to transcribe. *args: Additional arguments. **kwargs: Additional keyword arguments,

such as the language of the audio file.

Returns:

str: The transcript as a string.

autotranscript.transcript_exporter module

class Transcript(transcript: dict)

Bases: object

Class for storing transcript data, including speaker information and text segments, and exporting it to various file formats such as JSON, HTML, and LaTeX.

__init__(transcript: dict) None

Initializes the Transcript object with the given transcript data.

Args:
transcript (dict): A dictionary containing the formatted transcript string.

Keys should correspond to segment IDs, and values should contain speaker and segment information.

__repr__() str

Return a string representation of the Transcript object.

Returns:

str: A string that provides an informative description of the object.

__str__() str

Converts the transcript to a string representation.

Returns:
str: String representation of the transcript, including speaker names and

time stamps for each segment.

annotate(*args, **kwargs) dict

Annotates the transcript to associate specific names with speakers.

Args:

args (list): List of speaker names. These will be mapped sequentially to the speakers. kwargs (dict): Dictionary with speaker names as keys and list of segments as values.

Returns:

dict: Dictionary with speaker names as keys and the corresponding annotation as values.

Raises:
ValueError: If the number of speaker names does not match the number

of speakers, or if an unknown speaker is found.

get_dict() dict

Get transcript as dict

Returns:

transcript as dict

Return type:

dict

get_html() str

Get transcript as html string

Returns:

transcript as html string

Return type:

str

get_json(*args, **kwargs) str

Get transcript as json string :return: transcript as json string :rtype: str

get_md() str

Get transcript as Markdown string, using HTML formatting.

Returns:

str: Transcript as a Markdown string.

get_tex() str

Get transcript as LaTeX string. If no annotations are present, the speakers will be annotated with the first letters of the alphabet.

Returns:

str: Transcript as LaTeX string.

save(path: str, *args, **kwargs) None

Save transcript to file with the given path and file format.

This method can save the transcript in various formats including JSON, TXT, MD, HTML, TEX, and PDF. The file format is determined by the extension of the path.

Args:

path (str): Path to save the file, including the desired file extension. *args: Additional positional arguments to be passed to the specific save methods. **kwargs: Additional keyword arguments to be passed to the specific save methods.

Raises:

ValueError: If the file format specified in the path is unknown.

to_html(path: str) None

Save transcript as html file

Parameters:

path (str) – path to save file

to_json(path, *args, **kwargs) None

Save transcript as json file

Args:

path (str): path to save file

to_md(path: str) None

Get transcript as Markdown string, using HTML formatting.

Returns:

str: Transcript as a Markdown string.

to_pdf(path: str) None

Save transcript as a PDF file (placeholder function, implementation needed).

Args:

path (str): Path to save the PDF file.

to_tex(path: str) None

Save transcript as a LaTeX file (placeholder function, implementation needed).

Args:

path (str): Path to save the LaTeX file.

to_txt(path: str) None

Save transcript as a LaTeX file (placeholder function, implementation needed).

Args:

path (str): Path to save the LaTeX file.

autotranscript.version module

get_version(build_version=False)
git_version()

Module contents