Merge branch 'develop' into faster_whisper

This commit is contained in:
Schmieder, Jacob
2024-09-10 15:17:53 +00:00
20 changed files with 386 additions and 267 deletions
+35
View File
@@ -0,0 +1,35 @@
---
name: Bug report
about: Create a report to help us improve
title: "[BUG]"
labels: bug
assignees: ''
---
## Description 🐛
Please provide a clear and concise description of the bug. What went wrong?
## Steps to Reproduce 🔍
Steps to reproduce the behavior:
1. Go to '...'
2. Run command '....'
3. Provide input '....'
4. See error
## Expected Behavior 🤔
What did you expect to happen instead?
## Screenshots or Logs 📸
If applicable, add screenshots or logs to help explain your problem. This can include terminal output or error messages.
## Environment 🖥️
- **OS**: [e.g., Ubuntu 20.04]
- **Python Version**: [e.g., 3.9]
- **PyTorch Version**: [e.g., 2.0]
- **CUDA Version** (if applicable): [e.g., 11.7]
- **ScrAIbe Version**: [e.g., 1.0.0]
- **Installation Type**: [e.g., pip, GitHub, Docker, etc.]
## Additional Context 📝
Add any other context about the problem here. For example, information about custom models or configurations, related issues, or anything else that might be helpful.
+30
View File
@@ -0,0 +1,30 @@
---
name: Custom issue template
about: Describe this issue template's purpose here.
title: "[CUSTOM] "
labels: ''
assignees: ''
---
## Description 📝
Provide a detailed description of the issue or request. Explain the context, the problem, or the question you have.
## Objective 🎯
What do you hope to achieve with this issue? Are you looking for guidance, proposing a discussion, or something else?
## Relevant Information 📂
Include any relevant details, such as:
- Code snippets
- Links to related documentation or issues
- Configuration files
- Screenshots or diagrams
## Steps to Reproduce or Reference 🔍
If applicable, provide steps to reproduce the issue or reference specific parts of the project that are relevant to your issue.
## Proposed Next Steps 🚀
What do you propose as the next steps for addressing this issue? Do you need help, or are you suggesting a specific course of action?
## Additional Context 📝
Add any other context that might help understand the issue. This could include environmental details, related discussions, or any other relevant information.
+23
View File
@@ -0,0 +1,23 @@
---
name: Feature request
about: Suggest an idea for this project
title: "[FEATURE]"
labels: feature
assignees: ''
---
## Description 📝
Provide a clear and concise description of the feature or enhancement you are proposing. What problem does it solve, or what capability does it add?
## Use Case 💡
Explain the use case(s) for this feature. How would it benefit you or others? Include any relevant examples or scenarios.
## Proposed Solution 🚀
Describe your proposed solution in detail. How would the feature work? If you have an idea of how to implement it, include that here. Code snippets or references to other projects can be helpful.
## Alternatives Considered 🔄
Have you considered any alternative approaches or solutions? If so, please describe them and explain why they wouldn't be as effective.
## Additional Context 📝
Add any other context, screenshots, or mockups that might help clarify your request. This could include links to relevant discussions, related issues, or other resources.
+21
View File
@@ -0,0 +1,21 @@
{
labelsSynonyms: {
bug: ['error', 'need fix', 'not working', 'failure', 'crash', 'problem', 'issue', 'defect', 'glitch', 'fault', 'anomaly'],
enhancement: ['upgrade', 'update', 'improve', 'feature request', 'new feature', 'enhance', 'extension', 'add-on', 'improvement'],
"help wanted": ['help', 'how can i', 'assistance needed', 'support needed', 'question', 'guidance', 'aid', 'need assistance', 'advice', 'instruction'],
documentation: ['docs', 'Readme', 'documentation', 'guide', 'manual', 'instructions', 'how-to', 'reference', 'tutorial', 'specification'],
docker: ['compose', 'Dockerfile', 'container', 'docker-compose', 'image', 'docker setup', 'kubernetes', 'docker swarm', 'containerization'],
performance: ['slow', 'lag', 'performance', 'speed', 'optimization', 'tuning', 'efficiency', 'latency', 'improve performance', 'boost', 'performance issue'],
security: ['vulnerability', 'exploit', 'attack', 'breach', 'security', 'protection', 'patch', 'secure', 'threat', 'risk', 'malware'],
ui: ['user interface', 'ui', 'ux', 'design', 'layout', 'front-end', 'visual', 'interface', 'experience', 'aesthetic', 'theme', 'style'],
test: ['test', 'testing', 'unit test', 'integration test', 'e2e test', 'automated test', 'test case', 'test suite', 'qa', 'quality assurance'],
compatibility: ['compatible', 'incompatible', 'version', 'compatibility', 'interop', 'support', 'versioning', 'cross-platform', 'integration', 'compatibility issue']
},
labelsNotAllowed: [
'duplicate',
'good first issue',
'invalid'
],
//defaultLabels: ['triage'],
ignoreComments: true
}
+63
View File
@@ -0,0 +1,63 @@
# .github/labeler.yml
# Label for documentation changes
documentation:
- changed-files:
- any-glob-to-any-file:
- 'docs/**'
- 'README.md'
- 'CHANGELOG.md'
- 'CONTRIBUTING.md'
- 'Makefile'
- 'Pictures'
# Label for Docker changes
docker:
- changed-files:
- any-glob-to-any-file:
- '*docker*'
- 'Docker*'
# Label for release-related changes
release:
- changed-files:
- any-glob-to-any-file:
- 'scraibe/**'
- 'pyproject.toml'
- 'LICENCE'
tests:
- changed-files:
- any-glob-to-any-file:
- 'test/**'
workflows:
- changed-files:
- any-glob-to-any-file:
- '.github/workflows/*'
- '.github/*'
github:
- changed-files:
- any-glob-to-any-file:
- '.gitignore'
- '.github/ISSUE_TEMPLATE/*'
dependencies:
- changed-files:
- any-glob-to-any-file:
- 'requirements.txt'
- 'environment.yml'
- 'pyproject.toml'
- head-branch: ['^dependencies', 'dependencies', '^dependency', 'dependency']
feature:
- head-branch: ['^feature', 'feature']
patch:
- head-branch: ['^patch', 'patch', '^bug', 'bug']
ignore-pr-title-for-release:
- head-branch: ['develop']
- base-branch: ['main']
+47
View File
@@ -0,0 +1,47 @@
#Automatically generated release notes from GIthub used by softprops/action-gh-release@v2 in .github/workflows/release.yaml
changelog:
exclude:
labels:
- ignore-for-release
- ignore-pr-title-for-release
- workflows
- github
- documentation
authors:
- octocat
- github-actions[bot]
categories:
- title: New Features 🎉
labels:
- enhancement
- feature
- Semver-Minor
- title: Bug Fixes 🐛
labels:
- bug
- fix
- patch
- title: Dependency Updates 📦
labels:
- dependency
- dependencies
- dependency-update
- title: Breaking Changes 🛠
labels:
- breaking-change
- Semver-Major
- title: Container and Compose Updates 🐳
labels:
- docker
- compose
- docker-compose
- title: Other Changes 🔧
labels:
- "*"
+18
View File
@@ -0,0 +1,18 @@
name: Labeling new issue
on:
issues:
types: [opened, reopened]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
sparse-checkout: |
.github/auto-label.json5
sparse-checkout-cone-mode: false
- uses: Renato66/auto-label@v3
with:
repo-token: ${{ secrets.GITHUB_TOKEN }}
configuration-file: .github/auto-label.json5
+22
View File
@@ -0,0 +1,22 @@
name: Auto Label PRs
on:
pull_request:
types: [opened, synchronize, reopened, edited]
jobs:
label:
runs-on: ubuntu-latest
permissions:
contents: read
pull-requests: write
outputs:
all-labels: ${{ steps.label-the-PR.outputs.all-labels }}
steps:
- name: Apply Labels
id: label-the-PR
uses: actions/labeler@v5
with:
repo-token: ${{ secrets.GITHUB_TOKEN }}
configuration-path: .github/auto_label_pr.yml
sync-labels: true
-90
View File
@@ -1,90 +0,0 @@
name: Check and Add Version in Changelog
on:
pull_request:
branches:
- main
- develop
jobs:
check-and-add-version:
runs-on: ubuntu-latest
steps:
- name: Checkout Repository
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Check if Source Branch is docs
id: check_docs_branch
run: |
pr_head_ref="${{ github.event.pull_request.head.ref }}"
if [[ "$pr_head_ref" == "docs" ]]; then
echo "This is a docs branch merge. Exiting without creating a tag."
echo "is_docs_branch=true" >> $GITHUB_ENV
exit 0
else
echo "is_docs_branch=false" >> $GITHUB_ENV
fi
- name: Extract and Determine Version
if: env.is_docs_branch != 'true'
id: extract_version
run: |
# Fetch the latest tags from the remote
git fetch --tags
# Get the latest tag, or initialize to v0.0.0 if no tags are found
latest_tag=$(git describe --tags `git rev-list --tags --max-count=1` 2>/dev/null || echo "v0.0.0")
# Extract version from PR title or body
pr_body="${{ github.event.pull_request.body }}"
pr_title="${{ github.event.pull_request.title }}"
version_regex="v([0-9]+)\.([0-9]+)\.([0-9]+)"
if [[ $pr_body =~ $version_regex ]]; then
major=${BASH_REMATCH[1]}
minor=${BASH_REMATCH[2]}
patch=${BASH_REMATCH[3]}
new_tag="v$major.$minor.$patch"
elif [[ $pr_title =~ $version_regex ]]; then
major=${BASH_REMATCH[1]}
minor=${BASH_REMATCH[2]}
patch=${BASH_REMATCH[3]}
new_tag="v$major.$minor.$patch"
else
# Split the latest tag into parts
IFS='.' read -r -a parts <<< "${latest_tag#v}"
major=${parts[0]}
minor=${parts[1]}
patch=${parts[2]}
patch=$((patch + 1))
new_tag="v$major.$minor.$patch"
fi
clean_version="${new_tag#v}"
echo "version=$clean_version" >> $GITHUB_ENV
echo "Version determined: $clean_version"
- name: Check if Version Already Exists in Tags
if: env.is_docs_branch != 'true'
run: |
version="${{ env.version }}"
if git tag --list | grep -q "^$version$"; then
echo "Version $version already exists in tags."
exit 1
else
echo "Version $version does not exist in tags."
fi
- name: Check Version in CHANGELOG
if: env.is_docs_branch != 'true'
id: check_version
run: |
version="${{ env.version }}"
if ! grep -q "^## \[$version\]" CHANGELOG.md; then
echo "Version $version not found in CHANGELOG.md."
exit 1
else
echo "Version $version found in CHANGELOG.md."
fi
+2 -3
View File
@@ -2,8 +2,8 @@ name: documentation
on: on:
push: push:
branches: tags:
- main - 'v*.*.*'
workflow_dispatch: workflow_dispatch:
permissions: permissions:
@@ -36,7 +36,6 @@ jobs:
make html make html
- name: Deploy to GitHub Pages - name: Deploy to GitHub Pages
uses: peaceiris/actions-gh-pages@v3 uses: peaceiris/actions-gh-pages@v3
if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/sphinx_action' }}
with: with:
publish_branch: gh-pages publish_branch: gh-pages
github_token: ${{ secrets.TOKEN_GH }} github_token: ${{ secrets.TOKEN_GH }}
-29
View File
@@ -1,29 +0,0 @@
name: Manual Publish to PyPI
on:
workflow_dispatch:
inputs:
branch:
description: 'Branch to check out (main or develop)'
required: true
type: choice
options:
- main
- develop
jobs:
publish-to-pypi:
name: Publish to PyPI
runs-on: ubuntu-latest
steps:
- name: Checkout Repository
uses: actions/checkout@v4
with:
fetch-depth: '0'
ref: ${{ github.event.inputs.branch }}
- name: Set up Poetry 📦
uses: JRubics/poetry-publish@v1.16
with:
pypi_token: ${{ secrets.PYPI_API_TOKEN }}
plugins: "poetry-dynamic-versioning"
repository_name: "scraibe"
+26 -23
View File
@@ -1,24 +1,29 @@
name: Publish Python 🐍 distribution 📦 to PyPI and TestPyPI name: Publish Python 🐍 distribution 📦 to PyPI and TestPyPI
on: on:
pull_request: push:
types: [closed] tags:
branches: - 'v*.*.*'
- develop branches:
- main - "develop"
paths:
- "scraibe/**"
- "pyproject.toml"
workflow_dispatch: workflow_dispatch:
inputs: inputs:
job: test:
description: "Select job to run" description: "Run tests"
required: true default: true
type: choice type: boolean
options: publish_to_pypi:
- Build-and-publish-to-Test-PyPI description: "Publish to PyPI"
- test-install default: false
- publish-to-pypi type: boolean
jobs: jobs:
Build-and-publish-to-Test-PyPI: Build-and-publish-to-Test-PyPI:
if: github.event_name != 'workflow_dispatch' || github.event.inputs.test == 'true'
runs-on: ubuntu-latest runs-on: ubuntu-latest
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v4
@@ -32,7 +37,7 @@ jobs:
repository_name: "scraibe" repository_name: "scraibe"
repository_url: "https://test.pypi.org/legacy/" repository_url: "https://test.pypi.org/legacy/"
test-install: Test-PyPi-install:
name: Test Installation from TestPyPI name: Test Installation from TestPyPI
needs: Build-and-publish-to-Test-PyPI needs: Build-and-publish-to-Test-PyPI
runs-on: ubuntu-latest runs-on: ubuntu-latest
@@ -54,21 +59,19 @@ jobs:
publish-to-pypi: publish-to-pypi:
name: Publish to PyPI name: Publish to PyPI
needs: test-install needs: Test-PyPi-install
runs-on: ubuntu-latest runs-on: ubuntu-latest
if: |
always() &&
(( needs.Build-and-publish-to-Test-PyPI.result != 'failure' &&
needs.Test-PyPi-install.result != 'failure' ) ||
((github.event_name == 'workflow_dispatch' &&
github.event.inputs.publish_to_pypi == 'true')))
steps: steps:
- name: Checkout Repository Tags
uses: actions/checkout@v4
if: github.ref == 'refs/heads/main'
with:
fetch-depth: '0'
branch: 'main'
- name: Checkout Repository (Develop) - name: Checkout Repository (Develop)
uses: actions/checkout@v4 uses: actions/checkout@v4
if: github.ref == 'refs/heads/develop'
with: with:
fetch-depth: '0' fetch-depth: '0'
branch: 'develop'
- name: Set up Poetry 📦 - name: Set up Poetry 📦
uses: JRubics/poetry-publish@v1.16 uses: JRubics/poetry-publish@v1.16
with: with:
+8 -1
View File
@@ -2,7 +2,14 @@ name: Run Tests
on: on:
pull_request: pull_request:
branches: ['main', 'develop'] branches:
- main
- develop
paths:
- scraibe/**
- pyproject.toml
- requirements.txt
- test/**
workflow_dispatch: workflow_dispatch:
jobs: jobs:
+72
View File
@@ -0,0 +1,72 @@
name: release
on:
push:
tags:
- 'v*.*.*'
jobs:
build-on-workflow:
runs-on: ubuntu-latest
if: |
github.event_name == 'workflow_run' &&
github.event.workflow_run.conclusion == 'success'
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0 # Ensure all history is fetched
ref: main
- name: Get Latest Tag
id: get-latest-tag
if:
run: |
git fetch --tags
latest_tag=$(git describe --tags `git rev-list --tags --max-count=1`)
echo "latest_tag=$latest_tag" >> $GITHUB_OUTPUT
- name: Release from Workflow Run
if: |
github.event_name == 'workflow_run' &&
github.event.workflow_run.conclusion == 'success'
uses: softprops/action-gh-release@v2
with:
generate_release_notes: true
append_body: true
tag_name: ${{ steps.get-latest-tag.outputs.latest_tag }}
build-on-tag:
runs-on: ubuntu-latest
if: startsWith(github.ref, 'refs/tags/')
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0 # Ensure all history is fetched
ref: main
- name: Release from Tag Push
uses: softprops/action-gh-release@v2
with:
generate_release_notes: true
append_body: true
write_changelog:
runs-on: ubuntu-latest
needs: [build-on-workflow, build-on-tag]
if: |
always() &&
(needs.build-on-workflow.result == 'success' || needs.build-on-tag.result == 'success' )
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0 # Ensure all history is fetched
ref: main
- name: Write CHANGELOG.md
uses: rhysd/changelog-from-release/action@v3
with:
file: CHANGELOG.md
github_token: ${{ secrets.GITHUB_TOKEN }}
+4 -2
View File
@@ -1,9 +1,11 @@
name: Ruff name: Ruff
on: push on:
push:
paths:
- '**.py'
jobs: jobs:
ruff: ruff:
runs-on: ubuntu-latest runs-on: ubuntu-latest
if: ${{ github.event_name == 'pull_request' || (github.event_name == 'push') }}
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v4
- uses: chartboost/ruff-action@v1 - uses: chartboost/ruff-action@v1
-111
View File
@@ -1,111 +0,0 @@
name: Semantic Versioning for Tags
on:
pull_request:
types: [closed]
branches:
- main
jobs:
bump-version:
if: ${{ github.event.pull_request.merged == true && github.event.pull_request.base.ref == 'main' }}
runs-on: ubuntu-latest
steps:
- name: Checkout Repository
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Check if Source Branch is docs
id: check_docs_branch
run: |
pr_head_ref="${{ github.event.pull_request.head.ref }}"
if [[ "$pr_head_ref" == "docs" ]]; then
echo "is_docs_branch=true" >> $GITHUB_ENV
echo "This is a docs branch merge. Exiting without creating a tag."
exit 0
else
echo "is_docs_branch=false" >> $GITHUB_ENV
fi
- name: Bump Version and Tag
if: env.is_docs_branch != 'true'
id: bump_version
env:
GITHUB_TOKEN: ${{ secrets.GH_TOKEN }}
run: |
# Fetch the latest tags from the remote
git fetch --tags
# Get the latest tag, or initialize to v0.0.0 if no tags are found
latest_tag=$(git describe --tags `git rev-list --tags --max-count=1` 2>/dev/null || echo "v0.0.0")
# Extract version from PR title or body
pr_body="${{ github.event.pull_request.body }}"
pr_title="${{ github.event.pull_request.title }}"
version_regex="v([0-9]+)\.([0-9]+)\.([0-9]+)"
if [[ $pr_body =~ $version_regex ]]; then
major=${BASH_REMATCH[1]}
minor=${BASH_REMATCH[2]}
patch=${BASH_REMATCH[3]}
new_tag="v$major.$minor.$patch"
elif [[ $pr_title =~ $version_regex ]]; then
major=${BASH_REMATCH[1]}
minor=${BASH_REMATCH[2]}
patch=${BASH_REMATCH[3]}
new_tag="v$major.$minor.$patch"
else
# Split the latest tag into parts
IFS='.' read -r -a parts <<< "${latest_tag#v}"
major=${parts[0]}
minor=${parts[1]}
patch=${parts[2]}
patch=$((patch + 1))
new_tag="v$major.$minor.$patch"
fi
echo "Bumping version from $latest_tag to $new_tag"
# Set the new tag as an environment variable
echo "new_tag=$new_tag" >> $GITHUB_ENV
# Tag the new version
git tag $new_tag
# Configure GitHub token authentication
git remote set-url origin https://x-access-token:${{ secrets.GH_TOKEN }}@github.com/${{ github.repository }}.git
# Push the new tag to the remote repository
git push origin $new_tag
- name: Extract Release Notes
if: env.is_docs_branch != 'true'
id: extract_notes
run: |
version="${{ env.new_tag }}"
clean_version="${version#v}"
release_notes=$(awk -v version="$clean_version" '
BEGIN { flag=0 }
# Start flagging when the version section is found
/^## \[.*\]/ {
if (flag) exit # Exit when the next section starts
}
/^## \['"$clean_version"'\]/ { flag=1; next } # Start printing after the header
flag { print } # Print lines while flag is 1
' CHANGELOG.md)
echo "RELEASE_NOTES<<EOF" >> $GITHUB_ENV
echo "$release_notes" >> $GITHUB_ENV
echo "EOF" >> $GITHUB_ENV
- name: Create Release
if: env.is_docs_branch != 'true'
uses: actions/create-release@v1
env:
GITHUB_TOKEN: ${{ secrets.GH_TOKEN }}
with:
tag_name: ${{ env.new_tag }}
release_name: Release ${{ env.new_tag }}
body: ${{ env.RELEASE_NOTES }}
draft: false
prerelease: false
+1 -1
View File
@@ -31,7 +31,7 @@ exclude =[
] ]
[tool.poetry.dependencies] [tool.poetry.dependencies]
python = "^3.9" python = "^3.9"
tqdm = "^4.66.4" tqdm = "^4.66.5"
numpy = "^1.26.4" numpy = "^1.26.4"
openai-whisper = "^20231117" openai-whisper = "^20231117"
faster-whisper = "^1.0.3" faster-whisper = "^1.0.3"
+1 -1
View File
@@ -1,4 +1,4 @@
tqdm>=4.65.0 tqdm>=4.66.5
numpy>=1.26.4 numpy>=1.26.4
openai-whisper==20231117 openai-whisper==20231117
+9 -2
View File
@@ -79,6 +79,8 @@ def cli():
choices=sorted( choices=sorted(
LANGUAGES.keys()) + sorted([k.title() for k in TO_LANGUAGE_CODE.keys()]), LANGUAGES.keys()) + sorted([k.title() for k in TO_LANGUAGE_CODE.keys()]),
help="Language spoken in the audio. Specify None to perform language detection.") help="Language spoken in the audio. Specify None to perform language detection.")
parser.add_argument("--num-speakers", type=int, default=2,
help="Number of speakers in the audio.")
args = parser.parse_args() args = parser.parse_args()
@@ -117,8 +119,13 @@ def cli():
else: else:
task = "transcribe" task = "transcribe"
out = model.autotranscribe(audio, task=task, language=arg_dict.pop( out = model.autotranscribe(
"language"), verbose=arg_dict.pop("verbose_output")) audio,
task=task,
language=arg_dict.pop("language"),
verbose=arg_dict.pop("verbose_output"),
num_speakers=arg_dict.pop("num_speakers")
)
basename = audio.split("/")[-1].split(".")[0] basename = audio.split("/")[-1].split(".")[0]
print(f'Saving {basename}.{out_format} to {out_folder}') print(f'Saving {basename}.{out_format} to {out_folder}')
out.save(os.path.join( out.save(os.path.join(
+4 -4
View File
@@ -1,6 +1,5 @@
import os import os
import yaml import yaml
from pyannote.audio.core.model import CACHE_DIR as PYANNOTE_CACHE_DIR
from argparse import Action from argparse import Action
from ast import literal_eval from ast import literal_eval
@@ -8,9 +7,10 @@ CACHE_DIR = os.getenv(
"AUTOT_CACHE", "AUTOT_CACHE",
os.path.expanduser("~/.cache/torch/models"), os.path.expanduser("~/.cache/torch/models"),
) )
os.getenv(
if CACHE_DIR != PYANNOTE_CACHE_DIR: "PYANNOTE_CACHE",
os.environ["PYANNOTE_CACHE"] = os.path.join(CACHE_DIR, "pyannote") os.path.join(CACHE_DIR, "pyannote"),
)
WHISPER_DEFAULT_PATH = os.path.join(CACHE_DIR, "whisper") WHISPER_DEFAULT_PATH = os.path.join(CACHE_DIR, "whisper")
PYANNOTE_DEFAULT_PATH = os.path.join(CACHE_DIR, "pyannote") PYANNOTE_DEFAULT_PATH = os.path.join(CACHE_DIR, "pyannote")