Logo na Zephyrnet

Sanya samfurin hugging Fuskar (PyAnnote) mai magana akan Amazon SageMaker azaman madaidaicin ƙarshen ƙarshen | Ayyukan Yanar Gizo na Amazon

kwanan wata:

Diarization na lasifika, muhimmin tsari a cikin nazarin sauti, ya raba fayil ɗin mai jiwuwa bisa asalin lasifikar. Wannan sakon yana zurfafa cikin haɗa PyAnnote na Fuskar Hugging don bugun lasifikar da SageMaker na Amazon asynchronous karshen maki.

Muna ba da cikakken jagora kan yadda ake tura sassan magana da ɗimbin mafita ta amfani da SageMaker akan AWS Cloud. Kuna iya amfani da wannan mafita don aikace-aikacen da ke mu'amala da rikodin sauti mai yawan magana (sama da 100).

Bayanin bayani

Rubutun Amazon shine sabis na tafi-zuwa don diarization na lasifikar a cikin AWS. Koyaya, don harsuna marasa tallafi, zaku iya amfani da wasu samfura (a cikin yanayinmu, PyAnnote) waɗanda za'a tura su cikin SageMaker don fa'ida. Don gajerun fayilolin mai jiwuwa inda bayanin zai ɗauki daƙiƙa 60, zaku iya amfani da su ainihin-lokaci inference. Fiye da daƙiƙa 60, asynchronous ya kamata a yi amfani da hankali. Ƙarin fa'idar fa'ida ta asynchronous ita ce tanadin farashi ta hanyar ƙididdige ƙidaya ta atomatik zuwa sifili lokacin da babu buƙatun aiwatarwa.

Fuskar Hugging shahararriyar cibiyar buɗe ido ce don ƙirar injin koyo (ML). AWS da Fuskar Hugging suna da a cinikayya wanda ke ba da damar haɗin kai maras kyau ta hanyar SageMaker tare da saiti na AWS Deep Learning Containers (DLCs) don horo da ƙwarewa a cikin PyTorch ko TensorFlow, da Hugging Face estimators da tsinkaya don SageMaker Python SDK. Siffofin SageMaker da iyawa suna taimaka wa masu haɓakawa da masana kimiyyar bayanai su fara da sarrafa harshe na halitta (NLP) akan AWS cikin sauƙi.

Haɗin kai don wannan mafita ya ƙunshi amfani da samfurin Hugging Face da aka riga aka horar da ƙirar magana ta amfani da Laburaren PyAnnote. PyAnnote buɗaɗɗen kayan aikin kayan aiki ne da aka rubuta a cikin Python don bugun magana. Wannan samfurin, wanda aka horar akan samfurin bayanan sauti, yana ba da damar rarrabuwar magana mai inganci a cikin fayilolin odiyo. Ana tura samfurin akan SageMaker azaman saitin ƙarshen ƙarshen daidaitacce, yana ba da ingantaccen aiki da daidaitawa na ayyukan diarization.

Zane mai zuwa yana kwatanta tsarin gine-ginen mafita.Maganin gini

Don wannan sakon, muna amfani da fayil mai jiwuwa mai zuwa.

Fayilolin mai jiwuwa na sitiriyo ko tashoshi da yawa ana haɗe su ta atomatik zuwa ɗaya ta hanyar matsakaicin tashoshi. Fayilolin odiyo da aka yi samfuri a farashi daban-daban ana sake yin su zuwa 16kHz ta atomatik lokacin lodawa.

abubuwan da ake bukata

Cika abubuwan da ake bukata:

  1. Ƙirƙiri yankin SageMaker.
  2. Tabbatar da ku Gano AWS da Gudanar da Samun Dama (IAM) mai amfani yana da izinin samun dama don ƙirƙirar a Matsayin SageMaker.
  3. Tabbatar cewa asusun AWS yana da kewayon sabis don ɗaukar nauyin ƙarshen SageMaker don misalin ml.g5.2xlarge.

Ƙirƙiri aikin ƙira don samun dama ga diarization lasifikar PyAnnote daga Hugging Face

Kuna iya amfani da Cibiyar Hugging Face don samun damar samun horon da ake so PyAnnote lasifikar diarization model. Kuna amfani da rubutun iri ɗaya don zazzage fayil ɗin samfurin lokacin ƙirƙirar ƙarshen SageMaker.

rungume fuska

Duba lambar mai zuwa:

from PyAnnote.audio import Pipeline

def model_fn(model_dir):
# Load the model from the specified model directory
model = Pipeline.from_pretrained(
"PyAnnote/speaker-diarization-3.1",
use_auth_token="Replace-with-the-Hugging-face-auth-token")
return model

Kunna lambar ƙirar

Shirya mahimman fayiloli kamar inference.py, wanda ya ƙunshi lambar ba da amsa:

%%writefile model/code/inference.py
from PyAnnote.audio import Pipeline
import subprocess
import boto3
from urllib.parse import urlparse
import pandas as pd
from io import StringIO
import os
import torch

def model_fn(model_dir):
    # Load the model from the specified model directory
    model = Pipeline.from_pretrained(
        "PyAnnote/speaker-diarization-3.1",
        use_auth_token="hf_oBxxxxxxxxxxxx)
    return model 


def diarization_from_s3(model, s3_file, language=None):
    s3 = boto3.client("s3")
    o = urlparse(s3_file, allow_fragments=False)
    bucket = o.netloc
    key = o.path.lstrip("/")
    s3.download_file(bucket, key, "tmp.wav")
    result = model("tmp.wav")
    data = {} 
    for turn, _, speaker in result.itertracks(yield_label=True):
        data[turn] = (turn.start, turn.end, speaker)
    data_df = pd.DataFrame(data.values(), columns=["start", "end", "speaker"])
    print(data_df.shape)
    result = data_df.to_json(orient="split")
    return result


def predict_fn(data, model):
    s3_file = data.pop("s3_file")
    language = data.pop("language", None)
    result = diarization_from_s3(model, s3_file, language)
    return {
        "diarization_from_s3": result
    }

Shirya a requirements.txt fayil, wanda ya ƙunshi dakunan karatu na Python da ake buƙata don gudanar da bincike:

with open("model/code/requirements.txt", "w") as f:
    f.write("transformers==4.25.1n")
    f.write("boto3n")
    f.write("PyAnnote.audion")
    f.write("soundfilen")
    f.write("librosan")
    f.write("onnxruntimen")
    f.write("wgetn")
    f.write("pandas")

A ƙarshe, damfara da inference.py da bukatun.txt fayiloli kuma ajiye shi azaman model.tar.gz:

!tar zcvf model.tar.gz *

Sanya samfurin SageMaker

Ƙayyade albarkatun samfurin SageMaker ta hanyar tantance hoton URI, wurin bayanan samfurin a ciki Sabis na Sauƙi na Amazon (S3), da rawar SageMaker:

import sagemaker
import boto3

sess = sagemaker.Session()

sagemaker_session_bucket = None
if sagemaker_session_bucket is None and sess is not None:
    sagemaker_session_bucket = sess.default_bucket()

try:
    role = sagemaker.get_execution_role()
except ValueError:
    iam = boto3.client("iam")
    role = iam.get_role(RoleName="sagemaker_execution_role")["Role"]["Arn"]

sess = sagemaker.Session(default_bucket=sagemaker_session_bucket)

print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {sess.default_bucket()}")
print(f"sagemaker session region: {sess.boto_region_name}")

Loda samfurin zuwa Amazon S3

Loda fayil ɗin samfurin Fuskar PyAnnote Hugging zuwa guga S3:

s3_location = f"s3://{sagemaker_session_bucket}/whisper/model/model.tar.gz"
!aws s3 cp model.tar.gz $s3_location

Ƙirƙiri wurin ƙarshe na SageMaker asynchronous

Shirya wurin ƙarshe na asynchronous don tura samfurin akan SageMaker ta amfani da ƙayyadaddun ƙayyadaddun ƙayyadaddun ƙima:

from sagemaker.huggingface.model import HuggingFaceModel
from sagemaker.async_inference.async_inference_config import AsyncInferenceConfig
from sagemaker.s3 import s3_path_join
from sagemaker.utils import name_from_base

async_endpoint_name = name_from_base("custom-asyc")

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    model_data=s3_location,  # path to your model and script
    role=role,  # iam role with permissions to create an Endpoint
    transformers_version="4.17",  # transformers version used
    pytorch_version="1.10",  # pytorch version used
    py_version="py38",  # python version used
)

# create async endpoint configuration
async_config = AsyncInferenceConfig(
    output_path=s3_path_join(
        "s3://", sagemaker_session_bucket, "async_inference/output"
    ),  # Where our results will be stored
    # Add nofitication SNS if needed
    notification_config={
        # "SuccessTopic": "PUT YOUR SUCCESS SNS TOPIC ARN",
        # "ErrorTopic": "PUT YOUR ERROR SNS TOPIC ARN",
    },  #  Notification configuration
)

env = {"MODEL_SERVER_WORKERS": "2"}

# deploy the endpoint endpoint
async_predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.xx",
    async_inference_config=async_config,
    endpoint_name=async_endpoint_name,
    env=env,
)

Gwada ƙarshen ƙarshen

Ƙimar aikin ƙarshen ƙarshen ta hanyar aika fayil mai jiwuwa don diarization da dawo da fitowar JSON da aka adana a ƙayyadadden hanyar fitarwa na S3:

# Replace with a path to audio object in S3
from sagemaker.async_inference import WaiterConfig
res = async_predictor.predict_async(data=data)
print(f"Response output path: {res.output_path}")
print("Start Polling to get response:")

config = WaiterConfig(
  max_attempts=10, #  number of attempts
  delay=10#  time in seconds to wait between attempts
  )
res.get_result(config)
#import waiterconfig

Don ƙaddamar da wannan bayani a sikelin, muna ba da shawarar amfani AWS Lambda, Sabis ɗin Sanarwa na Sauƙi na Amazon (Amazon SNS), ko Sabis ɗin Sauki mai Sauƙi na Amazon (Amazon SQS). An tsara waɗannan ayyukan don haɓakawa, abubuwan gine-ginen da ke gudana, da ingantaccen amfani da albarkatu. Za su iya taimakawa wajen ɓata tsarin ba da izini na asynchronous daga sarrafa sakamako, yana ba ku damar daidaita kowane sashi da kansa da kuma sarrafa fashewar buƙatun yadda ya kamata.

results

Ana adana fitar samfurin a s3://sagemaker-xxxx /async_inference/output/. Fitowar ta nuna cewa an raba rikodin sautin zuwa ginshiƙai uku:

  • Fara (lokacin farawa cikin daƙiƙa)
  • Ƙare (ƙarshen lokacin cikin daƙiƙa)
  • Mai magana (lambar magana)

Lambar mai zuwa tana nuna misalin sakamakon mu:

[0.9762308998, 8.9049235993, "SPEAKER_01"]

[9.533106961, 12.1646859083, "SPEAKER_01"]

[13.1324278438, 13.9303904924, "SPEAKER_00"]

[14.3548387097, 26.1884550085, "SPEAKER_00"]

[27.2410865874, 28.2258064516, "SPEAKER_01"]

[28.3446519525, 31.298811545, "SPEAKER_01"]

Tsaftacewa

Kuna iya saita manufofin ƙira zuwa sifili ta saita MinCapacity zuwa 0; asynchronous inference yana ba ku damar ma'auni ta atomatik zuwa sifili ba tare da buƙatun ba. Ba kwa buƙatar share ƙarshen ƙarshen, shi Sikeli daga sifili lokacin da ake buƙata kuma, rage farashin lokacin da ba a amfani da shi. Duba lambar mai zuwa:

# Common class representing application autoscaling for SageMaker 
client = boto3.client('application-autoscaling') 

# This is the format in which application autoscaling references the endpoint
resource_id='endpoint/' + <endpoint_name> + '/variant/' + <'variant1'> 

# Define and register your endpoint variant
response = client.register_scalable_target(
    ServiceNamespace='sagemaker', 
    ResourceId=resource_id,
    ScalableDimension='sagemaker:variant:DesiredInstanceCount', # The number of EC2 instances for your Amazon SageMaker model endpoint variant.
    MinCapacity=0,
    MaxCapacity=5
)

Idan kana son share wurin ƙarshe, yi amfani da lambar mai zuwa:

async_predictor.delete_endpoint(async_endpoint_name)

Fa'idodin tura ƙarshen asynchronous

Wannan maganin yana ba da fa'idodi masu zuwa:

  • Maganin zai iya sarrafa fayilolin mai jiwuwa da yawa ko manyan.
  • Wannan misali yana amfani da misali guda ɗaya don nunawa. Idan kuna son amfani da wannan mafita don ɗaruruwan ko dubban bidiyoyi kuma kuyi amfani da ƙarshen ƙarshen asynchronous don aiwatarwa cikin yanayi da yawa, zaku iya amfani da tsarin sikelin atomatik, wanda aka tsara don babban adadin takardun tushe. Sikelin atomatik yana daidaita adadin lokuta da aka tanadar don ƙira don amsa canje-canje a cikin aikin ku.
  • Maganin yana inganta kayan aiki kuma yana rage nauyin tsarin ta hanyar raba ayyuka masu tsawo daga ainihin lokaci.

Kammalawa

A cikin wannan sakon, mun samar da madaidaiciyar hanya don tura samfurin hugging Face's lasifikar magana akan SageMaker ta amfani da rubutun Python. Yin amfani da ƙarshen ƙarshen asynchronous yana ba da ingantacciyar hanya da ma'auni don sadar da tsinkayar tsinkaya azaman sabis, karɓar buƙatun lokaci guda ba tare da matsala ba.

Fara yau tare da asynchronous lasifikar don ayyukan sautin ku. Tuntuɓi a cikin sharhin idan kuna da wasu tambayoyi game da samun ƙarshen diarization ɗin ku asynchronous ƙarshe da aiki.


Game da Authors

Sanjay Tiwary Shin ƙwararrun masanin ƙwararrun masanan caca Ai / ml ne ke kashe lokacinsa aiki tare da abokan cinikin da zasu ayyana takamaiman kayan amfani, da kuma yin amfani da Ai / ml da sabis ɗin AI / ml, da kuma masu yawansu. Ya taimaka ƙaddamar da ƙaddamar da AI / ML mai amfani da sabis na Amazon SageMaker kuma ya aiwatar da hujjoji da yawa na ra'ayi ta amfani da sabis na Amazon AI. Har ila yau, ya haɓaka dandalin nazari na ci gaba a matsayin wani ɓangare na tafiyar canjin dijital.

Kiran Challapalli babban mai haɓaka kasuwancin fasaha ne tare da sashin jama'a na AWS. Yana da fiye da shekaru 8 na gwaninta a AI / ML da shekaru 23 na ci gaban software da ƙwarewar tallace-tallace. Kiran yana taimaka wa kamfanoni na jama'a a duk faɗin Indiya bincika da haɓaka hanyoyin samar da tushen girgije waɗanda ke amfani da AI, ML, da AI na haɓaka - gami da manyan ƙirar harshe — fasaha.

tabs_img

Sabbin Hankali

tabs_img