From the course: Hands-On PyTorch Machine Learning

Unlock the full course today

Join today to access over 24,900 courses taught by industry experts.

Torchaudio introduction

Torchaudio introduction

- [Instructor] TorchAudio is a library for audio and signal processing with PyTorch. It provides IO, signal, and data processing functions, datasets, model implementations, and application components. TorchAudio offers a set of APIs, including backend, functional, transforms, datasets, models, pipelines, sox_effects, compliance.kaldi, kaldi_io, and utils. Similar to TorchVision, TorchAudio also provides a number of popular datasets out of the box. Examples include CMUDict, CMU pronouncing dictionary, Common Voice, GTZAN, which is music genre classification of audio signals, speech commands, and VCTK, which is speech data uttered by 110 English speakers with various accents. Details of these datasets can be found at PyTorch dataset documentation. Audio I/O package allow you to query audio file metadata, loading audio data into a tensor, and saving audio to files. Audio resampling. To resample an audio wave form…

Contents