Reference 📚
Wav2Rec
core
special
engine
Recommender
Wav2Rec
Waveform recommendation & matching engine.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_path |
Path |
path to (training) checkpoint for |
required |
distance_metric |
str |
distance metric to use for nearest neighbours search |
required |
normalize |
bool |
if |
required |
similarity |
callable |
a callable which accepts two 1D arrays
and returns a float. Must be compiled with |
required |
batch_size |
int |
number of audio files to send to the Wav2Rec neural network model for projection simultaneously. |
required |
num_workers |
int |
number of subprocesses to use when loading data from the
dataset. See |
required |
pin_memory |
bool |
copy tensors to CUDA memory before the data loader returns them. |
required |
prefetch_factor |
int |
Number of samples to load in advance of each worker.
See |
required |
device |
torch.device |
device to run the model on.
If |
required |
verbose |
bool |
if |
required |
**kwargs |
Keyword Arguments |
Keyword arguments to pass to |
required |
Warnings
- By default, this class uses
distance_metric='euclidean'
andnormalize=True
. These settings have been purposefully chosen so that the distances computed for nearest neighbours search accord with the default similarity metric used: cosine similarity. (The euclidean distance between L2 normalized vectors is an effective proxy of cosine similarity, see reference below.)
References
- https://en.wikipedia.org/wiki/Cosine_similarity
fit(self, dataset)
Fit the recommender to a dataset.
Fitting is composed of three steps:
1. Iterating over all files in the dataset
2. Computing `Wav2RecNet`` projections for each file
3. Fitting the nearest neighbours algorithm against the projections
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
Wav2RecDataset |
a dataset to fit against. |
required |
Returns:
Type | Description |
---|---|
Wav2Rec |
Wav2Rec |
Source code in wav2rec/core/engine.py
def fit(self, dataset: Wav2RecDataset) -> Wav2Rec:
"""Fit the recommender to a dataset.
Fitting is composed of three steps:
1. Iterating over all files in the dataset
2. Computing `Wav2RecNet`` projections for each file
3. Fitting the nearest neighbours algorithm against the projections
Args:
dataset (Wav2RecDataset): a dataset to fit against.
Returns:
Wav2Rec
"""
all_paths, all_projections = list(), list()
with tqdm(desc="Fitting", disable=not self.verbose, total=len(dataset)) as pbar:
for paths, audio in self._dataset2loader(dataset):
all_paths.extend(paths)
all_projections.append(self.get_projection(audio))
pbar.update(len(audio))
self.paths = np.asarray(all_paths)
self._nneighbours.fit(np.concatenate(all_projections))
self.fitted = True
return self
get_projection(self, x)
Get the model's projection of a waveform x
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
Union[torch.Tensor, np.ndarray] |
a 1D array or tensor with shape |
required |
Returns:
Type | Description |
---|---|
np.ndarray |
proj (np.ndarray): a projection of |
Source code in wav2rec/core/engine.py
def get_projection(self, x: Union[torch.Tensor, np.ndarray]) -> np.ndarray:
"""Get the model's projection of a waveform ``x``.
Args:
x (np.ndarray, torch.Tensor): a 1D array or tensor with shape ``[FEATURES]``
or a 2D array or tensor with shape ``[BATCH, FEATURES]``.
Returns:
proj (np.ndarray): a projection of ``x``.
"""
with torch.inference_mode():
proj: np.ndarray = (
self.net(_standardize_input(x).to(self.device)).cpu().numpy()
)
return _l2_normalize(proj, axis=-1) if self.normalize else proj
similarity
Similarity
cosine_similarity(x1, x2)
Compute cosine similarity between two 1D arrays.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x1 |
ndarray |
a 1D array with shape |
required |
x2 |
ndarray |
a 1D array with shape |
required |
Returns:
Type | Description |
---|---|
float |
similarity (float): a similarity score on [0, 1]. |
Warning
x1
andx2
must be normalized.
Source code in wav2rec/core/similarity.py
@numba.jit(nopython=True)
def cosine_similarity(x1: np.ndarray, x2: np.ndarray) -> float:
"""Compute cosine similarity between two 1D arrays.
Args:
x1 (np.ndarray): a 1D array with shape ``[FEATURES]``
x2 (np.ndarray): a 1D array with shape ``[FEATURES]``
Returns:
similarity (float): a similarity score on [0, 1].
Warning:
* ``x1`` and ``x2`` must be normalized.
"""
return float(_clip(x1 @ x2, a_min=0, a_max=1))
similarity_calculator(X_query, X_neighbours, metric=CPUDispatcher(<function cosine_similarity at 0x7fb7569b0160>))
Compute the similarity of X_query
with all entries in X_neighbours
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X_query |
ndarray |
a query 2D array with shape |
required |
X_neighbours |
ndarray |
a reference 2D array with shape
|
required |
metric |
Callable[[numpy.ndarray, numpy.ndarray], float] |
a callable which accepts two 1D arrays
and returns a float. Must be compiled with |
CPUDispatcher(<function cosine_similarity at 0x7fb7569b0160>) |
Returns:
Type | Description |
---|---|
ndarray |
sims (np.ndarray): a 2D array of similarities with shape |
Source code in wav2rec/core/similarity.py
@numba.jit(nopython=True)
def similarity_calculator(
X_query: np.ndarray,
X_neighbours: np.ndarray,
metric: Callable[[np.ndarray, np.ndarray], float] = cosine_similarity,
) -> np.ndarray:
"""Compute the similarity of ``X_query`` with all entries in ``X_neighbours``.
Args:
X_query (np.ndarray): a query 2D array with shape ``[N_QUERIES, FEATURES]``
X_neighbours (np.ndarray): a reference 2D array with shape
``[N_QUERIES, N_NEIGHBOURS, FEATURES]``
metric (callable): a callable which accepts two 1D arrays
and returns a float. Must be compiled with ``numba.jit(nopython=True)``.
Returns:
sims (np.ndarray): a 2D array of similarities with shape ``[N_QUERIES, N_NEIGHBOURS]``.
"""
n_queries = X_query.shape[0]
n_neighbours = X_neighbours.shape[1]
sims = np.zeros((n_queries, n_neighbours), dtype=X_neighbours.dtype)
for i in range(n_queries):
for j in range(n_neighbours):
sims[i, j] = metric(X_query[i], X_neighbours[i, j])
return sims
data
special
Data
dataset
Dataset
Wav2RecDataset
Base Wav2Rec Dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
audio_path |
Path |
path to a directory of caches of type |
required |
sr |
int |
sample rate to use for each track |
required |
offset |
int |
seconds to skip in each track |
required |
duration |
int |
the duration of each track to use. |
required |
ext |
str, tuple |
one or more file extensions in |
required |
res_type |
str |
resampling algorithm |
required |
zero_pad |
bool |
if |
required |
verbose |
bool |
if |
required |
n_features: int
property
readonly
Expected number of elements (samples) in each sample.
get_audio_files(self)
Generate an iterable of all eligible files in audio_path
.
Yields
path
Source code in wav2rec/data/dataset.py
def get_audio_files(self) -> Iterable[Path]:
"""Generate an iterable of all eligible files in ``audio_path``.
Yields:
path
"""
yield from tqdm(
self._audio_path_iter(),
desc="Scanning for Audio",
disable=not self.verbose,
total=sum(1 for _ in self._audio_path_iter()),
unit="file",
)
load_audio(self, path)
Load an audio file from path
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path |
Path |
a file path to a piece of audio |
required |
Returns:
Type | Description |
---|---|
torch.Tensor |
x (np.ndarray): a mono-signaled piece of audio. |
Source code in wav2rec/data/dataset.py
def load_audio(self, path: Path) -> torch.Tensor:
"""Load an audio file from ``path``.
Args:
path (Path): a file path to a piece of audio
Returns:
x (np.ndarray): a mono-signaled piece of audio.
"""
with warnings.catch_warnings():
warnings.filterwarnings("ignore", message="PySoundFile failed.*")
x, _ = load(
path=path,
sr=self.sr,
mono=True,
offset=self.offset,
duration=self.duration,
res_type=self.res_type,
)
if self.zero_pad:
x = zero_pad1d(x, target_length=self.n_features)
return torch.as_tensor(x)
scan(self)
Scan audio_path
for audio files.
Returns:
Type | Description |
---|---|
Wav2RecDataset |
Wav2RecDataset |
Source code in wav2rec/data/dataset.py
def scan(self) -> Wav2RecDataset:
"""Scan ``audio_path`` for audio files.
Returns:
Wav2RecDataset
"""
files = list(self.get_audio_files())
if files:
self.files = files
else:
raise OSError(f"No files found in '{str(self.audio_path)}'")
return self
transforms
Transforms
RandomNoise
Add random noise to a signal.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
alpha |
tuple |
a tuple to characterize a uniform distribution. Values drawn from this distribution will determine the weight given to the random noise. |
required |
**kwargs |
Keyword Args |
keyword arguments to pass to the parent class. |
required |
op(self, x)
Add random noise to x
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
Tensor |
a tensor to operate on |
required |
Returns:
Type | Description |
---|---|
Tensor |
x_fuzzed (torch.Tensor): x + noise. |
Source code in wav2rec/data/transforms.py
def op(self, x: torch.Tensor) -> torch.Tensor:
"""Add random noise to ``x``
Args:
x (torch.Tensor): a tensor to operate on
Returns:
x_fuzzed (torch.Tensor): x + noise.
"""
noise_weight = np.random.uniform(*self.alpha)
return x + torch.rand_like(x) * noise_weight
RandomOp
__init__(self, p)
special
Base class for randomly applying an operation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
p |
float |
probability of performing the transformation |
required |
Source code in wav2rec/data/transforms.py
def __init__(self, p: float) -> None:
"""Base class for randomly applying an operation.
Args:
p (float): probability of performing the transformation
"""
super().__init__()
self.p = p
forward(self, x)
Perform op()
on x
with probability p
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
Tensor |
tensor to operate on |
required |
Returns:
Type | Description |
---|---|
Tensor |
torch.Tensor |
Source code in wav2rec/data/transforms.py
def forward(self, x: torch.Tensor) -> torch.Tensor:
"""Perform ``op()`` on ``x`` with probability ``p``.
Args:
x (torch.Tensor): tensor to operate on
Returns:
torch.Tensor
"""
if np.random.uniform(0, 1) <= self.p:
return self.op(x)
else:
return x
op(self, x)
Operation to perform.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
Tensor |
tensor to operate on |
required |
Returns:
Type | Description |
---|---|
Tensor |
torch.Tensor |
Source code in wav2rec/data/transforms.py
def op(self, x: torch.Tensor) -> torch.Tensor:
"""Operation to perform.
Args:
x (torch.Tensor): tensor to operate on
Returns:
torch.Tensor
"""
raise NotImplementedError()
RandomReplaceMean
Randomly replace part of a tensor with its mean.
replacement(self, x, a, b)
Generate replacement (mean of each batch).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
Tensor |
tensor to operate on. Should be of the
form |
required |
a |
int |
start position in the tensor |
required |
b |
int |
end position in the tensor |
required |
Returns:
Type | Description |
---|---|
Union[float, torch.Tensor] |
torch.Tensor |
Source code in wav2rec/data/transforms.py
def replacement(
self,
x: torch.Tensor,
a: int,
b: int,
) -> Union[float, torch.Tensor]:
"""Generate replacement (mean of each batch).
Args:
x (torch.Tensor): tensor to operate on. Should be of the
form ``[BATCH, FEATURES]``.
a (float): start position in the tensor
b (float): end position in the tensor
Returns:
torch.Tensor
"""
return x.mean(dim=-1).repeat_interleave(b - a).view(-1, b - a)
RandomReplaceZero
Randomly replace part of a tensor with zero.
replacement(self, x, a, b)
Generate replacement (zero).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
Tensor |
tensor to operate on. Should be of the
form |
required |
a |
int |
start position in the tensor |
required |
b |
int |
end position in the tensor |
required |
Returns:
Type | Description |
---|---|
Union[float, torch.Tensor] |
torch.Tensor |
Source code in wav2rec/data/transforms.py
def replacement(
self,
x: torch.Tensor,
a: int,
b: int,
) -> Union[float, torch.Tensor]:
"""Generate replacement (zero).
Args:
x (torch.Tensor): tensor to operate on. Should be of the
form ``[BATCH, FEATURES]``.
a (float): start position in the tensor
b (float): end position in the tensor
Returns:
torch.Tensor
"""
return 0.0
Resize
Resize a tensor.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
size |
int, tuple |
one or more integers |
required |
mode |
str |
resizing algorithm to use |
required |
forward(self, x)
Resize x
to size
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
Tensor |
a tensor of the form |
required |
Returns:
Type | Description |
---|---|
Tensor |
x_resized (torch.Tensor): |
Source code in wav2rec/data/transforms.py
def forward(self, x: torch.Tensor) -> torch.Tensor:
"""Resize ``x`` to ``size``.
Args:
x (torch.Tensor): a tensor of the form ``[BATCH, ...]``.
Returns:
x_resized (torch.Tensor): ``x`` resized
"""
return F.interpolate(x, size=self.size, mode=self.mode)
nn
special
NN
audionets
Audio-Image Networks
AudioImageNetwork
Class of networks which handle 1D waveforms by making them image-like.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sr |
int |
sample rate of the audio files |
required |
n_mels |
int |
number of mel bands to construct for raw audio. |
required |
image_size |
int |
size to reshape the "images" (Melspectrograms) to. |
required |
**kwargs |
Keyword Args |
Keyword arguments to pass to |
required |
hidden_features: int
property
readonly
Number of features emitted by the network.
AudioResnet50
Resnet50-Based Audio network.
This network is designed to generate features against melspectrogram input, using a Resnet50 model as the encoder.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sr |
int |
sample rate of the audio files |
required |
n_mels |
int |
number of mel bands to construct for raw audio. |
required |
image_size |
int |
size to reshape the "images" (Melspectrograms) to. |
required |
**kwargs |
Keyword Arguments |
keyword arguments to pass to the parent class. |
required |
Notes
- Batches are normalized prior to being fed to the network in order to stabilize training.
hidden_features: int
property
readonly
Number of features emitted by the network.
forward(self, x)
Compute the forward pass of the network.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
Tensor |
an input tensor |
required |
Returns:
Type | Description |
---|---|
Tensor |
torch.Tensor |
Source code in wav2rec/nn/audionets.py
def forward(self, x: torch.Tensor) -> torch.Tensor:
"""Compute the forward pass of the network.
Args:
x (torch.Tensor): an input tensor
Returns:
torch.Tensor
"""
if x.ndim == 2: # assume waveforms
x = self.wav2spec(x)
return self.net(self.bn(x))
AudioVit
ViT-Based Audio network.
This network is designed to generate features against melspectrogram input, using a ViT model as the encoder.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sr |
int |
sample rate of the audio files |
required |
n_mels |
int |
number of mel bands to construct for raw audio. |
required |
image_size |
int |
size to reshape the "images" (Melspectrograms) to. |
required |
patch_size |
int |
size of each patch. Must be square. |
required |
dim |
int |
dimension of output following |
required |
depth |
int |
number of transformer blocks. |
required |
heads |
int |
number of multi-head Attention layers |
required |
mlp_dim |
int |
dimensions of the multi-layer perceptron (MLP) in the feed forward layer of the transformer(s). |
required |
dim_head |
int |
dimensions in the head of the attention block(s) |
required |
dropout |
float |
dropout rate to use. Must be on |
required |
emb_dropout |
float |
dropout of the embedding layer. Must be
on |
required |
**kwargs |
Keyword Arguments |
keyword arguments to pass to the parent class. |
required |
Notes
- Batches are normalized prior to being fed to the network in order to stabilize training.
hidden_features: int
property
readonly
Number of features emitted by the network.
forward(self, x)
Compute the forward pass of the network.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
Tensor |
an input tensor |
required |
Returns:
Type | Description |
---|---|
Tensor |
torch.Tensor |
Source code in wav2rec/nn/audionets.py
def forward(self, x: torch.Tensor) -> torch.Tensor:
"""Compute the forward pass of the network.
Args:
x (torch.Tensor): an input tensor
Returns:
torch.Tensor
"""
if x.ndim == 2: # assume waveforms
x = self.wav2spec(x)
return self.net(self.bn(x))
lightening
Lightening Model
Wav2RecNet
Unified (SimSam with Encoder) network.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
lr |
float |
learning rate for the model |
required |
encoder |
AudioImageNetwork |
a model which inherits from
|
required |
**kwargs |
Keyword Arguments |
Keyword arguments to pass to |
required |
configure_optimizers(self)
Choose what optimizers and learning-rate schedulers to use in your optimization. Normally you'd need one. But in the case of GANs or similar you might have multiple.
Returns:
Type | Description |
---|---|
Optimizer |
Any of these 6 options.
|
Note
The lr_dict is a dictionary which contains the scheduler and its associated configuration. The default configuration is shown below.
.. code-block:: python
lr_dict = {
'scheduler': lr_scheduler, # The LR scheduler instance (required)
# The unit of the scheduler's step size, could also be 'step'
'interval': 'epoch',
'frequency': 1, # The frequency of the scheduler
'monitor': 'val_loss', # Metric for `ReduceLROnPlateau` to monitor
'strict': True, # Whether to crash the training if `monitor` is not found
'name': None, # Custom name for `LearningRateMonitor` to use
}
Only the "scheduler"
key is required, the rest will be set to the defaults above.
Note
The frequency
value specified in a dict along with the optimizer
key is an int corresponding
to the number of sequential batches optimized with the specific optimizer.
It should be given to none or to all of the optimizers.
There is a difference between passing multiple optimizers in a list,
and passing multiple optimizers in dictionaries with a frequency of 1:
In the former case, all optimizers will operate on the given batch in each optimization step.
In the latter, only one optimizer will operate on the given batch at every step.
This is different from the frequency
value specified in the lr_dict mentioned below.
.. code-block:: python
def configure_optimizers(self):
optimizer_one = torch.optim.SGD(self.model.parameters(), lr=0.01)
optimizer_two = torch.optim.SGD(self.model.parameters(), lr=0.01)
return [
{'optimizer': optimizer_one, 'frequency': 5},
{'optimizer': optimizer_two, 'frequency': 10},
]
In this example, the first optimizer will be used for the first 5 steps,
the second optimizer for the next 10 steps and that cycle will continue.
If an LR scheduler is specified for an optimizer using the lr_scheduler
key in the above dict,
the scheduler will only be updated when its optimizer is being used.
Examples::
# most cases
def configure_optimizers(self):
return Adam(self.parameters(), lr=1e-3)
# multiple optimizer case (e.g.: GAN)
def configure_optimizers(self):
gen_opt = Adam(self.model_gen.parameters(), lr=0.01)
dis_opt = Adam(self.model_dis.parameters(), lr=0.02)
return gen_opt, dis_opt
# example with learning rate schedulers
def configure_optimizers(self):
gen_opt = Adam(self.model_gen.parameters(), lr=0.01)
dis_opt = Adam(self.model_dis.parameters(), lr=0.02)
dis_sch = CosineAnnealing(dis_opt, T_max=10)
return [gen_opt, dis_opt], [dis_sch]
# example with step-based learning rate schedulers
def configure_optimizers(self):
gen_opt = Adam(self.model_gen.parameters(), lr=0.01)
dis_opt = Adam(self.model_dis.parameters(), lr=0.02)
gen_sch = {'scheduler': ExponentialLR(gen_opt, 0.99),
'interval': 'step'} # called after each training step
dis_sch = CosineAnnealing(dis_opt, T_max=10) # called every epoch
return [gen_opt, dis_opt], [gen_sch, dis_sch]
# example with optimizer frequencies
# see training procedure in `Improved Training of Wasserstein GANs`, Algorithm 1
# https://arxiv.org/abs/1704.00028
def configure_optimizers(self):
gen_opt = Adam(self.model_gen.parameters(), lr=0.01)
dis_opt = Adam(self.model_dis.parameters(), lr=0.02)
n_critic = 5
return (
{'optimizer': dis_opt, 'frequency': n_critic},
{'optimizer': gen_opt, 'frequency': 1}
)
Note
Some things to know:
- Lightning calls
.backward()
and.step()
on each optimizer and learning rate scheduler as needed. - If you use 16-bit precision (
precision=16
), Lightning will automatically handle the optimizers. - If you use multiple optimizers, :meth:
training_step
will have an additionaloptimizer_idx
parameter. - If you use :class:
torch.optim.LBFGS
, Lightning handles the closure function automatically for you. - If you use multiple optimizers, gradients will be calculated only for the parameters of current optimizer at each training step.
- If you need to control how often those optimizers step or override the default
.step()
schedule, override the :meth:optimizer_step
hook.
Source code in wav2rec/nn/lightening.py
def configure_optimizers(self) -> torch.optim.Optimizer:
optimizer = torch.optim.Adam(self.parameters(), lr=self.lr)
return optimizer
forward(self, x)
Same as :meth:torch.nn.Module.forward()
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*args |
|
Whatever you decide to pass into the forward method. |
required |
**kwargs |
|
Keyword arguments are also possible. |
required |
Returns:
Type | Description |
---|---|
Tensor |
Your model's output |
Source code in wav2rec/nn/lightening.py
def forward(self, x: torch.Tensor) -> torch.Tensor:
return self.learner.wrapped_encoder(x)
training_step(self, batch, batch_idx)
Here you compute and return the training loss and some additional metrics for e.g. the progress bar or logger.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch |
Tuple[torch.Tensor, torch.Tensor] |
class: |
required |
batch_idx |
int |
Integer displaying index of this batch |
required |
optimizer_idx |
int |
When using multiple optimizers, this argument will also be present. |
required |
hiddens( |
|
class: |
required |
Returns:
Type | Description |
---|---|
Tensor |
Any of.
|
Note
Returning None
is currently not supported for multi-GPU or TPU, or with 16-bit precision enabled.
In this step you'd normally do the forward pass and calculate the loss for a batch. You can also do fancier things like multiple forward passes or something model specific.
Example::
def training_step(self, batch, batch_idx):
x, y, z = batch
out = self.encoder(x)
loss = self.loss(out, x)
return loss
If you define multiple optimizers, this step will be called with an additional
optimizer_idx
parameter.
.. code-block:: python
# Multiple optimizers (e.g.: GANs)
def training_step(self, batch, batch_idx, optimizer_idx):
if optimizer_idx == 0:
# do training_step with encoder
if optimizer_idx == 1:
# do training_step with decoder
If you add truncated back propagation through time you will also get an additional argument with the hidden states of the previous step.
.. code-block:: python
# Truncated back-propagation through time
def training_step(self, batch, batch_idx, hiddens):
# hiddens are the hidden states from the previous truncated backprop step
...
out, hiddens = self.lstm(data, hiddens)
...
return {'loss': loss, 'hiddens': hiddens}
Note
The loss value shown in the progress bar is smoothed (averaged) over the last values, so it differs from the actual loss returned in train/validation step.
Source code in wav2rec/nn/lightening.py
def training_step(
self,
batch: Tuple[torch.Tensor, torch.Tensor],
batch_idx: int,
) -> torch.Tensor:
_, x = batch
loss = self.learner(x)
self.log("loss", loss)
return loss
validation_step(self, batch, batch_idx)
Operates on a single batch of data from the validation set. In this step you'd might generate examples or calculate anything of interest like accuracy.
.. code-block:: python
# the pseudocode for these calls
val_outs = []
for val_batch in val_data:
out = validation_step(val_batch)
val_outs.append(out)
validation_epoch_end(val_outs)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch |
Tuple[torch.Tensor, torch.Tensor] |
class: |
required |
batch_idx |
int |
The index of this batch |
required |
dataloader_idx |
int |
The index of the dataloader that produced this batch (only if multiple val dataloaders used) |
required |
Returns:
Type | Description |
---|---|
Tensor |
Any of.
|
.. code-block:: python
# pseudocode of order
val_outs = []
for val_batch in val_data:
out = validation_step(val_batch)
if defined('validation_step_end'):
out = validation_step_end(out)
val_outs.append(out)
val_outs = validation_epoch_end(val_outs)
.. code-block:: python
# if you have one val dataloader:
def validation_step(self, batch, batch_idx)
# if you have multiple val dataloaders:
def validation_step(self, batch, batch_idx, dataloader_idx)
Examples::
# CASE 1: A single validation dataset
def validation_step(self, batch, batch_idx):
x, y = batch
# implement your own
out = self(x)
loss = self.loss(out, y)
# log 6 example images
# or generated text... or whatever
sample_imgs = x[:6]
grid = torchvision.utils.make_grid(sample_imgs)
self.logger.experiment.add_image('example_images', grid, 0)
# calculate acc
labels_hat = torch.argmax(out, dim=1)
val_acc = torch.sum(y == labels_hat).item() / (len(y) * 1.0)
# log the outputs!
self.log_dict({'val_loss': loss, 'val_acc': val_acc})
If you pass in multiple val dataloaders, :meth:validation_step
will have an additional argument.
.. code-block:: python
# CASE 2: multiple validation dataloaders
def validation_step(self, batch, batch_idx, dataloader_idx):
# dataloader_idx tells you which dataset this is.
Note
If you don't need to validate you don't need to implement this method.
Note
When the :meth:validation_step
is called, the model has been put in eval mode
and PyTorch gradients have been disabled. At the end of validation,
the model goes back to training mode and gradients are enabled.
Source code in wav2rec/nn/lightening.py
def validation_step(
self,
batch: Tuple[torch.Tensor, torch.Tensor],
batch_idx: int,
) -> torch.Tensor:
_, x = batch
loss = self.learner(x)
self.log("val_loss", loss, prog_bar=True)
return loss
simsam
SimSam Model
Notes
- Code adapted from https://github.com/lucidrains/byol-pytorch
References
- https://arxiv.org/abs/2006.07733
- https://arxiv.org/abs/2011.10566
- https://github.com/lucidrains/byol-pytorch
SimSam
Simple Siamese Neural Network for self-supervised representation learning.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
encoder |
AudioImageNetwork |
a model which inherits from |
required |
projection_size |
int |
dimensionality of vectors to be compared |
required |
projection_hidden_size |
int |
number of units in Multilayer Perceptron (MLP) networks |
required |
augment1 |
callable |
First augmentation (yields |
required |
augment2 |
callable |
Second augmentation (yield |
required |
References
- https://arxiv.org/abs/2006.07733
- https://arxiv.org/abs/2011.10566
- https://github.com/lucidrains/byol-pytorch
forward(self, x)
Compute the forward pass of the learner and the combined loss.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
Tensor |
a tensor of shape |
required |
Returns:
Type | Description |
---|---|
Tensor |
loss (torch.Tensor): combined, average loss of the operation |
Source code in wav2rec/nn/simsam.py
def forward(self, x: torch.Tensor) -> torch.Tensor:
"""Compute the forward pass of the learner and the
combined loss.
Args:
x (torch.Tensor): a tensor of shape ``[BATCH, ...]``
Returns:
loss (torch.Tensor): combined, average loss of the operation
"""
x1, x2 = self.augment1(x), self.augment2(x)
online_pred_1 = self.predictor(self.wrapped_encoder(x1))
online_pred_2 = self.predictor(self.wrapped_encoder(x2))
with torch.no_grad():
target_proj_1 = self.wrapped_encoder(x1).detach_()
target_proj_2 = self.wrapped_encoder(x2).detach_()
loss_1 = _loss_fn(online_pred_1, target_proj_2)
loss_2 = _loss_fn(online_pred_2, target_proj_1)
loss = loss_1 + loss_2
return loss.mean()
signal
special
dsp
Digital Signal Processing
MelSpectrogram
Layer to compute the melspectrogram of a 1D audio waveform.
This layer leverages the convolutional-based torchlibrosa
library to compute the melspectrogram of an audio waveform.
The computation can be performed efficiently on a GPU.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sr |
int |
sample rate of audio |
required |
n_fft |
int |
FFT window size |
required |
win_length |
int |
length of the FFT window function |
required |
hop_length |
int |
number of samples between frames |
required |
f_min |
float |
lowest frequency (Hz) |
required |
f_max |
float |
highest frequency (Hz) |
required |
n_mels |
int |
number of mel bands to create |
required |
window |
str |
window function to use. |
required |
power |
float |
exponent for the mel spectrogram. |
required |
center |
bool |
if True, center the input signal |
required |
pad_mode |
str |
padding to use at the edges of the signal.
(Note: this only applies if |
required |
as_db |
bool |
if |
required |
ref |
float, str |
the reference point to use when converting
to decibels. If a |
required |
amin |
float |
minimum threshold when converting to decibels.
(Note: this only applies if |
required |
top_db |
float |
the maximum threshold value to use when converting
to decibels. (Note: this only applies if |
required |
normalize_db |
bool |
if |
required |
forward(self, x)
Compute the melspectrogram of x
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
Tensor |
2D tensor with shape |
required |
Returns:
Type | Description |
---|---|
Tensor |
melspec (torch.Tensor): 3D tensor with shape |
Source code in wav2rec/signal/dsp.py
def forward(self, x: torch.Tensor) -> torch.Tensor:
"""Compute the melspectrogram of ``x``.
Args:
x (torch.Tensor): 2D tensor with shape ``[BATCH, TIME]``
Returns:
melspec (torch.Tensor): 3D tensor with shape ``[BATCH, CHANNEL, TIME, N_MELS]``.
"""
S = self.meltransform(x)
if self.as_db and self.normalize_db:
return self._normalize_db(self._power_to_db(S))
elif self.as_db:
return self._power_to_db(S)
else:
return S