Skip to content

Reference 📚

Wav2Rec

core special

engine

Recommender

Wav2Rec

Waveform recommendation & matching engine.

Parameters:

Name Type Description Default
model_path Path

path to (training) checkpoint for Wav2RecNet

required
distance_metric str

distance metric to use for nearest neighbours search

required
normalize bool

if True perform L2 normalization on all projections

required
similarity callable

a callable which accepts two 1D arrays and returns a float. Must be compiled with numba.jit(nopython=True). If None distances will be returned instead (see distance_metric).

required
batch_size int

number of audio files to send to the Wav2Rec neural network model for projection simultaneously.

required
num_workers int

number of subprocesses to use when loading data from the dataset. See torch.utils.data.dataloader.DataLoader.

required
pin_memory bool

copy tensors to CUDA memory before the data loader returns them.

required
prefetch_factor int

Number of samples to load in advance of each worker. See torch.utils.data.dataloader.DataLoader.

required
device torch.device

device to run the model on. If None, the device will be selected automatically.

required
verbose bool

if True display a progress bar while fitting.

required
**kwargs Keyword Arguments

Keyword arguments to pass to NearestNeighbors.

required

Warnings

  • By default, this class uses distance_metric='euclidean' and normalize=True. These settings have been purposefully chosen so that the distances computed for nearest neighbours search accord with the default similarity metric used: cosine similarity. (The euclidean distance between L2 normalized vectors is an effective proxy of cosine similarity, see reference below.)

References

  • https://en.wikipedia.org/wiki/Cosine_similarity
fit(self, dataset)

Fit the recommender to a dataset.

Fitting is composed of three steps:

1. Iterating over all files in the dataset
2. Computing `Wav2RecNet`` projections for each file
3. Fitting the nearest neighbours algorithm against the projections

Parameters:

Name Type Description Default
dataset Wav2RecDataset

a dataset to fit against.

required

Returns:

Type Description
Wav2Rec

Wav2Rec

Source code in wav2rec/core/engine.py
def fit(self, dataset: Wav2RecDataset) -> Wav2Rec:
    """Fit the recommender to a dataset.

    Fitting is composed of three steps:

        1. Iterating over all files in the dataset
        2. Computing `Wav2RecNet`` projections for each file
        3. Fitting the nearest neighbours algorithm against the projections

    Args:
        dataset (Wav2RecDataset): a dataset to fit against.

    Returns:
        Wav2Rec

    """
    all_paths, all_projections = list(), list()
    with tqdm(desc="Fitting", disable=not self.verbose, total=len(dataset)) as pbar:
        for paths, audio in self._dataset2loader(dataset):
            all_paths.extend(paths)
            all_projections.append(self.get_projection(audio))
            pbar.update(len(audio))

    self.paths = np.asarray(all_paths)
    self._nneighbours.fit(np.concatenate(all_projections))
    self.fitted = True
    return self
get_projection(self, x)

Get the model's projection of a waveform x.

Parameters:

Name Type Description Default
x Union[torch.Tensor, np.ndarray]

a 1D array or tensor with shape [FEATURES] or a 2D array or tensor with shape [BATCH, FEATURES].

required

Returns:

Type Description
np.ndarray

proj (np.ndarray): a projection of x.

Source code in wav2rec/core/engine.py
def get_projection(self, x: Union[torch.Tensor, np.ndarray]) -> np.ndarray:
    """Get the model's projection of a waveform ``x``.

    Args:
        x (np.ndarray, torch.Tensor): a 1D array or tensor with shape ``[FEATURES]``
            or a 2D array or tensor with shape ``[BATCH, FEATURES]``.

    Returns:
        proj (np.ndarray): a projection of ``x``.

    """
    with torch.inference_mode():
        proj: np.ndarray = (
            self.net(_standardize_input(x).to(self.device)).cpu().numpy()
        )
    return _l2_normalize(proj, axis=-1) if self.normalize else proj

similarity

Similarity

cosine_similarity(x1, x2)

Compute cosine similarity between two 1D arrays.

Parameters:

Name Type Description Default
x1 ndarray

a 1D array with shape [FEATURES]

required
x2 ndarray

a 1D array with shape [FEATURES]

required

Returns:

Type Description
float

similarity (float): a similarity score on [0, 1].

Warning

  • x1 and x2 must be normalized.
Source code in wav2rec/core/similarity.py
@numba.jit(nopython=True)
def cosine_similarity(x1: np.ndarray, x2: np.ndarray) -> float:
    """Compute cosine similarity between two 1D arrays.

    Args:
        x1 (np.ndarray): a 1D array with shape ``[FEATURES]``
        x2 (np.ndarray): a 1D array with shape ``[FEATURES]``

    Returns:
        similarity (float): a similarity score on [0, 1].

    Warning:
        * ``x1`` and ``x2`` must be normalized.

    """
    return float(_clip(x1 @ x2, a_min=0, a_max=1))

similarity_calculator(X_query, X_neighbours, metric=CPUDispatcher(<function cosine_similarity at 0x7fb7569b0160>))

Compute the similarity of X_query with all entries in X_neighbours.

Parameters:

Name Type Description Default
X_query ndarray

a query 2D array with shape [N_QUERIES, FEATURES]

required
X_neighbours ndarray

a reference 2D array with shape [N_QUERIES, N_NEIGHBOURS, FEATURES]

required
metric Callable[[numpy.ndarray, numpy.ndarray], float]

a callable which accepts two 1D arrays and returns a float. Must be compiled with numba.jit(nopython=True).

CPUDispatcher(<function cosine_similarity at 0x7fb7569b0160>)

Returns:

Type Description
ndarray

sims (np.ndarray): a 2D array of similarities with shape [N_QUERIES, N_NEIGHBOURS].

Source code in wav2rec/core/similarity.py
@numba.jit(nopython=True)
def similarity_calculator(
    X_query: np.ndarray,
    X_neighbours: np.ndarray,
    metric: Callable[[np.ndarray, np.ndarray], float] = cosine_similarity,
) -> np.ndarray:
    """Compute the similarity of ``X_query`` with all entries in ``X_neighbours``.

    Args:
        X_query (np.ndarray): a query 2D array with shape ``[N_QUERIES, FEATURES]``
        X_neighbours (np.ndarray): a reference 2D array with shape
            ``[N_QUERIES, N_NEIGHBOURS, FEATURES]``
        metric (callable): a callable which accepts two 1D arrays
            and returns a float. Must be compiled with ``numba.jit(nopython=True)``.

    Returns:
        sims (np.ndarray): a 2D array of similarities with shape ``[N_QUERIES, N_NEIGHBOURS]``.

    """
    n_queries = X_query.shape[0]
    n_neighbours = X_neighbours.shape[1]

    sims = np.zeros((n_queries, n_neighbours), dtype=X_neighbours.dtype)
    for i in range(n_queries):
        for j in range(n_neighbours):
            sims[i, j] = metric(X_query[i], X_neighbours[i, j])
    return sims

data special

Data

dataset

Dataset

Wav2RecDataset

Base Wav2Rec Dataset.

Parameters:

Name Type Description Default
audio_path Path

path to a directory of caches of type ext

required
sr int

sample rate to use for each track

required
offset int

seconds to skip in each track

required
duration int

the duration of each track to use.

required
ext str, tuple

one or more file extensions in audio_path to filter for

required
res_type str

resampling algorithm

required
zero_pad bool

if True, automatically zero pad waveforms shorter than n_features.

required
verbose bool

if True display progress bars

required
n_features: int property readonly

Expected number of elements (samples) in each sample.

get_audio_files(self)

Generate an iterable of all eligible files in audio_path.

Yields

path

Source code in wav2rec/data/dataset.py
def get_audio_files(self) -> Iterable[Path]:
    """Generate an iterable of all eligible files in ``audio_path``.

    Yields:
        path

    """
    yield from tqdm(
        self._audio_path_iter(),
        desc="Scanning for Audio",
        disable=not self.verbose,
        total=sum(1 for _ in self._audio_path_iter()),
        unit="file",
    )
load_audio(self, path)

Load an audio file from path.

Parameters:

Name Type Description Default
path Path

a file path to a piece of audio

required

Returns:

Type Description
torch.Tensor

x (np.ndarray): a mono-signaled piece of audio.

Source code in wav2rec/data/dataset.py
def load_audio(self, path: Path) -> torch.Tensor:
    """Load an audio file from ``path``.

    Args:
        path (Path): a file path to a piece of audio

    Returns:
        x (np.ndarray): a mono-signaled piece of audio.

    """
    with warnings.catch_warnings():
        warnings.filterwarnings("ignore", message="PySoundFile failed.*")
        x, _ = load(
            path=path,
            sr=self.sr,
            mono=True,
            offset=self.offset,
            duration=self.duration,
            res_type=self.res_type,
        )
    if self.zero_pad:
        x = zero_pad1d(x, target_length=self.n_features)
    return torch.as_tensor(x)
scan(self)

Scan audio_path for audio files.

Returns:

Type Description
Wav2RecDataset

Wav2RecDataset

Source code in wav2rec/data/dataset.py
def scan(self) -> Wav2RecDataset:
    """Scan ``audio_path`` for audio files.

    Returns:
        Wav2RecDataset

    """
    files = list(self.get_audio_files())
    if files:
        self.files = files
    else:
        raise OSError(f"No files found in '{str(self.audio_path)}'")
    return self

transforms

Transforms

RandomNoise

Add random noise to a signal.

Parameters:

Name Type Description Default
alpha tuple

a tuple to characterize a uniform distribution. Values drawn from this distribution will determine the weight given to the random noise.

required
**kwargs Keyword Args

keyword arguments to pass to the parent class.

required
op(self, x)

Add random noise to x

Parameters:

Name Type Description Default
x Tensor

a tensor to operate on

required

Returns:

Type Description
Tensor

x_fuzzed (torch.Tensor): x + noise.

Source code in wav2rec/data/transforms.py
def op(self, x: torch.Tensor) -> torch.Tensor:
    """Add random noise to ``x``

    Args:
        x (torch.Tensor): a tensor to operate on

    Returns:
        x_fuzzed (torch.Tensor): x + noise.

    """
    noise_weight = np.random.uniform(*self.alpha)
    return x + torch.rand_like(x) * noise_weight

RandomOp

__init__(self, p) special

Base class for randomly applying an operation.

Parameters:

Name Type Description Default
p float

probability of performing the transformation

required
Source code in wav2rec/data/transforms.py
def __init__(self, p: float) -> None:
    """Base class for randomly applying an operation.

    Args:
        p (float): probability of performing the transformation

    """
    super().__init__()
    self.p = p
forward(self, x)

Perform op() on x with probability p.

Parameters:

Name Type Description Default
x Tensor

tensor to operate on

required

Returns:

Type Description
Tensor

torch.Tensor

Source code in wav2rec/data/transforms.py
def forward(self, x: torch.Tensor) -> torch.Tensor:
    """Perform ``op()`` on ``x`` with probability ``p``.

    Args:
        x (torch.Tensor): tensor to operate on

    Returns:
        torch.Tensor

    """
    if np.random.uniform(0, 1) <= self.p:
        return self.op(x)
    else:
        return x
op(self, x)

Operation to perform.

Parameters:

Name Type Description Default
x Tensor

tensor to operate on

required

Returns:

Type Description
Tensor

torch.Tensor

Source code in wav2rec/data/transforms.py
def op(self, x: torch.Tensor) -> torch.Tensor:
    """Operation to perform.

    Args:
        x (torch.Tensor): tensor to operate on

    Returns:
        torch.Tensor

    """
    raise NotImplementedError()

RandomReplaceMean

Randomly replace part of a tensor with its mean.

replacement(self, x, a, b)

Generate replacement (mean of each batch).

Parameters:

Name Type Description Default
x Tensor

tensor to operate on. Should be of the form [BATCH, FEATURES].

required
a int

start position in the tensor

required
b int

end position in the tensor

required

Returns:

Type Description
Union[float, torch.Tensor]

torch.Tensor

Source code in wav2rec/data/transforms.py
def replacement(
    self,
    x: torch.Tensor,
    a: int,
    b: int,
) -> Union[float, torch.Tensor]:
    """Generate replacement (mean of each batch).

    Args:
        x (torch.Tensor): tensor to operate on. Should be of the
            form ``[BATCH, FEATURES]``.
        a (float): start position in the tensor
        b (float): end position in the tensor

    Returns:
        torch.Tensor

    """
    return x.mean(dim=-1).repeat_interleave(b - a).view(-1, b - a)

RandomReplaceZero

Randomly replace part of a tensor with zero.

replacement(self, x, a, b)

Generate replacement (zero).

Parameters:

Name Type Description Default
x Tensor

tensor to operate on. Should be of the form [BATCH, FEATURES].

required
a int

start position in the tensor

required
b int

end position in the tensor

required

Returns:

Type Description
Union[float, torch.Tensor]

torch.Tensor

Source code in wav2rec/data/transforms.py
def replacement(
    self,
    x: torch.Tensor,
    a: int,
    b: int,
) -> Union[float, torch.Tensor]:
    """Generate replacement (zero).

    Args:
        x (torch.Tensor): tensor to operate on. Should be of the
            form ``[BATCH, FEATURES]``.
        a (float): start position in the tensor
        b (float): end position in the tensor

    Returns:
        torch.Tensor

    """
    return 0.0

Resize

Resize a tensor.

Parameters:

Name Type Description Default
size int, tuple

one or more integers

required
mode str

resizing algorithm to use

required
forward(self, x)

Resize x to size.

Parameters:

Name Type Description Default
x Tensor

a tensor of the form [BATCH, ...].

required

Returns:

Type Description
Tensor

x_resized (torch.Tensor): x resized

Source code in wav2rec/data/transforms.py
def forward(self, x: torch.Tensor) -> torch.Tensor:
    """Resize ``x`` to ``size``.

    Args:
        x (torch.Tensor): a tensor of the form ``[BATCH, ...]``.

    Returns:
        x_resized (torch.Tensor): ``x`` resized

    """
    return F.interpolate(x, size=self.size, mode=self.mode)

nn special

NN

audionets

Audio-Image Networks

AudioImageNetwork

Class of networks which handle 1D waveforms by making them image-like.

Parameters:

Name Type Description Default
sr int

sample rate of the audio files

required
n_mels int

number of mel bands to construct for raw audio.

required
image_size int

size to reshape the "images" (Melspectrograms) to.

required
**kwargs Keyword Args

Keyword arguments to pass to MelSpectrogram()

required
hidden_features: int property readonly

Number of features emitted by the network.

AudioResnet50

Resnet50-Based Audio network.

This network is designed to generate features against melspectrogram input, using a Resnet50 model as the encoder.

Parameters:

Name Type Description Default
sr int

sample rate of the audio files

required
n_mels int

number of mel bands to construct for raw audio.

required
image_size int

size to reshape the "images" (Melspectrograms) to.

required
**kwargs Keyword Arguments

keyword arguments to pass to the parent class.

required

Notes

  • Batches are normalized prior to being fed to the network in order to stabilize training.
hidden_features: int property readonly

Number of features emitted by the network.

forward(self, x)

Compute the forward pass of the network.

Parameters:

Name Type Description Default
x Tensor

an input tensor

required

Returns:

Type Description
Tensor

torch.Tensor

Source code in wav2rec/nn/audionets.py
def forward(self, x: torch.Tensor) -> torch.Tensor:
    """Compute the forward pass of the network.

    Args:
        x (torch.Tensor): an input tensor

    Returns:
        torch.Tensor

    """
    if x.ndim == 2:  # assume waveforms
        x = self.wav2spec(x)
    return self.net(self.bn(x))

AudioVit

ViT-Based Audio network.

This network is designed to generate features against melspectrogram input, using a ViT model as the encoder.

Parameters:

Name Type Description Default
sr int

sample rate of the audio files

required
n_mels int

number of mel bands to construct for raw audio.

required
image_size int

size to reshape the "images" (Melspectrograms) to.

required
patch_size int

size of each patch. Must be square.

required
dim int

dimension of output following nn.Linear()

required
depth int

number of transformer blocks.

required
heads int

number of multi-head Attention layers

required
mlp_dim int

dimensions of the multi-layer perceptron (MLP) in the feed forward layer of the transformer(s).

required
dim_head int

dimensions in the head of the attention block(s)

required
dropout float

dropout rate to use. Must be on [0, 1].

required
emb_dropout float

dropout of the embedding layer. Must be on [0, 1].

required
**kwargs Keyword Arguments

keyword arguments to pass to the parent class.

required

Notes

  • Batches are normalized prior to being fed to the network in order to stabilize training.
hidden_features: int property readonly

Number of features emitted by the network.

forward(self, x)

Compute the forward pass of the network.

Parameters:

Name Type Description Default
x Tensor

an input tensor

required

Returns:

Type Description
Tensor

torch.Tensor

Source code in wav2rec/nn/audionets.py
def forward(self, x: torch.Tensor) -> torch.Tensor:
    """Compute the forward pass of the network.

    Args:
        x (torch.Tensor): an input tensor

    Returns:
        torch.Tensor

    """
    if x.ndim == 2:  # assume waveforms
        x = self.wav2spec(x)
    return self.net(self.bn(x))

lightening

Lightening Model

Wav2RecNet

Unified (SimSam with Encoder) network.

Parameters:

Name Type Description Default
lr float

learning rate for the model

required
encoder AudioImageNetwork

a model which inherits from AudioImageNetwork, to be used as the encoder in SimSam. If None, AudioResnet50 will be used.

required
**kwargs Keyword Arguments

Keyword arguments to pass to SimSam.

required
configure_optimizers(self)

Choose what optimizers and learning-rate schedulers to use in your optimization. Normally you'd need one. But in the case of GANs or similar you might have multiple.

Returns:

Type Description
Optimizer

Any of these 6 options.

  • Single optimizer.
  • List or Tuple of optimizers.
  • Two lists - The first list has multiple optimizers, and the second has multiple LR schedulers (or multiple lr_dict).
  • Dictionary, with an "optimizer" key, and (optionally) a "lr_scheduler" key whose value is a single LR scheduler or lr_dict.
  • Tuple of dictionaries as described above, with an optional "frequency" key.
  • None - Fit will run without any optimizer.

Note

The lr_dict is a dictionary which contains the scheduler and its associated configuration. The default configuration is shown below.

.. code-block:: python

lr_dict = {
    'scheduler': lr_scheduler, # The LR scheduler instance (required)
    # The unit of the scheduler's step size, could also be 'step'
    'interval': 'epoch',
    'frequency': 1, # The frequency of the scheduler
    'monitor': 'val_loss', # Metric for `ReduceLROnPlateau` to monitor
    'strict': True, # Whether to crash the training if `monitor` is not found
    'name': None, # Custom name for `LearningRateMonitor` to use
}

Only the "scheduler" key is required, the rest will be set to the defaults above.

Note

The frequency value specified in a dict along with the optimizer key is an int corresponding to the number of sequential batches optimized with the specific optimizer. It should be given to none or to all of the optimizers. There is a difference between passing multiple optimizers in a list, and passing multiple optimizers in dictionaries with a frequency of 1: In the former case, all optimizers will operate on the given batch in each optimization step. In the latter, only one optimizer will operate on the given batch at every step. This is different from the frequency value specified in the lr_dict mentioned below.

.. code-block:: python

def configure_optimizers(self):
    optimizer_one = torch.optim.SGD(self.model.parameters(), lr=0.01)
    optimizer_two = torch.optim.SGD(self.model.parameters(), lr=0.01)
    return [
        {'optimizer': optimizer_one, 'frequency': 5},
        {'optimizer': optimizer_two, 'frequency': 10},
    ]

In this example, the first optimizer will be used for the first 5 steps, the second optimizer for the next 10 steps and that cycle will continue. If an LR scheduler is specified for an optimizer using the lr_scheduler key in the above dict, the scheduler will only be updated when its optimizer is being used.

Examples::

# most cases
def configure_optimizers(self):
    return Adam(self.parameters(), lr=1e-3)

# multiple optimizer case (e.g.: GAN)
def configure_optimizers(self):
    gen_opt = Adam(self.model_gen.parameters(), lr=0.01)
    dis_opt = Adam(self.model_dis.parameters(), lr=0.02)
    return gen_opt, dis_opt

# example with learning rate schedulers
def configure_optimizers(self):
    gen_opt = Adam(self.model_gen.parameters(), lr=0.01)
    dis_opt = Adam(self.model_dis.parameters(), lr=0.02)
    dis_sch = CosineAnnealing(dis_opt, T_max=10)
    return [gen_opt, dis_opt], [dis_sch]

# example with step-based learning rate schedulers
def configure_optimizers(self):
    gen_opt = Adam(self.model_gen.parameters(), lr=0.01)
    dis_opt = Adam(self.model_dis.parameters(), lr=0.02)
    gen_sch = {'scheduler': ExponentialLR(gen_opt, 0.99),
               'interval': 'step'}  # called after each training step
    dis_sch = CosineAnnealing(dis_opt, T_max=10) # called every epoch
    return [gen_opt, dis_opt], [gen_sch, dis_sch]

# example with optimizer frequencies
# see training procedure in `Improved Training of Wasserstein GANs`, Algorithm 1
# https://arxiv.org/abs/1704.00028
def configure_optimizers(self):
    gen_opt = Adam(self.model_gen.parameters(), lr=0.01)
    dis_opt = Adam(self.model_dis.parameters(), lr=0.02)
    n_critic = 5
    return (
        {'optimizer': dis_opt, 'frequency': n_critic},
        {'optimizer': gen_opt, 'frequency': 1}
    )

Note

Some things to know:

  • Lightning calls .backward() and .step() on each optimizer and learning rate scheduler as needed.
  • If you use 16-bit precision (precision=16), Lightning will automatically handle the optimizers.
  • If you use multiple optimizers, :meth:training_step will have an additional optimizer_idx parameter.
  • If you use :class:torch.optim.LBFGS, Lightning handles the closure function automatically for you.
  • If you use multiple optimizers, gradients will be calculated only for the parameters of current optimizer at each training step.
  • If you need to control how often those optimizers step or override the default .step() schedule, override the :meth:optimizer_step hook.
Source code in wav2rec/nn/lightening.py
def configure_optimizers(self) -> torch.optim.Optimizer:
    optimizer = torch.optim.Adam(self.parameters(), lr=self.lr)
    return optimizer
forward(self, x)

Same as :meth:torch.nn.Module.forward().

Parameters:

Name Type Description Default
*args

Whatever you decide to pass into the forward method.

required
**kwargs

Keyword arguments are also possible.

required

Returns:

Type Description
Tensor

Your model's output

Source code in wav2rec/nn/lightening.py
def forward(self, x: torch.Tensor) -> torch.Tensor:
    return self.learner.wrapped_encoder(x)
training_step(self, batch, batch_idx)

Here you compute and return the training loss and some additional metrics for e.g. the progress bar or logger.

Parameters:

Name Type Description Default
batch Tuple[torch.Tensor, torch.Tensor]

class:~torch.Tensor | (:class:~torch.Tensor, ...) | [:class:~torch.Tensor, ...]): The output of your :class:~torch.utils.data.DataLoader. A tensor, tuple or list.

required
batch_idx int

Integer displaying index of this batch

required
optimizer_idx int

When using multiple optimizers, this argument will also be present.

required
hiddens(

class:~torch.Tensor): Passed in if :paramref:~pytorch_lightning.core.lightning.LightningModule.truncated_bptt_steps > 0.

required

Returns:

Type Description
Tensor

Any of.

  • :class:~torch.Tensor - The loss tensor
  • dict - A dictionary. Can include any keys, but must include the key 'loss'
  • None - Training will skip to the next batch

Note

Returning None is currently not supported for multi-GPU or TPU, or with 16-bit precision enabled.

In this step you'd normally do the forward pass and calculate the loss for a batch. You can also do fancier things like multiple forward passes or something model specific.

Example::

def training_step(self, batch, batch_idx):
    x, y, z = batch
    out = self.encoder(x)
    loss = self.loss(out, x)
    return loss

If you define multiple optimizers, this step will be called with an additional optimizer_idx parameter.

.. code-block:: python

# Multiple optimizers (e.g.: GANs)
def training_step(self, batch, batch_idx, optimizer_idx):
    if optimizer_idx == 0:
        # do training_step with encoder
    if optimizer_idx == 1:
        # do training_step with decoder

If you add truncated back propagation through time you will also get an additional argument with the hidden states of the previous step.

.. code-block:: python

# Truncated back-propagation through time
def training_step(self, batch, batch_idx, hiddens):
    # hiddens are the hidden states from the previous truncated backprop step
    ...
    out, hiddens = self.lstm(data, hiddens)
    ...
    return {'loss': loss, 'hiddens': hiddens}

Note

The loss value shown in the progress bar is smoothed (averaged) over the last values, so it differs from the actual loss returned in train/validation step.

Source code in wav2rec/nn/lightening.py
def training_step(
    self,
    batch: Tuple[torch.Tensor, torch.Tensor],
    batch_idx: int,
) -> torch.Tensor:
    _, x = batch
    loss = self.learner(x)
    self.log("loss", loss)
    return loss
validation_step(self, batch, batch_idx)

Operates on a single batch of data from the validation set. In this step you'd might generate examples or calculate anything of interest like accuracy.

.. code-block:: python

# the pseudocode for these calls
val_outs = []
for val_batch in val_data:
    out = validation_step(val_batch)
    val_outs.append(out)
validation_epoch_end(val_outs)

Parameters:

Name Type Description Default
batch Tuple[torch.Tensor, torch.Tensor]

class:~torch.Tensor | (:class:~torch.Tensor, ...) | [:class:~torch.Tensor, ...]): The output of your :class:~torch.utils.data.DataLoader. A tensor, tuple or list.

required
batch_idx int

The index of this batch

required
dataloader_idx int

The index of the dataloader that produced this batch (only if multiple val dataloaders used)

required

Returns:

Type Description
Tensor

Any of.

  • Any object or value
  • None - Validation will skip to the next batch

.. code-block:: python

# pseudocode of order
val_outs = []
for val_batch in val_data:
    out = validation_step(val_batch)
    if defined('validation_step_end'):
        out = validation_step_end(out)
    val_outs.append(out)
val_outs = validation_epoch_end(val_outs)

.. code-block:: python

# if you have one val dataloader:
def validation_step(self, batch, batch_idx)

# if you have multiple val dataloaders:
def validation_step(self, batch, batch_idx, dataloader_idx)

Examples::

# CASE 1: A single validation dataset
def validation_step(self, batch, batch_idx):
    x, y = batch

    # implement your own
    out = self(x)
    loss = self.loss(out, y)

    # log 6 example images
    # or generated text... or whatever
    sample_imgs = x[:6]
    grid = torchvision.utils.make_grid(sample_imgs)
    self.logger.experiment.add_image('example_images', grid, 0)

    # calculate acc
    labels_hat = torch.argmax(out, dim=1)
    val_acc = torch.sum(y == labels_hat).item() / (len(y) * 1.0)

    # log the outputs!
    self.log_dict({'val_loss': loss, 'val_acc': val_acc})

If you pass in multiple val dataloaders, :meth:validation_step will have an additional argument.

.. code-block:: python

# CASE 2: multiple validation dataloaders
def validation_step(self, batch, batch_idx, dataloader_idx):
    # dataloader_idx tells you which dataset this is.

Note

If you don't need to validate you don't need to implement this method.

Note

When the :meth:validation_step is called, the model has been put in eval mode and PyTorch gradients have been disabled. At the end of validation, the model goes back to training mode and gradients are enabled.

Source code in wav2rec/nn/lightening.py
def validation_step(
    self,
    batch: Tuple[torch.Tensor, torch.Tensor],
    batch_idx: int,
) -> torch.Tensor:
    _, x = batch
    loss = self.learner(x)
    self.log("val_loss", loss, prog_bar=True)
    return loss

simsam

SimSam Model

Notes

  • Code adapted from https://github.com/lucidrains/byol-pytorch

References

  • https://arxiv.org/abs/2006.07733
  • https://arxiv.org/abs/2011.10566
  • https://github.com/lucidrains/byol-pytorch

SimSam

Simple Siamese Neural Network for self-supervised representation learning.

Parameters:

Name Type Description Default
encoder AudioImageNetwork

a model which inherits from AudioImageNetwork,

required
projection_size int

dimensionality of vectors to be compared

required
projection_hidden_size int

number of units in Multilayer Perceptron (MLP) networks

required
augment1 callable

First augmentation (yields x1). IfNone``, the default augmentation will be used.

required
augment2 callable

Second augmentation (yield x2). If None, augment1 will be used.

required

References

  • https://arxiv.org/abs/2006.07733
  • https://arxiv.org/abs/2011.10566
  • https://github.com/lucidrains/byol-pytorch
forward(self, x)

Compute the forward pass of the learner and the combined loss.

Parameters:

Name Type Description Default
x Tensor

a tensor of shape [BATCH, ...]

required

Returns:

Type Description
Tensor

loss (torch.Tensor): combined, average loss of the operation

Source code in wav2rec/nn/simsam.py
def forward(self, x: torch.Tensor) -> torch.Tensor:
    """Compute the forward pass of the learner and the
    combined loss.

    Args:
        x (torch.Tensor): a tensor of shape ``[BATCH, ...]``

    Returns:
        loss (torch.Tensor): combined, average loss of the operation

    """
    x1, x2 = self.augment1(x), self.augment2(x)

    online_pred_1 = self.predictor(self.wrapped_encoder(x1))
    online_pred_2 = self.predictor(self.wrapped_encoder(x2))

    with torch.no_grad():
        target_proj_1 = self.wrapped_encoder(x1).detach_()
        target_proj_2 = self.wrapped_encoder(x2).detach_()

    loss_1 = _loss_fn(online_pred_1, target_proj_2)
    loss_2 = _loss_fn(online_pred_2, target_proj_1)

    loss = loss_1 + loss_2
    return loss.mean()

signal special

dsp

Digital Signal Processing

MelSpectrogram

Layer to compute the melspectrogram of a 1D audio waveform.

This layer leverages the convolutional-based torchlibrosa library to compute the melspectrogram of an audio waveform. The computation can be performed efficiently on a GPU.

Parameters:

Name Type Description Default
sr int

sample rate of audio

required
n_fft int

FFT window size

required
win_length int

length of the FFT window function

required
hop_length int

number of samples between frames

required
f_min float

lowest frequency (Hz)

required
f_max float

highest frequency (Hz)

required
n_mels int

number of mel bands to create

required
window str

window function to use.

required
power float

exponent for the mel spectrogram.

required
center bool

if True, center the input signal

required
pad_mode str

padding to use at the edges of the signal. (Note: this only applies if center=True)

required
as_db bool

if True, convert the output from amplitude to decibels.

required
ref float, str

the reference point to use when converting to decibels. If a float, the reference point will be used 'as is'. If a string, must be 'max' (computed and applied individually for each waveform in the batch). (Note: this only applies if as_db=True.)

required
amin float

minimum threshold when converting to decibels. (Note: this only applies if as_db=True.)

required
top_db float

the maximum threshold value to use when converting to decibels. (Note: this only applies if as_db=True.)

required
normalize_db bool

if True, normalize the final output s.t. it is on [0, 1]. (Note: requires as_db=True).

required
forward(self, x)

Compute the melspectrogram of x.

Parameters:

Name Type Description Default
x Tensor

2D tensor with shape [BATCH, TIME]

required

Returns:

Type Description
Tensor

melspec (torch.Tensor): 3D tensor with shape [BATCH, CHANNEL, TIME, N_MELS].

Source code in wav2rec/signal/dsp.py
def forward(self, x: torch.Tensor) -> torch.Tensor:
    """Compute the melspectrogram of ``x``.

    Args:
        x (torch.Tensor): 2D tensor with shape ``[BATCH, TIME]``

    Returns:
        melspec (torch.Tensor): 3D tensor with shape ``[BATCH, CHANNEL, TIME, N_MELS]``.

    """
    S = self.meltransform(x)
    if self.as_db and self.normalize_db:
        return self._normalize_db(self._power_to_db(S))
    elif self.as_db:
        return self._power_to_db(S)
    else:
        return S