Reference 📚

Wav2Rec

`core` `special`

`engine`

Recommender

`Wav2Rec`

Waveform recommendation & matching engine.

Parameters:

Name	Type	Description	Default
`model_path`	`Path`	path to (training) checkpoint for `Wav2RecNet`	required
`distance_metric`	`str`	distance metric to use for nearest neighbours search	required
`normalize`	`bool`	if `True` perform L2 normalization on all projections	required
`similarity`	`callable`	a callable which accepts two 1D arrays and returns a float. Must be compiled with `numba.jit(nopython=True)`. If `None` distances will be returned instead (see `distance_metric`).	required
`batch_size`	`int`	number of audio files to send to the Wav2Rec neural network model for projection simultaneously.	required
`num_workers`	`int`	number of subprocesses to use when loading data from the dataset. See `torch.utils.data.dataloader.DataLoader`.	required
`pin_memory`	`bool`	copy tensors to CUDA memory before the data loader returns them.	required
`prefetch_factor`	`int`	Number of samples to load in advance of each worker. See `torch.utils.data.dataloader.DataLoader`.	required
`device`	`torch.device`	device to run the model on. If `None`, the device will be selected automatically.	required
`verbose`	`bool`	if `True` display a progress bar while fitting.	required
`**kwargs`	`Keyword Arguments`	Keyword arguments to pass to `NearestNeighbors`.	required

Warnings

By default, this class uses distance_metric='euclidean' and normalize=True. These settings have been purposefully chosen so that the distances computed for nearest neighbours search accord with the default similarity metric used: cosine similarity. (The euclidean distance between L2 normalized vectors is an effective proxy of cosine similarity, see reference below.)

References

https://en.wikipedia.org/wiki/Cosine_similarity

`fit(self, dataset)`

Fit the recommender to a dataset.

Fitting is composed of three steps:

1. Iterating over all files in the dataset
2. Computing `Wav2RecNet`` projections for each file
3. Fitting the nearest neighbours algorithm against the projections

Parameters:

Name	Type	Description	Default
`dataset`	`Wav2RecDataset`	a dataset to fit against.	required

Returns:

Type	Description
`Wav2Rec`	Wav2Rec

Source code in wav2rec/core/engine.py

def fit(self, dataset: Wav2RecDataset) -> Wav2Rec:
    """Fit the recommender to a dataset.

    Fitting is composed of three steps:

        1. Iterating over all files in the dataset
        2. Computing `Wav2RecNet`` projections for each file
        3. Fitting the nearest neighbours algorithm against the projections

    Args:
        dataset (Wav2RecDataset): a dataset to fit against.

    Returns:
        Wav2Rec

    """
    all_paths, all_projections = list(), list()
    with tqdm(desc="Fitting", disable=not self.verbose, total=len(dataset)) as pbar:
        for paths, audio in self._dataset2loader(dataset):
            all_paths.extend(paths)
            all_projections.append(self.get_projection(audio))
            pbar.update(len(audio))

    self.paths = np.asarray(all_paths)
    self._nneighbours.fit(np.concatenate(all_projections))
    self.fitted = True
    return self

`get_projection(self, x)`

Get the model's projection of a waveform x.

Parameters:

Name	Type	Description	Default
`x`	`Union[torch.Tensor, np.ndarray]`	a 1D array or tensor with shape `[FEATURES]` or a 2D array or tensor with shape `[BATCH, FEATURES]`.	required

Returns:

Type	Description
`np.ndarray`	proj (np.ndarray): a projection of `x`.

Source code in wav2rec/core/engine.py

def get_projection(self, x: Union[torch.Tensor, np.ndarray]) -> np.ndarray:
    """Get the model's projection of a waveform ``x``.

    Args:
        x (np.ndarray, torch.Tensor): a 1D array or tensor with shape ``[FEATURES]``
            or a 2D array or tensor with shape ``[BATCH, FEATURES]``.

    Returns:
        proj (np.ndarray): a projection of ``x``.

    """
    with torch.inference_mode():
        proj: np.ndarray = (
            self.net(_standardize_input(x).to(self.device)).cpu().numpy()
        )
    return _l2_normalize(proj, axis=-1) if self.normalize else proj

`similarity`

Similarity

`cosine_similarity(x1, x2)`

Compute cosine similarity between two 1D arrays.

Parameters:

Name	Type	Description	Default
`x1`	`ndarray`	a 1D array with shape `[FEATURES]`	required
`x2`	`ndarray`	a 1D array with shape `[FEATURES]`	required

Returns:

Type	Description
`float`	similarity (float): a similarity score on [0, 1].

Warning

x1 and x2 must be normalized.

Source code in wav2rec/core/similarity.py

@numba.jit(nopython=True)
def cosine_similarity(x1: np.ndarray, x2: np.ndarray) -> float:
    """Compute cosine similarity between two 1D arrays.

    Args:
        x1 (np.ndarray): a 1D array with shape ``[FEATURES]``
        x2 (np.ndarray): a 1D array with shape ``[FEATURES]``

    Returns:
        similarity (float): a similarity score on [0, 1].

    Warning:
        * ``x1`` and ``x2`` must be normalized.

    """
    return float(_clip(x1 @ x2, a_min=0, a_max=1))

`similarity_calculator(X_query, X_neighbours, metric=CPUDispatcher(<function cosine_similarity at 0x7fb7569b0160>))`

Compute the similarity of X_query with all entries in X_neighbours.

Parameters:

Name	Type	Description	Default
`X_query`	`ndarray`	a query 2D array with shape `[N_QUERIES, FEATURES]`	required
`X_neighbours`	`ndarray`	a reference 2D array with shape `[N_QUERIES, N_NEIGHBOURS, FEATURES]`	required
`metric`	`Callable[[numpy.ndarray, numpy.ndarray], float]`	a callable which accepts two 1D arrays and returns a float. Must be compiled with `numba.jit(nopython=True)`.	`CPUDispatcher(<function cosine_similarity at 0x7fb7569b0160>)`

Returns:

Type	Description
`ndarray`	sims (np.ndarray): a 2D array of similarities with shape `[N_QUERIES, N_NEIGHBOURS]`.

Source code in wav2rec/core/similarity.py

@numba.jit(nopython=True)
def similarity_calculator(
    X_query: np.ndarray,
    X_neighbours: np.ndarray,
    metric: Callable[[np.ndarray, np.ndarray], float] = cosine_similarity,
) -> np.ndarray:
    """Compute the similarity of ``X_query`` with all entries in ``X_neighbours``.

    Args:
        X_query (np.ndarray): a query 2D array with shape ``[N_QUERIES, FEATURES]``
        X_neighbours (np.ndarray): a reference 2D array with shape
            ``[N_QUERIES, N_NEIGHBOURS, FEATURES]``
        metric (callable): a callable which accepts two 1D arrays
            and returns a float. Must be compiled with ``numba.jit(nopython=True)``.

    Returns:
        sims (np.ndarray): a 2D array of similarities with shape ``[N_QUERIES, N_NEIGHBOURS]``.

    """
    n_queries = X_query.shape[0]
    n_neighbours = X_neighbours.shape[1]

    sims = np.zeros((n_queries, n_neighbours), dtype=X_neighbours.dtype)
    for i in range(n_queries):
        for j in range(n_neighbours):
            sims[i, j] = metric(X_query[i], X_neighbours[i, j])
    return sims

`data` `special`

Data

`dataset`

Dataset

`Wav2RecDataset`

Base Wav2Rec Dataset.

Parameters:

Name	Type	Description	Default
`audio_path`	`Path`	path to a directory of caches of type `ext`	required
`sr`	`int`	sample rate to use for each track	required
`offset`	`int`	seconds to skip in each track	required
`duration`	`int`	the duration of each track to use.	required
`ext`	`str, tuple`	one or more file extensions in `audio_path` to filter for	required
`res_type`	`str`	resampling algorithm	required
`zero_pad`	`bool`	if `True`, automatically zero pad waveforms shorter than `n_features`.	required
`verbose`	`bool`	if `True` display progress bars	required

`n_features: int` `property` `readonly`

Expected number of elements (samples) in each sample.

`get_audio_files(self)`

Generate an iterable of all eligible files in audio_path.

Yields

path

Source code in wav2rec/data/dataset.py

def get_audio_files(self) -> Iterable[Path]:
    """Generate an iterable of all eligible files in ``audio_path``.

    Yields:
        path

    """
    yield from tqdm(
        self._audio_path_iter(),
        desc="Scanning for Audio",
        disable=not self.verbose,
        total=sum(1 for _ in self._audio_path_iter()),
        unit="file",
    )

`load_audio(self, path)`

Load an audio file from path.

Parameters:

Name	Type	Description	Default
`path`	`Path`	a file path to a piece of audio	required

Returns:

Type	Description
`torch.Tensor`	x (np.ndarray): a mono-signaled piece of audio.

Source code in wav2rec/data/dataset.py

def load_audio(self, path: Path) -> torch.Tensor:
    """Load an audio file from ``path``.

    Args:
        path (Path): a file path to a piece of audio

    Returns:
        x (np.ndarray): a mono-signaled piece of audio.

    """
    with warnings.catch_warnings():
        warnings.filterwarnings("ignore", message="PySoundFile failed.*")
        x, _ = load(
            path=path,
            sr=self.sr,
            mono=True,
            offset=self.offset,
            duration=self.duration,
            res_type=self.res_type,
        )
    if self.zero_pad:
        x = zero_pad1d(x, target_length=self.n_features)
    return torch.as_tensor(x)

`scan(self)`

Scan audio_path for audio files.

Returns:

Type	Description
`Wav2RecDataset`	Wav2RecDataset

Source code in wav2rec/data/dataset.py

def scan(self) -> Wav2RecDataset:
    """Scan ``audio_path`` for audio files.

    Returns:
        Wav2RecDataset

    """
    files = list(self.get_audio_files())
    if files:
        self.files = files
    else:
        raise OSError(f"No files found in '{str(self.audio_path)}'")
    return self

`transforms`

Transforms

`RandomNoise`

Add random noise to a signal.

Parameters:

Name	Type	Description	Default
`alpha`	`tuple`	a tuple to characterize a uniform distribution. Values drawn from this distribution will determine the weight given to the random noise.	required
`**kwargs`	`Keyword Args`	keyword arguments to pass to the parent class.	required

`op(self, x)`

Add random noise to x

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	a tensor to operate on	required

Returns:

Type	Description
`Tensor`	x_fuzzed (torch.Tensor): x + noise.

Source code in wav2rec/data/transforms.py

def op(self, x: torch.Tensor) -> torch.Tensor:
    """Add random noise to ``x``

    Args:
        x (torch.Tensor): a tensor to operate on

    Returns:
        x_fuzzed (torch.Tensor): x + noise.

    """
    noise_weight = np.random.uniform(*self.alpha)
    return x + torch.rand_like(x) * noise_weight

`RandomOp`

`init(self, p)` `special`

Base class for randomly applying an operation.

Parameters:

Name	Type	Description	Default
`p`	`float`	probability of performing the transformation	required

Source code in wav2rec/data/transforms.py

def __init__(self, p: float) -> None:
    """Base class for randomly applying an operation.

    Args:
        p (float): probability of performing the transformation

    """
    super().__init__()
    self.p = p

`forward(self, x)`

Perform op() on x with probability p.

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	tensor to operate on	required

Returns:

Type	Description
`Tensor`	torch.Tensor

Source code in wav2rec/data/transforms.py

def forward(self, x: torch.Tensor) -> torch.Tensor:
    """Perform ``op()`` on ``x`` with probability ``p``.

    Args:
        x (torch.Tensor): tensor to operate on

    Returns:
        torch.Tensor

    """
    if np.random.uniform(0, 1) <= self.p:
        return self.op(x)
    else:
        return x

`op(self, x)`

Operation to perform.

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	tensor to operate on	required

Returns:

Type	Description
`Tensor`	torch.Tensor

Source code in wav2rec/data/transforms.py

def op(self, x: torch.Tensor) -> torch.Tensor:
    """Operation to perform.

    Args:
        x (torch.Tensor): tensor to operate on

    Returns:
        torch.Tensor

    """
    raise NotImplementedError()

`RandomReplaceMean`

Randomly replace part of a tensor with its mean.

`replacement(self, x, a, b)`

Generate replacement (mean of each batch).

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	tensor to operate on. Should be of the form `[BATCH, FEATURES]`.	required
`a`	`int`	start position in the tensor	required
`b`	`int`	end position in the tensor	required

Returns:

Type	Description
`Union[float, torch.Tensor]`	torch.Tensor

Source code in wav2rec/data/transforms.py

def replacement(
    self,
    x: torch.Tensor,
    a: int,
    b: int,
) -> Union[float, torch.Tensor]:
    """Generate replacement (mean of each batch).

    Args:
        x (torch.Tensor): tensor to operate on. Should be of the
            form ``[BATCH, FEATURES]``.
        a (float): start position in the tensor
        b (float): end position in the tensor

    Returns:
        torch.Tensor

    """
    return x.mean(dim=-1).repeat_interleave(b - a).view(-1, b - a)

`RandomReplaceZero`

Randomly replace part of a tensor with zero.

`replacement(self, x, a, b)`

Generate replacement (zero).

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	tensor to operate on. Should be of the form `[BATCH, FEATURES]`.	required
`a`	`int`	start position in the tensor	required
`b`	`int`	end position in the tensor	required

Returns:

Type	Description
`Union[float, torch.Tensor]`	torch.Tensor

Source code in wav2rec/data/transforms.py

def replacement(
    self,
    x: torch.Tensor,
    a: int,
    b: int,
) -> Union[float, torch.Tensor]:
    """Generate replacement (zero).

    Args:
        x (torch.Tensor): tensor to operate on. Should be of the
            form ``[BATCH, FEATURES]``.
        a (float): start position in the tensor
        b (float): end position in the tensor

    Returns:
        torch.Tensor

    """
    return 0.0

`Resize`

Resize a tensor.

Parameters:

Name	Type	Description	Default
`size`	`int, tuple`	one or more integers	required
`mode`	`str`	resizing algorithm to use	required

`forward(self, x)`

Resize x to size.

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	a tensor of the form `[BATCH, ...]`.	required

Returns:

Type	Description
`Tensor`	x_resized (torch.Tensor): `x` resized

Source code in wav2rec/data/transforms.py

def forward(self, x: torch.Tensor) -> torch.Tensor:
    """Resize ``x`` to ``size``.

    Args:
        x (torch.Tensor): a tensor of the form ``[BATCH, ...]``.

    Returns:
        x_resized (torch.Tensor): ``x`` resized

    """
    return F.interpolate(x, size=self.size, mode=self.mode)

`nn` `special`

NN

`audionets`

Audio-Image Networks

`AudioImageNetwork`

Class of networks which handle 1D waveforms by making them image-like.

Parameters:

Name	Type	Description	Default
`sr`	`int`	sample rate of the audio files	required
`n_mels`	`int`	number of mel bands to construct for raw audio.	required
`image_size`	`int`	size to reshape the "images" (Melspectrograms) to.	required
`**kwargs`	`Keyword Args`	Keyword arguments to pass to `MelSpectrogram()`	required

`hidden_features: int` `property` `readonly`

Number of features emitted by the network.

`AudioResnet50`

Resnet50-Based Audio network.

This network is designed to generate features against melspectrogram input, using a Resnet50 model as the encoder.

Parameters:

Name	Type	Description	Default
`sr`	`int`	sample rate of the audio files	required
`n_mels`	`int`	number of mel bands to construct for raw audio.	required
`image_size`	`int`	size to reshape the "images" (Melspectrograms) to.	required
`**kwargs`	`Keyword Arguments`	keyword arguments to pass to the parent class.	required

Notes

Batches are normalized prior to being fed to the network in order to stabilize training.

`hidden_features: int` `property` `readonly`

Number of features emitted by the network.

`forward(self, x)`

Compute the forward pass of the network.

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	an input tensor	required

Returns:

Type	Description
`Tensor`	torch.Tensor

Source code in wav2rec/nn/audionets.py

def forward(self, x: torch.Tensor) -> torch.Tensor:
    """Compute the forward pass of the network.

    Args:
        x (torch.Tensor): an input tensor

    Returns:
        torch.Tensor

    """
    if x.ndim == 2:  # assume waveforms
        x = self.wav2spec(x)
    return self.net(self.bn(x))

`AudioVit`

ViT-Based Audio network.

This network is designed to generate features against melspectrogram input, using a ViT model as the encoder.

Parameters:

Name	Type	Description	Default
`sr`	`int`	sample rate of the audio files	required
`n_mels`	`int`	number of mel bands to construct for raw audio.	required
`image_size`	`int`	size to reshape the "images" (Melspectrograms) to.	required
`patch_size`	`int`	size of each patch. Must be square.	required
`dim`	`int`	dimension of output following `nn.Linear()`	required
`depth`	`int`	number of transformer blocks.	required
`heads`	`int`	number of multi-head Attention layers	required
`mlp_dim`	`int`	dimensions of the multi-layer perceptron (MLP) in the feed forward layer of the transformer(s).	required
`dim_head`	`int`	dimensions in the head of the attention block(s)	required
`dropout`	`float`	dropout rate to use. Must be on `[0, 1]`.	required
`emb_dropout`	`float`	dropout of the embedding layer. Must be on `[0, 1]`.	required
`**kwargs`	`Keyword Arguments`	keyword arguments to pass to the parent class.	required

Notes

Batches are normalized prior to being fed to the network in order to stabilize training.

`hidden_features: int` `property` `readonly`

Number of features emitted by the network.

`forward(self, x)`

Compute the forward pass of the network.

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	an input tensor	required

Returns:

Type	Description
`Tensor`	torch.Tensor

Source code in wav2rec/nn/audionets.py

def forward(self, x: torch.Tensor) -> torch.Tensor:
    """Compute the forward pass of the network.

    Args:
        x (torch.Tensor): an input tensor

    Returns:
        torch.Tensor

    """
    if x.ndim == 2:  # assume waveforms
        x = self.wav2spec(x)
    return self.net(self.bn(x))

`lightening`

Lightening Model

`Wav2RecNet`

Unified (SimSam with Encoder) network.

Parameters:

Name	Type	Description	Default
`lr`	`float`	learning rate for the model	required
`encoder`	`AudioImageNetwork`	a model which inherits from `AudioImageNetwork`, to be used as the encoder in `SimSam`. If `None`, `AudioResnet50` will be used.	required
`**kwargs`	`Keyword Arguments`	Keyword arguments to pass to `SimSam`.	required

`configure_optimizers(self)`

Choose what optimizers and learning-rate schedulers to use in your optimization. Normally you'd need one. But in the case of GANs or similar you might have multiple.

Returns:

Type Description

Optimizer

Any of these 6 options.

Single optimizer.
List or Tuple of optimizers.
Two lists - The first list has multiple optimizers, and the second has multiple LR schedulers (or multiple lr_dict).
Dictionary, with an "optimizer" key, and (optionally) a "lr_scheduler" key whose value is a single LR scheduler or lr_dict.
Tuple of dictionaries as described above, with an optional "frequency" key.
None - Fit will run without any optimizer.

Note

The lr_dict is a dictionary which contains the scheduler and its associated configuration. The default configuration is shown below.

.. code-block:: python

lr_dict = {
    'scheduler': lr_scheduler, # The LR scheduler instance (required)
    # The unit of the scheduler's step size, could also be 'step'
    'interval': 'epoch',
    'frequency': 1, # The frequency of the scheduler
    'monitor': 'val_loss', # Metric for `ReduceLROnPlateau` to monitor
    'strict': True, # Whether to crash the training if `monitor` is not found
    'name': None, # Custom name for `LearningRateMonitor` to use
}

Only the "scheduler" key is required, the rest will be set to the defaults above.

Note

The frequency value specified in a dict along with the optimizer key is an int corresponding to the number of sequential batches optimized with the specific optimizer. It should be given to none or to all of the optimizers. There is a difference between passing multiple optimizers in a list, and passing multiple optimizers in dictionaries with a frequency of 1: In the former case, all optimizers will operate on the given batch in each optimization step. In the latter, only one optimizer will operate on the given batch at every step. This is different from the frequency value specified in the lr_dict mentioned below.

.. code-block:: python

def configure_optimizers(self):
    optimizer_one = torch.optim.SGD(self.model.parameters(), lr=0.01)
    optimizer_two = torch.optim.SGD(self.model.parameters(), lr=0.01)
    return [
        {'optimizer': optimizer_one, 'frequency': 5},
        {'optimizer': optimizer_two, 'frequency': 10},
    ]

In this example, the first optimizer will be used for the first 5 steps, the second optimizer for the next 10 steps and that cycle will continue. If an LR scheduler is specified for an optimizer using the lr_scheduler key in the above dict, the scheduler will only be updated when its optimizer is being used.

Examples::

# most cases
def configure_optimizers(self):
    return Adam(self.parameters(), lr=1e-3)

# multiple optimizer case (e.g.: GAN)
def configure_optimizers(self):
    gen_opt = Adam(self.model_gen.parameters(), lr=0.01)
    dis_opt = Adam(self.model_dis.parameters(), lr=0.02)
    return gen_opt, dis_opt

# example with learning rate schedulers
def configure_optimizers(self):
    gen_opt = Adam(self.model_gen.parameters(), lr=0.01)
    dis_opt = Adam(self.model_dis.parameters(), lr=0.02)
    dis_sch = CosineAnnealing(dis_opt, T_max=10)
    return [gen_opt, dis_opt], [dis_sch]

# example with step-based learning rate schedulers
def configure_optimizers(self):
    gen_opt = Adam(self.model_gen.parameters(), lr=0.01)
    dis_opt = Adam(self.model_dis.parameters(), lr=0.02)
    gen_sch = {'scheduler': ExponentialLR(gen_opt, 0.99),
               'interval': 'step'}  # called after each training step
    dis_sch = CosineAnnealing(dis_opt, T_max=10) # called every epoch
    return [gen_opt, dis_opt], [gen_sch, dis_sch]

# example with optimizer frequencies
# see training procedure in `Improved Training of Wasserstein GANs`, Algorithm 1
# https://arxiv.org/abs/1704.00028
def configure_optimizers(self):
    gen_opt = Adam(self.model_gen.parameters(), lr=0.01)
    dis_opt = Adam(self.model_dis.parameters(), lr=0.02)
    n_critic = 5
    return (
        {'optimizer': dis_opt, 'frequency': n_critic},
        {'optimizer': gen_opt, 'frequency': 1}
    )

Note

Some things to know:

Lightning calls .backward() and .step() on each optimizer and learning rate scheduler as needed.
If you use 16-bit precision (precision=16), Lightning will automatically handle the optimizers.
If you use multiple optimizers, :meth:training_step will have an additional optimizer_idx parameter.
If you use :class:torch.optim.LBFGS, Lightning handles the closure function automatically for you.
If you use multiple optimizers, gradients will be calculated only for the parameters of current optimizer at each training step.
If you need to control how often those optimizers step or override the default .step() schedule, override the :meth:optimizer_step hook.

Source code in wav2rec/nn/lightening.py

def configure_optimizers(self) -> torch.optim.Optimizer:
    optimizer = torch.optim.Adam(self.parameters(), lr=self.lr)
    return optimizer

`forward(self, x)`

Same as :meth:torch.nn.Module.forward().

Parameters:

Name	Type	Description	Default
`*args`		Whatever you decide to pass into the forward method.	required
`**kwargs`		Keyword arguments are also possible.	required

Returns:

Type	Description
`Tensor`	Your model's output

Source code in wav2rec/nn/lightening.py

def forward(self, x: torch.Tensor) -> torch.Tensor:
    return self.learner.wrapped_encoder(x)

`training_step(self, batch, batch_idx)`

Here you compute and return the training loss and some additional metrics for e.g. the progress bar or logger.

Parameters:

Name	Type	Description	Default
`batch`	`Tuple[torch.Tensor, torch.Tensor]`	class:`~torch.Tensor` \| (:class:`~torch.Tensor`, ...) \| [:class:`~torch.Tensor`, ...]): The output of your :class:`~torch.utils.data.DataLoader`. A tensor, tuple or list.	required
`batch_idx`	`int`	Integer displaying index of this batch	required
`optimizer_idx`	`int`	When using multiple optimizers, this argument will also be present.	required
`hiddens(`		class:`~torch.Tensor`): Passed in if :paramref:`~pytorch_lightning.core.lightning.LightningModule.truncated_bptt_steps` > 0.	required

Returns:

Type Description

Tensor

Any of.

:class:~torch.Tensor - The loss tensor
dict - A dictionary. Can include any keys, but must include the key 'loss'
None - Training will skip to the next batch

Note

Returning None is currently not supported for multi-GPU or TPU, or with 16-bit precision enabled.

In this step you'd normally do the forward pass and calculate the loss for a batch. You can also do fancier things like multiple forward passes or something model specific.

Example::

def training_step(self, batch, batch_idx):
    x, y, z = batch
    out = self.encoder(x)
    loss = self.loss(out, x)
    return loss

If you define multiple optimizers, this step will be called with an additional optimizer_idx parameter.

.. code-block:: python

# Multiple optimizers (e.g.: GANs)
def training_step(self, batch, batch_idx, optimizer_idx):
    if optimizer_idx == 0:
        # do training_step with encoder
    if optimizer_idx == 1:
        # do training_step with decoder

If you add truncated back propagation through time you will also get an additional argument with the hidden states of the previous step.

.. code-block:: python

# Truncated back-propagation through time
def training_step(self, batch, batch_idx, hiddens):
    # hiddens are the hidden states from the previous truncated backprop step
    ...
    out, hiddens = self.lstm(data, hiddens)
    ...
    return {'loss': loss, 'hiddens': hiddens}

Note

The loss value shown in the progress bar is smoothed (averaged) over the last values, so it differs from the actual loss returned in train/validation step.

Source code in wav2rec/nn/lightening.py

def training_step(
    self,
    batch: Tuple[torch.Tensor, torch.Tensor],
    batch_idx: int,
) -> torch.Tensor:
    _, x = batch
    loss = self.learner(x)
    self.log("loss", loss)
    return loss

`validation_step(self, batch, batch_idx)`

Operates on a single batch of data from the validation set. In this step you'd might generate examples or calculate anything of interest like accuracy.

.. code-block:: python

# the pseudocode for these calls
val_outs = []
for val_batch in val_data:
    out = validation_step(val_batch)
    val_outs.append(out)
validation_epoch_end(val_outs)

Parameters:

Name	Type	Description	Default
`batch`	`Tuple[torch.Tensor, torch.Tensor]`	class:`~torch.Tensor` \| (:class:`~torch.Tensor`, ...) \| [:class:`~torch.Tensor`, ...]): The output of your :class:`~torch.utils.data.DataLoader`. A tensor, tuple or list.	required
`batch_idx`	`int`	The index of this batch	required
`dataloader_idx`	`int`	The index of the dataloader that produced this batch (only if multiple val dataloaders used)	required

Returns:

Type Description

Tensor

Any of.

Any object or value
None - Validation will skip to the next batch

.. code-block:: python

# pseudocode of order
val_outs = []
for val_batch in val_data:
    out = validation_step(val_batch)
    if defined('validation_step_end'):
        out = validation_step_end(out)
    val_outs.append(out)
val_outs = validation_epoch_end(val_outs)

.. code-block:: python

# if you have one val dataloader:
def validation_step(self, batch, batch_idx)

# if you have multiple val dataloaders:
def validation_step(self, batch, batch_idx, dataloader_idx)

Examples::

# CASE 1: A single validation dataset
def validation_step(self, batch, batch_idx):
    x, y = batch

    # implement your own
    out = self(x)
    loss = self.loss(out, y)

    # log 6 example images
    # or generated text... or whatever
    sample_imgs = x[:6]
    grid = torchvision.utils.make_grid(sample_imgs)
    self.logger.experiment.add_image('example_images', grid, 0)

    # calculate acc
    labels_hat = torch.argmax(out, dim=1)
    val_acc = torch.sum(y == labels_hat).item() / (len(y) * 1.0)

    # log the outputs!
    self.log_dict({'val_loss': loss, 'val_acc': val_acc})

If you pass in multiple val dataloaders, :meth:validation_step will have an additional argument.

.. code-block:: python

# CASE 2: multiple validation dataloaders
def validation_step(self, batch, batch_idx, dataloader_idx):
    # dataloader_idx tells you which dataset this is.

Note

If you don't need to validate you don't need to implement this method.

Note

When the :meth:validation_step is called, the model has been put in eval mode and PyTorch gradients have been disabled. At the end of validation, the model goes back to training mode and gradients are enabled.

Source code in wav2rec/nn/lightening.py

def validation_step(
    self,
    batch: Tuple[torch.Tensor, torch.Tensor],
    batch_idx: int,
) -> torch.Tensor:
    _, x = batch
    loss = self.learner(x)
    self.log("val_loss", loss, prog_bar=True)
    return loss

`simsam`

SimSam Model

Notes

Code adapted from https://github.com/lucidrains/byol-pytorch

References

https://arxiv.org/abs/2006.07733
https://arxiv.org/abs/2011.10566
https://github.com/lucidrains/byol-pytorch

`SimSam`

Simple Siamese Neural Network for self-supervised representation learning.

Parameters:

Name	Type	Description	Default
`encoder`	`AudioImageNetwork`	a model which inherits from `AudioImageNetwork`,	required
`projection_size`	`int`	dimensionality of vectors to be compared	required
`projection_hidden_size`	`int`	number of units in Multilayer Perceptron (MLP) networks	required
`augment1`	`callable`	First augmentation (yields `x1). If`None``, the default augmentation will be used.	required
`augment2`	`callable`	Second augmentation (yield `x2`). If `None`, `augment1` will be used.	required

References

https://arxiv.org/abs/2006.07733
https://arxiv.org/abs/2011.10566
https://github.com/lucidrains/byol-pytorch

`forward(self, x)`

Compute the forward pass of the learner and the combined loss.

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	a tensor of shape `[BATCH, ...]`	required

Returns:

Type	Description
`Tensor`	loss (torch.Tensor): combined, average loss of the operation

Source code in wav2rec/nn/simsam.py

def forward(self, x: torch.Tensor) -> torch.Tensor:
    """Compute the forward pass of the learner and the
    combined loss.

    Args:
        x (torch.Tensor): a tensor of shape ``[BATCH, ...]``

    Returns:
        loss (torch.Tensor): combined, average loss of the operation

    """
    x1, x2 = self.augment1(x), self.augment2(x)

    online_pred_1 = self.predictor(self.wrapped_encoder(x1))
    online_pred_2 = self.predictor(self.wrapped_encoder(x2))

    with torch.no_grad():
        target_proj_1 = self.wrapped_encoder(x1).detach_()
        target_proj_2 = self.wrapped_encoder(x2).detach_()

    loss_1 = _loss_fn(online_pred_1, target_proj_2)
    loss_2 = _loss_fn(online_pred_2, target_proj_1)

    loss = loss_1 + loss_2
    return loss.mean()

`signal` `special`

`dsp`

Digital Signal Processing

`MelSpectrogram`

Layer to compute the melspectrogram of a 1D audio waveform.

This layer leverages the convolutional-based torchlibrosa library to compute the melspectrogram of an audio waveform. The computation can be performed efficiently on a GPU.

Parameters:

Name	Type	Description	Default
`sr`	`int`	sample rate of audio	required
`n_fft`	`int`	FFT window size	required
`win_length`	`int`	length of the FFT window function	required
`hop_length`	`int`	number of samples between frames	required
`f_min`	`float`	lowest frequency (Hz)	required
`f_max`	`float`	highest frequency (Hz)	required
`n_mels`	`int`	number of mel bands to create	required
`window`	`str`	window function to use.	required
`power`	`float`	exponent for the mel spectrogram.	required
`center`	`bool`	if True, center the input signal	required
`pad_mode`	`str`	padding to use at the edges of the signal. (Note: this only applies if `center=True`)	required
`as_db`	`bool`	if `True`, convert the output from amplitude to decibels.	required
`ref`	`float, str`	the reference point to use when converting to decibels. If a `float`, the reference point will be used 'as is'. If a string, must be `'max'` (computed and applied individually for each waveform in the batch). (Note: this only applies if `as_db=True`.)	required
`amin`	`float`	minimum threshold when converting to decibels. (Note: this only applies if `as_db=True`.)	required
`top_db`	`float`	the maximum threshold value to use when converting to decibels. (Note: this only applies if `as_db=True`.)	required
`normalize_db`	`bool`	if `True`, normalize the final output s.t. it is on [0, 1]. (Note: requires `as_db=True`).	required

`forward(self, x)`

Compute the melspectrogram of x.

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	2D tensor with shape `[BATCH, TIME]`	required

Returns:

Type	Description
`Tensor`	melspec (torch.Tensor): 3D tensor with shape `[BATCH, CHANNEL, TIME, N_MELS]`.

Source code in wav2rec/signal/dsp.py

def forward(self, x: torch.Tensor) -> torch.Tensor:
    """Compute the melspectrogram of ``x``.

    Args:
        x (torch.Tensor): 2D tensor with shape ``[BATCH, TIME]``

    Returns:
        melspec (torch.Tensor): 3D tensor with shape ``[BATCH, CHANNEL, TIME, N_MELS]``.

    """
    S = self.meltransform(x)
    if self.as_db and self.normalize_db:
        return self._normalize_db(self._power_to_db(S))
    elif self.as_db:
        return self._power_to_db(S)
    else:
        return S

Reference 📚

core special

engine

Wav2Rec

fit(self, dataset)

get_projection(self, x)

similarity

cosine_similarity(x1, x2)

similarity_calculator(X_query, X_neighbours, metric=CPUDispatcher(<function cosine_similarity at 0x7fb7569b0160>))

data special

dataset

Wav2RecDataset

n_features: int property readonly

get_audio_files(self)

load_audio(self, path)

scan(self)

transforms

RandomNoise

op(self, x)

RandomOp

__init__(self, p) special

forward(self, x)

op(self, x)

RandomReplaceMean

replacement(self, x, a, b)

RandomReplaceZero

replacement(self, x, a, b)

Resize

forward(self, x)

nn special

audionets

AudioImageNetwork

hidden_features: int property readonly

AudioResnet50

hidden_features: int property readonly

forward(self, x)

AudioVit

hidden_features: int property readonly

forward(self, x)

lightening

Wav2RecNet

configure_optimizers(self)

forward(self, x)

training_step(self, batch, batch_idx)

validation_step(self, batch, batch_idx)

simsam

SimSam

forward(self, x)

signal special

dsp

MelSpectrogram

forward(self, x)

`core` `special`

`engine`

`Wav2Rec`

`fit(self, dataset)`

`get_projection(self, x)`

`similarity`

`cosine_similarity(x1, x2)`

`similarity_calculator(X_query, X_neighbours, metric=CPUDispatcher(<function cosine_similarity at 0x7fb7569b0160>))`

`data` `special`

`dataset`

`Wav2RecDataset`

`n_features: int` `property` `readonly`

`get_audio_files(self)`

`load_audio(self, path)`

`scan(self)`

`transforms`

`RandomNoise`

`op(self, x)`

`RandomOp`

`init(self, p)` `special`

`forward(self, x)`

`op(self, x)`

`RandomReplaceMean`

`replacement(self, x, a, b)`

`RandomReplaceZero`

`replacement(self, x, a, b)`

`Resize`

`forward(self, x)`

`nn` `special`

`audionets`

`AudioImageNetwork`

`hidden_features: int` `property` `readonly`

`AudioResnet50`

`hidden_features: int` `property` `readonly`

`forward(self, x)`

`AudioVit`

`hidden_features: int` `property` `readonly`

`forward(self, x)`

`lightening`

`Wav2RecNet`

`configure_optimizers(self)`

`forward(self, x)`

`training_step(self, batch, batch_idx)`

`validation_step(self, batch, batch_idx)`

`simsam`

`SimSam`

`forward(self, x)`

`signal` `special`

`dsp`

`MelSpectrogram`

`forward(self, x)`