model package¶

Submodules¶

`model.deeprm_model`¶

Module: deeprm.model.deeprm_model This module defines the DeepRM model architecture, including the ResNet block, Transformer model, positional encoding, and regression head.

class deeprm.model.deeprm_model.ResNetBlock(*args, **kwargs)[source]¶

Bases: Module

A 1D ResNet block for 1D DeepRM.

Parameters:

in_channels (int) – Number of input channels.
out_channels (int) – Number of output channels.
hidden_channels (int) – Number of hidden channels. If None, set to out_channels. (optional)
kernel_size (int) – Kernel size for the middle convolutional layer. Default is 3. (optional)
stride (int) – Stride for the convolutional layers. Default is 1. (optional)
activation (str) – Activation function to use. Default is ‘gelu’. (optional)
dropout (float) – Dropout rate. Default is 0.1. (optional)
groups (int) – Number of groups for grouped convolution. Default is 1. (optional)

bn1¶

Batch normalization layer for the first convolution.

Type:: torch.nn.BatchNorm1d

activation¶

Activation function.

Type:: typing.Callable

conv1¶

First convolutional layer with kernel size 1.

Type:: torch.nn.Conv1d

bn2¶

Batch normalization layer for the second convolution.

Type:: torch.nn.BatchNorm1d

conv2¶

Second convolutional layer with specified kernel size and groups.

Type:: torch.nn.Conv1d

bn3¶

Batch normalization layer for the third convolution.

Type:: torch.nn.BatchNorm1d

dropout¶

Dropout layer.

Type:: torch.nn.Dropout

conv3¶

Third convolutional layer with kernel size 1.

Type:: torch.nn.Conv1d

shortcut¶

Shortcut connection to match input and output dimensions.

Type:: torch.nn.Module

forward(x)[source]¶

Forward pass through the ResNet block.

Parameters:: x (torch.Tensor) – Input tensor of shape (batch_size, in_channels, sequence_length).
Returns:: Output tensor of shape (batch_size, out_channels, sequence_length).
Return type:: torch.Tensor

class deeprm.model.deeprm_model.TransformerModel(*args, **kwargs)[source]¶

Bases: Module

A Transformer model for DeepRM.

Parameters:

d_model (int) – Dimension of the model.
n_heads (int) – Number of attention heads.
d_ff (int) – Dimension of the feed-forward network.
n_layers (int) – Number of encoder layers.
encoder_dropout (float) – Dropout rate for the encoder. Default is 0.1. (optional)
lin_dropout (float) – Dropout rate for the linear layers. Default is 0.1. (optional)
kmer_size (int) – Size of the k-mer. Default is 5. (optional)
signal_size (int) – Size of the signal input. Default is 25. (optional)
block_len (int) – Length of the block. Default is 17. (optional)
seq_len (int) – Length of the sequence. Default is 200. (optional)
t_act (str) – Activation function for the transformer. Default is ‘gelu’. (optional)
lin_act (str) – Activation function for the linear layers. Default is ‘relu’. (optional)
lin_depth (int) – Depth of the linear layers. Default is 1. (optional)
signal_stride (int) – Stride for the signal input. Default is 6. (optional)
**kwargs – Additional keyword arguments.

kmer_embedding¶

Embedding layer for k-mer sequences.

Type:: torch.nn.Embedding

signal_embedding¶

Linear layer for signal input.

Type:: torch.nn.Linear

pos_encoding¶

Positional encoding layer.

Type:: PositionalEncoding

cnn_encoder¶

Sequential container for CNN encoder blocks.

Type:: torch.nn.Sequential

transformer_encoder¶

Transformer encoder.

Type:: torch.nn.TransformerEncoder

regression_head¶

Regression head for the model output.

Type:: RegressionHead

d_model¶

Dimension of the model.

Type:: int

model_type¶

Type of the model, set to ‘Transformer’.

Type:: str

kmer_size¶

Size of the k-mer.

Type:: int

signal_stride¶

Stride for the signal input.

Type:: int

unit_size¶

Size of the unit for processing sequences.

Type:: int

target_start_idx¶

Start index for the target in the sequence.

Type:: int

target_end_idx¶

End index for the target in the sequence.

Type:: int

seq_len¶

Length of the sequence.

Type:: int

block_len¶

Length of the block.

Type:: int

init_weights(initrange=0.1)[source]¶

Initialize the weights of the model.

Parameters:: initrange (float) – Range for uniform initialization of weights. Default is 0.1. (optional)
Returns:: None

process_kmer(src_kmer, src_seg_len_flat)[source]¶

Process the k-mer input to convert nucleotide characters to numerical indices.

Parameters:

src_kmer (torch.Tensor) – Input tensor of shape (batch_size, seq_len) containing nucleotide characters.
src_seg_len_flat (torch.Tensor) – Flattened segment lengths for the input sequences.

Returns:

Processed k-mer tensor of shape (batch_size, seq_len) with numerical indices.

Return type:

torch.Tensor

process_signal(src_signal)[source]¶

Process the signal input by unfolding it into segments based on the signal stride and k-mer size.

Parameters:: src_signal (torch.Tensor) – Input tensor of shape (batch_size, seq_len, signal_size) containing signal data.
Returns:: Processed signal tensor of shape (batch_size, new_seq_len, signal_size) after unfolding.
Return type:: torch.Tensor

flatten_seg_len(src_seg_len)[source]¶

Flatten the segment lengths to create a single dimension for each sequence.

Parameters:: src_seg_len (torch.Tensor) – Input tensor of shape (batch_size, num_segments) containing segment lengths.
Returns:: Flattened segment lengths of shape (batch_size, seq_len).
Return type:: torch.Tensor

create_src_pad_mask(src_signal, src_seg_len)[source]¶

Create a padding mask for the source signal to ignore padded values during processing.

Parameters:

src_signal (torch.Tensor) – Input tensor of shape (batch_size, seq_len, signal_size) containing signal data.
src_seg_len (torch.Tensor) – Segment lengths tensor of shape (batch_size, num_segments).

Returns:

Padding mask of shape (batch_size, seq_len) where True indicates padded positions.

Return type:

torch.Tensor

create_target_mask(src_seg_len, src_seg_len_flat)[source]¶

Create a target mask to identify the target positions in the sequence.

Parameters:

src_seg_len (torch.Tensor) – Segment lengths tensor of shape (batch_size, num_segments).
src_seg_len_flat (torch.Tensor) – Flattened segment lengths tensor of shape (batch_size, seq_len).

Returns:

Target mask of shape (batch_size, seq_len) where True indicates target positions.

Return type:

torch.Tensor

process_dwell_bq(src_dwell_bq, src_seg_len_flat)[source]¶

Process the dwell time and base quality input by flattening and repeating it based on segment lengths.

Parameters:

src_dwell_bq (torch.Tensor) – Input tensor of shape (batch_size, seq_len, channel) containing dwell time and base quality.
src_seg_len_flat (torch.Tensor) – Flattened segment lengths for the input sequences.

Returns:

Processed dwell time and base quality tensor of shape (batch_size, seq_len, channel).

Return type:

torch.Tensor

forward(src_kmer, src_signal, src_seg_len, src_dwell_bq)[source]¶

Forward pass through the Transformer model.

Parameters:

src_kmer (torch.Tensor) – Input tensor of shape (batch_size, seq_len) containing k-mer sequences.
src_signal (torch.Tensor) – Input tensor of shape (batch_size, seq_len, signal_size) containing signal data.
src_seg_len (torch.Tensor) – Segment lengths tensor of shape (batch_size, num_segments).
src_dwell_bq (torch.Tensor) – Input tensor of shape (batch_size, seq_len, channel) containing dwell time and base quality.

Returns:

Output tensor of shape (batch_size, seq_len) after processing through the model.

Return type:

torch.Tensor

class deeprm.model.deeprm_model.PositionalEncoding(*args, **kwargs)[source]¶

Bases: Module

Positional Encoding for Transformer models.

Parameters:

d_model (int) – Dimension of the model.
seq_len (int) – Length of the sequence.

pe¶

Positional encoding tensor of shape (1, seq_len, d_model).

Type:: torch.Tensor

forward(batch_size)[source]¶

Forward pass to repeat the positional encoding for the given batch size.

Parameters:: x – Tensor, shape [seq_len, batch_size, embedding_dim]
Returns:: Tensor, shape [seq_len, batch_size, embedding_dim]
Return type:: torch.Tensor

class deeprm.model.deeprm_model.RegressionHead(*args, **kwargs)[source]¶

Bases: Module

Regression head for the Transformer model.

Parameters:

d_model (int) – Dimension of the model.
lin_act (str) – Activation function for the linear layers.
lin_depth (int) – Depth of the linear layers.
lin_dropout (float) – Dropout rate for the linear layers.
seq_length (int) – Length of the sequence.

lin_layers¶

Sequential container for the linear layers.

Type:: torch.nn.Sequential

forward(x)[source]¶

Forward pass through the regression head.

Parameters:: x (torch.Tensor) – Input tensor of shape (batch_size, seq_length, d_model).
Returns:: Output tensor of shape (batch_size, seq_length, 1) after processing through the linear layers.
Return type:: torch.Tensor

init_weights(initrange=0.1)[source]¶

Initialize the weights of the linear layers in the regression head.

Parameters:: initrange (float) – Range for uniform initialization of weights. Default is 0.1.
Returns:: None

model package¶

Submodules¶

model.deeprm_model¶

`model.deeprm_model`¶