qc packageΒΆ
SubmodulesΒΆ
qc.inspect_alignmentΒΆ
DeepRM QC Module: Inspect Alignment
Inspect alignment quality by extracting CIGAR string and calculating error rates. This module reads a BAM file, extracts the CIGAR strings, and computes the error rates for each read.
- deeprm.qc.inspect_alignment.add_arguments(parser)[source]ΒΆ
Adds command-line arguments.
- Parameters:
parser (
argparse.ArgumentParser) β Argument parser to which arguments will be added.- Returns:
None
- deeprm.qc.inspect_alignment.main(args)[source]ΒΆ
Main function to run the alignment inspection pipeline. This function parses command line arguments, checks for existing output, and runs the CIGAR extraction and error rate calculation. It also plots the error rates using KDE and boxplot.
- Parameters:
args (
argparse.Namespace) β Parsed command-line arguments.- Returns:
None
- deeprm.qc.inspect_alignment.extract_cigar_worker(pid, args, error_dict)[source]ΒΆ
Worker function to extract CIGAR strings and calculate error rates for a given process ID.
- Parameters:
pid (
int) β Process ID for multiprocessing.args (
argparse.Namespace) β Parsed command line arguments.error_dict (
dict) β Shared dictionary to store error rates.
- Returns:
None
- deeprm.qc.inspect_alignment.extract_cigar_master(args)[source]ΒΆ
Master function to extract CIGAR strings and calculate error rates using multiprocessing. :param args: Parsed command line arguments. :type args:
argparse.Namespace- Returns:
DataFrame containing error rates for each read.
- Return type:
- deeprm.qc.inspect_alignment.md_to_mismatch_arr(md)[source]ΒΆ
Convert MD tag to mismatch array. 1 = mismatch, 0 = match. Deletions are ignored (filled as matches).
- Parameters:
md (
str) β MD tag string from the BAM file.- Returns:
Array of mismatches (1s) and matches (0s).
- Return type:
- deeprm.qc.inspect_alignment.get_error_rate_func(cigar, md, use_md=True)[source]ΒΆ
Calculate error rates from CIGAR string and MD tag.
- Parameters:
- Returns:
Array containing mismatch rate, insertion rate, and deletion rate.
- Return type:
- deeprm.qc.inspect_alignment.plot_kde(df_error, args)[source]ΒΆ
Plot the distribution of read alignment accuracy using KDE.
- Parameters:
df_error (
pandas.DataFrame) β DataFrame containing error rates for each read.args (
argparse.Namespace) β Parsed command line arguments.
- Returns:
None
- deeprm.qc.inspect_alignment.plot_boxplot(df_error, args)[source]ΒΆ
Plot a boxplot of the error rates for each read.
- Parameters:
df_error (
pandas.DataFrame) β DataFrame containing error rates for each read.args (
argparse.Namespace) β Parsed command line arguments.
- Returns:
None
qc.inspect_blockΒΆ
DeepRM QC Module: Inspect Block Files
Inspect block files for quality control. Plot distribution of base quality, motif composition, nucleotide composition, and block score distribution.
- deeprm.qc.inspect_block.add_arguments(parser)[source]ΒΆ
Adds command-line arguments.
- Parameters:
parser (
argparse.ArgumentParser) β Argument parser to which arguments will be added.- Returns:
None
- deeprm.qc.inspect_block.main(args)[source]ΒΆ
Main function to inspect block files.
- Parameters:
args (
argparse.Namespace) β Command-line arguments.- Returns:
None
- deeprm.qc.inspect_block.seq_to_onehot(seq)[source]ΒΆ
Converts a nucleotide sequence to a one-hot encoded matrix.
- Parameters:
seq (
str) β Nucleotide sequence (A, C, G, T/U).- Returns:
One-hot encoded matrix of the sequence.
- Return type:
- deeprm.qc.inspect_block.motif_cdf(block_df_dict, color_dict, output)[source]ΒΆ
Calculate and plot the cumulative distribution function (CDF) of 5-mer motifs in the blocks.
- deeprm.qc.inspect_block.motif_composition(block_df_dict, output)[source]ΒΆ
Plot ratio of nucleotides in each position. Each nucleotide is represented as a box, and the height of the box is the ratio of the nucleotide.
- deeprm.qc.inspect_block.nucleotide_composition(block_df_dict, output)[source]ΒΆ
Plot the ratio of nucleotides in each block as a pie chart.
- deeprm.qc.inspect_block.bq_plot(block_df_dict, color_dict, output, sample=10000, comment='')[source]ΒΆ
Plot the distribution of base quality. Plot position-wise mean with CI95.
- Parameters:
block_df_dict (
dict) β Dictionary of DataFrames, each containing block data.color_dict (
dict) β Dictionary mapping block names to colors for plotting.output (
str) β Output directory to save the base quality plot and data.sample (
int) β Number of samples to use for plotting. If None, use all data.comment (
str) β Comment to append to the output file name.
- Returns:
None
- deeprm.qc.inspect_block.block_score_distribution(block_df_dict, color_dict, output)[source]ΒΆ
Plot the distribution of block score
- deeprm.qc.inspect_block.plot_violin(block_df_dict, color_dict, cb_len, output)[source]ΒΆ
Plot the distribution of base quality as a violin plot.
- deeprm.qc.inspect_block.plot_motif(perfect_block_df_dict, color_dict, args, motif_list=['AGACU', 'CGACA', 'UGAUC', 'GAAGC', 'UCAAG'])[source]ΒΆ
Plot the distribution of motifs in the perfect blocks.
- Parameters:
perfect_block_df_dict (
dict) β Dictionary of DataFrames, each containing perfect block data.color_dict (
dict) β Dictionary mapping block names to colors for plotting.args β Command-line arguments containing output directory and context block length.
motif_list (
list) β List of motifs to plot. Default is a predefined list of motifs.
- Returns:
None
qc.inspect_runΒΆ
DeepRM QC Module: Inspect Basecalled Run
Open a bam file and get the stats of read, then plot. 1. Read length distribution 2. Quality score distribution
- deeprm.qc.inspect_run.add_arguments(parser)[source]ΒΆ
Adds command-line arguments.
- Parameters:
parser (
argparse.ArgumentParser) β Argument parser to which arguments will be added.- Returns:
None
- deeprm.qc.inspect_run.main(args)[source]ΒΆ
Main function to run the script. It reads a BAM file, collects statistics on read lengths, mean quality scores, and poly(A) lengths, and generates plots for these statistics.
- Parameters:
args (
argparse.Namespace) β Parsed command-line arguments.- Returns:
None
- deeprm.qc.inspect_run.plot_read_len_oligo(read_len_arr, mean_qual_arr, bq_thres, out_path, bb_length)[source]ΒΆ
Plot read length distribution for oligo data.
- Parameters:
read_len_arr (
numpy.ndarray) β Array of read lengths.mean_qual_arr (
numpy.ndarray) β Array of mean quality scores.bq_thres (
int) β Base quality threshold.out_path (
str) β Output directory path.bb_length (
int) β Length of the barcode.
- Returns:
None
- deeprm.qc.inspect_run.plot_read_len_mrna(read_len_arr, mean_qual_arr, bq_thres, out_path)[source]ΒΆ
Plot read length distribution for mRNA data.
- Parameters:
read_len_arr (
numpy.ndarray) β Array of read lengths.mean_qual_arr (
numpy.ndarray) β Array of mean quality scores.bq_thres (
int) β Base quality threshold.out_path (
str) β Output directory path.
- Returns:
None
- deeprm.qc.inspect_run.plot_polya_len(read_len_arr, mean_qual_arr, bq_thres, out_path)[source]ΒΆ
Plot poly(A) length distribution.
- Parameters:
read_len_arr (
numpy.ndarray) β Array of read lengths.mean_qual_arr (
numpy.ndarray) β Array of mean quality scores.bq_thres (
int) β Base quality threshold.out_path (
str) β Output directory path.
- Returns:
None
- deeprm.qc.inspect_run.plot_qual(mean_qual_arr, out_path, bq_thres=7, max_bq=30)[source]ΒΆ
Plot mean quality score distribution.
- Parameters:
mean_qual_arr (
numpy.ndarray) β Array of mean quality scores.out_path (
str) β Output directory path.bq_thres (
int) β Base quality threshold.max_bq (
int) β Maximum base quality score for plotting.
- Returns:
None
- deeprm.qc.inspect_run.read_bam_worker(args, pid, collect_dict)[source]ΒΆ
Worker function to read BAM file and collect statistics.
- Parameters:
args (
argparse.Namespace) β Parsed command line arguments.pid (
int) β Process ID.collect_dict (
dict) β Shared dictionary to collect results.
- Returns:
None