🚀 Quickstart

  • For demonstration purposes, you can use examples POD5 and BAM files provided in the examples directory of the repository.

  • You can also use your own POD5 and BAM files.

RNA Modification Detection

  • Estimated time: ~1 hours

1️⃣ Prepare data

deeprm call prep -p inference_example.pod5 -b inference_example.bam -o <prep_dir>
  • (Alternative) To supply your own POD5 file:

    dorado basecaller --reference <ref_fasta> --min-qscore 0 --emit-moves rna004_130bps_sup@v5.0.0 <pod5_dir> \
    | tee >(samtools sort -@ <threads> -O BAM -o <bam_path> - && samtools index -@ <threads> <bam_path>) \
    | deeprm call prep -p <pod5_dir> -b - -o <prep_dir>
    
    • If Dorado fails due to “illegal memory access”, try adding --chunksize <chunk_size> option (e.g., chunk_size=12000).

2️⃣ Run inference

deeprm call run -b inference_example.bam -i <prep_dir> -o <pred_dir> -s 1000
  • Adjust the -s (batch size) parameter according to your GPU memory capacity (default: 10000).

  • Expected output file:

    • Site-level detection result file (.bed)

    • Molecule-level detection result file (.npz)

Model Training

  • Estimated time: ~1 hours

1️⃣ Prepare unmodified & modified training data

deeprm train prep -p training_a_example.pod5 -b training_a_example.bam -o <prep_dir>/a
deeprm train prep -p training_m6a_example.pod5 -b training_m6a_example.bam -o <prep_dir>/m6a

2️⃣ Compile training data

deeprm train compile -n <prep_dir>/a/data -p <prep_dir>/m6a/data -o <prep_dir>/compiled

3️⃣ Run training

deeprm train run -d <prep_dir>/compiled -o <output_dir> --batch 64
  • Adjust the --batch parameter according to your GPU memory capacity (default: 1024).

  • Expected output file:

    • Trained DeepRM model file (.pt)