This function performs cancer detection based on indel information from a BAM file using a predefined list of mutations. It processes the mutations data and uses the provided model to make cancer calls.

dreams_cc_indels(
  mutations_df,
  bam_file_path,
  reference_path,
  model,
  model_indels,
  alpha = 0.05,
  calculate_confidence_intervals = FALSE,
  use_turboem = TRUE
)

Arguments

mutations_df

A dataframe containing the list of mutations to be analyzed.

bam_file_path

Path to the BAM file containing sequencing data.

reference_path

Path to the reference genome file, typically in FASTA format.

model

The model to be used for cancer detection.

model_indels

The model to be used for calling mutations for indels.

alpha

Significance level for statistical testing, default is 0.05.

calculate_confidence_intervals

Logical flag indicating whether to calculate confidence intervals, default is FALSE.

use_turboem

Logical flag indicating whether to use the turboEM algorithm, default is TRUE.

Value

A list() with:

  • cancer_info A data.frame() with results for cancer calling across all mutations:

    tf_est

    The estiamted tumor fraction (allele fraction).

    tf_min, tf_max

    The confidence interval of tf_est.

    r_est, est_mutations_present

    The estiamted fraction/number of candidate mutations present in the sample.

    r_min, r_max

    The confidence interval of r_est.

    mutations_tested

    Number of candidate mutations tested.

    total_coverage, total_count

    Total count and coverage across all mutations (only reference and alternative allele(s).

    mutations_tested

    Number of candidate mutations tested.

    EM_converged

    If the EM algorithm converged.

    EM_steps, fpeval, objfeval

    Number of steps and function evaluations by the EM algorithm.

    ll_0, ll_A

    The value of the log-likelihood function under the null (tf=0) and alternative (tf>0) hypothesis.

    Q_val, df, p_val

    The chisq test statistic, degrees of freedom and p-value of the statistical test.

    cancer_detected

    Whether cancer was detected at the supplied alpha level.

  • mutation_info A data.frame() with information about the individual mutations:

    chr, genomic_pos

    The genomic position of the mutation.

    ref, alt

    The reference and alternative allele.

    P_mut_is_present

    The estimated probability the mutation is present in the sample.

    exp_count

    The expected count of the alternative allele under the error (null) model.

    count

    The count of the alternative allele.

    coverage

    The coverage used by the model (only referenceredas with and alternative allele).

    obs_freq

    The observed frequency of the alternative allele.