This function evaluates the presence of cancer in a sample by combining the cancerous signal across a catalogue of candidate mutations.

call_cancer_indels(
  mutations_df,
  read_positions_df,
  read_positions_df_indels,
  model,
  model_indels,
  beta,
  beta_indels,
  alpha = 0.05,
  calculate_confidence_intervals = FALSE,
  use_turboem = TRUE
)

Arguments

mutations_df

A data.frame() with candidate mutations (SNVs) (chromosome, positions, reference and alternative)

read_positions_df

A data.frame() with read-positions. See get_read_positions_from_BAM().

read_positions_df_indels

A data.frame() with read-positions for indels. See get_read_positions_from_BAM_indels().

model

A dreams model. See train_dreams_model().

model_indels

A DREAMS model for indels. See train_dreams_model_indels().

beta

Down sampling parameter for correcting the error-rates from the DREAMS model.

beta_indels

Down sampling parameter for indels for correcting the error-rates from the DREAMS model for indels.

alpha

Alpha-level used for testing and confidence intervals. Default is 0.05.

calculate_confidence_intervals

Logical. Should confidence intervals be calculated? Default is FALSE.

use_turboem

Logical. Should turboEM::turboem() be used for EM algorithm? Default is TRUE.

Value

A list() with:

  • cancer_info A data.frame() with results for cancer calling across all mutations:

    tf_est

    The estiamted tumor fraction (allele fraction).

    tf_min, tf_max

    The confidence interval of tf_est.

    r_est, est_mutations_present

    The estiamted fraction/number of candidate mutations present in the sample.

    r_min, r_max

    The confidence interval of r_est.

    mutations_tested

    Number of candidate mutations tested.

    total_coverage, total_count

    Total count and coverage across all mutations (only reference and alternative allele(s).

    mutations_tested

    Number of candidate mutations tested.

    EM_converged

    If the EM algorithm converged.

    EM_steps, fpeval, objfeval

    Number of steps and function evaluations by the EM algorithm.

    ll_0, ll_A

    The value of the log-likelihood function under the null (tf=0) and alternative (tf>0) hypothesis.

    Q_val, df, p_val

    The chisq test statistic, degrees of freedom and p-value of the statistical test.

    cancer_detected

    Whether cancer was detected at the supplied alpha level.

  • mutation_info A data.frame() with information about the individual mutations:

    chr, genomic_pos

    The genomic position of the mutation.

    ref, alt

    The reference and alternative allele.

    P_mut_is_present

    The estimated probability the mutation is present in the sample.

    exp_count

    The expected count of the alternative allele under the error (null) model.

    count

    The count of the alternative allele.

    coverage

    The coverage used by the model (only referenceredas with and alternative allele).

    obs_freq

    The observed frequency of the alternative allele.