This function performs variant calling on indels from a given BAM file using a predefined list of mutations. It processes the mutations data, batch processes the genomic positions, and calls mutations using a specific model.

dreams_vc_indels(
  mutations_df,
  bam_file_path,
  reference_path,
  model,
  model_indels,
  alpha = 0.05,
  use_turboem = TRUE,
  calculate_confidence_intervals = FALSE,
  batch_size = NULL
)

Arguments

mutations_df

A dataframe containing the list of mutations to be analyzed.

bam_file_path

Path to the BAM file containing sequencing data.

reference_path

Path to the reference genome file, typically in FASTA format.

model

The model to be used for calling mutations for SNVs.

model_indels

The model to be used for calling mutations for indels.

alpha

Significance level for statistical testing, default is 0.05.

use_turboem

Logical flag indicating whether to use the turboEM algorithm, default is TRUE.

calculate_confidence_intervals

Logical flag indicating whether to calculate confidence intervals, default is FALSE.

batch_size

Number of positions to process in each batch; if NULL, it's determined based on the data.

Value

A data.frame() with information about the individual mutation calls, including:

chr, genomic_pos

The genomic position of the mutation.

ref, alt

The reference and alternative allele.

EM_converged

If the EM algorithm converged.

EM_steps, fpeval, objfeval

Number of steps and function evaluations by the EM algorithm.

tf_est

The estiamted tumor fraction (allele fraction).

tf_min, tf_max

The confidence interval of tf_est.

exp_count

The expected count of the alternative allele under the error (null) model.

count

The count of the alternative allele.

coverage

The coverage used by the model (only referenceredas with and alternative allele).

full_coverage

The total coverage of the position (for reference).

obs_freq

The observed frequency of the alternative allele.

ll_0, ll_A

The value of the log-likelihood function under the null (tf=0) and alternative (tf>0) hypothesis.

Q_val, df, p_val

The chisq test statistic, degrees of freedom and p-value of the statistical test.

mutation_detected

Whether the mutation was detected at the supplied alpha level.