Extract training data from BAM files

get_training_data(
  bam_paths,
  reference_path,
  bed_include_path = NULL,
  factor = 1,
  common_positions_to_exclude_paths = NULL,
  positions_to_exclude_paths = NULL,
  mm_rate_max = 1,
  verbose = F
)

Arguments

bam_paths

Vector of strings. Paths to .bam files to extract training data from.

reference_path

String. Path to reference genome fasta file.

bed_include_path

String. Path to bed-file with regions to include. Default is NULL.

factor

Number between 0 and 1. Ratio between negative and positive data. Default is 1.

common_positions_to_exclude_paths

Vector of strings. List of files with positions to exclude from all samples. Default is NULL.

positions_to_exclude_paths

Vector of strings. List of files with positions to exclude from training with length equal to number of samples. Default is NULL.

mm_rate_max

Number between 0 and 1. Maximum mismatch rate in position. Default is 1.

verbose

TODO: Write this

Value

A list containing two elements:

  • data: A tbl_df with dimensions 2 x 22.

  • info: A data.frame with dimensions 1 x 4.

See also

train_dreams_model() Function for training model.