SYM-17-03

Representing the human genome with synthetic controls

TR Mercer1,2

  1. Garvan Institute for Medical Research
  2. St Vincents Clinical School, Faculty of Medicine, UNSW

DNA has an inherent directionality from 5 to 3. By reversing the orientation to the 3 to 5 direction, we generate a synthetic mirror (or chiral) sequence that is distinct to the original sequence, yet retains the identical nucleotide composition and alignment attributes. This approach can be used to design potent synthetic DNA spike-in standards that behave as isomers to, and thereby faithfully emulate, any feature of the human genome. DNA standards are added to a genomic DNA sample prior to sequencing, and the resultant reads are aligned to a combined index comprising both the human reference genome and a mirrored synthetic chromosome, to which reads from DNA standards exclusively align. Spike-in-derived reads are thereby partitioned from sample reads, enabling their use as internal quantitative and qualitative controls that do not interfere with the accompanying sample. We have demonstrated the use of DNA standards to represent a range of genome features, including common variation, germline and somatic mutations, structural rearrangements, repeat DNA, and immune clonotypes. In addition, we have developed spliced synthetic RNA standards and fusion-gene standards that recur in cancer. By titrating these standards at different concentrations into a mixture, we can establish internal and independent reference ladders and emulate quantitative features of genome biology, such as allele frequency, copy number variation, gene expression and alternative splicing. We have validated the design and performance of DNA standards by comparison to examples in the NA12878 reference genome, and demonstrate their use during the detection and quantification of variants in whole genome and targeted sequencing. We similarly show the utility of RNA standards to assess gene discovery, assembly, and expression profiling with RNA-Seq. Together, this illustrates how spike-in standards constitute a simple and effective sample-specific method to measure sensitivity and precision, and assess the diagnostic performance of sequencing tests.