Olga Dudchenko, is a post-doc in the Department of Molecular and Human Genetics in the Erez Aiden lab at Baylor College of Medicine.
What is the biggest question facing your field today?
To come up with a cost-effective way to generate end-to-end genome assemblies. For the human genome, this would entail construction of 46 strings of genomic sequence representing the 23 pairs of chromosomes, end-to-end. To this day, more than 10 years after the conclusion of the Human Genome Project, we are still relatively far from achieving this objective: while several exemplary human genomes have been, at a great expense, brought about to a high degree of completeness, a more typical high-throughput genome project will produce hundreds of thousands of fragments of short intervals, known as “contigs,” whose order and orientation relative to one another are unknown, and whose chromosome of origin is a mystery.
Why is it significant?
End-to-end assemblies would allow for much more meaningful and robust comparisons between the genomes than fragmented assemblies. An example of the biological insights that become possible with better assemblies can be found in our recent paper (Dudchenko et al., 2017). In this work we put together several mosquito genome assemblies including that of the Aedes aegypti – a mosquito that can spread the dengue fever, chikungunya and the Zika virus. We show that genomic rearrangements during mosquito evolution, unlike that of mammals, occur within, rather than in between, chromosomes, a result that could not be tackled without chromosome-length assemblies. Better genome assemblies would also allow for better recognition of functionally important elements such as genes.
Where will the answer likely come from?
A combination of long-read technologies such as SMRT sequencing and in situ 3d mapping methods such as Hi-C. The former show great potential in resolving complex regions that were impossible to put together using other sequencing methods, but require for DNA to be extracted from the cell for analysis. As such, long-read sequencing is limited by DNA fragmentation upon the extraction procedure. In situ Hi-C captures information while still in the intact nucleus, thus providing the signal to link the genome assemblies into whole chromosomes.