The $1k Genome

Over 200 scientists collaborated on the Human Genome project to sequence all of the approximately 3 billion base pairs found in our DNA. The project cost $3 billion and took eleven years to complete, and it signaled a new era focusing on exploring the specific relationship between genes and biology. In order to properly ask and answer questions about the way genes impact our health, we need access to the genomes of many thousands or even millions of people. Because of this, the utility of sequencing technology has always been tied to its cost. A long-term goal has been to drive the price of sequencing down to $1000 per genome, with the thought that this is the value that would allow full genome sequencing to become routine practice.

The Human Genome Project was accomplished using a method called Sanger sequencing. This technique is rather labor intensive and slow since it necessitates making many copies of DNA that are subsequently sequenced by fluorescent nucleotides. Almost all of the work for the Human Genome Project was done by Applied Biosystems in California. Since they were the only sequencing provider for the project, there was basically no incentive for them to iterate or improve their technology.

After the Human Genome Project wrapped, the US National Human Genome Research Institute (NHGRI) began funding the Advanced Sequencing Technology grants. These awards came to be known as the $1000 and $100,000 genome programs and were focused both on creating new sequencing methods and on efforts to make these technologies suitable for regular commercial use. These grants helped fund the development of pyrosequencing, where many sequencing reactions are run on a solid platform, and nanopore/nanograph sequencing, where bases are sequenced as they are threaded through a pore (eliminating the need to generate many copies of the DNA beforehand).

The NHGRI grants were important for several reasons. First, they prevented a stagnation in sequencing research after the end of the Human Genome Project in 2003. Second, they created competition in the sequencing space by funding many different projects and nurturing a large community of scientists with different ideas. Finally, they made investing in sequencing projects more appealing to private investors. It is very important to note that the work that has been done towards the $1,000 genome would not have been possible just with these grants, but it is unlikely that all the subsequent funding supporting these programs would have been available without them.

Right now, Illumina stands as the market leader in sequencing (thanks in part to their acquisition of multiple companies that had been funded by Advanced Sequencing Technology grants). Their HiSeq X Ten platform is capable of producing 1.8 terrabases of data (16 human genomes) at 30x coverage in a 3 day run and heralds the dawn of “full coverage human genomes for less than $1000.”*
*This is amazing, but certainly doesn’t mark the end of the race towards affordable sequencing since the HiSeq X Ten system is available only as a combination of at lease 10 HiSeq X system, costing at least $10 million.

Now that genomics research no longer has a limiting economic bottleneck, the question becomes how we deal with all the data we suddenly have at our fingertips. Mapping abstract data to real-life biological processes represents an entirely new kind of challenge that cannot be solved simply by the generation of more genome sequences. The skills cultivated to design and execute benchwork experiments aren’t relevant for this kind of data analytics, and there is going to be a need for more biologists who are also programmers and statisticians. In the future, it will be important to tease out the relationships between genes and biology that inform impactful solutions to healthcare problems, and this will only be possible with scientists who understand the computational tools as well as the biological problems.