of assembling the genome worked week upon week to put the gene fragments in order, but the complete sequence was still missing.
By the winter of 2000, both projects neared completion—but the communications between the groups, strained at its best moments, had fallen apart. Venter accused the Genome Project of “a vendetta against Celera.” Lander wrote to the editors of Science protesting Celera’s strategy of selling the sequence database to subscribers and restricting parts of it to the public, while trying to publish yet other selected parts of the data in a journal; Celera was trying to “have its genome and sell it too.” “In the history of scientific writing since the 1600s,” Lander complained, “the disclosure of data has been linked to the announcement of a discovery. That’s the basis of modern science. In pre-modern times, you could say: ‘I’ve found an answer, or I’ve made lead into gold, proclaim the discovery, and then refuse to show the results.’ But the whole point of professional scientific journals is disclosure and credit.” Worse, Collins and Lander accused Celera of using the Human Genome Project’s published sequence as a “scaffold” to assemble its own genome—molecular plagiarism (Venter retorted that the idea was ludicrous; Celera had deciphered all the other genomes with no help from such “scaffolds”). Left to its own devices, Lander announced, Celera’s data was nothing more than a “genome tossed salad.”
As Celera edged toward the final draft of its paper, scientists made frantic appeals for the company to deposit its results in the publicly available repository of sequences, named GenBank. Ultimately, Venter agreed to provide free access to academic researchers—but with several important constraints. Dissatisfied with the compromise, Sulston, Lander, and Collins chose to send their paper to a rival journal, Nature.
On February 15 and 16, 2001, the Human Genome Project consortium and Celera published their papers in Nature and Science, respectively. Both were enormous studies, nearly spanning the lengths of the two journals (at sixty-six thousand words, the Human Genome Project paper was the largest study published in Nature’s history). Every great scientific paper is a conversation with its own history—and the opening paragraphs of the Nature paper were written with full cognizance of its moment of reckoning:
“The rediscovery of Mendel’s laws of heredity in the opening weeks of the 20th century sparked a scientific quest to understand the nature and content of genetic information that has propelled biology for the last hundred years. The scientific progress made [since that time] falls naturally into four main phases, corresponding roughly to the four quarters of the century.”
“The first established the cellular basis of heredity: the chromosomes. The second defined the molecular basis of heredity: the DNA double helix. The third unlocked the informational basis of heredity [i.e., the genetic code], with the discovery of the biological mechanism by which cells read the information contained in genes, and with the invention of the recombinant DNA technologies of cloning and sequencing by which scientists can do the same.”
The sequence of the human genome, the project asserted, marked the starting point of the “fourth phase” of genetics. This was the era of “genomics”—the assessment of the entire genomes of organisms, including humans. There is an old conundrum in philosophy that asks if an intelligent machine can ever decipher its own instruction manual. For humans, the manual was now complete. Deciphering it, reading it, and understanding it would be quite another matter.
* * *
I. Stretches of DNA associated with a gene called promoters can be likened to “on” switches for that gene. These sequences encode information about when and where to activate a gene (thus hemoglobin is only turned on in red blood cells). In contrast, other stretches of DNA encode information about when and where to turn a gene “off” (thus lactose-digesting genes are turned off in a bacterial cell unless lactose becomes the dominant nutrient). It is remarkable that the system of “on” and “off” gene switches, first discovered in bacteria, is conserved throughout biology.
II. Venter’s strategy of sequencing protein-encoding and RNA-encoding portions of the genome would, in the end, prove to be an invaluable resource for geneticists. Venter’s method revealed parts of the genome that were “active,” thereby allowing geneticists to annotate these active parts against the whole genome.
III. Estimating the number of genes in any organism is complicated and requires some fundamental assumptions about the nature and structure of a gene. Before the advent of whole-genome sequencing, genes were identified by their function. However, whole-genome sequencing does not consider the