Translating The Human Genome Into Protein Function: Structural Genomics Research At Columbia
New York, N.Y. – Columbia researchers are making a major contribution to the rapidly expanding field of structural genomics, an endeavor that relates protein structures to the genetic sequences that encode them. The effort may reveal rules that could be used to predict the shape of a protein from its sequence. Determining the structure of a protein may help explain its function as well as how it may be affected by both normal physiological and pathological processes. Detailed structural information also enables researchers to model a protein’s interactions with a drug candidate, greatly aiding in pharmaceutical design. While the Human Genome Project has yielded the sequences of genes for a vast number of proteins, their actual structures have yet to be determined. Therefore, determining the 3-D structure of isolated proteins constitutes an important part of a new structural genomics program funded by National Institute of General Medical Sciences (NIGMS). “The idea is to try to find a large number—ultimately, representative of all different types of sequence.” Dr. Wayne Hendrickson, professor of biochemistry and molecular biophysics and leader of the Columbia team, says the goal of the initiative is 10,000 structures in 10 years. “The optimism for being able to do this is based on advances in techniques. “It’s a cross-disciplinary project. The idea is to carry out analysis of protein structure on a pan-genomic scale, bringing together capability in experimental determination of structure and bioinformatics,” says Dr. Hendrickson. Columbia faculty involved in the research are professors Barry Honig, Eric Gouaux, Arthur Palmer and Burkhard Rost in biochemistry and molecular biophysics; John Hunt and Liang Tong in biological sciences; Andrew Laine in biomedical engineering; Peter Allen in computer science; and Ann E. McDermott in chemistry, biological sciences, and chemical engineering and applied chemistry. The Northeast Structural Genomics Consortium, which combines forces of Columbia researchers with others from New York, New Jersey, Connecticut, Washington State, and Ontario, Canada, is one of seven pilot research centers in structural genomics awarded five-year research grants by NIGMS. Administered by Gaetano Montelione, Rutgers University, the grant enables researchers to develop techniques in both X-ray crystallography and NMR spectroscopy for determining protein structures on a large scale. In the first round of funding announced Sept. 26, Columbia received $8.5 million, the largest individual share for the Northeast Consortium. “One structure method is X-ray crystallography,” Dr. Hendrickson says. This first requires purifying enough of a protein to form a crystal, an ordered stacking of the protein molecules. A beam of X-rays is then directed toward the protein crystal. The X-rays are bent, or diffracted, by features of each protein molecule in a predictable way. Analyzing the resulting X-ray diffraction patterns yields the protein’s structure. Because efforts to determine protein structure are limited by researchers’ abilities to obtain and purify enough of the protein for a sizeable crystal, Dr. Hendrickson outlines a number of technical innovations that boost researchers’ ability to determine structures. “With minor adaptations, the main plan is to use selenomethionine with multiple-wavelength anomalous dispersion (MAD) phasing,” says Dr. Hendrickson. “MAD phasing enhances the amount of data that can be obtained from a protein sample. It takes advantage of interactions between X-rays and electron orbitals,” the negatively charged subatomic-particle ‘traffic patterns’ that constitute important physical features of atoms. “In order to do MAD phasing, relatively heavy atoms are required. Here we use living cells to make proteins that incorporate selenomethionine, which has selenium in place of the sulfur atom in methionine,” one of the amino acids that make up proteins. The switched elements in the amino acid do not appear to affect the protein’s shape or physiological function. Synthesis and crystallization of proteins can take place at Columbia, but X-ray crystallography depends on outside facilities. Says Dr. Hendrickson: “This all requires synchrotron beams. MAD phasing can only be done at a synchrotron source. We’ve developed a facility with Howard Hughes Medical Institute funding at Brookhaven National Lab on Long Island.” A synchrotron is a special facility that accelerates beams of electrons and bends them with powerful magnets arranged in a large ring. At each bend of the ring, several types of radiation, including X-rays, are produced. An innovation at another synchrotron site in Illinois maximizes ability to determine structures from very small crystals. “The Advanced Photon Source at Argonne National Lab incorporates an undulator, which allows production of very intense pulses of X-ray radiation.” The undulator comprises a set of magnets specially configured to focus the X-rays. “When you set up high-throughput crystallography, you need to automate. One aspect is to develop robotics for automation and rapid throughput at these facilities. Some of the sub-projects are to grow crystals en masse. John Hunt and Liang Tong in biology are working together with the collaborators in engineering on aspects of this process.” “The goal is to determine as many structures as possible. But even then, we’ll still have many fewer structures than primary sequences,” says Dr. Barry Honig, professor of biochemistry and molecular biophysics. While exhaustive crystallography on every protein encoded by known gene sequences may be impractical, bioinformatic computing approaches by researchers like Dr. Honig and Dr. Burkhard Rost may greatly extend the benefits of structurally analyzing a set of representative proteins through sophisticated sequence comparisons and modeling of unknown structures. “Our research is designed to develop methods to predict structures based on experimentally determined related structures. Our goal is to be able to do homology modeling—building a hypothetical prediction of 3-D structure.” By recognizing patterns between gene sequences and the structures of proteins they encode, the researchers hope to deduce rules to predict an unknown protein structure based on its sequence. Says Dr. Hendrickson: “All of this is based on the notion that there is a unifying, underlying basis of structure in proteins in organisms as diverse as humans and bacteria. Knowing that a human protein of interest has homologs in bacteria, flies and worms may allow us to gain appreciable insight into that human molecule from the structure of any one of the family members.” “Three-dimensional structure is a starting point, but not enough—there are other properties that are important, says Dr. Honig. “For example, we use theoretical calculations of electric fields to predict how proteins interact with one another.” Electric charges on regions of a protein can be as important as the protein’s shape in determining what molecules it interacts with or its enzymatic activity. “One problem with the design of new pharmaceuticals is specificity. How do you affect one kinase, and not others? This all comes down to specific interactions between proteins, and between proteins and drugs.” Dr. Honig typically works from structures determined with X-ray crystallography. The position of electrical charges is inferred from the structure and is used as a basis of calculating electric fields around the protein. Potentially, homology modeling from primary sequence information might allow development of structural and functional models without the need for crystal structure determination, at least in some cases. Dr. Hendrickson admits, “There are some that are opposed to this high-throughput approach because it’s not hypothesis-driven.” But that’s just what many researchers said about the U.S. Human Genome Project only a few years ago.
###