vergessen, hier mal was gemacht wird/wurde mit all den infos.
leider in englisch
As we welcome FightAIDS project to the grid I thought I'd recap the HPF project and give an update.
1. We've finished what we projected we'd do and are continuing to include new additional genomes each day as we share the grid with our new grid project FightAIDS.
2. We're planning a second phase dirrected agains Human secreted proteins, canser biomarkers and malaria proteins.
3. We're beta testing the database of function and structure annotations and will soon release it to the general public and publish the paper. The first paper to come out of the project is in late draft stages.
Here is what we've got so far. As new results come in (newly generated on the grid) we expand the set of biologists that can benifit from the HPF function database.
THE RESULTS OF THE FIRST PHASE OF THE PROJECT IN GREATER DETAIL:
Human Proteome Folding Project has now produced a database that describes the structure of ~120,000 protein domains. The structure of these protein domains is now available for the first time. This work is being carried out in parallel to experimental structural genomics projects and represents an important component to our effort to understand the structure of proteomes. These structures span over 80 genomes including key model systems in all domains of life. These proteins include key proteins of unknown function in pathogens, cancer biomarkers and proteins essential to cellular processes in model organisms. Here we describe the scientific results of the project from the perspective on proteins and biology and tools for biologists. Further details about the actual high-performance and grid computing advancements that the project entailed are not discussed here.
The database: unprecedented annotation comprehension and new research on data-integration and visualization.
The data base of information on the proteins will be available upon publication. Our collabora-tors at the University of Washington and the Institute for Systems Biology are currently beta-testing the database. First users include Dave Goodlett at the Dept. of Medicinal Chemistry at the UW. He is examining several pathogenic bacteria and usign our structure predictions and the integrated database as a means of understanding the function of proteins that he identifies as important for pathogenisity and host-specification (e.g. why does bacteria-A infect mice and bacteria-B infect humans?). Workers at the UW are also exploring our results for Yeast, a key model organism that is the center of a gigantic global research effort.
Researchers investigating cancer at several partner institutions including the ISB are al-ready prepared to use this information to better understand proteins that are key to cancer biology. Several proteins are found at dramatically increased levels in cancer tissues. Oncolo-gists or pathologists can use these proteins as so-called cancer biomarkers and much research is currently underway to use protein and mRNA measurements to stratify cancer. In any case, this research has produced a number of proteins that are indicators of cancer, but are of un-known function. By exploring the human database of integrated structure predictions workers will get closer to understanding the function of these important proteins. Figure 2 shows the first page a researcher will see after finding their protein in the database. Many types of infor-mation about protein function (in addition to the structure/function integration from the grid) are presented on this interface. We hope to continue developing this portal and make it as in-tuitive as possible.
The sheer number of datatypes used in this study presents several data-visualization chal-lenges. Because a protein’s fold or shape does not always trivially equate to function we will provide functional association networks and integration with many datatypes for all genomes folded as part of this project. Biologists can explore the function/fold data generated by HPF in the context of these networks (metabolic, operons, phylogenetic pathways, protein-protein in-teractions, etc). Regardless of the type of functional associations we integrate with the take home message is that we want to give people a lot of different options for using/exploring the results of this project and this cutting edge graphical front end is one of them.
To address many of these challenges we have developed a graphical front end that plugs in to the Cytoscape platform allowing graphical access to the data generated ( www.cytoscape.org
). This software is being developed here at the ISB by Iliana Avila-Campillo, Iliana is one of the core Cytoscape developers and has been with the project at the ISB for a long time, so we’re lucky to have very good support for our diverse data visualization needs.
We will continue this project with a second phase. The second phase will be called Human Pro-teome Folding 2 (HPF2, we could of thought up a more creative name, i know...) would take important proteins with interesting novel predictions from HPF1 and refine those predicted structures (with something we call all-atom mode) to a higher level of resolution/accuracy (HPF1 == fold resolution, broad fold-function survey; HPF2 == higher res for more detailed conclusions). We will focus on cancer biomarkers, proteins expressed at key times in the infec-tion cycle of malaria and human secreted proteins. We will use a different mode of the Rosetta program to generate higher resolution structures (refining predictions from the first round with more accurate but also more computationally demanding methods). With this focus we hope to trade comprehension (HPF1) for detail (HPF2). HPF1 can be thought of as a resourse and a broad function survey while HPF2 can be thought of as drilling deeper for a ultra-important subset of proteins.