CombFunc |
Protein Function Prediction Server |
Help |
This page explains how to use CombFunc with details on the submission options and how the results page should be interpreted. If you would like further details about the methods used see the About page. Multiple examples of output from CombFunc can be viewed on the Examples page. |
Submission Options |
All that is required to run CombFunc is a protein sequence to use as a query. This is input in the text area on the submission page. Additionally to run features that are not sequence based the UniProt accession must also be passed. Without including it the gene Co-expression and Protein-Protein interaction analyses are not performed. The other options allow the user to specify an email address that results can be forwarded to and to assign a description to their submission. The sequence should be submitted in fasta format: or just the amino acid code: |
Submission Progress |
Each submission runs multiple processes so they may take up to an hour to complete. It is therefore advisable to enter an email address when making a submission but this is not essential. While the job is running the progress is updated so that the user can see how many of the processes have completed. The progress table is shown to the right. The key at the top explains the colouring, with completed processes coloured blue and running processes coloured purple. If a process is off line then it is colour red. If a UniProt accession has not been submitted with the sequence then any processes that are not run are coloured grey. Combined prediction refers to the final CombFunc process of making a combined prediction using data from each of the other processes. |
Interpreting CombFunc Predictions |
Overall Function Predictions | |
After running the multiple processes that form CombFunc, they are all combined to give an overall Function prediction. Further details of how the data from the different processes are combined can be found in About. In CombFunc the predictions are split into the two different Gene Ontology (GO) categories used - Molecular function (which describes the biochemical function of the protein) and Biological Process (which describes the larger scale processes that the protein is part of). The predictions for these two categories are displayed separately in tables, and also as graphs to allow the user to explore how the predicted functions are related within the Gene Ontology graph. Each of these displays is explained below. The results displayed on this page can be viewed in full here |
Results Table The results tables display the Gene Ontology functions that have been predicted for the query sequence. The first column displays the GO term that has been predicted. The description of the GO term is displayed in the Description field. By passing the mouse over a row the definition of the GO term is displayed to the right of the table. Additionally the blue GO symbol next to the term and the description links to the Gene Ontology page for the function at geneontology.org. The third column displays the number of SVMs (out of 10) that predicted the term to be an annotation of the sequence. The average probability score from the SVMs is displayed in the fourth column and can be used as indicator of the confidence of the predicted term. The SVM probability is colour coded to indicate the level of confidence, the red predictions have the highest confidence and yellow the lowest. This colour coding is used in the other displays for the combined predictions and also for each of the individual sets of data (see below). |
The image view displays the predicted functions as a sub-graph of the gene ontology. This enables the user to see how the different predicted terms are related. These images can become very large so it is possible to zoom in on different area of the graph either using the mouse or by using the controls in the bottom right corner that control the zoom level and the movement of the image in the display. Again all of the predicted terms are coloured according to the confidence of their prediction. Parent terms are not coloured for clarity and descendent terms of the predicted terms are not disaplyed.
|
The list view displays similar information to the graph view but in a more compact way. Each of the predicted terms is displayed (coloured according to confidence of prediction). For each predicted term it is then possible to extend the list and view the parent terms of the prediction. The buttons are the top enable the complete list to be expanded using the "Expand All" button or collapsed using the "Collapse All" button. The image on the right shows the list partially expanded to displaying the parent terms of GTPase activity and GTP binding. The Gene Ontology pages for each function can be viewed by clicking on the blue GO next to the function description. |
Individual Analyses | |
Data from each of the individual analyses is displayed below the overall predictions. This enables the user to explore the data that was used to make the prediction. Like the overall predictions, the scores associated with the different functions identified are coloured coded to give an indication of the strength of the data in support of the function. Some of the analyses generate considerable data so the data from the analyses is hidden and can be disaplyed by clicking the link adjacent to each heading. The data displayed for each analysis is explained below. |
ConFunc Analysis
Z Score - For each function a Z score is calculated to obtain the significance of the result for that function. Z score ratio - The Z scores for each function are compared to the maximum Z score obtained for the functions present for the sequence. In this example the "protein binding" function had the highest Z score and so the Z score ratio is calculated by dividing the other Z scores by the "protein binding" Z score.
NumberSequences - The number of sequences homologous to the query that are identified by ConFunc and used for making the prediction. |
BLAST Analysis The BLAST analysis results display the top 3 GO annotated sequences for the both the GO Molecular Function and Biological Process categories. The details of the columns are described below:
Hit Acc - The UniProt accession of the BLAST hit. This is also a link to the uniprot page for this protein. e-value - The BLAST e-value for the hit. The lower the e-value the more significant the match between the two sequences is. %seq id - The sequence identity between the query and the hit sequence. Query coverage - The percentage of the query sequence that is aligned with the hit sequence. hit coverage - The percentage of the hit sequence that is aligned with the query sequence. The final two columns list the Molecular Function and Biological process annotations of the hit respectively.
|
Intperpro Analysis
Grpahical View - Each of the intepro hits is displayed along the length of the sequence to give an indication of where on the query sequence the domain hits occur. The hits are coloured according to their e-value. |
|
Pfam Domain Combinations Analysis |
|
Phyre2 Fold Library Search Hits to the fold library are displayed in a tabular format as shown on the right. The columns are explained below. The results table displays up to 100 hits.
|
|
Protein-Protein interaction analysis
|
|
Gene Co-expression Analysis
| |
3DLigandSite Submission |
References
Submit About News Help Example Contact Disclaimer |
© Structural Bioinformatics Group, Imperial College, London |
Mark Wass |