CombFunc

CombFunc

Protein Function Prediction Server

About CombFunc

Full details about CombFunc can be found in our recent publication Wass MN et al., Nucleic Acids Research, 2012, 40:W466-W470. CombFunc is a Gene Ontology (GO) [1] based function prediction server. Users submit a protein sequence (and optionally its UniProt accession). CombFunc runs multiple analyses for the query sequence (details below) to obtain data that can be associated with protein function. This data is then combined using a machine learning approach (Support Vector Machines SVM) resulting in a function prediction.

The different analyses performed are listed below.

Method/Data Source	Description
ConFunc	ConFunc [1] is a sequence based method for the prediction of protein function. It identifies conserved residues present in alignments of proteins with the same GO annotatations and uses them to assign function to a a query sequence.
BLAST	BLAST [2] identifies sequences in a database that are homologous to a query sequence. Homologouse sequences often have similar functions, particularly at high levels of sequence identity [3] but the conservation of function is more variable below 85% sequence identity. Data from the BLAST hits is input into the SVM including features representing the e-value of the hit and the coverage of both the query and hit sequences.
Interpro	Interpro [4] identifies proteins domains/families present in the qeury protein. The domains are mapped to Gene Ontology functions and used as features in the SVM.
Pfam Domain Combinations	While individual domains can be mapped to GO functions, combinations of doamins occuring together can also be used [5].
Phyre2 fold library search	Phyre2 [6] is our in-house protein structure prediction server. In CombFunc the Phyre2 fold library is searched to identify structures that are homologous to the query sequence. Functionally annotated structures are identified and the information forms further features input into the SVM.
Protein-Protein interactions	It is possible to predict protein function by considering the functions of interacting proteins. For the query sequence MINT [7] and IntAct [8] are queried to identify interactions of the query sequence. In the simplest way the more frequently a function occurs in interacting proteins, the more likely it is to be a function of the query sequence. It is also possible to consider "indirect" interacting proteins [9], i.e. proteins that do not directly interact but they share the same interaction partner. Features used in the SVM include the proportion of direct and indirect interactors with the function.
Gene Co-expression Analysis	Co-expressed genes have also been observed to have similar functions. Co-expression data for the query is extracted from COXPRESdb [10]. The GO annotations of the co-expressed proteins are obtained and for each function the average Mutual Rank (a measure of co-expression usd in COXPRESdb) for the function identified.
3DLigandSite	3DLigandSite [11] is our in-house ligand binding site prediction server. Submissions are also sent to 3DLigandSite so that binding site predictions can be used in conjunction with the GO function predictions made by CombFunc.

References

1. Wass MN, Sternberg MJE (2008) ConFunc--functional annotation in the twilight zone. Bioinformatics 24:798–806.
2. Altschul SF et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389–3402.
3. Devos D, Valencia A (2001) Intrinsic errors in genome annotation. Trends Genet 17:429–431.
4. Hunter S et al. (2012) InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Res. 40:D306–D312.
5. Forslund K, Sonnhammer ELL (2008) Predicting protein function from domain content. Bioinformatics 24:1681–1687.
6. Kelley LA, Sternberg MJ (2009) Protein structure prediction on the Web: a case study using the Phyre server. Nat Protoc 4:363–371.
7. Ceol A et al. (2010) MINT, the molecular interaction database: 2009 update. Nucl. Acids Res. 38:D532–539.
8. Kerrien S et al. (2012) The IntAct molecular interaction database in 2012. Nucleic Acids Res. 40:D841–6.
9. Chua HN, Sung WK, Wong L (2006) Exploiting indirect neighbours and topological weight to predict protein function from protein-protein interactions. Bioinformatics 22:1623–1630.
10. Obayashi T, Kinoshita K (2011) COXPRESdb: a database to compare gene coexpression in seven model animals. Nucleic Acids Res. 39:D1016–22.
11. Wass MN, Kelley LA, Sternberg MJE (2010) 3DLigandSite: predicting ligand-binding sites using similar structures. Nucleic Acids Res. 38:W469–W473.

Mark Wass