Wass Group - Computational Biology
Research in the Wass group considers two main elements:
1. Development of computational biology methods
Method development in the group has a basis in structural bioinformatics, often combining this with machine learning. The advent of high thrroughput technologies such as
next generation sequencing have results in large volumes of data that are not characterised. For example the UniProt protein seuqence databse currently contains 148 million
protien sequence but for most of these proteins their structure and function is unknown. Similarly with the sequencing of many individuals we now have extensive knowledge of
genetic variants that occur in people but for most of these variants we do not know if they have a functional effect.
Modelling small molecule binding sites in proteins
Knowledge of the location of ligand binding sites (such as active sites or cofactor binding sites) are important to aid our understanding of proteins.
We have developed the 3DLigandSite to address this problem -
Wass et al., 2010, Nucleic Acid Res,38, W469–W473
Users can submit either a protein sequence or a structure to 3DligandSite. Where a sequence is submitted the first step is to model the protein structure using Phyre2.
3DLigandSite identifies structures present in the protein databank that are homologous to the query protein that have bound ligands. These ligands are superimposed onto the
onto the structural model of the query protein and used to predict the binding site. The method was developed based on our successful predictions in the CASP8
(Critical Assessmet of protein Structure Prediction) - Wass and Sternberg, M.J. (2009) Proteins, 77 Suppl 9:147-51
Inferring protein function
Less than 1% of the 148M proteins in UniProt have experimentally characterised functions that are recorded in the Gene Ontology. We have therefore developed computational methods to infer protein function. CombFunc is a machine learning approach that combines features/data from multiple sources to infer protein function and it includes Confunc, our original conservation based method for inferring protein function. Both methods have performed well in the international critical assessment of functional annotation (CAFA), with ConFunc ranked 4th in CAFA1 for prediction of Eukaryotic protein function and CombFunc ranked in the top 10 methods - CAFA2 assessment paper
Modelling protein structure proteins
Protein structures have been solved for even fewer proteins than those with annotated functions. Modelling protein structure is therefore an important task and we have been involved in the development of the Phyre2 webserver.
Predicting the effect of single nucleotide variantsThe 1000 Genomes project identified that each of us has between 4-5 million genetic variants that differ from the reference genome. It is now important to identify those that have a functional effect and result in phenotype, especially those that are associated with disease. To address this we developed VarMod a machine learning based method for predicting if non-synonymous single nucleotide variants are functional. VarMod uses structural modelling and analysis of protein-protein interfaces and protin-ligand binding sites to identify SNVs that have functional effects. This builds upon our research demonstrating that disease associated SNVs frequently occur at protien-protein interfaces - David et al., 2012, Human Mutation,33, 359–363
2. Using computational biology to address important biological questions
Idenityfing molecular determinant of virus pathogenicity
Over the past few years we have been interested in identifying the molecular determinants of Ebola virus pathogenicity. This work was driven by the 2013-16 Ebola virus outbreak in West Africa,
which resulted in more than 28,000 cases and 11,000 deaths. The virus was also widely sequenced during this outbreak, making our research possible. We have focussed on comparison of Reston virus,
the only species of Ebolavirus that does not cause disease in humans, with the four species that are known to cause disease. Our work has identified a small set of amino acid differences between
these species that we propose are responsible for the difference in pathogenicity. Our main hypothesis is that differences in the protein VP24 are critical to determining host-specific
pathogenicity. Our original study (Pappalardo et al., 2016) used the 196 genome sequences that were available at the time.
Our findings were also supported by molecular dynamics simulations of VP24 (Pappalardo et al., 2017).
We have recently updated (Martell, Masterson et al., 2019) our analysis using more than 1,400 genome sequences and found
that our results were reproduced with this much larger dataset, providing confidence that our approach is robust even with the small number of sequences originally used.
Studying cancer cell evolution to understand acquired drug resistanceDrug resistance is a common problem during cancer treatment, often a tumour initially responds to treatment with a drug only for the tumour cells to evolve over time making them resistant to further treatment with the same drug. In this work we collaborate extensively with Prof Martin Michaelis using the Resistance Cancer Cell Line Collection (RCCL), a collection of more than 1500 cancer cell lines that have been adapted to anti-cancer drugs and which we use as a model to study the mechanisms of acquired drug resistance and identify biomarkers of drug sensitivity/resistance. We combine omics data (including exome sequencing and transcriptomics) with screening of the cell lines against a panel of drugs to compare the parental cell lines with their drug resistant sub lines.
Protein evolution - Adaptation of myosin with increasing body size
Analaysis of genetic variationMartell et al., 2017 - this is the cystinuria paper
Identifying the functions present in the minimal bacterial genomeIdentifying molecular determinants David et al., 2012, Human Mutation,33, 359–363
Analysis of disease causing non synonymous SNPs