Cerulean Team 2018
Graph Clustering Toolkit for Biological Data Analytics
The goal of this project was to research and implement graph clustering algorithms based upon pairwise interactions between entities. Such data sets arise in disciplines as varied as natural language processing, biological systems, or social networks. In particular, we investigated protein-protein interation networks and clustering algorithms designed for this type of data, such as Spectral Clustering based on a PPI-specific metric called the Diffusion State Distance. Researchers generally classify proteins according to their functional categories through repeated experiments. Using unsupervised learning methods for this task helps speed up the process significantly, at the cost of reduced accuracy. The main goal of research in this field is to improve both speed and accuracy simultaneously.
Our project consists of a suite of computationally efficient graph clustering algorithms in Matlab and Python. While these algorithms were primarily intended for protein interaction networks, they are also suited for any general pairwise interaction data set. We designed an easy-to-use, intuitive graphical user interface to introduce bioinformatics researchers and machine learning enthusiasts to this important field of research.
Related Tech Notes
- Graph Embedding for Dimensionality Reduction by Anuththari Gamage
- Protein-Protein Interaction Networks: Predicting Protein Functionality by Brian Rappaport