Identifying Mutation-induced Protein-Protein Interactions in Scientific Literature

Eric Zhe-You Chen, Onkar Singh, Toni Rose Jue, Chen-Kai Wang, Jitendra Jonnagaddala, Hong-Jie Dai

Abstract

Identifying mutation induced protein-protein interaction (PPI) is a prime concern in the advancement of precision medicine (PM). The aim of this study is to develop an automatic tool to identify mutation-induced PPI from biomedical literature. Identification of such interactions is important for understanding the complex network of pathways that make up heterogeneous diseases. In the work, we applied two machine learning algorithms to deal with the problem. The first is the support vector machine. We proposed features including n-gram and article-meta information and train two SVM models with the linear kernel. The second is the neural network. We proposed a new network structure based on the convolutional neural network (CNN) which integrates convolved context features from different paragraphs and handcrafted features for MeSH term information. The performance of the developed models was evaluated on the training set of the BioCreative VI PM document triage task. The SVM-based approaches achieved the best overall Fscore, while CNN models have better precision.In the future, we will release the tool and continue to improve the performance of the developed CNN models by replacing the randomized initialized embedding layer with pre-trained word vectors and including more meta-information with.

comments powered by Disqus