SPRENO: A BioC Module for Recognizing and Normalizing Species and Their Model Organisms

Onkar Singh, Hong-Jie Dai

Abstract

Name entity recognition is a key step in a biomedical text mining task. This becomes more critical and challenging due to the availability of huge amount of biomedical literature. To recognize and identify species become difficult for the domain experts due to the vagueness of the abbreviated term used for model organisms/strains. In this study, we present our species recognition tool─SPRENO (Species Recognition and Normalization) developed for recognizing organism terms mentioned in figure captions. SPRENO is an extension of our previous species recognition tool developed for the BioCreative V BioC task. We developed new algorithms optimized for normalizing organism terms mentioned in figure captions, which consider the contextual information from the corresponding full text. Furthermore, two disambiguation methods are developed to determine the ID of ambiguous organism mentions. One is based on the majority rule to select the ID that has been successfully linked to previously mentioned organism terms. Another is a convolutional neural network model trained by learning the context and the distance information of the target organism mention. We participated the BioCreative VI BioID task and submitted three runs for the assessment of the developed tool. The best micro Fscores achieved by SPRENO on the test set are 0.776 (entity recognition) and 0.755 (entity normalization).

comments powered by Disqus