A Case Study For Protein Localization-Mining Pubmed Data Using Loctext
Keywords:
LocText, Protein Localization, PubMed, Text Mining.Abstract
Data sharing and web 2.0 have revolutionized the whole new idea and context of thinking about data. Generation and consumption of data has become integral part of human life. Data generated is in various formats and most widely accepted is text data. The biomedical literature is exploding and it is complexity increases to keep track of publications in relevant areas of interest. In this work we attempt to implement LocText on PubMed data to illustrate relationships between protein and sub-cellular locations. The aim of experimentation is to match the UniPortKB accession numbers, GO Cellular Component identifiers to NCBI taxonomy identifiers available in PubMed data. The experiments are performed on a commodity cluster that comprise of 16 machines installed with Ubuntu 16.04 Operating System, Java 8, Hadoop 2.7, Spark2.1, Elasticsearch, Nalaf, LocText and Kibana.