Gly-Pro-pNA br studies mutations in a gene have also been pr
studies, mutations in a gene have also been proposed to be studied in a matrix as an input for a kernel test [45,46]. In this study, we proposed using a framework which utilizes a regression model to pre-select de-leterious genes, nsNMF to decompose the matrix, SVM to train a clas-sifier, and then penalized regression to derive relevant genes. Following the careful tuning of parameters and models, we have proved that this is an effective model to classify cancers, derive relevant genes, and identify associated pathways.
To fully understand a disease, studying mutations using a full range of genes together is of critical importance. Complex traits are modified by multiple genes and multiple mutations together . Traditionally, NMF has been applied to study gene Gly-Pro-pNA [18,28]. In this study, we proposed using somatic mutations for cancer classification. Fur-thermore, we proposed generating the feature matrix by integrating both the basis matrix W and the coefficient matrix H. Moreover, we developed a novel method to derive effect scores from the feature matrix. Using this method, we obtained the association score of each gene with a particular cancer type enabling relevant pathway dis-covery. The discovered effect scores have a high potential to help us better understand the genetic pathophysiology behind cancer.
In this study, we proposed a novel strategy to study the genetic landscape of multiple cancers. In the future, we will use tensor factor-ization to integrate known pathways to guide the grouping of muta-tional variants  and use external cohorts to validate the proposed model. Furthermore, this generic process only requires the input of somatic mutations and a disease type of interest, without much domain specific knowledge. This strategy has the potential to be easily adapted and applied to other diseases as well.
This study was supported in part by grant R21LM012618-01 from the National Institutes of Health, Breast Cancer Research Foundation, and the Lynn Sage Cancer Research Foundation.
ZZ, SC, SK, and YL originated the study. ZZ and YL performed analyses and wrote the first draft of the manuscript. ZZ, CM, and AV annotated the dataset. SC and SK reviewed and helped analyze the findings. All authors discussed the results and revised the manuscript.
Declaration of Competing Interest
The authors have no competing interests to declare.
Appendix A. Supplementary material
for sequencing data with the sequence kernel association test, Am. J. Human Genet.
7 Accepted Manuscript
Cancer driver gene discovery in transcriptional regulatory networks using influence maximization approach
Majid Rahimi, Babak Teimourpour, Sayed-Amir Marashi
To appear in: Computers in Biology and Medicine
Please cite this article as: M. Rahimi, B. Teimourpour, S.-A. Marashi, Cancer driver gene discovery in transcriptional regulatory networks using influence maximization approach, Computers in Biology and Medicine (2019), doi: https://doi.org/10.1016/j.compbiomed.2019.103362.
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Cancer driver gene discovery in transcriptional regulatory networks using
influence maximization approach
Majid Rahimia, Babak Teimourpour a,*, Sayed-Amir Marashi b
a Department of Information Technology Engineering, Faculty of Industrial and Systems, Tarbiat Modares University, Tehran, Iran
b Department of Biotechnology, College of Science, University of Tehran, Tehran, Iran
*Corresponding author: Room No. 210, 2nd Floor, School of Systems and Industrial Engineering, Tarbiat Modares University
Abstract: Cancer driver genes (CDGs) are the genes whose mutations cause tumor growth. Several computational methods have been previously developed for finding CDGs. Most of these methods are sequence-based, that is, they rely on finding key mutations in genomic data to predict CDGs. In the present work, we propose iMaxDriver as a network-based tool for predicting driver genes by application of influence maximization algorithm on human transcriptional regulatory network (TRN). In the first step of this approach, the TRN is pruned and weighted by exploiting tumor-specific gene expression (GE) data. Then, influence maximization approach is used to find the influence of each gene. The top genes with the highest influence rate are selected as the potential driver genes. We compared the performance of our CDG prediction method with fifteen other computational tools, based on a benchmark of three different cancer types. Our results show that iMaxDriver outperforms most of the state-of-the-art algorithms for CDG prediction. Furthermore, iMaxDriver is able to correctly predict many CDGs that are overlooked by all previously published tools. Due to this relative orthogonality, iMaxDriver can be considered as a complementary approach to the sequence-based CDG prediction methods.