Inference of gene regulatory networks from Microarray time series data by dynamic Bayesian networks

Number of pages: 87 File Format: word File Code: 31027
Year: 2012 University Degree: Master's degree Category: Computer Engineering
  • Part of the Content
  • Contents & Resources
  • Summary of Inference of gene regulatory networks from Microarray time series data by dynamic Bayesian networks

    Master's Thesis in Computer Engineering-Artificial Intelligence

    Abstract

    Deduction of Gene Regulation Networks from Microarray Time Series Data by Dynamic Bayesian Networks

    Genetic regulatory networks are a set of gene-gene relationships that establish cause and effect relationships in gene activities. Our knowledge about these networks plays a very effective role in understanding biological processes and can lead to the discovery of new methods for the treatment of complex diseases and the production of effective drugs. Many methods have been proposed to detect genetic regulatory networks. In the meantime, dynamic Bayesian networks have special advantages that have attracted a lot of attention. Despite the research done in this field, reverse engineering of gene regulatory networks by dynamic Bayesian networks is by no means obvious. Often, the number of samples available for training the model is much less than the number of unknowns of the problem. Also, the high complexity of these models and their accuracy are among their most important shortcomings.

    One of the main methods used to increase the accuracy of inferred networks is the use of basic knowledge about gene regulatory networks. One of the major sources of this basic knowledge is our knowledge about the overall structure of gene regulatory networks. The researches show that the number of edges in these networks is small. Also, many evidences have been obtained that show that the distribution of the output degree in gene regulation networks follows the power law. In fact, these networks are scale-free in output degree. Despite this evidence, the methods of learning dynamic Bayesian networks consider such networks as networks with a random structure or only control the complexity of the network. The proposed method has a polynomial time complexity and can be used to infer networks with a large number of nodes. The experiments that have been conducted to compare the ability of the proposed algorithm with previous network learning methods show that the proposed algorithm, when used to infer networks that are scale-free, is able to significantly increase the quality of the inferred network, especially when the training data is insufficient.

    Key words: Bayesian dynamic networks, gene regulation networks, Scale-Free structure

    Chapter One

    Introduction

     

    In every cell of a living organism, every moment, thousands of genes are connected to make complex biological processes possible. Genetic regulatory networks [1] are a set of DNA parts in the cell that are indirectly (by RNA or protein production) connected with each other and other substances inside the cell, thereby controlling the transcription speed [2] from genes to form mRNA. Each mRNA molecule produces a specific protein with a specific function. Some proteins are used only to turn genes on or off. Such proteins are called transcription factors[3] and play the main role in the gene regulation network. In other words, the genetic regulatory network is a set of gene-gene connections that creates a cause and effect relationship in gene activities. Our knowledge about these networks plays a very effective role in understanding biological processes and can lead to the discovery of new methods for the treatment of complex diseases and the production of effective drugs. Therefore, the detection and reverse engineering of genetic regulatory networks has become one of the most important research fields [1]. Microarray is a technology that has created the ability to simultaneously measure the expression level [4] of mRNA related to thousands of genes and can provide us with information about the relationship of genes at the genome level [2]. But there is no simple solution to detect genetic regulatory networks from microarray data. In most cases, the number of unknowns is very large. This is despite the fact that we have a small amount of data.. Also, in many cases, the error rate in existing measurements is high, or we are facing the problem of lack of measurement for some variables.

    Microarray data can be divided into two types: static[5] and time series[6]. The first mode is an image of the expression of genes in a specific moment and condition. In the second case, the expression of genes in an intracellular process is measured over time. These time series reflect intracellular dynamic processes. Most of the early methods that were used to analyze microarray time series data were actually methods that were designed for static data. In the last few years, methods for working with time series data have been specifically proposed, which are able to solve the problems that are specific to time series data, and also use the unique features of this type of data. However, working with time series data requires more subtlety and precision than static data, and the reverse engineering of genetic regulatory networks is more difficult in these cases.

    Many methods have been proposed to detect genetic regulatory networks, the most important of which are: Boolean networks [3], random Boolean networks [4], differential equations [5] and Bayesian networks [7] [6]. In the meantime, Bayesian networks, which are able to express the cause and effect relationship between variables based on probabilistic relationships, have attracted a lot of attention. Due to the noise of Microarray data, the use of probabilistic models can greatly increase the efficiency of the model. Despite the relative success of Bayesian networks, the impossibility of loops [8] in these networks limits their efficiency in many cases because feedback loops [9] are common in real genetic regulatory networks. Therefore, when dealing with time series data, dynamic Bayesian networks become a suitable option for modeling [7,8,9]. Dynamic Bayesian networks are a more general form of Bayesian networks that can model data with time delays. Dynamic Bayesian networks have special advantages that have made this model attract a lot of attention. First, in this type of model, we are able to directly show cause and effect relationships between variables and use the information available in this case. The second advantage of this model is its random nature. The processes related to gene regulation are random processes, and even if these processes themselves are inherently deterministic, the large amount of error in the measurements made makes the processes seem random from our point of view. The third thing that makes this model superior is the ability of these networks to follow the change of variables over time.

    Despite these features, reverse engineering of gene regulation networks from time series data by dynamic Bayesian networks is by no means obvious. Often, the number of samples available for training the model is much less than the number of unknowns of the problem [10]. Also, there is a lot of error in the measured values, and in some cases, measurements were not made for some variables. Currently, they are mostly used in experiments with a small number of genes or simulated data. The high complexity of these models as well as their low accuracy are among their most important shortcomings. More research is needed in this field to obtain models to work with high-volume data and increase the efficiency of the generated models.

    One of the main methods used to increase the accuracy of inferred networks and compensate for the lack of training data during the network learning process is to use basic knowledge about gene regulatory networks [11]. One of the main sources of this basic knowledge is the information obtained about the general structure of gene regulatory networks. The conducted researches show that these networks are quiet in terms of communication[10]. In other words, the number of edges in these networks is small. Also, many evidences have been obtained that show that the output degree distribution in gene regulatory networks follows the power law [11] [12,13]. In fact, these networks are scale-free at the output level. This is while the degree of input in them follows the Poisson distribution with a low mean [14,15,16].

  • Contents & References of Inference of gene regulatory networks from Microarray time series data by dynamic Bayesian networks

    List:

    Chapter One: Introduction 1

    The need to do the work 6

    Overview of the thesis chapters 6

    Chapter Two: Research background 8

    2-1- Introduction 9

    2-2- Biological basics 9

    2-2-1- Genes 9

    2-2-2- Gene expression 10

    2-2-3- Gene regulatory networks 11

    2-3- Methods of learning gene regulatory networks 12

    2-3-1- Methods based on clustering 12

    2-3-2- Methods based on regression 13

    2-3-3- Methods based on mutual information 14

    2-3-4- Method 14

    2-3-5- Methods based on system theory 14

    2-3-6- Bayesian methods 15

    Chapter three: Proposed method 18

    3-1- Introduction 19

    3-2- Dynamic Bayesian networks 20

    3-3- Learning dynamic Bayesian networks 22

    3-3-1- Bayesian scoring methods 23

    3-3-1-1- Scoring by K2 method 25

    3-3-1-2- Scoring by BDe method 26

    3-3-2- Scoring methods based on information theory 26

    3-3-2-1- Scoring by log-likelihood (LL) method 27

    3-3-2-2- Scoring by BIC method 27

    AIC scoring method 28

    3-3-2-4- MIT scoring method 28

    - Time complexity of learning dynamic Bayesian networks 29

    3-4- Random networks and scale-free networks 31

    3-5- Proposed method 35 Chapter 4: Experimental results 44 4-1 Introduction 45 4-2 Scale-free network generation methods 46 4-3 Accuracy measurement methods for inferred networks 50

    4-4- The first experiment: using the full search method 52

    4-5- The second experiment: a closer look at the performance of the proposed method 54

    4-6- The third experiment: Using the greedy search 57

    4-7- The fourth experiment: Recovering a part of the gene regulation network in Yeast 60

    4-8- Experiment Fifth: The performance of the presented method in recovering random networks             

    Chapter Five: Summary 67

    5-1- Conclusion 68

    5-2- Suggestion for future work 69

    Research sources 70

    English 74 Source: English [1] Sima, Chao, Jianping Hua, and Sungwon Jung. "Inference of gene regulatory networks using time-series data: a survey." Current genomics 10, no. 6 (2009): 416.

    [2] Pham, Tuan D., Christine Wells, and Denis Crane. "Analysis of microarray gene expression data." Current bioinformatics 1, no. 1 (2006): 37-53.

    [3] Akutsu, Tatsuya, Satoru Miyano, and Satoru Kuhara. "Identification of genetic networks from a small number of gene expression patterns under the Boolean network model." In Pacific Symposium on Biocomputing, vol. 4, pp. 17-28. Maui, Hawaii: World Scientific, 1999.

    [4] Shmulevich, Ilya, Edward R. Dougherty, Seungchan Kim, and Wei Zhang. "Probabilistic Boolean networks: a rule-based uncertainty model for gene regulatory networks." Bioinformatics 18, no. 2 (2002): 261-274.

    [5] De Hoon, Michiel, Seiya Imoto, Kazuo Kobayashi, Naotake Ogasawara, and Satoru Miyano. "Inferring gene regulatory networks from time-ordered gene expression data of Bacillus subtilis using differential equations." In Biocomputing 2003: Proc. Pacific Symposium, vol. 8, pp. 17-28. 2002.

    [6] Friedman, Nir, Michal Linial, Iftach Nachman, and Dana Pe'er. "Using Bayesian networks to analyze expression data." Journal of computational biology 7, no. 3-4 (2000): 601-620.

    [7] Perrin, Bruno-Edouard, Liva Ralaivola, Aurelien Mazurie, Samuele Bottani, Jacques Mallet, and Florence d'Alche-Buc. "Gene networks inference using dynamic Bayesian networks." Bioinformatics 19, no. suppl 2 (2003): ii138-ii148.

    [8] Zou, Min, and Suzanne D. Conzen. "A new dynamic Bayesian network (DBN) approach for identifying gene regulatory networks from time course microarray data." Bioinformatics 21, no. 1 (2005): 71-79.

    [9] Kim, Sun Yong, Seiya Imoto, and Satoru Miyano. "Inferring gene networks from time series microarray data using dynamic Bayesian networks." Briefings in bioinformatics 4, no. 3 (2003): 228-235. [10] Husmeier, Dirk. "Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic Bayesian networks." Bioinformatics 19, no. 17 (2003): 2271-2282.

    [11] Hecker, Michael, Sandro Lambeck, Susanne Toepfer, Eugene van Someren, and Reinhard Guthke. "Gene regulatory network inference: Data integration in dynamic models—A." Biosystems 96 (2009): 86-103.

    [12] Sandy Shaw, Evidence of Scale-free Topology and Dynamics in Gene Regulatory Networks, Proceedings of the ISCA 12th International Conference on Intelligent and Adaptive Systems and Software Engineering, Vol. 0 (2003), pp. 37-40

    [13] Featherstone, David E., and Kendal Broadie. "Wrestling with pleiotropy: genomic and topological analysis of the yeast gene expression network." Bioessays 24, no. 3 (2002): 267-274.

    [14] Babu, M. Madan, Nicholas M. Luscombe, L. Aravind, Mark Gerstein, and Sarah A. Teichmann. "Structure and evolution of transcriptional regulatory networks." Current opinion in structural biology 14, no. 3 (2004): 283-291.

    [15] Klemm, Konstantin, and Stefan Bornholdt. "Topology of biological networks and reliability of information processing." Proceedings of the National Academy of Sciences of the United States of America 102, no. 51 (2005): 18414-18419.

Inference of gene regulatory networks from Microarray time series data by dynamic Bayesian networks