Word Files
Reference for Downloading Educational Files

Providing a suitable data model to discover the transmission of genetic diseases

Number of pages: 95 File Format: word File Code: 31010
Year: 2014 University Degree: Master's degree Category: Computer Engineering

Tags/Keywords: Computer - data model - Database - Graphical databases - Medical data - NOSQL databases - Transmission of genetic diseases - Unstructured databases

Part of the Content
Contents & Resources

Summary of Providing a suitable data model to discover the transmission of genetic diseases

Computer Engineering Master Thesis

Software Orientation

Abstract

In today's society, due to the progress of medical science, the volume of medical data is increasing rapidly. For faster and more efficient analysis of these data, we need electronic storage of these data. The data related to genetic diseases are also considered from this category of data. Considering this issue, we should design suitable databases for storing and retrieving these data. Considering the nature of genetic data and the problem of transmission of genetic diseases, relationships between people and its analysis is considered an important issue, in this article we use the graph data model, which is one of the subsets of the non-structured data model (NOSQL), to store and retrieve this data. For this purpose, we first determine the needs and queries related to this issue and based on that, we design the data graphic model. To evaluate the designed data model, a team of genetic experts also reviewed this data model and expressed their favorable opinion regarding the use of this data model for genetic diseases. We also used the Neo4j software, which stored the data related to the genetic disease Thalassemia, and we examined this data model based on the efficiency of storing and retrieving information and the time of the queries. Considering the time of the queries and the lack of support of other data models for the relationships between people, this data model is considered a suitable model.

Key words: genes, genetic diseases, graph databases, neo4j, data model

Chapter First: Introduction

1-1-Preface

In the field of medicine, data are being produced and spread rapidly. These data are produced in different forms than the data of the past, and according to the advances of science in this field, the need for new management is felt much more than in the past. To store these data, we need databases that are capable of supporting various types of data and a large volume of data, and also have the ability to perform correct and complete data management [14].

Regarding genetic diseases, the data that we need to store are diverse data. According to the nature of genetic diseases, in order to understand how these diseases are transmitted, we also need to save the health status of the ancestors of the patients. In each investigation, a new person may be added to this genealogy. Also, to discover the path of transmission of diseases, the relationships between people in this database are very important and necessary. To support these needs and manage relationships between people and disease transmission, structured databases [1] are not a suitable option, because they are not able to support different types of data.

Unstructured databases [2] are more suitable options to support different types of data. There are different types of NOSQL databases, but due to the fact that in this type of disease, relationships between people are very important and we must be able to add entities at any time, graph databases are the right case. The cell nucleus consists of 46 chromosomes or 23 pairs of chromosomes. Chromosomes have tangled strands called DNA, which contain genes. Each cell of the human body contains 25,000 to 35,000 genes [1]. Genes contain information that makes up human characteristics. Genes are composed of pairs of bases called nucleotides. The basic composition consists of 4 structures: Adenine [3] / Guanine [4] / Cytosine [5] / Thymine [6] In fact, we use 4 letters A, T, C, G to express each gene, which is called the nucleotide sequence. The nucleotide sequences of different diseases are different from each other. For example, the length of the insulin gene sequence is 333 characters. So far, the longest known nucleotide sequence is related to Duchenne's disease. The length of this sequence is 2.3 megabytes. As examples of genetic diseases, leprosy, skin cancers, mental retardation, sickle cell anemia, phenylketonuria, thalassemia and the like can be mentioned [2].

Some of the cases in which genetic tests are performed are as follows:

• A couple who is planning to start a family and one of them or one of his close relatives has a hereditary disease.

• A person who has a child with an acute birth defect.

• A child with a physical problem that can be genetic.

In order to perform these genetic tests in the first place, you need to have We have the genealogy of couples when forming a family, parents during pregnancy tests, and patients when examining a genetic patient. After knowing the genealogy, when dealing with diseases, we need to store information about the patient. To store data related to genetic diseases, we need a database that can well support the storage of all types of data. For this data, we need a data model that can analyze and analyze this data in addition to storing it. One of the important issues for choosing a data model is that in the case of medical data for each patient, we may save characteristics that we do not need for other patients, for example, we may need to save blood test results for one patient, but we do not need to perform this test and save this test for another patient, or we may come across things that were not foreseen during the examination of the patient's condition, for this reason, it is better not to design a general plan for the database from the beginning in order to be able to save any characteristic that we needed or while working with We encountered it and added it for the patient. Based on this issue, we come to the conclusion that we cannot use SQL databases and NOSQL databases are more suitable for this. Another important issue is that due to the need for the genetics of family members and previous generations of the patient, we must have the ability to add new entities (previous and subsequent generations) to this database while conducting research. Regarding the transmission of diseases, it is very important to determine the route of transmission of the disease because it must be determined that the disease was inherited from the father or mother, which of the ancestors in the next stage, and also to determine which of the male or female children may inherit the disease. For this reason, we must design databases that have the ability to extract relationships between entities. Of course, the relationships between the entities in the interface databases can also be extracted, but firstly, it is very complicated and time-consuming due to the need to write nested procedures. Considering these three issues, we come to the conclusion that the graph database is the best choice for this type of disease.

In this thesis, we will design a database using the graph data model that has the ability to store different types and volumes of data. The designed database must have the ability to perform operations on these stored data and be able to extract the desired results from it in investigating the transmission of genetic diseases. Results such as the path of disease transmission, the possibility of disease transmission to the next generation or the possibility of disease transmission to a certain gender of the next generation, the percentage of disease transmission, etc. In this database, the entities that are the same people are stored in the nodes, in addition to the general characteristics of the patients, all the information related to the diseases of the people, conditions and symptoms of the patients are also stored in the nodes. In the next levels of the graph, higher generations of patients will be stored along with information about the specific disease they are researching. We will use

edges to display the relationships between people in this data model. In this way, if the disease is transmitted from one person to another, we will use the directed edge to show this transmission. In addition to these, we can also add explanations on the ridges. Explanations such as the percentage of the probability of transmission of a specific disease from one person to another. 1-3-Importance and necessity of conducting research So far, several data models have been used to store medical information, but each of these data models has disadvantages that make them not ideal data models. One of these data models is the interface data model.
Contents & References of Providing a suitable data model to discover the transmission of genetic diseases

List:

Summary

Chapter One: Introduction. 1

1-1 Preface. 2

1-2 statement of the problem. 3

1-3 The importance and necessity of conducting research. 6

1-4 aspects of newness and innovation in research. 7

1-5 specific research objectives. 8

1-6 review of the thesis structure. 8

Chapter Two: Concepts. 9

2-1 Introduction. 10

2-2 What is the data model? 10

2-2-1 Structured data models. 12

2-2-1-1 relational data model. 12

2-2-1-2 object-oriented data model 14

2-2-1-3 relational object data model. 16

2-2-2 unstructured data model. 17

2-2-2-1 key/value data model. 19

2-2-2-2 document oriented data model 21

2-2-2-3 columnar data model. 23

2-2-2-4 Graph databases. 24

Title Page

2-3 Data Management 29

2-4 Medical Data. 30

2-5 applications of medical data management. 34

2-6 genetic diseases. 36

2-7 transmission of genetic diseases. 37

2-8 genetic tests. 38

Chapter three: Background of the research. 42

3-1 Introduction. 43

3-2 Relational data model for epidemic diseases. 43

3-3 object relational data model for hospitals 44

3-4 data graphic model for epidemic diseases. 47

Chapter Four: Suggested method. 48

4-1 Introduction. 49

4-2 entities 53

4-3-Attributes related to each of the entities 53

4-3-1 healthy human. 53

4-3-2 human carrier. 54

4-3-3 treated human 54

4-3-4 sick human. 54

4-3-5 doctor. 55

Title

4-3-6 Disease. 55

4-3-7 symptoms. 57

4-3-8 Treatment methods. 58

4-3-9 medicine. 58

4-4 Values ??stored on the edges 59

4-5 Determination of data model capabilities 60

4-5-1 Creating. 60

4-5-2 Add. 61

4-5-3 Update. 61

4-5-4 delete. 61

4-5-5 questions. 62

4-5-5-1 Query objects related to one node 62

4-5-5-2 Query objects related to two nodes 66

4-5-5-3 Query objects of more than two nodes 68

4-6 Data model design 70

4-6-1 ER design. 70

4-6-2 graphic model design. 72

Chapter Five: Evaluation. 75

5-1 Introduction. 76

5-2 First method: Focus group. 77

5-2-1 Getting to know the focus group. 77

Title . 77

5-2-3 focus group methodology. 77

5-2-4 Evaluation by the focus group. 79

5-3 The second method: practical implementation of the database 79

5-3-1 NeoFerji software. 80

5-3-2 Required data. 80

5-3-3 Saving data in the Neo-Fergeian database. 81

4-5 Results. 83

Sixth chapter: Summary and future work. 90

6-1 Summary of future works. 91

Resources. 97

Source:

Persian sources

1- Asad, Mohammad Taghi. Basics of genetics. Dunya, 1380.

2- Jabarpour Fanadi, Hosseinpour Faizi. Human genetics and human genetic diseases. Art and culture 1366.

3- Haqjo, Mustafa. Scientific-applied information bank. First volume, fundamental concepts, third edition. Iran University of Science and Technology. 1385

4- Haqjo, Mustafa; Safai, Ali Asghar. Scientific-applied information bank. Volume II, Advanced Concepts, Third Edition. Iran University of Science and Technology. 1385

5- Davrpanah, Ahmed; Mehdi Qolikhan Ramin. Management of medical documents, Ministry of Health Research Deputy, 1372

6-Sharifi Bidgoli, Mina and others. "A graph storage system for network epidemic data storage". The 19th Annual National Conference of the Iranian Computer Association, Shahid Beheshti University, Tehran, 2013

English Obstacles

7-Bryan, T., 2013, Literature Survey of Graph Databases, SYSTAP, LLC

8-Canada Health Infoway. The emerging benefits ofThe emerging benefits of electronic medical record use in community-based care. PwC.

9-Date, C.j., 2003, An introduction to Database systems.

10-EI-Sappagh, Sh. 2012, Electronic Health Record Data Model Optimized for Knowledge Discovery. IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 5, No 1.

11-http://www.neo4j.org/learn/neo4j

12-Ken Ka-Yin, L., Wai-Choi, T., Kup-Sze, C.2012, Alternatives to relational database: Comparison of NoSQL and XML approaches for clinical data storage, Computer methods and programs in biomedicine110(2013)99-109

13-Lahr, G.etc. 2007, A Dominant B0-Thalassemia-Like Phenotype In A German Caucasian Family Is Associated With Mild Chronic Hemalytic Anemia But Influenced In Severity By Co-Inherited Genetic Factors, hematologica September 92: 1264-1265; Doi:10.3324/haematol.11383

14-Rabinson, I., Webber, J., Eifrem, E. 2013, Graph Databases, O'Reilly Media

15-Tayie, S.,2005, Research Methods and Writing Research Proposals, Cairo university.

16-Vaish, G. 2013, Getting Started with NoSQL, Packt Publishing Ltd.

How To Access The File

Investigating the relationship between the geographical distribution of MS disease and climatic elements in Gilan province

Number of pages: 90 Category: Geography - Urban Planning

Dissertation for master's degree in natural geography majoring in climatology in environmental planning Abstract One of the diseases of this century that unfortunately young people especially girls and young women are suffering from is the disease (MS), the cause of which is still unknown according to the expert doctors of the world and unfortunately they state various theories ...

Presenting a feature-based model to analyze the sentiment in texts

Number of pages: 74 Category: Computer Engineering

Master's Thesis in Computer Engineering (Software) First Chapter Preface 1-1- Introduction Some authors define data mining as a tool to search for useful information in a large amount of data. To perform the data mining process, we encounter various research fields, such as database, machine learning and statistics. Databases are essential for analyzing large amounts of data. ...

Identifying overlapping entities in dynamic networks

Number of pages: 82 Category: Computer Engineering

Master's Thesis in Computer Engineering-Artificial Intelligence Abstract Identifying Overlapping Organizations in Dynamic Networks Many complex natural and social structures can be considered as networks [1]. Roads, Internet sites, social networks, organizational communication, kinship relationships, electronic mail exchange, telephone calls and financial transactions are just a ...

A fuzzy k-nearest neighbor data classification algorithm for privacy in cloud computing

Number of pages: 104 Category: Computer Engineering

Master's Thesis in Computer Engineering Major: Software Abstract: Cloud processing and cloud environment and cloud databases are the place to store information on the web and the best solution should be used to increase their security. Our problem here is the classification of confidential and top-secret data and then their encryption for storage in the cloud. For this, speed ...

Presenting a model to identify the influencing factors and their impact factor in the profit and loss of the third party car insurance of insurance companies by means of data mining methods, a case study of Iran Insurance Company.

Number of pages: 100 Category: Computer Engineering

Master's thesis in the field of computer - software engineering. Abstract: The review of car insurance information has shown that factors such as the type of car used, having a driver's license, the type of license and its compatibility or non-compatibility with the vehicle, the amount of the insurance premium, the amount of insurance policy obligations, the quality of the car ...

Evaluation of some concurrency control algorithms in the database management system, through Petri modeling

Number of pages: 120 Category: Computer Engineering

Master's Thesis of Faculty of Technology and Engineering, Computer Department Abstract: The issue of concurrency control in databases is a necessary and important issue. Asynchronous execution of transactions in a database management system may lead to inconsistencies. Inconsistency is caused by incorrect values ??for existing data, due to conflict and interference of ...

Analysis of the relationship between the development of electronic databases and the increase in the number of tourists

Number of pages: 104 Category: Tourism - Tourism

Dissertation for Master's Degree in Tourism Management (M.A.) Major: Planning and Development Abstract: Introduction: The increasing growth and expansion of information and communication technology in today's era has caused various economic and industrial sectors to be affected by it. The tourism industry as a dynamic and growing industry has been able to gain a favorable ...

Estimation of power network parameters based on real-time measured utilization quantities using phasor measurement units placed with the help of genetic algorithm

Number of pages: 120 Category: Electronic Engineering

Master's thesis in electrical engineering, power, power trend, abstract network development planning, operation planning and finding solutions to improve the security and economic performance of the power system all require system studies. The most necessary step in conducting these studies is network modeling, which itself requires detailed information about the impedance ...

Simulation and modeling of sensor networks with competitive neural networks

Number of pages: 131 Category: Computer Engineering

Thesis for obtaining a master's degree in the field: computer, software, abstract. In a sensor network, which is a pervasive distributed system, communication synchronization is one of the discussed cases. One of the main tasks of synchronizing processes is mutual exclusivity. The new algorithms provided are more fair compared to the old algorithms. In this thesis, we present a ...

Estimation of power network parameters based on real-time measured operation quantities using phasor measurement units placed with the help of genetic algorithm

Number of pages: 86 Category: Electrical Engineering

Providing a suitable data model to discover the transmission of genetic diseases

Summary of Providing a suitable data model to discover the transmission of genetic diseases

Contents & References of Providing a suitable data model to discover the transmission of genetic diseases