Word Files
Reference for Downloading Educational Files

Investigating dynamic data replication algorithms in grid networks and presenting a new algorithm based on parameters of file size, available bandwidth and geographical distance.

Number of pages: 81 File Format: word File Code: 31037
Year: 2014 University Degree: Master's degree Category: Computer Engineering

Tags/Keywords: Algorithm - Computer networks - cried - Distributed data - Dynamic data replication algorithms - Grid networks - Static replication

Part of the Content
Contents & Resources

Summary of Investigating dynamic data replication algorithms in grid networks and presenting a new algorithm based on parameters of file size, available bandwidth and geographical distance.

Computer Engineering Masters Thesis

Software Orientation

Abstract

The necessity of increasing use of distributed data in computer networks is clear to everyone. A large number of computing and storage resources are placed together and form the grid. In recent years, grid technology has grown significantly, so that it has been used in most researches and scientific experiments. The big challenges in the data grid are the need for high availability, efficiency and low bandwidth consumption. Data replication is a method that can be used to solve problems such as efficient data access or high availability. In an environment where replication is used, by increasing the number of replicated copies of files with better locality of data, the efficiency of the system will be improved.

In this thesis, different methods of dynamic data replication in data grid networks are investigated and a dynamic data replication algorithm is proposed in the grid, which by taking advantage of the effective factors on data replication, reduces the execution time of tasks and reduces bandwidth consumption and the cost of maintaining versions. This algorithm has been implemented in Optorsim simulator, and the simulation results show that parameters such as the average execution time, the number of replicas, and productivity have improved.

Key words: data grid, data replication, replacement, access pattern, geographical distance, access cost

Chapter One

Introduction

Introduction

Over time, various types of distributed systems [1] have been designed and implemented, one of the types of distributed systems is Grid systems [2]. This technology is characterized by its focus on large-scale resource sharing. Data replication is a data grid service that was created to facilitate and speed up data access.

2-1. Statement of the problem

Today, in various fields, big data sets are becoming an important part of shared resources. In various fields, including physical energy, bioinformatics, earth observations, global climate changes, image processing and data mining, a huge amount of data is measured in terabytes and in some cases in betabytes. Such a huge amount of information can be accessed by researchers and scientists using sophisticated computing devices. These researchers and computing and storage devices are distributed all over the world.

The huge amount of information and calculations creates new problems regarding data access, processing and distribution, and with a large amount of data, different geographical locations and complex calculations are involved, which makes it difficult to face the challenge of management infrastructure. Data Grid is a suitable solution for all the mentioned problems. Grid is an architecture for distributed management and analysis of scientific data sets.

A large number of computing and storage resources are placed together and form Grid. The main topic and issue that caused the formation of Grid technology was to share resources in a coordinated manner and solve issues and problems in dynamic and multi-institutional virtual organizations. The purpose of sharing was not only the exchange of simple files, but the goal of direct access to computers, software, data and other available resources. Grid provides easy access to all these resources.

3-1. The importance of data grid

The main motivation for designing data grid was to respond to the needs of users with a large amount of data, cover users and distributed resources, and be responsive to analyzes with a high volume of calculations [1].

Effective access to such a large amount of data that is widely distributed is slow due to network delays and bandwidth problems. As the size of a grid grows, the complexity of this system increases. The big challenge that arises in the data grid is the need for high availability, efficiency and saving in network traffic. The data grid is designed to meet the needs of large data sets, geographical distribution of users and resources, and computational analysis. This architecture is also developed for complex operations in large areas and heterogeneous environments.Managing such a large amount of distributed data in a centralized method is not efficient because a large amount of load is imposed on the central server. In addition to the fact that the storage is done on the central server, it also has problems such as failure at one point and bottleneck. Therefore, this large amount of information must be repeated and distributed in different places of the distribution system in order to avoid such issues and problems. The grid retrieves data from the nearest site and replicates it to the requesting sites.

With the help of the data grid, large amounts of data can be stored and then retrieved at different points throughout the grid. In this case, the efficiency of the grid depends on the available bandwidth and network time delay, so that the low bandwidth between the data storage location and the processing location causes grid inefficiency.

4-1. Possible solutions

As we know, the data access time depends on the communication bandwidth in the data grid. In a communication environment, the main factor to ensure fast access to data is the absence of high latency. To reduce the access time, various solutions are used, among these solutions, the use of work scheduling can be mentioned. A good scheduler can reduce data transfer costs as much as possible for faster access by running the job in the right place. Another solution is to use the replication mechanism[3], which increases the access speed by creating copies[4] of a copy, in fact, to increase efficiency, multiple copies of files can be stored across the grid[2].

5-1. Proposed solution

In fact, the complexity of the structure increases with the growth of the grid size. High data availability is a major challenge in Grid. Users' computing applications contain enormous amounts of data. Local storage of a copy of the data is very expensive and impractical. Coping with network delays and storage capacity limitations[5] at different sites to provide high availability is a difficult challenge. To answer the access challenge, data replication is one of the major methods that promotes high availability, bandwidth consumption, increased fault tolerance, and improved scalability and response time [3-9]. When data is replicated, a copy of the data files is placed in different locations of the data grid, replication can save a large amount of bandwidth compared to having the data only reside at one site. Therefore, to ensure constant and fast access to data, data replication is a very good trade-off between available memory and available bandwidth [10]. Data replication is a common way to improve the efficiency of data access in distributed systems. Creating duplicates not only reduces bandwidth consumption, but also reduces access latency. In other words, increasing the efficiency of data reading from appropriate nodes [6] is the main goal of data replication algorithms.

In addition, it is possible to increase data access, reliability, system scalability, load balance by performing replication and queuing them among different sites [11].

The main benefits of replication are: [12]

1. Better availability: When one node fails, the system can access data from another node, which also improves availability.

2. Better performance: Because the data is replicated among multiple nodes, the user can obtain the data from the nearest node or the node with less workload. Data replication techniques can be classified into two main parts, static replication [7] and dynamic replication [8]. In static replication, the number of copies and the host node are statically selected at the beginning, and no more copies are created after that. On the other hand, the dynamic strategy can create the copy in a new node according to the storage capacity and bandwidth or adapt itself to the changes and delete the copies that are no longer needed according to the requests. In static data replication, a duplicate copy exists until it is deleted by the user or its lifetime expires. The defect of static replication is when the access pattern of nodes changes frequently and static algorithms are not able to adapt to new conditions. Once a copy is created on a site, it remains there until it is deleted by the user.
Contents & References of Investigating dynamic data replication algorithms in grid networks and presenting a new algorithm based on parameters of file size, available bandwidth and geographical distance.

List:

Chapter 1. Introduction. 2

1-1. Introduction. 3

2-1. State the problem. 3

3-1. Importance of data grid 4

4-1. Possible solutions. 5

5-1. Suggested solution. 5

6-1. Thesis questions. 8

6-1. Objectives of the thesis. 8

7-1. Thesis structure. 9

Chapter 2. A review of previous records. 10

2-1. Introduction. 11

2-2. Data replication techniques 11

2-3. A framework for data replication 12

Chapter 3. Algorithm of dynamic replication in data grid using initial data fetching 29

3-1. Introduction. 30

3-2. PDDRA architecture. 30

3-3. Steps to perform the PDDRA algorithm. 32

3-3-1. Phase 1: Storing the file access pattern. 33

3-4. Phase 2 of the initial retrieval algorithm. 38

3-4-1. Manager's responsibility to update copy. 40

3-4-2. Local server structure and grid sites. 41

3-5. Phase 3: Replacement. 46

3-5-1. PDDRA Replacement Algorithm. 48

3-6. conclusion 49

Chapter 4. The proposed algorithm. 50

4-1. Introduction. 51

4-2 The proposed data replication algorithm 51

4-3. Algorithm description. 53

4-3-1. First phase: file request and duplicating. 53

4-3-2 The second phase: replacement. 54

Chapter 5. Algorithm simulation. 56

5-1 Introduction. 57

5-2. Algorithm simulation. 57

5-2-1 access patterns. 59

5-2-2. Configuration files for optorsim settings. 61

5-3. Simulation results. 62

5-3-1. Fuzzy system implementation. 63

5-4. Performance evaluation. 63

6-4. Network efficiency. 66

Chapter 6. Conclusions and suggestions. 67

6-1. Introduction. 68

6-2. Suggested solution. 68

6-3. conclusion 68

5-2. future works 69

References. 70

Source:

[1] Ghilavizadeh Z., Mirabedini S. J., Harounabadi A., “A New Fuzzy Optimal Data Replication Method for Data Grid”, Management Science Letters Journal, (2013) 927-936.

[2] ChangR. S., ChenP. H., Complete and fragmented selection and retrieval indata grids, Future Generation Computer Systems, 23 (2007) 536–546.

[3] FosterI., The grid: A new infrastructure for 21st century science, (2002).

[4] RanganathanK., FosterI., Design and evaluation of dynamic replication strategies for a high performance data grid, in: International Conference on Computing in High Energy and Nuclear Physics, vol. 2001, (2001).

[5] Lamehamedi H., Szymanski B., Shentu Z., Deelman E., Data replication strategies in grid environments, 5th International Conference on Algorithms and Architecture for Parallel Processing, (2002) 0378.

[6] RanganathanK., IamnitchiA., FosterI., Improving data availability through dynamic model-driven replication in large peer-to-peer communities, Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid, (2002) 376.

[7] RahmanR.M., BarkerK., AlhajjR., Replica placement in data grid: Consideringutility and risk, Information Technology: Coding and Computing, 1 (2005)354 - 359.

[8] S. Vazhkudai, S. Tuecke, I. Foster, Replica selection in the globus data grid, First IEEE International Symposium on Cluster Computing and the Grid, (2001) 106.

[9] StockingerH., SamarA., HoltmanK., AllcockB., FosterI., TierneyB., File and object replication in data grids, Cluster Computing, 5 (3) (2002) 305–314.

[10] YuanY., WuY., Yang, F. Yu, Dynamic data replication based on local optimization principle in data grid, Sixth International Conference on Grid and Cooperative Computing, (2007) 815 - 822.

[11] Foster I., Ranganathan K., Design and evaluation of dynamic replication strategies a high performance Data Grid, in: Proceedings of International Conference on Computing in High Energy and Nuclear Physics, (2001) 20.

[12] Meroufel B., Belalem G., Dynamic Replication Based on Availability and Popularity in the Presence of Failures, Journal of Information Processing Systems (JIPS), (2012), Dynamic Replication Based on Availability and popularity in the Presence of Failures, Journal of Information Processing Systems (JIPS), (2012) 263-278.

[13] CibejU., SlivnikB., RobicB., The complexity of static data replication in datagrids, Parallel Computing 31 (8) (2005) 900-912.
[14] Dong, X. LiJ., WuZ., ZhangD., XuJ., On Dynamic Replication Strategies in Data Service Grids, 11th IEEE International Symposium on Object Oriented Real-Time Distributed Computing (ISORC), Orlando, (2008) 155–161. [15] Amjad T., Sher M., Daud A., A survey of dynamic replication strategies for improving data availability in data grids, Future Generation Computer Systems, (2012) 337–349.

[16] BsoulM., A Framework for Replication in Data Grid, International Conference on Networking, Sensing and Control Delft, (2011) 978-981.

[17] Sashi, K., AntonyS.T., Dynamic Replica Management for Data Grid, IACSIT International Journal of Engineering and Technology, (2010) 2- 4. [18] Park S.M., Kim J.H., KoW. B., Yoon W. S., Dynamic Data Replication Strategy Based on Internet Hierarchy BHR, in Lecture notes in Computer Science Publisher, (2004) 838-846.

[19] Loukopoulos T., AhmadI., Static and Adaptive Distributed Data Replication Using Genetic Algorithms, Journal of Parallel Distributed Computing, (2004)1270-1285.

[20] Zhongping Z., Zhang C., Mengfei Z., Wang Z., Dynamic Data Grid Replication Algorithm Based on Weight and Cost of Replica, Telkomnika Indonesian Journal of Electrical Engineering, (2014) 2860-2867.

[21] ChangR. S., ChangH. P., WangY. T., A dynamic weighted data replication strategy in data grids, The Journal of Supercomputing, 45 (3) (2008) 277-295.

[22] GuQ., ChenB., ZhangY., Dynamic Replica Placement and Location Strategies for Data Grid, International Conference on Computer Science and Software Engineering, Wuhan-Hubei, (2008) 35-40.

[23] LStoica, R. Morris, D. Karger, M. F. Kaashoek, and H Balakrishnan, Chord: A Scalable Peer to Peer Lookup Service for Internet Applications, Proceedings of ACM SIGCOMM, (2001) 160-177.

[24] TangM., LeeB.S., YaoC.K., TangX.Y, Dynamic replication algorithm for the multi-tier data grid, Future Generation Computer. Systems, 21 (5) (2005)775-790.

[25] ShorfuzzamanM., GrahamP., EskiciogluR., Popularity-driven dynamic replica placement in hierarchical data grids, in: Proceedings of Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies, (2008) 524-531.

[26] Slota R., Skital L., Nikolow D., Kitowski J., Algorithms for automatic data replication in grid environment, in: Roman Wyrzykowski, Jack Dongarra, Norbert Meyer, Jerzy Wasniewski (Eds.), Parallel Processing and Applied Mathematics: 6th International Conference, PPAM 2005, Poznan, Poland, September 11-14, 2005, Revised Selected Papers, in: Lecture Notes in Computer Science, vol. 3911, Springer, 2006, pp. 707–714.

[27] AbdurrabA.R., XieT., Fire: a file reunion data replication strategy for datagrids, in: 10th IEEE/ACM International Conference on Cluster, Cloud and GridComputing, (2010) 215-223.

[28] Chang R.S., Chang H.P., Wang W.T., A dynamic weighted data replication strategy in data grids, IEEE/ACS International Conference on Computer Systems and Applications, (2008) 414-421.

[29] Ghilavizadeh Z., MirabediniS. J., Harounabadi A., A New Fuzzy Optimal Data Replication Method for Data Grid, Management Science Letters Journal, (2013) 927-936.

[30] SashiK., SanthanamT., Replica Replacement Algorithm for Data Grid Environment, ARPN Journal of Engineering and Applied Sciences, (2013) 86-90.

[31] Lei M., VrbskyS. V., A Data Replication Strategy to Increase Data Availability in Data Grids.
[33] KroegarT.M., Long DarrellD.E.

How To Access The File

Investigating reallocation algorithms in computational grids and presenting an efficient algorithm

Number of pages: 73 Category: Computer Engineering

Master's thesis in the field of computer engineering (Torin Computational Networks Abstract Software (GRID) has provided a context that heterogeneous resources can be used in different geographic locations to solve complex scientific, engineering and business problems. Scheduling operations play a key role in GRID's performance. Due to the dynamics of resources and imprecise ...

Presenting an ant community algorithm to improve the time of doing tasks in the grid environment

Number of pages: 85 Category: Computer Engineering

Dissertation for M.Sc. Abstract In this thesis, we have presented a new method in network processing with Ant algorithm. The model we used in the network space is a continuous two-way auction. Due to their simplicity and dynamics, these models are used in many algorithms used to control resources and schedule tasks. Many of these models have weaknesses in their response time ...

Assessment of transient stability of power systems using data from phasor measurement units

Number of pages: 123 Category: Electronic Engineering

Master's Thesis in the field of electrical engineering- control, abstract assessment of the transient stability of power systems using the data of phasor measurement units, quick assessment of security in power networks in emergency situations and the occurrence of various errors, is a vital thing to prevent collapse and create nationwide outages. Therefore, the evaluation ...

Simulation and modeling of sensor networks with competitive neural networks

Number of pages: 131 Category: Computer Engineering

Thesis for obtaining a master's degree in the field: computer, software, abstract. In a sensor network, which is a pervasive distributed system, communication synchronization is one of the discussed cases. One of the main tasks of synchronizing processes is mutual exclusivity. The new algorithms provided are more fair compared to the old algorithms. In this thesis, we present a ...

Assessment of transient stability of power systems using data from phasor measurement units

Number of pages: 122 Category: Electrical Engineering

Master's Thesis in the field of Electrical Engineering - Control Abstract Evaluation of the transient stability of power systems using the data of phasor measurement units by the efforts of Hanieh Mohammadi Rapid evaluation of security in power networks in emergency situations and the occurrence of various errors is a vital thing to prevent collapse and create nationwide ...

Designing optimal rate allocation algorithms based on utility function in data networks

Number of pages: 210 Category: Electronic Engineering

Faculty of Electrical Engineering and Computers Electrical Engineering Doctoral Dissertation Abstract: The purpose of this thesis is to improve the performance of optimal rate allocation algorithms based on the utility function in data networks. The optimal rate allocation algorithm based on the utility function was first proposed by Dr. Golestani. Then Kelly showed that the ...

Designing optimal rate allocation algorithms based on utility function in data networks

Number of pages: 205 Category: Electrical Engineering

Electrical Engineering Doctoral Dissertation Abstract: The purpose of this thesis is to improve the performance of optimal rate allocation algorithms based on the utility function in data networks. The optimal rate allocation algorithm based on the utility function was first proposed by Dr. Golestani. Then Kelly showed that the problem of optimal rate allocation can be converted ...

Identifying overlapping entities in dynamic networks

Number of pages: 82 Category: Computer Engineering

Master's Thesis in Computer Engineering-Artificial Intelligence Abstract Identifying Overlapping Organizations in Dynamic Networks Many complex natural and social structures can be considered as networks [1]. Roads, Internet sites, social networks, organizational communication, kinship relationships, electronic mail exchange, telephone calls and financial transactions are just a ...

Inference of gene regulatory networks from Microarray time series data by dynamic Bayesian networks

Number of pages: 87 Category: Computer Engineering

Master's Thesis in the field of Computer Engineering-Artificial Intelligence Abstract Inferring gene regulatory networks from Microarray time series data by dynamic Bayesian networks Genetic regulatory networks are a set of gene-gene relationships that create a cause and effect relationship in gene activities. Our knowledge about these networks plays a very effective role in ...

About the sleep timing of nodes in wireless sensor networks

Number of pages: 118 Category: Electronic Engineering

Dissertation for Master's degree (M.Sc) Abstract: A sensor network consists of a large number of sensor nodes that are widely distributed in an environment and collect information from the environment. Since the nodes are powered by batteries, an important issue that is considered in sensor networks is the issue of energy consumption. One of the methods that are very common in ...

Investigating dynamic data replication algorithms in grid networks and presenting a new algorithm based on parameters of file size, available bandwidth and geographical distance.

Summary of Investigating dynamic data replication algorithms in grid networks and presenting a new algorithm based on parameters of file size, available bandwidth and geographical distance.

Contents & References of Investigating dynamic data replication algorithms in grid networks and presenting a new algorithm based on parameters of file size, available bandwidth and geographical distance.