An optimized approach for simultaneous horizontal data fragmentation and allocation in Distributed Database Systems (DDBSs)

Ali A Amer; Adel A Sewisy; Taha MA Elgendy

doi:10.1016/j.heliyon.2017.e00487

. 2018 Jan 11;3(12):e00487. doi: 10.1016/j.heliyon.2017.e00487

An optimized approach for simultaneous horizontal data fragmentation and allocation in Distributed Database Systems (DDBSs)

Ali A Amer ^a,^⁎, Adel A Sewisy ^b, Taha MA Elgendy ^c

PMCID: PMC5772458 PMID: 29387818

Abstract

With the substantial ever-upgrading advancement in data and information management field, Distributed Database System (DDBS) is still proven to be the most growingly-demanded tool to handle the accompanied constantly-piled volumes of data. However, the efficiency and adequacy of DDBS is profoundly correlated with the reliability and precision of the process in which DDBS is set to be designed. As for DDBS design, thus, several strategies have been developed, in literature, to be used in purpose of promoting DDBS performance. Off these strategies, data fragmentation, data allocation and replication, and sites clustering are the most immensely-used efficacious techniques that otherwise DDBS design and rendering would be prohibitively expensive. On one hand, an accurate well-architected data fragmentation and allocation is bound to incredibly increase data locality and promote the overall DDBS throughputs. On the other hand, finding a practical sites clustering process is set to contribute remarkably in reducing the overall Transmission Costs (TC). Consequently, consolidating all these strategies into one single work is going to undoubtedly satisfy a massive growth in DDBS influence. In this paper, therefore, an optimized heuristic horizontal fragmentation and allocation approach is meticulously developed. All the drawn-above strategies are elegantly combined into a single effective approach so as to an influential solution for DDBS productivity promotion is set to be markedly fulfilled. Most importantly, an internal and external evaluations are extensively illustrated. Obviously, findings of conducted experiments have maximally been recorded to be in favor of DDBS performance betterment.

Keywords: Computer science

1. Introduction

In this modern time of rapidly-paced data, and information, feeds and an ever-growing connection with data usage through constantly-advanced technology, it is therefore being more than an obligation for all DDBS-based organizations/individuals to search for creatively-designed DDBS of highly-appreciated throughputs. Nevertheless, in DDBS, a well-designed system is still considerably demanding task as it is of continuous uncontrollable urge to reach the satisfactory level of DDBS performance. The intended level of DDBS throughputs, however, could be simply achieved through: proposing a proper data fragmentation; presenting precise data allocation, and designing practical algorithm for sites clustering. As a matter of fact, these techniques combined were successfully proven to be super effective in boosting DDBS productivity (Sewisy et al., 2017; Hababeh et al., 2015). Therefore, this work comes with the aim of introducing a newly-designed approach by combining them all into a single efficacious work. Off these techniques, (Abdalla, 2014) is being selected and optimized to be then re-introduced as an optimized fragmentation technique for this work. Additionally, this optimized technique is being finely integrated with proposed sites clustering algorithm and cost model for data allocation. It is worth mentioning that the obtained results confirm emphatically that the proposed (optimized) approach outperforms (Abdalla, 2014) to the great extent, and proves to be a potential progress not just in lessening TC substantially, but also in promoting DDBS performance significantly. To sum up, contributions along with motivations of this work are clearly featured as follows;

1.
Objective function of (Abdalla, 2014) had not delved communication costs into distributed query costs which leads to its being inefficient on either mitigating costs of communications, as distributed query being processed, or even evaluating the whole technique. Communication costs, however, is the prime rational for which (Abdalla, 2014) and also present work basically come to find a practical solution capable of minimizing these costs to the most greatest extent. In other words, practically reducing communication costs has been the major concern of present work providing that these costs are being carefully reflected in the intrinsically-amended objective function. Moreover, having communication costs involved within objective function would satisfy: (1) the reflection of actual, or at least near-optimal, reality of Transmission Costs (TC) (Sewisy et al., 2017); (2) the precipitation of accurately evaluating overall DDBS performance as distributed queries processed and maximally verified rate of data locality growth and DDBS throughput as a result.
2.
On contrary to (Abdalla, 2014), to further minimizing communication costs, present work aims at delicately proposing clustering algorithm for sites as sites clustering has significantly proven to be considerably efficient on lessening communication costs (Sewisy et al., 2017; Hababeh et al., 2015; Abdel Raouf and Badr, 2017).
3.
Data allocation has widely proven to be super effective factor in DDBS productivity promotion, specifically as it has been done appropriately. So, unlike (Abdalla, 2014), data allocation (including replication over clusters/sites) has neatly been made using precise mathematical model with considering communication costs between both sites and clusters of sites. This model is meant to be applicable in both works while its being completely complied with present work’s concepts including sites grouping.
4.
In full contrast with (Abdalla, 2014), in which just data replication scenario for data is adopted and data replication was bound to be permanently met, present work seeks to adopt replication when it is just necessary, and non-replication scenario as well for both works which would contribute in avoiding unnecessary replication of demoralizing effects.
5.
In (Abdalla, 2014), technique’s evaluation was not provided to measure effectiveness of proposed technique as distributed queries under processing. Present work, however, seeks to substantively draw evaluation process for both works while strictly maintaining their circumstances. This evaluation has been expressively conducted with considering the precisely-modified objective function in mind.
6.
To prove proposed concepts of the present work, many different experiments under varied circumstances have been conducted that an internal and external evaluations are extensively drawn in self-explanatory frame. Both works are being exposed on TC function to measure their quality and grade DDBS performance accordingly.

The rest of this paper is elegantly organized as follows; section (2) profoundly covers earlier works, which are closely relevant to this work. In section (3), technique’s methodology, including architecture, is presented. Site clustering algorithm is stated in section (4). In section (5), the proposed data allocation and replication models are elaborately given. In section (6), pseudo code algorithm is briefly provided. In section (7), experimental results are extensively drawn. Section (8) illustrates works’ evaluation thoroughly, and gives comparative theoretical study. Finally, conclusions and future work directions are included in section 9.

2. Related work

As per earlier studies of DDBS design, a remarkable progress is being recorded in form of successive steps to improve DDBS rendering. As one important aspect of this progress, several Horizontal Fragmentation (HF) techniques/methods have been presented. As example, in (Ceri et al., 1982; 1986) used a min-term predicate as a measure to split relations so that primary HF was produced assuming that previously-specified predicates set satisfied properties of disjoint-ness and completeness. On the same line, (Zhang and Orlowska, 1994) draw two-phase HF method. In first phase, relations were fragmented by primary HF using predicate affinity and bond energy algorithm. Secondly, relations were further divided using derived HF. For initial stage of DDBS design, (Surmsuk and Thanawastien, 2007) proposed a Create, Read, Update and Delete Matrix (CRUD) so that attributes used as rows of CRUD and applications locations used as columns. Fragmentation and data allocation was considered together as well. (Amer and Abdalla, 2012) Presented a cost model to find an optimal HF in which two scenarios for data allocation were considered so that no supplemental complexity was added to data placement. In follow-up work, this model was further extended in (Abdalla, 2014) and mathematically shown to be an effective at reducing costs of communication. Experimental results, performance analysis and model practicality were not provided, though.

On the other hand, a hybridized fragmentation was proposed in (Harikumar and Ramachandran, 2015) to reduce database access time based on subspace clustering algorithm. Data fragments were generated with respect to tuple and attribute patterns that the closely correlated data were assembled together. In the meantime, (Hauglid et al., 2010) evolved a decentralized approach for dynamic Table fragmentation and allocation in DDBS (DYFRAM) to maximize data locality based on recorded access history. Moreover, approach feasibility was experimentally demonstrated. By the same token, (Abdel Raouf and Badr, 2017) gave an enhanced system to perform initial-stage fragmentation and data allocation along with replication at run time over cloud environment. Site clustering was addressed as well to enhance DDBS performance through increasing local accesses.

Meanwhile, (Lin et al., 1993) addressed data allocation problem in DDBS through developing two algorithms with the aim of lessening the entire costs of communication. On the same page, a model to approach queries behavior in DDBS was drawn in (Huang and Chen, 2001). In terms of reducing communication costs, two heuristic algorithms were then given to find a near-optimal allocation scenario. This algorithms was proven to be close enough from being an optimal compared to (Lin et al., 1993). In (Tâmbulea and Horvat, 2008), a dynamic data allocation method was presented to decrease transmission cost with considering database catalog as the only storing place for required data. In (Amita Goyal Chin, 2002), as a new of its kind, a partial data reallocation and full reallocation heuristics were approached to minimize costs and keep complexity controlled. Moreover, to find an optimal data allocation technique, (Abdalla et al., 2014) presented a non-replicated dynamic data allocation algorithm (POEA). Actually, POEA was originally sought to integrate some previously-proposed concepts used in its earlier peers including (Mukherjee, 2011). (Dejan Chandra Gope, 2012), on its turn, developed a dynamic non-replicated data allocation algorithm (named, NNA). Data reallocation was done with respect to the changing pattern of data access along with time constraints.

By the same token, (Singh, 2016) approached data allocation framework for non-replicated dynamic DDBS using threshold algorithm (Ulus and Uysal, 2007), and time constraint algorithm (Singh and Kahlon, 2009). Furthermore, this work was shown to be most efficient in terms of long-term performance than threshold algorithms when access frequency pattern changes in rapid paces. Nevertheless, (Kumar and Gupta, 2016) evolved an extended allocation approach capable of dynamically assigning fragment in redundant/non-redundant DDBS. The problem of having more than one site qualified to have data was also discussed. Lastly and most importantly, in (Wiese, 2015), Data Replication Problem (DRP) was formulated to have precise horizontal fragmentation of overlapping fragments. This work aimed at placing N-copy replication scheme of fragments into M distinct sites with ensuring that overlapping is being precluded. Most of all, replication problem was looked at as an optimization problem to achieve intended goal that fragments’ copies and sites kept minimized. In follow-up work, (Wiese, 2015) was further being extended in (Wiese et al., 2016) and DRP problem was then re-formalized as an integer linear program. Runtime performance was analyzed and data insertion and deletion were addressed as well.

3. Methodology

This work is ultimately driven by different quests with Transmission Costs reduction has been the top primacy. On the other hand, introducing a mathematically-based data allocation, integrating non-replication scenario for data allocation as well as proposing site clustering algorithm have been further aspects of evolving this approach of this paper. Therefore, motivations could be presented in the next a few lines.

3.1. Motivations

Motivations of this work are delicately identified to be either general or particular motivations as follows;

3.1.1. General motivations

After a comprehensively-done investigation for related work, in context of horizontal method of relations in relational distributed database, and to best of our acknowledge that there has never been any single work seeks to integrate several communication costs-reducing techniques (horizontal fragmentation, data allocation and replication, site clustering, mathematical models,. etc) into one single work. Therefore, this work meant to be promising, leading and distinguishable approach (able to be valid as mathematical-based general solution for most problems of DDBS performance). In the sense that all these techniques are set to be combined together in purpose of finding creative sustainable solution for DDBS performance improvement. Moreover, this work has successfully been shown to be highly promising either theoretically, experimentally or mathematically. On the other hand, it is worth referring that (Hababeh et al., 2015; Abdel Raouf and Badr, 2017) are spotted as they were proposed so that all these technique were combined together, but these approaches were implemented and designed in a completely different circumstances of our present work of this paper as well as they were meant to solve a particular cases of telecommunication databases and cloud environment. Both of references were adequately cited on our paper, though.

3.1.2. Particular motivations

After a thoroughly-made investigation and carefully-raised questioning for all closely-relevant earlier studies in terms of DDBS performance enhancement, (Abdalla, 2014) has been found to be an interesting well-designed technique to be examined and significantly extended. The major purpose of (Abdalla, 2014) was focused on DDBS rendering promotion through decreasing communication costs. However, given the importance of this work’s objective function, intrinsic flaws has been recorded as this function had never considered communication costs, which is largely responsible for either DDBS performance boost or deterioration. Data allocation process also was not clearly elaborated as it was given merely theoretically. Moreover, no evaluation for technique was provided so that objective function could be identified as being effective. So, based on this flaws and as per this work’s drawn-above contributions, the present work of this paper is built on to close these gaps for the sake of producing super effective approach. In other words, those flaws and gaps make (Abdalla, 2014) to be somehow appearing unable of achieving the acquired goal of reducing communication costs as well as promoting DDBS performance, which it was originally intended to be satisfied in the first place. This claims, nevertheless, are significantly confirmed to be indisputable facts via discussion and evaluation presented in this work. To sum up, using (Abdalla, 2014) as essential and initial cornerstone, this work comes to fully and intrinsically perfect and optimize technique of (Abdalla, 2014).

3.2. Horizontal fragmentation model

In this work, fragmentation model is set out to be entirely depending upon predicates set, Pr [Pr₁, …,Pr_p]. In its turn, these predicates are supposedly assigned to all (NA) attributes under consideration, A [A₁,….,A_n]. While numerical attributes are assumed to have one of three states: (Pr_i > Value1), (Pr_i < Value2) or (Pr_i = Value3), alphabetical attributes are destined to have only one state: (Pr_i = Alphabetical Value). On the other hand, for each attribute, retrieval and update frequencies would be extracted/given by Database Administrator (DBA) for all predicates in form of “Pr_i.RF_i or Pr_i.R_i “and “Pr_i.UF_i or Pr_i.U_i“. Meanwhile, these attributes are observed to be constantly required by most-frequently used queries (Qs), which are supposedly released from several sites, and each query has its own RF/UF (or, R/U) frequency over data in each site. In the sense that if query (Q) is launched from several sites (M), that query would be treated as a different query in each site with different RF/UF frequency. These frequencies have to be precisely saved for all queries in Query Frequency Matrix (QFM). Based on these requirements, this work is set to substantially utilized fragmentation procedure drawn in (Abdalla, 2014) along with slightly-made modifications (Fig. 1) as follows:

1.
Relations under consideration are set to be defined and their predicates are bound to be identified.
2.
Individually, all most-constantly used queries that are observed to reach each relation would be kept and considered regardless of their type (either retrieval or update). Query Frequencies over sites, Retrieval and Update Frequencies of queries over data in all sites would carefully be drawn into QFM, QRM and QUM matrices respectively. Using these matrices along with fragmentation cost model, data fragmentation is set to be activated.
3.
Based on Eq. (4) along with the mentioned-above matrices, Attribute Frequency Accumulation (AFAC) would be introduced.
4.
As per Eq. (5), Communication Cost Matrix (CSM) would be converted into Distance Costs Matrix (DCM) by applying Minimum Algorithm as presented in (Abdalla, 2014). Then, using Eq. (6), DCM is multiplied by AFAC to yield Total Frequencies of Attributes predicate Matrix (TFAM).
5.
After that, TFAM is used for attributes individually to compute the entire pay (total) of access costs and then sort all attributes according to their pays.
6.
Finally, among these attributes, attribute of highest pay would be selected to be Candidate Attribute “CA” by which fragmentation process is to be successfully conducted, Eq. (7).

Fig. 1 — The proposed Technique Architecture.

3.3. Objective function

T C_{1} = \sum_{j = 1}^{m} \sum_{i = 1}^{m} \sum_{k = 1}^{q} (1 - X_{k j}) * (Q R M_{k j}) * F_{s i z e} * C M S_{i j}

(1)

T C_{2} = \sum_{j = 1}^{m} \sum_{i = 1}^{m} \sum_{k = 1}^{q} (1 - X_{k j}) * (Q U M_{k j}) * F_{s i z e} * C M S_{i j}

(2)

T C_{t o t a l} = T C_{1} + T C_{2}

(3)

It is worth mentioning that objective function is basically taken from (Abdalla, 2014) and intrinsically evolved to be best befitting present work’s circumstances so as to actual reality of transmission cost (TC) is being precisely reflected (Sewisy et al., 2017). While the first equation is set to be used to measure costs incurred as distributed retrieval queries are being processed, Eq. (2) is going to measure costs yielded as a result of performing distributed update queries running over DDBS. Transmission Costs in Total would therefore be accurately calculated with the use of Eq. (3). The effects of drawn objective function, however, are conspicuously illustrated in the demonstrated-below discussion section. Meanwhile, TC₁ and TC₂ (Eqs. (1) and (2)) represents Transmission costs in terms of retrieval operations and Transmission costs in terms of Update operations respectively. While CMS stands for either costs matrix between sites (CSM) or costs matrix between (Cn) clusters of sites (CCM). Finally, F_size stands for considered fragment’s size, and X_ij is binary variable drawn to indicate fragment’s allocation over sites.

3.4. Fragmentation cost functions

A F A C_{j i h} = \sum_{j = 1}^{m} \sum_{i = 1}^{n} \sum_{h = 1}^{A} (R F_{j i h p} * Q F_{j i} + U F_{j i h p} * Q F_{j i})

(4)

D C M_{i j} = M i n (C S M_{i j}), 1 \leq i; j \leq m

(5)

T F A M_{j h} = \sum_{j = 1}^{m} \sum_{h = 1}^{A} (A F A C_{j i h} * D C M_{i j}), 1 \leq i; j \leq m

(6)

C A_{h} = M A X (\sum_{j = 1}^{m} \sum_{h = 1}^{A} T F A M_{j h})

(7)

Where QF, RF and UF are just abbreviations for elements of matrices of QFM, QRM and QUM respectively.

3.5. Site clustering algorithm

The presented algorithm of site clustering has precisely been designed based on the proved-to be-efficient concept of Least Difference Value (LDV) proposed in (Sewisy et al., 2017) which was essentially used to cluster DDBS queries. In this work, however, clustering algorithm behaves differently from that of (Sewisy et al., 2017) which was basically done based on threshold values. Compared to proposed algorithm of this work, this threshold value-based algorithm seems (in some cases) to either minimize number of site clusters to inaccurate extent or maximize clusters to an excessive undesirable rang that in both cases bounds to adversely come at the expense of DDBS performance as shown in discussion section. Threshold-based algorithm is slightly shown to better behave than (Abdalla, 2014), though. Therefore, instead of using threshold algorithm to cluster sites in (Sewisy et al., 2017), LDV concept is fully utilized to initiate first clusters of sites using communication costs. After that, to keep clustering the remaining sites, the least average of communication cost between sites would be used as metric to delicately pull each site into its relative cluster (among those already initiated). Consequently, this clustering procedure is set to keep proceeding in such pattern to decide all sites’ belonging. Number of clusters, on the other hand, is subjected to nothing but behavior of algorithm. This algorithm is carefully drawn so that cluster of sites would be kept at minimum, though. Communication costs within and between clusters (CCM) thus are of key importance to be taken for data allocation and performance evaluation alike, particularly in non-replication scenario. Finally, as per (Ceri et al., 1982; Al-Sayyed et al., 2014), the cost matrix is assumed to be a symmetric between sites (and between clusters), and costs between the same sites are considered to be a zero or, Tables 1 and 2).

Table 1.

Communication Cost Matrix between Sites (CSM); Four Sites.

Site/Site	S1	S2	S3	S4
S1	0	5	9	18
S2	5	0	16	4
S3	9	16	0	11
S4	18	4	11	0

Stud-no	Stud-Name	In-date	Position	Fund	Proj-Place	Proj-Id
1	Anna	11/01/2015	Leader	15000	P1	112
2	Ingrid	01/06/2014	Follower	12000	P2	113
3	Diana	29/03/2016	Follower	11500	P3	112
4	Nadeem	21/11/2015	Follower	10000	P1	111
5	Michel	11/01/2015	Follower	11000	P1	113
6	Amber	05/05/2016	Follower	9500	P2	114
7	Brown	29/03/2015	Leader	9000	P1	112
8	Sid	11/01/2016	Follower	11000	P2	111
9	Danial	01/06/2014	Follower	10000	P3	111

Attribute	Type	Length (Bytes)
Stud-no	Nominal	3
Stud-Name	Categorical	30
In-date	Categorical	36
Position	Categorical	4
Fund	Numerical	6
Proj-Place	Categorical	7
Proj-Id	Nominal	4

Directive	Response
Enter no of queries	5
Enter no of sites	4
Enter no of attributes	6
For attribute 1, enter no of predicates	0
For attribute 2, enter no of predicates	3
For attribute 3, enter no of predicates	0
For attribute 4, enter no of predicates	3
For attribute 5, enter no of predicates	0
For attribute 6, enter no of predicates	3

Fragment/Site	S1	S2	S3	S4
F1	1
F2	1	1	0 (Capacity Violation)	1
F3	1	0 (Fragment Limit Violation)	1	1

Fragment/Site	S1	S3	S4
F1		0 capacity violation so to site of next max	1
F2	1	0 capacity violation so to site of next max
F3			1

Fragment/Cluster	C1		C2
Fragment/Site	S1	S3	S2	S4
F1	1			1
F2	1	0 (capacity violation)	1	1
F3	1	1	0 (Fragment Limit Violation)	1

S#	Q#	Frequency	Activity Mod	In-date			Fund			Proj-place
S#	Q#	Frequency	R/U	P1	P2	P3	P1	P2	P3	P1	P2	P3
S5	Q1	2	R	1	0	0	1	1	2	0	0	1
S5	Q1		U	2	0	1	1	0	0	2	2	0
S5	Q3	3	R	0	2	1	2	5	0	1	3	1
S5	Q3		U	1	0	0	2	1	0	2	0	1
S5	Q5	2	R	1	0	1	2	0	1	2	2	1
S5	Q5		U	1	0	2	3	1	0	0	0	3
S6	Q2	2	R	3	1	0	2	3	1	1	2	0
S6	Q2		U	0	1	1	2	2	0	1	2	0
S6	Q3	1	R	0	2	1	2	5	0	1	3	1
S6	Q3		U	1	0	0	2	1	0	2	0	1

S#/F#	F1	F2	F3
S1	264	524	224
S2	274	434	310
S3	260	442	160
S4	242	398	158
S5	220	384	232
S6	246	412	254

S#/F#	F1	F2	F3
S1	624	824	224
S2	650	754	310
S3	560	676	160
S4	530	570	158
S5	588	752	232
S6	602	714	254

Fragment/Site	S1	S2	S3	S4	S5	S6
F1	0	1	0 capacity violation	0	0	0
F2	1	0 fragment limit violation	0 capacity violation	1	1	1
F3	1	0 fragment limit violation	1	1	1	1

Cluster/Cluster	C1(S1S3)	C2(S2S4)
C1(S1S3)	0	5
C2(S2S4)	5	0

Cluster/Cluster	C1(S2S6)	C2(S1S4)	C3(S3S5)
C1(S2S6)	0	3	5
C2(S1S4))	3	0	3
C3(S3S5)	5	3	0

S #	Capacity (C) in byte	Fragment Limit (FL)
S₁	1000	6
S₂	900	1
S₃	250	3
S₄	870	4

S #	Capacity (C) in byte	Fragment Limit (FL)
S₁	1000	6
S₂	900	1
S₃	250	3
S₄	870	4
S5	950	2
S6	710	2

S#/F#	F1	F2	F3
S1	264	524	224
S2	274	434	310
S3	260	442	160
S4	242	398	158
S5	220	384	232
S6	246	412	254

S#/F#	F1	F2	F3
S1	624	824	224
S2	650	754	310
S3	560	676	160
S4	530	570	158
S5	588	752	232
S6	602	714	254

Problem#	Experiment#	Dataset Cardinality	Allocation Scenario	Actual Queries#	Original Queries#	Sites#	Clusters#
P₁	1	9	Scenario (1)	16	5	4	2
	2	-	Scenario (2)	-	-	-	-
	3	-	Scenario (3)	-	-	-	-

P₂	4	50	Scenario (1)	16	-	4	2
	5	-	Scenario (2)	-	-	-	-
	6	-	Scenario (3)	-	-	-	-

P₃	7	9	Scenario (1)	26	-	6	3
	8	-	Scenario (2)	-	-	-	-
	9	-	Scenario (3)	-	-	-	-

P₄	10	50	Scenario (1)	26	-	6	3
	11	-	Scenario (2)	-	-	-	-
	12	-	Scenario (3)	-	-	-	-

P₅	13	200	Scenario (1)	40	-	12	5
	14	-	Scenario (2)	-	-	-	-
	15	-	Scenario (3)	-	-	-	-

PERMALINK

An optimized approach for simultaneous horizontal data fragmentation and allocation in Distributed Database Systems (DDBSs)

Ali A Amer

Adel A Sewisy

Taha MA Elgendy

Abstract

1. Introduction

2. Related work

3. Methodology

3.1. Motivations

3.1.1. General motivations

3.1.2. Particular motivations

3.2. Horizontal fragmentation model

Fig. 1.

3.3. Objective function

3.4. Fragmentation cost functions

3.5. Site clustering algorithm

Table 1.

Table 2.

Table 3.

Table 4.

3.6. Proposed data allocation and replication model

3.6.1. Problem description

3.6.2. Data allocation requirements

3.6.3. Data allocation scenarios

3.6.4. Data allocation cost functions

Table 5.

Table 6.

3.7. The proposed fragmentation and allocation algorithm

3.7.1. Data fragmentation process

3.7.2. Sites clustering before data allocation

3.7.3. Communication costs calculation as evaluation process being initiated

3.7.4. Data allocation process

4. Results

Table 7.

Table 8.

Fig. 2.

Fig. 3.

4.1. First experiment (Network of four sites)

4.1.1. Fragmentation process

Table 9.

Table 10.

Table 11.

Table 12.

Table 13.

Table 14.

Table 15.

4.1.2. Data fragments allocation

Table 16.

Table 17.

Table 18.

Table 19.

Table 20.

Table 21.

Table 22.

Table 23.

4.2. Second experiment (Network of six sites)

4.2.1. Fragmentation process

Table 24.

Table 25.

4.2.2. Data fragments allocation

Table 26.

Table 27.

Table 28.

Table 29.

Table 30.

Table 31.

Table 32.

Table 33.

5. Discussion

Table 34.

Fig. 4.

Fig. 5.

Fig. 6.

Fig. 7.

Fig. 8.

Fig. 9.

Fig. 10.

Fig. 11.

Fig. 12.

S#/F#	F1	F2	F3
S1	264	524	224
S2	274	434	310
S3	260	442	160
S4	242	398	158
S5	220	384	232
S6	246	412	254

S#/F#	F1	F2	F3
S1	624	824	224
S2	650	754	310
S3	560	676	160
S4	530	570	158
S5	588	752	232
S6	602	714	254