The DIVA library provides fully transparent, distributed data management
for large parallel and distributed systems in which the processors and the
memory modules are connected by a relatively sparse network. The current
implementations focus on mesh-connected massively parallel processors
(MPPs).
Efficient data management provides transparent and fast
access to the shared data objects from any processor in the network.
Our implementations are based on the "access tree strategy" introduced
in FOCS'97.
The access tree strategy describes how copies of the data objects
are distributed in the network. In particular, it answers the
following questions:
How many copies of a data object should be created?
On which memory modules should these copies be placed?
How should consistency among the copies be maintained?
The access tree strategy aims to minimize the routing congestion, that is,
it minimizes the communication overhead while simultaneously balancing
the communication load evenly among the network resources.
The access tree strategy is based on a hierarchical decomposition of the
network. This decomposition allows to exploit topological locality in
an efficient way. The following picture illustrates the decomposition.
The hierarchical decomposition has associated with it a decomposition
tree, in which each node corresponds to one of the submeshes,
i.e., the root of the tree corresponds to the mesh itself, and the children
of a node in the tree correspond to the two submeshes into which
the submesh corresponding to that node is divided. The decomposition
tree corresponding to the mesh decomposition above is shown in the following
picture.
The node labels correspond to the labels of the mesh.
In order to compare the congestion in both networks we define bandwidths
for the edges in the decomposition tree.
These bandwidths correspond to the number of edges leaving the respective
submesh.
The edge labels in the picture above indicate the bandwidths of the
edges.
For each global variable, define an access tree to be a
copy of the decomposition tree.
We embed the access trees randomly into the mesh, i.e., for each variable,
each node of the access tree is mapped at random
to one of the processors in the corresponding submesh.
The remaining description of our data management strategy is very simple:
We simulate a simple caching strategy for trees on each of the access trees.
More details, including the caching strategy for trees and theoretical
results on the efficiency of the access tree strategy, can be found in
FOCS'97.
Christof Krick,
Harald Räcke,
Berthold Vöcking,
Matthias Westermann,
October 1998