|
SFB 376 - Project A2
The DIstributed
VAriables Library
|
Some Implementation Details
The DIVA library includes several variants of the access tree strategy
described on the algorithm page and
a standard caching strategy that uses a fixed home for each data object.
The library also provides routines for barrier synchronization and
the locking of global variables. The synchronization and locking routines are
implementations of elegant algorithms that use access trees.
Variations of the access tree strategy
The original access tree strategy uses a 2-ary hierarchical mesh decomposition.
We have implemented variations of the access tree strategy using trees of
different heights and degrees. The idea behind varying the degree of
the access trees is to reduce the overhead due to additional startups.
4-ary and 16-ary mesh decompositions lead to 4-ary and 16-ary access
trees, respectively. The 4-ary decomposition just skips the odd
decomposition levels of the
2-ary decomposition, and the 16-ary decomposition then skips the odd
decomposition levels of the 4-ary one.
Another way to get flatter access trees is to terminate the hierarchical mesh
decomposition with submeshes of size k.
An access tree node that represents a submesh of size k' <= k
gets k' children, one for each of the nodes in the submesh.
The degree of the access trees and the smallest submeshes to be
partitioned can be specified as a program parameter.
The fixed home strategy
For the purpose of comparison we have implemented a further strategy called
the "fixed home strategy".
This strategy follows a standard approach in which each
global variable is assigned a processor keeping track of the variable's copies.
This processor is called the home of the variable.
The home of each variable is chosen uniformly at random from the set
of processors.
In order to manage the copies of the data objects, we use the well known
ownership scheme:
At any time either one of the accessing processors or the home processor
can be viewed as the owner of a global variable.
Initially, the home is the owner of the variable.
A write access issued by a processor that is not the owner of the
data object assigns the ownership to this processor.
A read access issued by another processor moves the ownership back
to the home.
Write accesses of the owner can be served locally, whereas all other
write accesses have to invalidate all existing copies and create a
new copy at the writing processor.
Read accesses by processors that do not hold a copy of the requested
data object move a copy from the owner to the home processor
(if the home is not the current owner itself)
and a further copy to the reading processor.
In this way, subsequent read accesses of that processor can be served locally.
Barrier synchronizations
The implemented barrier synchronizations allow to synchronize
arbitrary groups, i.e.,
subsets of the processors. The groups can be generated dynamically at
run time.
With each group, we associate a randomly embedded access tree.
A call for a barrier synchronization sends a message upwards in the
access tree. Each node of the access tree waits until it receives a
synchronization message from each of its children that represent
a submesh including at least one processor of the group.
After receiving all these messages the access tree node either sends
a synchronization message to its parent node, or, if the node
represents the smallest submesh that includes all members of the group,
it initiates a multicast downwards along the branches of the access tree
notifying all group members about the completion of the barrier.
Locking mechanisms
The implemented locks are attached directly to global variables.
A call for a lock deletes all of the variable's copies and creates a new
exclusive copy on the processor that has requested the lock.
All subsequent read, write, or lock requests of other processors
are blocked as soon as they reach the node holding the exclusive copy.
If the lock is released then the work of the blocked requests is
continued as usual. Note that the processor holding the locked copy
does not become a bottleneck even if all other processors try to
access the locked variable simultanously because the data management
strategy running on each access tree guarantees that at most one request
for a variable is blocked at each endpoint of a tree edge.
Harald Räcke,
Matthias Westermann,
August 2000