SFB 376 

SFB 376 - Project A2

The DIstributed VAriables Library


Some Implementation Details

The DIVA library includes several variants of the access tree strategy described on the algorithm page and a standard caching strategy that uses a fixed home for each data object. The library also provides routines for barrier synchronization and the locking of global variables. The synchronization and locking routines are implementations of elegant algorithms that use access trees.

Variations of the access tree strategy

The original access tree strategy uses a 2-ary hierarchical mesh decomposition. We have implemented variations of the access tree strategy using trees of different heights and degrees. The idea behind varying the degree of the access trees is to reduce the overhead due to additional startups.

4-ary and 16-ary mesh decompositions lead to 4-ary and 16-ary access trees, respectively. The 4-ary decomposition just skips the odd decomposition levels of the 2-ary decomposition, and the 16-ary decomposition then skips the odd decomposition levels of the 4-ary one. Another way to get flatter access trees is to terminate the hierarchical mesh decomposition with submeshes of size k. An access tree node that represents a submesh of size k' <= k gets k' children, one for each of the nodes in the submesh.

The degree of the access trees and the smallest submeshes to be partitioned can be specified as a program parameter.

The fixed home strategy

For the purpose of comparison we have implemented a further strategy called the "fixed home strategy". This strategy follows a standard approach in which each global variable is assigned a processor keeping track of the variable's copies. This processor is called the home of the variable. The home of each variable is chosen uniformly at random from the set of processors. In order to manage the copies of the data objects, we use the well known ownership scheme:

At any time either one of the accessing processors or the home processor can be viewed as the owner of a global variable. Initially, the home is the owner of the variable. A write access issued by a processor that is not the owner of the data object assigns the ownership to this processor. A read access issued by another processor moves the ownership back to the home. Write accesses of the owner can be served locally, whereas all other write accesses have to invalidate all existing copies and create a new copy at the writing processor. Read accesses by processors that do not hold a copy of the requested data object move a copy from the owner to the home processor (if the home is not the current owner itself) and a further copy to the reading processor. In this way, subsequent read accesses of that processor can be served locally.

Barrier synchronizations

The implemented barrier synchronizations allow to synchronize arbitrary groups, i.e., subsets of the processors. The groups can be generated dynamically at run time. With each group, we associate a randomly embedded access tree. A call for a barrier synchronization sends a message upwards in the access tree. Each node of the access tree waits until it receives a synchronization message from each of its children that represent a submesh including at least one processor of the group. After receiving all these messages the access tree node either sends a synchronization message to its parent node, or, if the node represents the smallest submesh that includes all members of the group, it initiates a multicast downwards along the branches of the access tree notifying all group members about the completion of the barrier.

Locking mechanisms

The implemented locks are attached directly to global variables. A call for a lock deletes all of the variable's copies and creates a new exclusive copy on the processor that has requested the lock. All subsequent read, write, or lock requests of other processors are blocked as soon as they reach the node holding the exclusive copy. If the lock is released then the work of the blocked requests is continued as usual. Note that the processor holding the locked copy does not become a bottleneck even if all other processors try to access the locked variable simultanously because the data management strategy running on each access tree guarantees that at most one request for a variable is blocked at each endpoint of a tree edge.

Harald Räcke, Matthias Westermann, August 2000