    CSE Home Page CSE Research CSE Site Map CSE Faculty
DARPA HPCS
Discrete Mathematics Benchmarks
 DARPA HPCS Discrete Math Benchmarks ### Overview

There are six (6) benchmarks in the DARPA HPCS Discrete Mathematics suite for high productivity computing. All have been coded in Fortran 77 with MIL-SPEC Fortran extensions or in C. The benchmarks have been written for 64-bit machines. In addition, some of the benchmarks are available in MPI, Shmem, and UPC. The code provided implements a reasonably efficient algorithm. Program documentation includes the statement of the problem, description of the algorithm, definition of variables, and comments preceding sections of code. DARPA is interested in all six benchmarks and therefore strongly encourages experimentation with all of them.

Some of these benchmarks have specified sizes for particular data structures. Random Access scales to fill the memory of whatever system it runs on. One can expect that the scale of the other HPCS Discrete Math benchmarks will likely change as a function of time.

Some rules for these benchmarks exist.

This information has been transcribed from information provided by Bob Lucas of the Information Sciences Institute of the University of Southern California. We hope that the transcriptions have been accurate; any transcription errors are ours and not theirs.

Further information on HPC benchmarks, including some results on Benchmark 0 (Random Access) can be found at an HPC Challenge website at the University of Tennessee.

### 0. Random Access

Let T be a table of size 2^n and let S be a table of size 2^m filled with random 64-bit integers. Let {A_i} be a stream of 64-bit integers of length 2^{n+2} generated by the primitive polynomial over GF(2), X_{63}+1. For each a_i, set

T[a_i <63, 64-n>] = T[a_i <63,64-n>] + S[a_i < m-1,0 >].

where "+" denotes addition in GF(2), i.e. exclusive "or", and a_i denotes a sequence of bits within a_i such that are the lowest m bits.

Parameters;

• Part a:
• m = 9
• n = 8 , 9, 10, ..., maximum usable integer
• Part b:
• m such that 2^m is half the size of cache
• n such that 2^n is half of memory

Some further information is also available, as well as a description of some sample performance.

### 1. Multiple Precision Arithmetic

Given two N x N matrices with entries in the ring of integers modulo some number M, compute the product of the matrices.

Parameters: N = 225, M = 7^{183} - 6

### 2. Dynamic Program

Given:
• A_1, A_2 , ..., A_K, N x N floating point matrices (Assume all values in the A matrices are nonnegative)
• D, a vector of length T, containing 8-bit integers where 1 <= D(t) <= K for t = 1, 2,...,T
Calculate:
• Y_0, Y_1, ... Y_T, 64-bit floating point vectors of length N
• P_1, P_2, ... P_T, 32-bit integer vectors of length N
• B, a T+1-long 32-bit integer vector where 1 <= B(t) <= N for t = 0, 1, 2, ..., T
according to the following algorithm.
• Y_0 (i) = 1 for i = 1, 2, ..., N
• For t = 1, 2, ..., T
• k = D(t)
• For j = 1, 2, ..., N
• Y_t (j) = max_i { Y_{t-1} (i) * A_k (i, j) }
• P_t (j) = any i which maximized Y_t (j)
• Next j
• Next t
• Let Z = max_i { Y_T } (i) and let I be any i which maximized Z.
• Set B(T) = I
• For t =T - 1, T - 2, ..., 1, 0
• Set B(t) = P_{t+1} (B(t+1))
• Next t
• Output Z and B(t) for t = 0, 1, 2, ..., T.

Parameters:
• Dense case
• N = 10, K = 30, T = 100,000
• A_i ( i= 1,2 ,..., K) is dense.
• Sparse case
• N = 10^6, K = 2, T = 2000
• A_i (i= 1,2, ..., K) has at most 4 non-zero elements per column and at most 4 non-zero elements per row.

### 3. Data Transposition

Let {A_i} be a stream of n bit integers of length L. Consider each successive block of n integers as an n x n matrix of bits. For each such matrix, transpose the bits such that bit b_{ij} is interchanged with bit b_{ji}. The resulting stream {A'_i} will have the property that:
• A'_1 ... A'_{n+1} A'_{2n+1} etc., will hold sequentially the high order bits of {A_i}
• A'_2 ... A'_{n+2} A'_{2n+2}, etc., will hold sequentially the next-to-high order bits of {A_i},
• and so forth.

Output the stream {A'_i}.

Parameters:
• n = 32, L = 10^7, ITER = 400
• n = 64, L = 10^7, ITER = 230
• n = 1024, L = 10^7, ITER = 12

### 4. Integer Sort

Given { A_i }, a stream of unsigned n-bit integers of length N, generate the stream { B_i } which is { A_i } sorted in ascending order. In-place sorting is not required and there may be non-unique integers in the stream Output selected values (as specified in the program code) from the stream { B_i }.

Parameters:
• Part a: In-core sort
• n = 64, N = 10^6, ITER = 300
• n = 128, N = 5*10^7, ITER = 1
• Part b: Secondary memory sort
• n = 64, N = 5*10^7

This is a disk-based sort that uses an in-place radix sort for in-core sorting and a 2-way merge sort for the disk sorting.

BASIC ALGORITHM:
Given a file of random number of length RUNSIZE*NUMRUNS, break it into NUMRUNS runs of length RUNSIZE. Sort each run independently and write it to disk. The runs are merge sorted together by making several passes through the data, creating successively larger runs until the whole file is sorted.

MEMORY REQUIREMENTS:
For maximum efficiency set RUNSIZE as large as possible. It will be approximately 1/4 the system memory. There are two input buffer arrays of size RUNSIZE and one output buffer of size (2*RUNSIZE). There is one additional array used by the radix sort of size (64*65536).

DISK REQUIREMENTS:
The disk requirements are quite large. There is a file of random numbers RUNSIZE*NUMRUNS in size. There are also four files on disk that simulate tapes (T1, T2, T3, T4) and total 2*RUNSIZE*NUMRUNS in size. These five files can be placed in any directory by identifying said directories on the command-line.

### 5. Boolean Equation Solving

Let {s_i} be a stream of binary digits of length L. Search for all occurrences of the pattern P where certain bits in the pattern may be either 1 or 0, and let j be the position in the stream immediately after an occurrence of P.

Example: Assume P = 000???100??110?10?1111, where `?' can be either 1 or 0.

Then in the stream we might have an alignment

```011010001110101000001011001111001001111101...
000???100??110?10?1111
<- - - - - P - - - - >j
```

Starting at position j, form 700 equations in 700 unknowns over GF(2) as follows:

```       s_{j}x_1 +      s_{j+1}x_2 + ... +    s_{j+699}x_{700} = s_{j+700} (1)
s_{j+701}x_1 +    s_{j+702}x_2 + ... +   s_{j+1400}x_{700} = s_{j+1401}(2)
s_{j+1402}x_1 +   s_{j+1403}x_2 + ... +   s_{j+2101}x_{700} = s_{j+2102}(3)
.      .
.      .
.      .
s_{j+489999}x_1 + s_{j+490000}x_2 + ... + s_{j+490698}x_{700} = s_{j+490699} (700)
```

For each occurrence of the pattern P in {s_i}, form the above equations and determine whether the equations are dependent or independent. If the equations are independent, then find the unique solution. Gaussian elimination over GF(2) is the suggested method. Output for all occurrences of P the list (j, flag, solution) where:

• j = starting position of the equations
• flag = { 1, if the equations are independent; 0, otherwise}
• solution = { unique Gaussian elimination solution (if it exists); 0, otherwise}

Parameter: L = 10^7