DARPA HPCS
Discrete Mathematics Benchmarks


DARPA HPCS Discrete Math Benchmarks



Overview
There are six (6) benchmarks in the DARPA HPCS Discrete Mathematics suite
for high productivity computing.
All have been coded in Fortran 77 with MILSPEC Fortran extensions or in C.
The benchmarks have been written for 64bit machines.
In addition, some of the benchmarks are available in MPI, Shmem, and UPC.
The code provided implements a reasonably efficient algorithm.
Program documentation includes the statement of the problem, description of
the algorithm, definition of variables, and comments preceding sections of code.
DARPA is interested in all six benchmarks and therefore strongly encourages
experimentation with all of them.
Some of these benchmarks have specified sizes for particular data structures.
Random Access scales to fill the memory of whatever system it runs on.
One can expect that the scale of the other HPCS Discrete Math benchmarks
will likely change as a function of time.
Some
rules
for these benchmarks exist.
This information has been transcribed from information provided by Bob Lucas
of the Information Sciences Institute of the University of Southern California.
We hope that the transcriptions have been accurate;
any transcription errors are ours and not theirs.
Further information on HPC benchmarks, including some results
on Benchmark 0 (Random Access) can be found at an
HPC Challenge
website at the University of Tennessee.


0. Random Access
Let
T
be a table of size
2^n
and let
S
be a table of size
2^m
filled with random 64bit integers. Let
{A_i}
be a stream of
64bit integers of length
2^{n+2}
generated by the primitive polynomial over
GF(2),
X_{63}+1.
For each
a_i,
set
T[a_i <63, 64n>] = T[a_i <63,64n>] + S[a_i < m1,0 >].
where "+" denotes addition in
GF(2),
i.e. exclusive "or", and
a_i
denotes a sequence of bits within
a_i
such that
are the lowest
m
bits.
Parameters;

Part a:

m = 9

n = 8 , 9, 10, ...,
maximum usable integer

Part b:

m
such that
2^m
is half the size of cache

n
such that
2^n
is half of memory
Some
further information
is also available, as well as a
description
of some sample performance.


1. Multiple Precision Arithmetic
Given two
N x N
matrices with entries in the ring of integers modulo some number
M, compute the product of the matrices.
Parameters:
N = 225,
M = 7^{183}  6


2. Dynamic Program
Given:

A_1, A_2 , ..., A_K, N x N floating point matrices
(Assume all values in the A matrices are nonnegative)

D, a vector of length T, containing 8bit integers where
1 <= D(t) <= K for t = 1, 2,...,T
Calculate:

Y_0, Y_1, ... Y_T,
64bit floating point vectors of length N

P_1, P_2, ... P_T,
32bit integer vectors of length N

B, a T+1long 32bit integer vector where
1 <= B(t) <= N for t = 0, 1, 2, ..., T
according to the following algorithm.

Y_0 (i) = 1 for i = 1, 2, ..., N

For t = 1, 2, ..., T

k = D(t)

For j = 1, 2, ..., N

Y_t (j) = max_i { Y_{t1} (i) * A_k (i, j) }

P_t (j) = any i which maximized Y_t (j)

Next j

Next t

Let Z = max_i { Y_T } (i) and let I be any
i which maximized Z.

Set B(T) = I

For t =T  1, T  2, ..., 1, 0

Set B(t) = P_{t+1} (B(t+1))

Next t

Output Z and B(t) for t = 0, 1, 2, ..., T.
Parameters:

Dense case

N = 10, K = 30, T = 100,000

A_i ( i= 1,2 ,..., K) is dense.

Sparse case

N = 10^6, K = 2, T = 2000

A_i (i= 1,2, ..., K) has at most 4 nonzero elements per
column and at most 4 nonzero elements per row.


3. Data Transposition
Let {A_i} be a stream of n bit integers of length L.
Consider each successive block of n integers as an n x n
matrix of bits.
For each such matrix, transpose the bits such that bit b_{ij} is
interchanged with bit b_{ji}.
The resulting stream {A'_i} will have the property that:

A'_1 ... A'_{n+1} A'_{2n+1}
etc., will hold sequentially the high order bits of
{A_i}

A'_2 ... A'_{n+2} A'_{2n+2},
etc., will hold sequentially the nexttohigh order bits of
{A_i},

and so forth.
Output the stream
{A'_i}.
Parameters:

n = 32, L = 10^7, ITER = 400

n = 64, L = 10^7, ITER = 230

n = 1024, L = 10^7, ITER = 12


4. Integer Sort
Given
{ A_i }, a stream of unsigned
nbit integers of length
N,
generate the stream
{ B_i }
which is
{ A_i }
sorted in ascending order.
Inplace sorting is not required and there may be nonunique
integers in the stream
Output selected values (as specified in the program code) from the stream
{ B_i }.
Parameters:

Part a: Incore sort

n = 64, N = 10^6, ITER = 300

n = 128, N = 5*10^7, ITER = 1

Part b: Secondary memory sort
This is a diskbased sort that uses an inplace radix sort for incore
sorting and a 2way merge sort for the disk sorting.
BASIC ALGORITHM:
Given a file of random number of length
RUNSIZE*NUMRUNS, break it into
NUMRUNS
runs of length
RUNSIZE.
Sort each run independently and write it to disk.
The runs are merge sorted together by making several passes through
the data, creating successively larger runs until the whole file is sorted.
MEMORY REQUIREMENTS:
For maximum efficiency set
RUNSIZE
as large as possible.
It will be approximately
1/4
the system memory.
There are two input buffer arrays of size
RUNSIZE
and one output
buffer of size
(2*RUNSIZE).
There is one additional array used by the radix sort of size
(64*65536).
DISK REQUIREMENTS:
The disk requirements are quite large.
There is a file of random numbers
RUNSIZE*NUMRUNS
in size.
There are also four files on disk that simulate tapes (T1, T2, T3, T4)
and total
2*RUNSIZE*NUMRUNS
in size.
These five files can be placed in any directory by identifying said
directories on the commandline.


5. Boolean Equation Solving
Let {s_i} be a stream of binary digits of length L.
Search for all occurrences of the pattern P where certain bits in the
pattern may be either 1 or 0, and let j be
the position in the stream immediately after an occurrence of P.
Example:
Assume P = 000???100??110?10?1111,
where `?' can be either 1 or 0.
Then in the stream we might have an alignment
011010001110101000001011001111001001111101...
000???100??110?10?1111
<     P     >j
Starting at position j, form 700 equations
in 700 unknowns over
GF(2) as follows:
s_{j}x_1 + s_{j+1}x_2 + ... + s_{j+699}x_{700} = s_{j+700} (1)
s_{j+701}x_1 + s_{j+702}x_2 + ... + s_{j+1400}x_{700} = s_{j+1401}(2)
s_{j+1402}x_1 + s_{j+1403}x_2 + ... + s_{j+2101}x_{700} = s_{j+2102}(3)
. .
. .
. .
s_{j+489999}x_1 + s_{j+490000}x_2 + ... + s_{j+490698}x_{700} = s_{j+490699} (700)
For each occurrence of the pattern P in {s_i},
form the above equations
and determine whether the equations are dependent or independent.
If the equations are independent, then find the unique solution.
Gaussian elimination over GF(2) is the suggested method.
Output for all occurrences of P the list (j, flag, solution)
where:

j = starting position of the equations

flag = { 1, if the equations are independent; 0, otherwise}

solution = { unique Gaussian elimination solution (if it exists);
0, otherwise}
Parameter:
L = 10^7




