GWU - UPC NPB Benchmark Suite
            ===============================

The George Washington University - High Performance Computing Laboratory
    - http://hpcl.gwu.edu/

Work done under the GWU UPC Project, supervised by Professor Tarek El-Ghazawi
    - tarek@gwu.edu

Project website :
    - http://threads.hpcl.gwu.edu/sites/npb-upc
    - SVN repository of the code, bug reports, ...

I) Presentation and remarks
---------------------------
  The following kernels are available:

    - CG - Conjugate Gradient: This benchmark computes an approximation to the
        smallest eigenvalue of symmetric positive definite matrix. This kernel
        features unstructured grid computations requiring irregular long-range
        communications.
    - EP - Embarrassingly Parallel: This benchmark can run on any number of
        processors with little communication. It estimates the upper achievable
        limits for floating point performance of a parallel computer. This
        kernel generates pairs of Gaussian random deviates according to a
        specific scheme and tabulates the number of pairs in successive annuli.
    - FT - Fast Fourier Transform: This benchmark solves a 3D partial
        differential equation using an FFT-based spectral method, also
        requiring long range communication. FT performs three one-dimensional
        (1-D) FFT's, one for each dimension.
    - IS - Integer Sort: This benchmark is a parallel sorting program based on
        the bucket sort. It requires a lot of total exchange communication.
    - MG - MultiGrid: This benchmark uses a V-cycle multigrid method to compute
        the solution of the 3-D scalar Poisson equation. It performs both short
        and long range communications that are highly structured.
    - BTIO - Test of different parallel I/O techniques

II) Content of the distribution
-------------------------------
  Follows a short-description of the structure of the NAS NPB UPC distribution:

  README    <- This file
  CG/       <- contains the CG kernel and Makefile files
  EP/       <- contains the EP kernel and Makefile files
  FT/       <- contains the FT kernel and Makefile files
  IS/       <- contains the IS kernel and Makefile files
  MG/       <- contains the MG kernel global.h and Makefile files
  ??/variants/  <- gathers different UPC variants of the ?? problem, including
        a dynamic memory allocated O0 version, a static memory allocated O3
        version (and sometimes even a dynamic memory allocated O1 version) (for
        details over variants O0, O1 and O3, please check the notes)
  bin/      <- executables will be built in this directory
  common/   <- common files (C files)
  config/   <- Configuration files
               (See section III: Building instructions)
  sys/      <- C file to create the npbparams.h for each workload

  Notes:
    - The notations O0, O1 and O3 are referring to the paper "UPC Benchmarking
        Issues" (Tarek El-Ghazawi, Sebastien Chauvin, 30th Annual Conference
        IEEE International Conference on Parallel Processing, 2001 (ICPP01) 
        Pages 365-372).

        O0: No privatization, no prefetching
        O1: Privatization hand-optimization implemented (local shared accesses
            converted as much as possible to private accesses)
        O2: No privatization but prefetching implemented (prefetching of block
            of shared references) (NOTE: There is no O2 version of any 
            NPB workload)
        O3: Privatization and prefetching hand-coded.

    - All these problems have been implemented using dynamically allocated
        shared memory. Several (CG, FT, IS) have a statically allocated shared
        memory variant.


III) Building instructions
--------------------------
 a) The built need to be configured for your specific UPC compiler.
    Some defaults are set in the config/Makefile.default file.
    config/Makefile.in will replace those defaults;
        cp config/Makefile.default config/Makefile.in
        vim config/Makefile.in  (Configure it for your specific compiler)
    More advanced options are located in config/make.def 
 b) Go to the workload directory (e.g. cd CG)
 c) Clean the current binaries files present in the workload directory (e.g.
    gmake clean)
 d) Make the binary for the CLASS and Number of Processors chosen using the
    most optimized UPC version (e.g. gmake CLASS=A NP=4)

 Make options:
 ------------
    * VARIANT: can be specified during compilation to use a different UPC
        variant of a given workload (gmake CLASS=A NP=4 VARIANT=O1).
       It can be O0, O1, O1static, O3 and O3static, depending on the available
        implementations of a given workload.
    * CLASS: can be S, W, A, B or C (smaller to larger sizes). A larger CLASS
        (CLASS D) is even present in the NPB2.4 workloads (except IS).
    * NP: Number of threads (limited by the type of workload and the number of
        CPU present).
    * USE_MONOTONIC_CLOCK=1: Make use of the system monotonic clock for more
        precise timing results.

 e) Run the UPC Binary file created


V) Revision History
-------------------
 v1.00: Initial Effort - Implemented in a way similar to MPI. The distribution
    is no longer available on the web. 
 v2.00: First Release - May 9th 2003
 v2.01: Minor Changes - Improvement of the Makefiles in order to avoid to do a
    'make clean' before each compilation - May 13th 2003
 v2.02: Bug fix       - Do a single useful upc_all_lock_alloc() call instead of
    two (FT workload) - May 16th 2003
 v2.03: New Workload  - CG added to the kernels (O0, O1, O3) - May 19th 2003
 v2.04: New Workload  - IS added to the kernels (O0, O1) - May 29th 2003
 v2.05: Started conjoint development of NAS 2.4 - CG, EP, FT, IS kernels - 
    June 5th 2003
 v2.06: New Workload  - MG added to the kernels (O0, O1, O3) - June 26th 2003
 v2.07: New Makefile accepting VARIANT flag, new template for Berkeley UPC
    Compiler available in config/models/ - July 14th 2003
 v2.08: Portability of the support/ scripts over HP Tru64 Unix sh shell - 
    July 15th 2003
 v2.09: Various bugfix over CG and MG, implementation of a file_output for MG -
    November 2004

 v2.20 and later release : Please consult the ChangeLog file

 After version 2.20, the version numbering scheme changed as follow :
    npb-NASVERSION-YY.MM.tar.gz
    So, npb-upc-2.4-11.02.tar.gz stands for NAS Parallel Benchmarks for
        the 2.4 NAS specification, released February 2011.

VI) Acknowledgements
--------------------
  Please consult the AUTHORS file for the complete list of people having
contributed to this software.