Fine-Grain Parallel Computing: the Next Frontier in High
Performance Computing
The Boulder HPC Facility: Exploring New Computing Technologies for NOAA
Major Activites in 2011 / 2012
- Fortran to CUDA (or C) Compiler: The F2C-ACC compiler was released to the public in June 2009. While there are limitations development of this compiler has proven useful for the parallelization of the NIM and other weather models.
F2C-ACC Version 4.6 was released in October 2012 with updated documentation. This version provides limited support for modules. The distribution also contains a number of working examples and tests which can be compiled and run for Fortran only, Fortran + C , and Fortran + CUDA that runs on the GPU.
- Commercial Fortran GPU Compilers: We continue to evaluate the Fortran GPU compilers from CAPS (HMPP), and PGI (PGI Accel) using the NIM model. We are also evaluating a beta version of the Cray GPU compiler. These compiler vendors all have plans to support the Intel MIC and we plan to evaluate them in 2012. We hope to use these compilers to run other models including the HRRR (a WRF-ARW variant), HYCOM, and FIM.
- GPU Parallelization of NIM model dynamics using F2C-ACC: Parallelization efforts focused on (1) maintaining a single source code for CPU and GPU execution, and (2) run efficiently on the CPU while optimizing performance on the GPU. Performance optimizations we made for the GPU, also improved the CPU performance. Dynamics currently runs at 30 percent of the peak performance of the Intel Westmere CPU. Performance comparisons between Fermi (GPU) and Intel Westmere (CPU) show NIM runs 5 times faster on the GPU (socket-to-socket).
- GPU Parallelization of NIM Physics using F2C-ACC: This work has begun with exploratory work using select routines from WRF Physics. The emphasis in parallelization is to retain the original community code (written in Fortan). We are using F2C-ACC directives to convert the code to CUDA. Initial performance results for the YSU PBL is running 2x faster on the GPU (socket to socket). Speedup does not include the time to transfer data between the CPU and GPU.
Recent Presentations
- December 2011: M.Govett, J. Middlecoff, T.Henderson, J.Rosinski, and C.Tierney, Parallelization of the NIM Dynamical Core for GPUs, Workshop on Dynamical Cores for Climate Models,
- Infrastructure for the European Network for Earth System Modeling, Lecce Italy.- November 2011: M.Govett, J.Middlecoff,T.Henderson, J.Rosinski, C.Tierney, Successes and Challenges using GPUs for Weather and Climate Models, audio presentation at NVIDIA booth.
- SuperComputing 2011, Seattle, Washington.
- September 2011: M.Govett, T.Henderson, J.Middlecoff,J.Rosinski, Successes and Challenges Using GPUs for Weather and Climate Models, Workshop on Programming Weather, Climate, and Earth-System Models on Heteorogenous Multi-Core Platforms.
- National Center for Atmospheric Research (NCAR), Boulder, Colorado.- September 2011: M.Govett, GPGPUs for High Performance Computing, HPC 2011 - Front Range Computing Symposium.
- Front Range Computing Consortium (FRCC), Golden, Colorado.- August 2011: J. Rosinski, NIM: A Unified Model for Forecasting Weather and Climate, Titan Summit, Oak Ridge National Laboratory,
- U.S. Dept of Energy, Oak Ridge, Tennessee.
- July 2011: T.Henderson, M.Govett, J.Middlecoff, P.Madden, J.Rosinski, C.Tierney, Experience Applying Fortran GPU Compilers to Numerical Weather Prediction, Symposium on Application Accelerators in High-Performance Computing.
- University of Tennessee, Knoxville.
Prepared by Mark Govett, Mark.W.Govett@noaa.gov
Date of last update:July 18, 2012