Nvidia’s CUDA 6 adds support for unified memory models, solving a major bottleneck with tasks that need to operate on both the CPU and the GPU.
Nvidia has announced the release of its latest parallel programming platform, CUDA 6, bringing support for unified memory, drop-in libraries and scaling across multiple graphics processors.
CUDA, the Compute Unified Device Architecture, is Nvidia’s secret source for general-purpose GPU (GPGPU) coding. A rival to the Khronos Group’s OpenCL, CUDA offers the ability to write code which can be executed in parallel on Nvidia graphics processors – greatly accelerating parallelisable tasks, and making Nvidia GPUs a common choice for high-performance computing (HPC) and supercomputing projects. The latest TOP500 list of the world’s most powerful supercomputers sees 38 of the 500 using Nvidia GPU-based accelerators, with just two using AMD’s rival Radeon boards and 13 using Intel’s Many Integrated Core (MIC) x86 Xeon Phi boards.
The biggest change in CUDA 6 is designed to simplified writing code that can run on both the GPU and CPU while boosting performance: unified memory. Under CUDA 6, Nvidia explains, applications can access both CPU and GPU memory without the need to transfer data from one to the other – addressing a major bottleneck that can cripple the performance of CUDA-based GPGPU applications. Rival AMD has shown a similar progression in its hUMA model but CUDA 6 marks the first time Nvidia has supported it for GPGPU applications.
The new release also introduces drop-in libraries for basic linear algebra subprograms (BLAS) and fast Fourier transform (FFT), allowing developers to get a claimed eight-fold performance boost on these common calculation types simply by replacing their existing CPU-driven libraries with those provided in the SDK. The reason for the eight-fold figure? The new libraries allow for automatic performance scaling over up to eight GPUs in a single node – two for the FFTW library – offering, assuming you pick up Nvidia’s top-end accelerator, up to nine teraflops of double-precision performance and support for workloads of up to 512GB.
The new CUDA toolkit is available to download as a release candidate now from the official website.
Article source: http://feedproxy.google.com/~r/bit-tech/news/~3/nbsRfkzMnu8/1
Article source: http://feedproxy.google.com/~r/GamingRipplesWeb/~3/cDdUH6oKZ7M/