mankind is for enlightenment ...

I am, Nafee Mostafa Sadh, blogging about the ever refreshing earth, my land Bangladesh, my culture Bangaliyana, my lifestyle Islam and about literature, esthetics, history, heritage, fashion, about science, engineering, technology, electronics, information, networking, computing, programming and also about a lot more topics. But it is more about insight and idea which make people human.

Tuesday, April 7, 2009

Sadh's Talk on GPU Computing

Graphics processing units - powerful, programmable, and highly parallel - are increasingly targeting general-purpose computing applications. GPU Computing - J. D. Owens, M. Houston, D. Luebke, S. Green, J. E. Stone, and J. C. Phillips

What is GPU Computing?

GPU for General Purpose Computing

The idea of GPU Computing

GPU computing is the use of a GPU to do general purpose scientific and engineering computing

· The model is to use a CPU and GPU together in a heterogeneous computing model.

· The sequential part of the application runs on the CPU and the computationally-intensive part runs on the GPU.

· From the user’s perspective, the application just runs faster because it is using the high-performance of the GPU to boost performance.

Why GPU Computing…

Over the past few years, the GPU has evolved from a fixed-function special-purpose processor into a full-fledged parallel programmable processor with additional fixed-function special-purpose functionality

GPU for Non-Graphic Apps

The GPU is designed for a particular class of applications with the following characteristics,

· Computational requirements are large

· Parallelism is substantial

· Throughput is more important than latency

A growing community has identified other applications with similar characteristics and successfully mapped these applications onto the GPU

GPU extends its hand towards CPU for performance

· Parallelism is the future of computing

· Many programs have to process huge set of data following same functions

· Several stream processors can execute same set of instructions on different data sets and give a higher throughput

If GPU take some share of computation load from CPU, many applications can be benefitted in speed-up

GPU Architecture and Evolution

GPU is now turned into a programmable engine

GPU Pipeline

  • Input of Geometric Primitives
  • Vertex Operations
  • Primitive Assembly
  • Rasterization
  • Frament Operation
  • Composition
  • Screen Output

Evolution…

Fixed Function Pipeline : lack of generality
More fully featured instruction set
Unified Shader Model
Increased Program-mability
Program-mable engine surrounded by supporting fixed function units

GPU Computing Model

All GPU programs must be structured in this way: many parallel elements, each processed in parallel by a single program

Computing on the GPU

Programming a GPU for Graphics

1. programmer specifies geometry covering a screen region; rasterizer generates a fragment at each pixel location

2. Each fragment is shaded by the fragment program (FP).

3. FP computes the fragment by a combination of math operations and global memory reads

4. Resulting image can be used as texture on future passes.

Programming a GPU for General-Purpose Programs (Old)

1. programmer specifies geometric primitive covering computation domain of interest; rasterizer generates fragment

2. Each fragment is shaded by an SPMD general purpose FP

3. FP computes the fragment by a combination of math operations and ‘gather’ accesses from global memory.

4. Resulting buffer can be used as an input on future passes.

Programming a GPU for General-Purpose Programs (New)

1. programmer directly defines the computation domain of interest as a structured grid of threads

2. SPMD general-purpose program computes each thread

3. each thread is computed by a combination of math operations and both ‘gather’ (read) accesses from and ‘scatter’ (write) accesses to global memory; (same buffer can be used for both allowing more flexible algorithms)

4. resulting buffer in global memory can then be used as an input in future computation

Software Environments

· BrookGPU

· Microsoft’s Accelerator

· Vendor Specific GPGPU systems

o AMD ATI’s CTM (Close to the Metal)

o NVIDIA’s CUDA (Compute Unified Device Architecture)

Scan performance on CPU, OpenGL and CUDA

Scan performance on CPU, graphics-based GPU (using OpenGL), and direct-compute GPU (using CUDA). Results obtained on a GeForce 8800 GTX GPU and Intel Core2-Duo Extreme 2.93 GHz CPU.

(Figure adapted from Harris et al.)

Future…

· support for double-precision floating-point

· higher bandwidth path between CPU and GPU (like ATI’s HyperTransport)

· more tightly coupled CPU and GPU (AMD’s fusion or nVidia nForce)

· NVIDIA Quadro for Multiple GPU Collaboration

Finally, let us wait for new era when GPU Computing will rule

Thank You

it must be noted that, this blog is generated from a presentation slide, originally created for the first seminar of the course "Seminar" as an UG student at Dept of Computer Science and Engineering of Khulna Univ of Engg and Tech (KUET)