General Programming on the GPU - Confoo

GPUs: Not Just for
Graphics Anymore
David Ostrovsky | Couchbase

GPGPU refers to using a Graphics
Processing Unit (GPU) to perform
computation in applications
traditionally handled by the CPU.

• Image processing, graphics rendering
• Fractal images (e.g. Mandelbrot set)
• String matching
• Distributed queries, MapRecuce
• Brute-force cryptographic attacks
• Bitcoin mining
Embarrassingly Parallel Problems

Amdahl’s Law
The speedup of a
program using multiple
processors in parallel
computing is limited by
the sequential fraction of
the program.

GPGPU Concepts
• Texture: A common way to provide the
read-only input data stream as a 2D grid.
• Frame Buffer: A write-only memory
interface for output.
• Kernel: The operation to perform on each
unit of data. Roughly similar to the body
of a loop.

Parallelizing Your Code
void compute(float in[10000], float *out[10000])
{
for(int i=0; i < 10000; i++)
*out[i] = func(in[i]);
}
Texture Frame Buffer
Kernel

• OpenCL
• Subset of C99
• Implementations for Intel,
AMD, and nVidia GPUs
• CUDA
• C++ SDK, wrappers for
other languages
• Only supported on nVidia
GPUs
GPGPU Frameworks
• C++ AMP
• Subset of C++
• Microsoft
implementation
based on DirectX,
integrated into
Visual Studio
• Supports most
modern GPUs

• OpenCL
• Vendor-specific SDKs,
available from Intel, AMD,
IBM, and nVidia
• Wrappers for popular
languages, including C#,
Python, Java, etc.
• Supports multiple vendor-
specific debuggers
Client Integration
• C++ AMP
• Native C++
projects, P/Invoke
from .NET, WinRT
component, any
language that can
interoperate with
native libraries
• Supports GPU
debugging, profiling

Using C++ AMP
extern "C" __declspec ( dllexport ) void _stdcall square_array(float* arr, int n)
{
array_view<float,1> dataView(n, &arr[0]);
parallel_for_each(dataView.extent, [=] (index<1> idx) restrict(amp)
{
dataView[idx] = dataView[idx] * dataView[idx];
});
dataView.synchronize();
}
Native DLL

Using C++ AMP
[DllImport("NativeAmpLibrary", CallingConvention = CallingConvention.StdCall)]
extern unsafe static void square_array(float* array, int length);
float[] arr = new[] { 1.0f, 2.0f, 3.0f, 4.0f };
fixed (float* arrPt = &arr[0]) {
square_array(arrPt, arr.Length);
}
Managed Code

Using OpenCL
C# Project NuGet Package

Using Aparapi (OpenCL)
Aparapi Java Code
• Converts Java bytecode to
OpenCL at runtime
• Syntax somewhat similar to
C++ AMP
final float[] data = new float[size];
Kernel kernel = new Kernel(){
@Override public void run() {
int gid = getGlobalId();
data[gid] = data[gid] * data[gid];
}
};
kernel.execute(Range.create(512));

Demo Time!
Simple GPGPU Applications

Case Study 1: Edge Detection
Sobel Operator
Pixels can be checked
in parallel
Find all the points in the
image where the
brightness changes
sharply.

More Demo Time!
Processing a Video Stream

Case Study 2: Password Cracking
Passwords are commonly stored as hashes of the original plain
text: "12345" = "5994471abb01112afcc18159f6cc74b4f511b99806da59b3caf5a9c173cacfc5"
Cracking a password by
brute force requires
repeatedly hashing
guesses until a match is
found – can be
parallelized effectively.

Even More Demos!
Cracking a Single Password Hash with a Dictionary Attack

Thank you!
@DavidOstrovsky
CodeHardBlog.azurewebsites.net
linkedin.com/in/davidostrovsky
davido@couchbase.com
David Ostrovsky | Couchbase

General Programming on the GPU - Confoo

More Related Content

What's hot

Viewers also liked

Similar to General Programming on the GPU - Confoo

Recently uploaded

General Programming on the GPU - Confoo

Editor's Notes