Collecting OpenCL GPU Kernel Performance Counters

CodeXL

PreviousNext
CodeXL User Guide
Help > Tutorials > GPU Profiler Tutorial > Collecting OpenCL GPU Kernel Performance Counters
Collecting OpenCL GPU Kernel Performance Counters

Run a Performance Counters GPU Profile session

1.      Open or create a CodeXL project.

2.      Select the GPU: Performance Counters profile mode.

3.      Click the (Start  Profiling) toolbar button to start profiling.

4.      Stop the profiled application when the part of the application that is under investigation has completed its execution.

When the profiled application’s execution is done, CodeXL displays the session.

The GPU kernel performance counters can help find possible bottlenecks in the kernel execution. You can find the list of performance counters supported by AMD Radeon™ GPUs in the tool documentation. Once we have used the trace data to discover which kernel most requires optimization, we can collect the GPU performance counters to drill down into the kernel execution on a GPU device. Using the performance counters, you can:

·         Find the number of resources (general-purpose registers, local memory size, and flow-control stack size) allocated for the kernel. These resources affect the possible number of in-flight wavefronts in the GPU. A higher number better hides data latency.

·         Determine the number of ALU, global, and local memory instructions executed by the GPU.

·         Determine the number of bytes fetched from, and written to, the global memory.

·         Determine the use of the SIMD engines and memory units in the system.

·         View the efficiency of the shader compiler in packing ALU instructions into the VLIW instructions used by AMD GPUs.

·         View any local memory (Local Data Share - LDS) bank conflicts. The Session View (see the screenshot above) shows the performance counters for a profile session. The output data is recorded in a comma-separated variable (.csv) format. You can also click on the kernel name entry in the "Method" column to view the OpenCL kernel source, AMD Intermediate Language (IL), GPU ISA, or CPU assembly code for that kernel.