Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Medical Image Processing.pdf
Скачиваний:
26
Добавлен:
11.05.2015
Размер:
6.14 Mб
Скачать

178

L. Domanski et al.

Intuitively, one would expect that efforts would be best directed towards improving the performance of those steps, which consume the largest percentage of time, assuming they can be accelerated. Therefore, we will focus our attention on the implementation of the initial linear feature detection on the GPU.

8.3.3 Parallel MDNMS on GPUs

The MDNMS algorithm with extensions to support symmetry checks and dual local maxima detection (Sects. 8.2.2 and 8.2.3) consists of four steps for each window orientation:

1.Detecting primary maxima by NMS with given window size.

2.Detecting candidate secondary maxima by NMS with a smaller window size.

3.Symmetry check on primary maxima.

4.Secondary maxima search and symmetry check in presence of positive primary detection.

NMS is achieved most simply using a brute force neighborhood filter. In this case, each pixel is compared directly to every pixel within its local linear window to determine whether it is the maximum in the window. Although it is possible to reuse the max operator observations across nearby pixels to improve performance in a serial context [2], this would not work effectively in a parallel implementation where nearby pixels can be processed concurrently and where the order of operations is not predictable. The symmetry checks are performed using similar brute force filters, but carry out different operations on the values within a pixel’s linear window.

In each of these brute force neighborhood filters, the result for a single pixel is not dependent on the output of other pixels, and can be performed in parallel on the GPU using one thread to calculate each output pixel. Examples of the parallel filter kernels are shown in Listing 8.1 through Listing 8.3. Note that the NMS kernel is executed with two different windows sizes to produce the primary and secondary

Listing 8.1 Parallel NMS kernel

8 High-Throughput Detection of Linear Features: Selected Applications...

179

Listing 8.2 Parallel symmetry check kernel

Listing 8.3 Parallel secondary maxima search and symmetry check

180

L. Domanski et al.

maxima images “prim” and “sec.” Each of these parallel kernels can be executed using a simple 2D grid and block configuration that assigns one thread to each output pixel.

8.3.4 Combining Steps for Efficiency

In Sect. 8.3.1, we discussed the high latency of GPU RAM accesses (where the input image resides) compared to on-chip data access to registers, shared memory or cache memories. Because of these latencies, it is important to minimize transfers to GPU RAM where practical. The steps outlined in Sect. 8.3.3 can easily be performed using separate parallel image filters by executing separate GPU kernel functions for each step. However, a solution with fewer GPU RAM accesses can be developed if we consider the following properties of the filters:

1.NMS: A pixel can be compared with the pixels in its linear window in any order to determine its suppression status.

2.NMS: Testing a pixel’s suppression status in a given linear window is a subtask of doing the same for a larger linear window at the same orientation.

3.Symmetry check: Performing a symmetry check on a maxima pixel requires the same set of values as its primary suppression test

With these properties identified, we can combine the two NMS steps and the primary maxima symmetry check into a single kernel shown in Listing 8.4. The code in this kernel visits the first and second halves of a pixel’s linear window separately, allowing it to calculate the average value in each half-window while simultaneously checking whether the pixel is a maximum. It also assesses the secondary NMS result for a pixel at the same time as the primary NMS. This allows everything except the secondary maxima symmetry check to be calculated after reading a pixel’s primary linear window values only once from GPU RAM. In contrast, using a separate kernel for each filter requires these values to be read multiple times from RAM, since on-chip storage is not persistent between kernel launches. It would also require the values of the primary NMS kernel to be communicated to the primary symmetry check kernel via GPU RAM. It should be noted that this combination of steps might also help speed up the CPU algorithm by reducing overall workload and memory accesses.

The secondary maxima search and symmetry check around a primary maximum cannot be performed efficiently in the same kernel as the other filters. This is because the thread responsible for a pixel requires both the secondary NMS and half-window average values calculated by other threads to avoid recalculating them. It is non-trivial to share these values using shared memory, as many threads with data interdependency relationships will belong to different processing blocks. These threads will not be able to access each other’s shared memory space or synchronize their execution. All NMS and half-window averages are, therefore, written to global memory by the kernel in Listing 8.4, and a different kernel is utilized to facilitate the exchange of values and perform the necessary processing.

8 High-Throughput Detection of Linear Features: Selected Applications...

181

Listing 8.4 Parallel kernel calculating primary and secondary NMS, and primary symmetry check

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]