Skip to content

Median filter in 2D images optimized for CUDA

I recently was thinking about how a median filter could be implemented fast and precisely in 2D for small range kernels. I will skip the entire part of boundary conditions and walking over the image because I think that is quite straightforward to understand. In the end, we will have a 3x3, 5x5 or higher 2n+1 sized patch and want to get the median from it. So the challenge is to sort a uneven length array partially and retrieve the middle element.

Previous work

I found other implementations of this online but they did not work. I tried to use the famous implementation of Morgan McGuire, Williams College which suprisingly does not give the correct result for 1/3 of my test cases when using the 5x5 filter. A simple example that fails is

1 1 0 0 0
0 0 0 0 0
0 0 0 0 0
1 1 1 1 1
1 1 1 1 1

which will return 1 as the median.

I sadly do not have access to the mentioned book and could not double check if his implementation. The 3x3 implementation seems to work and is also based on a really clever approach of excluding the min and max. Nevertheless, that implementation cannot be easily adapted to the 25 element case.

Concept

I derived the the median filter from a bitonic sorting network. The advantage of this architecture is that the sorting steps can be parallelized. It is furthermore a network with a fixed order of comparisons which is optimal for the GPU since we do not have divergence of the threads.

The bitonic networks are designed to work with 2n elements but we will just fill the lanes on top and bottom with inf and then exclude all the comparisons that will never be swapped (dashed lines). I drew the network for the simple case of deriving a nine element median filter from a 16 element sorting network below.

9 element median filter network

The blue operations do not need to be performed since they are not relevant for the median filter. The green operation will be undone in the next step and hence does not need to be performed. The purple operation can be performed by defining the lane index. The same operation can be done for the 5x5 median filter.

25 element median filter network

Due to the nature of the network, multiple cores can sort in parallel which is highly beneficial for parallelized architecture such as CUDA.

A simplistic implementation of the 25 element version is shown below.

C++
#define EXCH(a, b)                       \
  temp = window[a];                      \
  window[a] = min(window[a], window[b]); \
  window[b] = max(temp, window[b]);

#define EXCH23(a, b, c, d, e, f) \
  EXCH(a, b);                    \
  EXCH(c, d);                    \
  EXCH(e, f);

#define EXCH24(a, b, c, d, e, f, g, h) \
  EXCH(a, b);                          \
  EXCH(c, d);                          \
  EXCH(e, f);                          \
  EXCH(g, h)

__device__ __host__ float Median25(std::array<float, 25>& window) {
  float temp;
  EXCH24(0, 1, 3, 2, 4, 5, 7, 6);
  EXCH24(8, 9, 11, 10, 12, 13, 15, 14);
  EXCH24(16, 17, 19, 18, 20, 21, 23, 22);

  EXCH24(3, 1, 2, 0, 1, 0, 3, 2);
  EXCH24(4, 6, 5, 7, 4, 5, 6, 7);
  EXCH24(11, 9, 10, 8, 9, 8, 11, 10);
  EXCH24(12, 14, 13, 15, 12, 13, 14, 15)
  EXCH24(19, 17, 18, 16, 17, 16, 19, 18);
  EXCH24(20, 22, 21, 23, 20, 21, 22, 23);

  EXCH24(11, 7, 10, 6, 9, 5, 8, 4);
  EXCH24(12, 16, 13, 17, 14, 18, 15, 19);
  EXCH(24, 20)

  EXCH24(0, 2, 1, 3, 0, 1, 2, 3)
  EXCH24(7, 5, 6, 4, 7, 6, 5, 4)
  EXCH24(11, 9, 10, 8, 11, 10, 9, 8)
  EXCH24(12, 14, 13, 15, 12, 13, 14, 15)
  EXCH24(16, 18, 17, 19, 16, 17, 18, 19)
  EXCH24(23, 21, 22, 20, 23, 22, 21, 20)
  EXCH24(0, 8, 1, 9, 2, 10, 3, 11)
  EXCH24(24, 16, 23, 15, 22, 14, 21, 13)
  EXCH(20, 12)

  EXCH24(4, 8, 5, 9, 6, 10, 7, 11)
  EXCH24(19, 15, 18, 14, 17, 13, 16, 12)
  EXCH(24, 20)
  EXCH24(0, 2, 1, 3, 0, 1, 2, 3)
  EXCH24(4, 6, 5, 7, 4, 5, 6, 7)
  EXCH24(8, 10, 9, 11, 8, 9, 10, 11)
  EXCH24(15, 13, 14, 12, 15, 14, 13, 12)
  EXCH24(19, 17, 18, 16, 19, 18, 17, 16)
  EXCH24(23, 21, 22, 20, 23, 22, 21, 20)

  EXCH24(0, 16, 1, 17, 2, 18, 3, 19)
  EXCH24(4, 20, 5, 21, 6, 22, 7, 23)
  EXCH(8, 24)

  EXCH24(12, 20, 13, 21, 14, 22, 15, 23)
  EXCH24(16, 24, 17, 9, 18, 10, 19, 11)
  EXCH24(12, 16, 13, 17, 14, 18, 15, 19)
  EXCH23(12, 14, 13, 15, 14, 15);
  return window[15];
}

The implementation was tested against all 225 binary combinations as well as the same number of random patches of float values. The sorting consists of a total of 135 compare and swaps.

Bibtex

Anything written here is without guarantee to work but feel free to use it in any way. Feel free to tag the idea in any publications.

@inproceedings{McGuire2008Median, 
  author = {Urs Hofmann}, 
  title = {Deriving small sized median filters from bitonic sorting networks},
  month = {December},
  year = {2024}, 
  url = {https://hofmannu.org/articles/median-filter-2d.html}, 
}

Created by Urs Hofmann