Median filter in 2D images optimized for CUDA
I recently was thinking about how a median filter could be implemented fast and precisely in 2D for small range kernels. I will skip the entire part of boundary conditions and walking over the image because I think that is quite straightforward to understand. In the end, we will have a 3x3, 5x5 or higher
Previous work
I found other implementations of this online but they did not work. I tried to use the famous implementation of Morgan McGuire, Williams College which suprisingly does not give the correct result for 1/3 of my test cases when using the 5x5 filter. A simple example that fails is
1 1 0 0 0
0 0 0 0 0
0 0 0 0 0
1 1 1 1 1
1 1 1 1 1
which will return 1 as the median.
I sadly do not have access to the mentioned book and could not double check if his implementation. The 3x3 implementation seems to work and is also based on a really clever approach of excluding the min and max. Nevertheless, that implementation cannot be easily adapted to the 25 element case.
Concept
I derived the the median filter from a bitonic sorting network. The advantage of this architecture is that the sorting steps can be parallelized. It is furthermore a network with a fixed order of comparisons which is optimal for the GPU since we do not have divergence of the threads.
The bitonic networks are designed to work with
The blue operations do not need to be performed since they are not relevant for the median filter. The green operation will be undone in the next step and hence does not need to be performed. The purple operation can be performed by defining the lane index. The same operation can be done for the 5x5 median filter.
Due to the nature of the network, multiple cores can sort in parallel which is highly beneficial for parallelized architecture such as CUDA.
A simplistic implementation of the 25 element version is shown below.
#define EXCH(a, b) \
temp = window[a]; \
window[a] = min(window[a], window[b]); \
window[b] = max(temp, window[b]);
#define EXCH23(a, b, c, d, e, f) \
EXCH(a, b); \
EXCH(c, d); \
EXCH(e, f);
#define EXCH24(a, b, c, d, e, f, g, h) \
EXCH(a, b); \
EXCH(c, d); \
EXCH(e, f); \
EXCH(g, h)
__device__ __host__ float Median25(std::array<float, 25>& window) {
float temp;
EXCH24(0, 1, 3, 2, 4, 5, 7, 6);
EXCH24(8, 9, 11, 10, 12, 13, 15, 14);
EXCH24(16, 17, 19, 18, 20, 21, 23, 22);
EXCH24(3, 1, 2, 0, 1, 0, 3, 2);
EXCH24(4, 6, 5, 7, 4, 5, 6, 7);
EXCH24(11, 9, 10, 8, 9, 8, 11, 10);
EXCH24(12, 14, 13, 15, 12, 13, 14, 15)
EXCH24(19, 17, 18, 16, 17, 16, 19, 18);
EXCH24(20, 22, 21, 23, 20, 21, 22, 23);
EXCH24(11, 7, 10, 6, 9, 5, 8, 4);
EXCH24(12, 16, 13, 17, 14, 18, 15, 19);
EXCH(24, 20)
EXCH24(0, 2, 1, 3, 0, 1, 2, 3)
EXCH24(7, 5, 6, 4, 7, 6, 5, 4)
EXCH24(11, 9, 10, 8, 11, 10, 9, 8)
EXCH24(12, 14, 13, 15, 12, 13, 14, 15)
EXCH24(16, 18, 17, 19, 16, 17, 18, 19)
EXCH24(23, 21, 22, 20, 23, 22, 21, 20)
EXCH24(0, 8, 1, 9, 2, 10, 3, 11)
EXCH24(24, 16, 23, 15, 22, 14, 21, 13)
EXCH(20, 12)
EXCH24(4, 8, 5, 9, 6, 10, 7, 11)
EXCH24(19, 15, 18, 14, 17, 13, 16, 12)
EXCH(24, 20)
EXCH24(0, 2, 1, 3, 0, 1, 2, 3)
EXCH24(4, 6, 5, 7, 4, 5, 6, 7)
EXCH24(8, 10, 9, 11, 8, 9, 10, 11)
EXCH24(15, 13, 14, 12, 15, 14, 13, 12)
EXCH24(19, 17, 18, 16, 19, 18, 17, 16)
EXCH24(23, 21, 22, 20, 23, 22, 21, 20)
EXCH24(0, 16, 1, 17, 2, 18, 3, 19)
EXCH24(4, 20, 5, 21, 6, 22, 7, 23)
EXCH(8, 24)
EXCH24(12, 20, 13, 21, 14, 22, 15, 23)
EXCH24(16, 24, 17, 9, 18, 10, 19, 11)
EXCH24(12, 16, 13, 17, 14, 18, 15, 19)
EXCH23(12, 14, 13, 15, 14, 15);
return window[15];
}
The implementation was tested against all
Bibtex
Anything written here is without guarantee to work but feel free to use it in any way. Feel free to tag the idea in any publications.
@inproceedings{McGuire2008Median,
author = {Urs Hofmann},
title = {Deriving small sized median filters from bitonic sorting networks},
month = {December},
year = {2024},
url = {https://hofmannu.org/articles/median-filter-2d.html},
}