This post was written by Risto Vuorio as part of AIHelsinki Academia study group on Natural Image Statistics. This text mostly covers material from Natural Image Statistics book chapter 2 available here http://www.naturalimagestatistics.net/.

# Linear Filters and Frequency Analysis

This post is written as part of AIHelsinki study group on Natural Image Statistics. This text mostly covers material from Natural Image Statistics book chapter 2 available here http://www.naturalimagestatistics.net/. The format of this post is something between learning diary and a summary but I tried to include some original examples with reproduction instructions in the text.

The topics covered in this post are mostly the same as in the textbook chapter 2 including linear filtering, frequency analysis, and space-frequency analysis. Some material has been left out and there may be inaccuracies so please consider this post supplemental to the textbook chapter.

## Linear Filtering, Convolution and Impulse Response

Linear filtering is an image processing technique where a filter is applied to an input image to produce an output. Inputs and outputs of the linear filtering operation are images. The filter itself can be thought of as an image as well. The filtering operation corresponds to sliding the filter on the input pixel by pixel and computing the output.

The output \(O(x,y)\) is given by following equation \[\label{eq:linear-filtering}

O(x,y) = \sum_{x_*=-K}^{K}\sum_{y_*=-K}^{K}{W(x_*,y_*)I(x+x_*,y+y_*)} \qquad(1) \] where \(W\) is the filter of size \((2K+1)*(2K+1)\), \(I\) is the input image and \(O\) is the output image. For each position \((x,y)\) the output is given by the linear correlation of the filter with the sub-area in the same position in the input image. The output pixel at \((x,y)\) is computed by taking the sum over the products of the filter values at positions \((x_*,y_*)\) and the input values centered at \((x,y)\) offset by \((x_*,y_*)\).

Convolution is an operation closely related to linear correlation. The term convolution might be familiar to people following news on neural networks and deep learning. We are talking about the convolution in Convolutional Neural Networks. Convolution is defined as \[I_1(x,y) \ast I_2(x,y) = \sum_{x_*=-\infty}^{\infty}\sum_{y_*=-\infty}^{\infty}{I_1(x-x_*,y-y_*)I_2(x_*,y_*)} \qquad(2) \] where \(I_1\) and \(I_2\) are images. As can be seen here the main difference between linear correlation and convolution is the sign of the terms \(x_*\) and \(y_*\). Convolution and linear filtering are related through what is called the impulse response.

Impulse response \(H(x,y)\) is the response of a filter to a short impulse. Short impulse is defined as either Dirac delta function for continuous systems or Kronecker delta function for discrete systems such as 2D images. The Kronecker delta function is given by \[\delta(x,y) = \begin{cases}

1, & \text{if x = y}\\

0, & \text{otherwise}

\end{cases} \qquad(3) \]

The impulse response \(H(x,y)\) of a filter \(W(x,y)\) can be computed by plugging the delta function above to the equation of linear filtering [eq:linear-filtering] defined at the top of the page. By doing so you can observe that \(H(x,y) = W(-x,-y)\). This defines the relationship between convolution and linear correlation. The filter used in convolution is thus a ”mirror image“ of the weights used in linear correlation.

Why would anyone bother mirroring their weight matrix to compute convolutions instead of linear correlations you may ask? There are two reasons. In the context of natural image statistics this is done because the impulse response has desired properties in the frequency based analysis of linear filtering, which we will come back to in the next section. The other reason is that the convolution operation offers mathematical convenience as it is a commutative operation http://www.deeplearningbook.org/contents/convnets.html.

With convolution in our toolbox we can express the filtering operation using the impulse response as \[O(x,y) = I(x,y) \ast H(x,y) \qquad(4) \]

## Frequency-based representation

By looking at the values of a 2D filter it is often hard to tell what its effect is on the input image. Frequency-based representation is an useful tool for the analysis of such linear filters. Frequency-based representation is a way of representing signal in terms of sinusoidal components.

The regular way to represent images is called the spatial representation. The spatial representation of an image is a set of numbers which represent the intensity of the pixel in that channel (in grayscale there’s one channel, in RGB there are three etc.). In the frequency based representation the image is given as a set of numbers that describe the *amplitudes* and *phases* of the sinusoidal components of the image. The sinusoidal signal is of the form \(A \cos(\omega x + \Psi)\) where \(A\) is the amplitude, \(\omega\) is the frequency and \(\Psi\) is the phase.

The frequency-based representation of a one dimensional signal is given by \[I(x) = \sum_{u=0}^{U-1}{A_{u}\cos(\omega_{u}x+\Psi_{u})} \qquad(5) \] where \(U = (N-1)/2\) if the original signal contains N real-valued numbers. For images and other two-dimensional data the signal becomes a bit more complicated to account for the extra dimension. Two dimensional frequency-based representation of a signal is given by \(A \cos(\omega_{x} x + \omega_{y} y +

\Psi)\), where \(\omega_{x}\) and \(\omega_{y}\) stand for frequencies in corresponding directions. The two dimensional sinusoidal components are basically just cosine waves on a plane with different frequencies, amplitudes and phases. The figure [fig:sinusoidal-waves] present examples of such waves.

The transformation from the spatial domain to the frequency domain can be done using discrete Fourier transformation. The theory of discrete Fourier transformation is quite involved and will not be reviewed here. The textbook chapter 20 provides a mathematical treatment of the theory of Fourier analysis. Details on how DFT is used for acquiring the frequency based representation are provided in the textbook chapter 2. For now we are going to focus on the uses of the frequency-based representation.

Linear filters and sinusoidals have a special relationship. The output of linear filtering on a sinusoidal wave is a sinusoidal wave with same frequency but possibly different phase and amplitude. If the input signal has the following frequency-based representation \[I(x,y) = \sum_{\omega_{x}}{\sum_{\omega_{y}}{A_{\omega_{x},\omega_{y}}\cos(\omega_{x}x+\omega_{y}y+\Psi_{\omega_{x},\omega_{y}})}} \qquad(6)\] the response of the linear filter has the following frequency-based representation \[\begin{split}

O(x,y) & = H(x,y) \ast I(x,y) = \\

& \sum_{\omega_{x}}{\sum_{\omega_{y}}{\lvert \tilde{H}(\omega_{x},\omega_{y})\rvert A_{\omega_{x},\omega_{y}} \cos(\omega_{x} x + \omega_{y} y + \Psi_{\omega_{x},\omega_{y}} + \angle\tilde{H}(\omega_{x},\omega_{y}))}}

\end{split} \qquad(7) \] where \(\lvert\tilde{H}(\omega_{x},\omega_{y})\rvert\) denotes the amplitude magnification factor of the linear filter and \(\angle\tilde{H}(\omega_{x},\omega_{y})\) denotes the phase shift of the filter. The \(\lvert\tilde{H}(\omega_{x},\omega_{y})\rvert A_{\omega_{x},\omega_{y}}\) part of the equation is called the amplitude response.

The use of this property of the frequency-based representation makes analysis of the linear filters straightforward. It is difficult to describe the behavior of a filter based solely on its spatial representation but if its amplitude response is available you can tell what the filter does for different frequencies.

### Example of Linear Filtering

The concepts described above are demonstrated in the following code snippet. NumPy is used throughout the example and the library name abbreviated as np.

First a gaussian kernel is extracted from the SciPy gaussian filtering operation using the properties of impulse response. Next the filter is applied to an input image using 2d convolution. The input image is just a 2d NumPy array containing the grayscale input image. The image can be read using PIL, SciPy, OpenCV or similar. And finally the frequency-based representation of the filter is obtained using Fast-Fourier Transformation.

```
# Here we make use of the properties of the impulse
# response to get a gaussian kernel using the SciPy
# gaussian filter. Create an array with all zeros
# except for one element in the center.
impulse = np.zeros((21,21))
impulse[10,10] = 1.
# Apply the gaussian filter to the impulse to obtain
# the gaussian kernel. gaussian_filter is
# scipy.ndimage.filters.gaussian_filter
gaussian_kernel = gaussian_filter(impulse, 3)
# Convolve the kernel with the example image
filtered_input = signal.convolve2d(img, gaussian_kernel)
# Compute the frequency-based representation of the
# filter. fftshift is a convenience method for
# organizing the fft-output
f_b_representation = np.abs(
np.fft.fftshift(np.fft.fft2(gaussian_kernel)))
```

The gaussian kernel extracted from the SciPy filtering operation and its amplitude response are shown in the figure [fig:gaussian-kernel]. The input image for the convolution and the results of convolution with Gaussian are presented in the figure [fig:gaussian-input-output]. (source: https://commons.wikimedia.org/wiki/File:Carl_Friedrich_Gauss_1840_by_Jensen.jpg)

The amplitude response of the filter kernel is shown in figure [fig:gaussian-kernel]. In the picture dark pixels correspond to small amplitude and light pixels correspond to high amplitude. The amplitude of the 0 frequency is shown in the center of the picture. The pixels away from the center have higher frequencies. The direction where the pixel is from the center corresponds to the direction of the wave.

The Gaussian filter does not have any surprising properties in the frequency domain. In the frequency-based representation only the components very close to the origin remain, which means the high-frequency components have been removed. The final image is blurred as a result.

With a Gaussian kernel computed with smaller standard deviation more of the high frequency components are retained. Figure [fig:gaussian-kernel-2] presents the new kernel and its amplitude response. The corresponding amplitude response shows that there are more high frequency components retained (bright pixels away from origin). Filtering with such kernel will result to less blurred image compared to the kernel in figure [fig:gaussian-kernel].

## Frequency-based Representation as a Basis

The frequency-based representation can be viewed as defining a linear basis and finding coefficients in that basis. The idea is that any signal of the form \(A

\cos(\omega x + \Psi)\) can be represented in the form \(C\cos(\omega x) +

S\sin(\omega x)\) where \(S\) and \(C\) are some coefficients in our basis. This is useful because finding \(S\) and \(C\) in a linear basis are linear operations and it makes the calculation of amplitude \(A\) and phase \(\Psi\) simple. The formulas for \(A\) and \(\Psi\) in terms of \(S\) and \(C\) are given in the following equations [eq:amplitude-s-c], [eq:phase-s-c]. \[\label{eq:amplitude-s-c}

A = \sqrt{C^{2}+S^{2}} \qquad(8) \] \[\label{eq:phase-s-c}

\Psi = -\arctan{\frac{S}{C}} \qquad(9) \] This rather superficial treatment is augmented in the textbook section 2.3.

## Space-frequency Analysis

The tools described above are useful for analyzing linear filters but run into trouble if we want to gain understanding about larger images. Running DFT on a large image will give us a very high-level view on what sort of components the image is made of, but it completely hides the spatial information in the original image. Next we are going to look at methods that combine spatial and frequential information.

Space-frequency analysis can be conducted by filtering the image with two localized sinusoidal filters and computing the amplitudes of the filters. The localized sinusoidal filters are called Gabor filters. On a high-level their idea is that the sinusoidal plane wave is weighted by a windowing function, making it very localized. The formula for coefficients \(C\) becomes \[C(x,y) = \sum_{x}{\sum_{y}{W(x-x_{0}, y-y_{0}) \cos(\omega_{x} (x – x_{0}) + \omega_{y} (y – y_{0})) }} \qquad(10) \] The formula for coefficients \(S\) is similiar. With the coefficients \(C\) and \(S\) at hand, the amplitudes can be calculated using the formula [eq:amplitude-s-c] given in the previous section.

The windowing function \(W(x,y)\) in the above formula for Gabor filters is the Gaussian function. The width of the window can be controlled by changing the \(\sigma\) parameter of the Gaussian function. In 2D case there are separate standard deviation parameters for both axes. In practical computation truncated windows are used as the gaussian function falls off close to zero quickly.

### Example of Gabor Filtering

In the following example Gabor filters are applied to an image to obtain a spatial amplitude map of a vertical frequency component. The amplitude is computed from the equation \(A = \sqrt{C^{2}+S^{2}}\). The Gabor filters are presented in figure [fig:gabor-kernel]. The filters have a phase difference of \(\pi\). The output of the amplitude calculation is presented in figure [fig:gabor-input-output]. As expected from the filter images the figure shows that the selected filters resulted in high output where vertical features are present in the input image.

*The input image and corresponding spatial amplitude map computed by convolving the gabor kernels with the image and computing the amplitude according to equation*

[\label{eq:linear-filtering}\]

## Leave a Reply