2. Image pyramid tutorial

2.1. Content

  • Laplacian pyramids

    • functions for constructing / manipulating Laplacian pyramids

    • aliasing in the Gaussian and Laplacian pyramids

    • Projection and Basis functions

  • QMF/wavelet pyramids

    • functions for constructing / manipulating QMF/Wavelet pyramids

    • perfect reconstruction: Haar and Daubechies wavelets

    • aliasing in wavelet transforms

A brief introduction to multi-scale pyramid decompositions for image processing. You should go through this, reading the comments, and executing the corresponding code. This file assumes a basic familiarity with matrix algebra, with linear systems and Fourier theory, and with Python. If you don’t understand a particular function call, execute help(functionName) to see documentation.

for basic tools see TUTORIALS/01_tools.ipynb
for multi-scale steerable pyramids see TUTORIALS/03_steerable.ipynb
[1]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

import pyrtools as pt
%load_ext autoreload
%autoreload 2
[2]:
im = plt.imread('../DATA/Einstein.pgm')
# im = pt.blurDn(im, 1, 'qmf9')
pt.imshow(im);
../_images/tutorials_02_pyramids_2_0.png

2.1.1. Laplacian Pyramids:

Images may be decomposed into information at different scales.

[3]:
# Blurring eliminates the fine scales (detail):
binom5 = pt.binomial_filter(5)
lo_filt = binom5*binom5.T
blurred = pt.rconv2(im, lo_filt) # NOTE this convolution is a slow, see corrDn instead
# rconv2: Convolution of two matrices, with boundaries handled via reflection about the edge pixels.

pt.imshow([im, blurred], title=['im', 'blurred']);
# help(pt.rconv2)
../_images/tutorials_02_pyramids_4_0.png
[4]:
# subtracting the blurred image from the original leaves ONLY the fine scale detail:
fine0 = im - blurred
pt.imshow([fine0, blurred], title=['fine0', 'blurred']);
../_images/tutorials_02_pyramids_5_0.png

The blurred and fine images contain all the information found in the original image. Trivially, adding the blurred image to the fine scale detail will reconstruct the original. We can compare the original image to the sum of blurred and fine using the “image_stats” function, which reports on the statistics of the difference between it’s arguments:

[5]:
# pt.imshow([im, blurred, fine0], title=None);
pt.image_compare(im, blurred+fine0);
Difference statistics:
  Range: [0, 0]
  Mean: 0.000000,  Stdev (rmse): 0.000000,  SNR (dB): inf

Since the filter is a lowpass filter, we might want to subsample the blurred image. This may cause some aliasing (depends on the filter), but the decomposition structure given above will still be possible. The corrDn function correlates (same as convolution, but flipped filter) and downsamples in a single operation (for efficiency). The string ‘reflect1’ tells the function to handle boundaries by reflecting the image about the edge pixels. Notice that the blurred1 image is half the size (in each dimension) of the original image.

[6]:
lo_filt = 2*binom5*binom5.T   # construct a separable 2D filter
blurred1 = pt.corrDn(image = im, filt = lo_filt, step = (2,2))
#corrDn: Compute correlation of matrices image with `filt, followed by downsampling.

pt.imshow([fine0, blurred1], title=['fine0', 'blurred1']);
../_images/tutorials_02_pyramids_9_0.png
[7]:
fine1 = im - pt.upConv(image = blurred1, filt = lo_filt, step = (2, 2))
# upConv: Upsample matrix image, followed by convolution with matrix filt.
pt.imshow([fine1, blurred1], title=['fine1', 'blurred1']);
../_images/tutorials_02_pyramids_10_0.png

We now have a technique that takes an image, computes two new images (blurred1 and fine1) containing the coarse scale information and the fine scale information. We can also (trivially) reconstruct the original from these two (even if the subsampling of the blurred1 image caused aliasing):

[8]:
recon = fine1 + pt.upConv(image=blurred1, filt=lo_filt, step=(2, 2),start=(0, 0), stop=(256, 256))
pt.image_compare(im, recon);
Difference statistics:
  Range: [0, 0]
  Mean: 0.000000,  Stdev (rmse): 0.000000,  SNR (dB): inf

Thus, we have described an INVERTIBLE linear transform that maps an input image to the two images blurred1 and fine1. The inverse transformation maps blurred1 and fine1 to the result. This is depicted graphically with a system diagram:

 (IM) --> blur/down2: (BLURRED1) -------> up2/blur -----> add --> (RECON)
  |                     |                                  ^
  |                     |                                  |
  |                     V                                  |
  |                  up2/blur                              |
  |                     |                                  |
  |                     |                                  |
  |                     V                                  |
   ---------------> subtract: (FINE1) ---------------------

Note that the number of samples in the representation (i.e., total samples in BLURRED1 and FINE1) is 1.5 times the number of samples in the original IM. Thus, this representation is OVERCOMPLETE.

[9]:
blurred2 = pt.corrDn(image = blurred1, filt = lo_filt, step = (2, 2))
pt.imshow(blurred2, 'auto', 4);
../_images/tutorials_02_pyramids_14_0.png
[10]:
fine2 = blurred1 - pt.upConv(image = blurred2, filt = lo_filt, step = (2, 2))
pt.imshow(fine2, 'auto', 2);
../_images/tutorials_02_pyramids_15_0.png

Since blurred2 and fine2 can be used to reconstruct blurred1, and blurred1 and fine1 can be used to reconstruct the original image, the set of THREE images (also known as “subbands”) {blurred2, fine2, fine1} constitute a complete representation of the original image. Note that the three subbands are displayed at the same size, but they are actually three different sizes.

[11]:
pt.imshow([fine1, fine2, blurred2], title=['fine1', 'fine2', 'blurred2'], col_wrap=3);
../_images/tutorials_02_pyramids_17_0.png

It is useful to consider exactly what information is stored in each of the pyramid subbands. The reconstruction process involves recursively interpolating these images and then adding them to the image at the next finer scale. To see the contribution of ONE of the representation images (say blurred2) to the reconstruction, we imagine filling all the other subbands with zeros and then following our reconstruction procedure. For the blurred2 subband, this is equivalent to simply calling upConv twice:

[12]:
blurred2_tmp  = pt.upConv(image = blurred2, filt = lo_filt, step = (2, 2))
blurred2_full = pt.upConv(image = blurred2_tmp, filt = lo_filt, step = (2, 2))
pt.imshow([fine1, fine2, blurred2_full], title=['fine1', 'fine2', 'blurred2_full'], col_wrap=3);
../_images/tutorials_02_pyramids_19_0.png
[13]:
# For the fine2 subband, this is equivalent to calling upConv once:
fine2_full = pt.upConv(image = fine2, filt = lo_filt, step = (2, 2))
pt.imshow([fine1, fine2_full, blurred2_full], title=['fine1', 'fine2_full', 'blurred2_full'], col_wrap=3);
../_images/tutorials_02_pyramids_20_0.png
[14]:
# If we did everything correctly, we should be able to add together
# these three full-size images to reconstruct the original image:
recon = blurred2_full + fine2_full + fine1
pt.image_compare(im, recon);
Difference statistics:
  Range: [0, 0]
  Mean: 0.000000,  Stdev (rmse): 0.000000,  SNR (dB): inf
[15]:
pt.imshow([recon, im]);
../_images/tutorials_02_pyramids_22_0.png

FUNCTIONS for CONSTRUCTING/MANIPULATING LAPLACIAN PYRAMIDS

We can continue this process, recursively splitting off finer and finer details from the blurred image (like peeling off the outer layers of an onion). The resulting data structure is known as a “Laplacian Pyramid”. To make things easier, we have written a function called Lpyr to construct this object. The function returns a long vector containing the subbands of the pyramid. The display routine showLpyr shows all the subbands of the pyramid, at their correct relative sizes. It should now be clearer why these data structures are called “pyramids”.

[16]:
pyr = pt.pyramids.LaplacianPyramid(im)
pt.pyrshow(pyr.pyr_coeffs);
../_images/tutorials_02_pyramids_24_0.png
[17]:
# To access the coefficients of a single band, use: pyr.pyr_coeffs[(level, band)]
pt.imshow(pyr.pyr_coeffs[(1, 0)]);
../_images/tutorials_02_pyramids_25_0.png
[18]:
# The reconLpyr function allows you to reconstruct from a laplacian pyramid.
# The third (optional) arg allows you to select any subset of pyramid bands
# (default is to use ALL of them).
pt.imshow(pyr.recon_pyr(levels=[0, 2]));  # reconstruct with just the first and third band
../_images/tutorials_02_pyramids_26_0.png
[19]:
fullres = pyr.recon_pyr()
pt.imshow(fullres);
pt.image_compare(im,fullres);
Difference statistics:
  Range: [0, 0]
  Mean: 0.000000,  Stdev (rmse): 0.000000,  SNR (dB): inf
../_images/tutorials_02_pyramids_27_1.png
[20]:
# buildLpyr uses 5-tap filters by default for building Laplacian
# pyramids.  You can specify other filters:
print (pt.named_filter('binom3'))

pyr3 = pt.pyramids.LaplacianPyramid(im, height=7, downsample_filter_name='binom3')
pt.pyrshow(pyr3.pyr_coeffs);

fullres3 = pyr3.recon_pyr(levels='all', upsample_filter_name='binom3')
pt.image_compare(im,fullres3);
[[0.35355339]
 [0.70710678]
 [0.35355339]]
Difference statistics:
  Range: [0, 0]
  Mean: 0.000000,  Stdev (rmse): 0.000000,  SNR (dB): inf
../_images/tutorials_02_pyramids_28_1.png

Here we build a “Laplacian” pyramid using random filters. filt1 is used with the downsampling operations and filt2 is used with the upsampling operations. We normalize the filters for display purposes. Of course, these filters are (almost certainly) not very “Gaussian”, and the subbands of such a pyramid will be garbage! Nevertheless, it is a simple property of the Laplacian pyramid that we can use ANY filters and we will still be able to reconstruct perfectly. Indeed fine bands are produced by subtracting the parent from blurred, so adding them back gives the parent back.

[21]:
filt1 = np.random.rand(5,)
filt1 = np.sqrt(2)*filt1/sum(filt1)
plt.subplot(2,1,1)
plt.stem(filt1)
plt.subplot(2,1,2)
filt2 = np.random.rand(3,)
filt2 = np.sqrt(2)*filt2/sum(filt2)
plt.stem(filt2)
plt.show()
../_images/tutorials_02_pyramids_30_0.png
[22]:
pyrr = pt.pyramids.LaplacianPyramid(im, height=7,
                                    downsample_filter_name=filt1,
                                    upsample_filter_name=filt2)
pt.pyrshow(pyrr.pyr_coeffs)

fullresr = pyrr.recon_pyr(filt2)

pt.image_compare(im, fullresr);
pt.imshow([im, fullresr]);
Difference statistics:
  Range: [0, 0]
  Mean: -0.000000,  Stdev (rmse): 0.000000,  SNR (dB): 320.773103
../_images/tutorials_02_pyramids_31_1.png
../_images/tutorials_02_pyramids_31_2.png

ALIASING in the Gaussian and Laplacian pyramids:

Unless one is careful, the subsampling operations will introduce aliasing artifacts in these pyramid transforms. This is true even though the Laplacian pyramid can be used to reconstruct the original image perfectly. When reconstructing, the pyramid is designed in such a way that these aliasing artifacts cancel out. So it’s not a problem if the only thing we want to do is reconstruct. However, it can be a serious problem if we intend to process each of the subbands independently.

One way to see the consequences of the aliasing artifacts is by examining variations that occur when the input is shifted. We choose an image and shift it by some number of pixels. Then blur (filter-downsample-upsample-filter) the original image and blur the shifted image. If there’s no aliasing, then the blur and shift operations should commute (i.e., shift-filter-downsample-upsample-filter is the same as filter-downsample-upsample-filter-shift). Try this for 2 different filters (by replacing ‘binom3’ with ‘binom5’ or ‘binom7’ below), and you’ll see that the aliasing is much worse for the 3 tap filter.

[23]:
sig = 100 * np.random.randn(1, 16)
sh = (0,7)   # shift amount
lev = 2      # level of pyramid to look at
#filter to use:
flt = pt.named_filter('binom3')
# flt = pt.named_filter('binom5')
# flt = pt.named_filter('binom7')

shiftIm = np.roll(sig,sh)
pyr = pt.pyramids.LaplacianPyramid(shiftIm, lev, flt, flt, 'circular')
shiftBlur = pyr.recon_pyr(flt, 'circular', lev-1)

pyr2 = pt.pyramids.LaplacianPyramid(sig, lev, flt, flt, 'circular')
res = pyr2.recon_pyr(flt, 'circular', lev-1)
blurShift = np.roll(res,sh)

plt.figure()
plt.plot(blurShift.T)
plt.plot(shiftBlur.T)
plt.legend(['blurShift', 'shiftBlur'])
plt.show()

pt.image_compare(blurShift,shiftBlur);
../_images/tutorials_02_pyramids_33_0.png
Difference statistics:
  Range: [-37, 56]
  Mean: 0.000000,  Stdev (rmse): 27.396116,  SNR (dB): 4.143998

2.1.2. PROJECTION and BASIS functions:

An invertible, linear transform can be characterized in terms of a set of PROJECTION and BASIS functions.
In python matrix notation:
c = np.dot(P.T, x)
x = np.dot(B, c)

where x is an input, c are the transform coefficients, P and B are matrices. The columns of \(P^T\) are the projection functions (the input is projected onto the the columns of P to get each successive transform coefficient). The columns of B are the basis functions (x is a linear combination of the columns of B).

Since the Laplacian pyramid is a linear transform, we can ask: what are its BASIS functions? We consider these in one dimension for simplicity. The BASIS function corresponding to a given coefficient tells us how much that coefficient contributes to each pixel in the reconstructed image. We can construct a single basis function by setting one sample of one subband equal to 1.0 (and all others to zero) and reconstructing. To build the entire matrix, we have to do this for every sample of every subband:

[24]:
sz = 36
sig = np.zeros((1,sz))
pyr = pt.pyramids.LaplacianPyramid(sig,)

# find size of basis
Nelements = 0
for n in pyr.pyr_size.values():
    Nelements += np.prod(n)

basis = np.zeros((sz, Nelements))
ctr = 0

for k in pyr.pyr_coeffs.keys():
    for i in range(pyr.pyr_size[k][1]):
        pyr.pyr_coeffs[k][0, i] = 1
        basis[:,ctr] = pyr.recon_pyr().flatten()
        ctr += 1
        pyr.pyr_coeffs[k][0, i] = 0
pt.imshow(basis, 'auto', 4);
../_images/tutorials_02_pyramids_36_0.png

So the first pixel of the reconstructed image, out of the 36 pixels, receives constributions from the first, 37th, 38th, 55th, 56th, 64th and 65th dimensions of the pyramid representation.

The columns of the basis matrix are the basis functions. The matrix is short and fat, corresponding to the fact that the representation is OVERCOMPLETE. Below, we plot the middle one from each subband, starting with the finest scale. Note that all of these basis functions are lowpass (Gaussian-like) functions.

[25]:
locations = pt.matlab_round(sz * (2 - 3./2.**np.array(range(1,len(pyr.pyr_size)+1))))
# locations, Nelements

for lev in range(len(locations)):
    plt.subplot(2,2,lev+1)
    plt.plot(basis[:, int(locations[lev])])
    plt.axis([0, sz, 0, 1.1])
plt.show()
../_images/tutorials_02_pyramids_39_0.png

Now, we’d also like see the inverse (we’ll call them PROJECTION) functions. We need to ask how much of each sample of the input image contributes to a given pyramid coefficient. Thus, the matrix is constructed by building pyramids on the set of images with impulses at each possible location. The rows of this matrix are the projection functions.

[26]:
projection = np.zeros((Nelements,sz))
for pos in range(sz):
    pyr = pt.pyramids.LaplacianPyramid(pt.synthetic_images.impulse((1, sz), (0, pos)).T)
    proj = np.array([]).reshape((1, 0))
    for v in pyr.pyr_coeffs.values():
        proj = np.concatenate((proj, v.T), axis=1)
    projection[:,pos] = proj
pt.imshow(projection, 'auto', 4);
../_images/tutorials_02_pyramids_41_0.png
[27]:
# Looking at successive rows of this matrix, we can visualize the subsampling operation.

plt.figure(figsize=(10,3))
# Non-zero values are shifted by 1 position.
plt.subplot(1,3,1)
plt.plot(projection[18,:])
plt.plot(projection[19,:])
# Non-zero values are shifted by 2 positions, rather than 1.
plt.subplot(1,3,2)
plt.plot(projection[45,:])
plt.plot(projection[46,:])
# Non-zero values are shifted by 4 positions, rather than 1.
plt.subplot(1,3,3)
plt.plot(projection[58,:])
plt.plot(projection[59,:])
plt.show()
../_images/tutorials_02_pyramids_42_0.png

Building a pyramid corresponds to multiplication by the projection matrix. Reconstructing from this pyramid corresponds to multiplication by the basis matrix. Thus, the product of the two matrices (in this order) should be the identity matrix:

[28]:
pt.imshow(np.dot(basis, projection), 'auto', 4);
../_images/tutorials_02_pyramids_44_0.png

We can plot a few example projection functions at different scales. Note that all of the projection functions are bandpass functions, except for the coarsest subband which is lowpass.

[29]:
for lev in range(locations.shape[0]):
    plt.subplot(2,2,lev+1)

    plt.plot(projection[int(locations[lev])])
    print('mean: ' , np.round(np.mean(projection[int(locations[lev])]),4))
    plt.axis([0, sz, -.3, 0.8])
plt.show()
mean:  -0.0
mean:  -0.0
mean:  -0.0
mean:  0.0786
../_images/tutorials_02_pyramids_46_1.png

Now consider the frequency response of these functions, plotted over the range [-pi,pi]:

[30]:
for lev in range(len(locations)):
    plt.subplot(2,2,lev+1)
    proj = projection[int(locations[lev])]
    plt.plot(np.pi*np.array(range(-32,32))/32, np.fft.fftshift(np.abs(np.fft.fft(proj.T,64))))
    plt.axis([-np.pi, np.pi, -0.1, 3])
plt.show()
../_images/tutorials_02_pyramids_48_0.png

The first projection function is highpass, and the second is bandpass. Both of these look something like the Laplacian (2nd derivative) of a Gaussian. The last is lowpass, as are the basis functions. Thus, the basic operation used to create each level of the pyramid involves a simple highpass/lowpass split.

2.1.3. QMF/WAVELET PYRAMIDS.

Two things about Laplacian pyramids are a bit unsatisfactory. First, there are more pixels (coefficients) in the representation than in the original image. Specifically, the 1-dimensional transform is overcomplete by a factor of 4/3, and the 2-dimensional transform is overcomplete by a factor of 2. Secondly, the “bandpass” images (fineN) do not segregate information according to orientation.

There are other varieties of pyramid. One type that arose in the speech coding community is based on a particular pairs of filters known as a “Quadrature Mirror Filters” or QMFs. These are closely related to Wavelets (essentially, they are approximate wavelet filters).

Recall that the Laplacian pyramid is formed by simple hi/low splitting at each level. The lowpass band is subsampled by a factor of 2, but the highpass band is NOT subsampled. In the QMF pyramid, we apply two filters (hi- and lo- pass) and subsample BOTH by a factor of 2, thus eliminating the excess coefficients of the Laplacian pyramid.

The two filters must have a specific relationship to each other. In particular, let n be an index for the filter samples. The highpass filter may be constructed from the lowpass filter by

  1. modulating (multiplying) by (-1)^n (equivalent to shifting by pi in the Fourier domain)

  2. flipping (i.e., reversing the order of the taps)

  3. spatially shifting by one sample.

Try to convince yourself that the resulting filters will always be orthogonal to each other (i.e., their inner products will be zero) when shifted by any multiple of two.

The function modulateFlip performs the first two of these operations. The third (spatial shifting) step is built into the convolution code.

[31]:
flo = pt.named_filter('qmf9').T
# modulateFlip is a hidden method of the WaveletPyramid class
fhi = pt.pyramids.WaveletPyramid._modulate_flip(flo)
x = np.arange(9)

plt.subplot(2,1,1)
plt.stem(x, flo.T)
plt.axis([-1, 9, -0.5, 1.0])
plt.gca().set_title('lowpass')
plt.subplot(2,1,2)
plt.stem(x, fhi.T)
plt.axis([-1, 9, -0.5, 1.0])
plt.gca().set_title('highpass')

plt.tight_layout()

../_images/tutorials_02_pyramids_52_0.png

In the Fourier domain, these filters are (approximately) “power-complementary”: the sum of their squared power spectra is (approximately) a constant. But note that neither is a perfect bandlimiter (i.e., a sinc function), and thus subsampling by a factor of 2 will cause aliasing in each of the subbands. See below for a discussion of the effect of this aliasing.

[32]:
# Plot the two frequency responses:
freq = np.pi*np.array(range(-32,32))/32
flo_sp = np.fft.fftshift(np.abs(np.fft.fft(flo,64)))[0]
fhi_sp = np.fft.fftshift(np.abs(np.fft.fft(fhi,64)))[0]

plt.subplot(2,1,1)
plt.plot(freq, flo_sp, '--',
         freq, fhi_sp,'-')
plt.axis([-np.pi, np.pi, 0, 1.5])
plt.gca().set_title('FFT magnitudes')
plt.subplot(2,1,2)
plt.plot(freq, flo_sp ** 2 + fhi_sp ** 2)
plt.axis([-np.pi, np.pi, 0, 2.2])
plt.gca().set_title('Sum of squared magnitudes')
plt.tight_layout()
../_images/tutorials_02_pyramids_54_0.png
[33]:
# We can split an input signal into two bands as follows:
sig = pt.synthetic_images.pink_noise((1,64), 1.6)
plt.subplot(2,1,1)
plt.plot(sig.T)
plt.gca().set_title('signal')
plt.subplot(2,1,2)
lo1 = pt.corrDn(image = sig, filt = flo, step = (1, 2)) #subsampling by a factor of 2 (step=2)
plt.plot(lo1.T)
hi1 = pt.corrDn(image = sig, filt = fhi, step = (1, 2), start = (0, 1)) #subsampling by a factor of 2 (step=2)
plt.plot(hi1.T)
plt.legend(['low band','high band'])
plt.tight_layout()
../_images/tutorials_02_pyramids_55_0.png
Notice that the two subbands are half the size of the original image, due to the subsampling by a factor of 2.
One subtle point: the highpass and lowpass bands are subsampled on different lattices: the lowpass band retains the odd-numbered samples and the highpass band retains the even-numbered samples. This was the 1-sample shift relating the high and lowpass kernels (mentioned above).
We’ve used the ‘reflect1’ to handle boundaries, which works properly for symmetric odd-length QMFs.
[34]:
# We can reconstruct the original image by interpolating these two subbands
# USING THE SAME FILTERS:
reconlo = pt.upConv(image = lo1, filt = flo, step = (1, 2))
reconhi = pt.upConv(image = hi1, filt = fhi, step = (1, 2), start = (0, 1))
plt.plot(sig.T)
plt.plot((reconlo+reconhi).T, '--')
plt.show()

pt.image_compare(sig, (reconlo+reconhi));
../_images/tutorials_02_pyramids_57_0.png
Difference statistics:
  Range: [0, 0]
  Mean: 0.000218,  Stdev (rmse): 0.000901,  SNR (dB): 60.907085

We have described an INVERTIBLE linear transform that maps an input image to the two images lo1 and hi1. The inverse transformation maps these two images to the result. This is depicted graphically with a system diagram:

IM ---> flo/down2 --> LO1 --> up2/flo --> add --> RECON
    |                                      ^
    |                                     |
    |                                     |
     -> fhi/down2 --> HI1 --> up2/fhi -----

Note that the number of samples in the representation (i.e., total samples in LO1 and HI1) is equal to the number of samples in the original IM. Thus, this representation is exactly COMPLETE, or “critically sampled”.

So we’ve fixed one of the problems that we had with Laplacian pyramid. But the system diagram above places strong constraints on the filters. In particular, for these filters the reconstruction is no longer perfect. Turns out there are NO perfect-reconstruction symmetric filters that are power-complementary, except for the trivial case [1] and the nearly-trivial case [1 1]/sqrt(2).

Let’s consider the projection functions of this 2-band splitting operation. We can construct these by applying the transform to impulse input signals, for all possible impulse locations. The rows of the following matrix are the projection functions for each coefficient in the transform.

[35]:
M = np.concatenate((pt.corrDn(image = np.identity(32), filt = flo, edge_type = 'circular', step = (2, 1)),
                    pt.corrDn(image = np.identity(32), filt = fhi, edge_type = 'circular', step = (2, 1),
                               start = (1, 0))))
pt.imshow(M, vrange='auto', zoom=5, title="M");
../_images/tutorials_02_pyramids_60_0.png

The transform matrix is composed of two sub-matrices. The top half contains the lowpass kernel, shifted by increments of 2 samples. The bottom half contains the highpass. Now we compute the inverse of this matrix:

[36]:
M_inv = np.linalg.inv(M)
pt.imshow(M_inv, vrange='auto', zoom=5, title="M_inv");
../_images/tutorials_02_pyramids_62_0.png
[37]:
# The inverse is (very close to) the transpose of the original
# matrix!  In other words, the transform is orthonormal.

pt.image_compare(M_inv.T,M);

pt.imshow(np.dot(M_inv,M), zoom=5);
Difference statistics:
  Range: [0, 0]
  Mean: -0.000004,  Stdev (rmse): 0.000263,  SNR (dB): 56.495757
../_images/tutorials_02_pyramids_63_1.png
This also points out a nice relationship between the corrDn and upConv functions, and the matrix representation:
- corrDn is equivalent to multiplication by a matrix with copies of the filter on the ROWS, translated in multiples of the downsampling factor. - upConv is equivalent to multiplication by a matrix with copies of the filter on the COLUMNS, translated by the upsampling factor.

Multiplying the coefficient vector, which consists of concatenation of low pass and high pass coefficients, with the inverse (=transpose) matrix is the same as upsampling and convolving (i.e. UpConv) with the same decomposition filters. Indeed each rows of the inverse matrix is a subsampled shifted copy of low and high pass filters. When this is multiplied by the vector of coefficients, it is equivalent to upsampling the vector of coeffs (by inserting zeros) and convolving the first half (low pass) with low pass filter and the second half with the high pass filter and then sum them up. This makes sense, because in order to go back to the original signal, we need a weighted sum of the low pass and high pass coeffs.

As in the Laplacian pyramid, we can recursively apply this QMF band-splitting operation to the lowpass band:

[38]:
lo2 = pt.corrDn(image = lo1, filt = flo, step = (1, 2))
hi2 = pt.corrDn(image = lo1, filt = fhi, step = (1, 2), start = (0, 1))

The representation of the original signal is now comprised of the three subbands {hi1, hi2, lo2} (we don’t hold onto lo1, because it can be reconstructed from lo2 and hi2). Note that hi1 is at 1/2 resolution, and hi2 and lo2 are at 1/4 resolution: The total number of samples in these three subbands is thus equal to the number of samples in the original signal.

[39]:
plt.figure(figsize = (5,5))
plt.subplot(4,1,1)
plt.plot(hi1.T)
plt.gca().set_title('hi1')
plt.axis([1, len(hi1[0]), 1.1*hi1.min(), 1.1*hi1.max()])
plt.subplot(4,1,2)
plt.plot(hi2.T)
plt.gca().set_title('hi2')
plt.axis([1, hi2.shape[1], 1.1*hi2.min(), 1.1*hi2.max()])
plt.subplot(4,1,3)
plt.plot(lo2.T)
plt.gca().set_title('lo2')
plt.axis([1, lo2.shape[1], 1.1*lo2.min(), 1.1*lo2.max()])
plt.tight_layout()
plt.subplot(4,1,4)
plt.plot(sig.T, 'k')
plt.gca().set_title('signal')
plt.axis([1, sig.shape[1], 1.1*sig.min(), 1.1*sig.max()])
plt.tight_layout()
../_images/tutorials_02_pyramids_69_0.png

Reconstruction proceeds as with the Laplacian pyramid: combine lo2 and hi2 to reconstruct lo1, which is then combined with hi1 to reconstruct the original signal:

[40]:
recon_lo1 = ( pt.upConv(image = hi2, filt = fhi, step = (1, 2), start = (0, 1)) +
              pt.upConv(image = lo2, filt = flo, step = (1, 2), start = (0, 0)) )
reconstructed = ( pt.upConv(image = hi1, filt = fhi, step = (1, 2), start = (0, 1)) +
                  pt.upConv(image = recon_lo1, filt = flo, step = (1, 2), start = (0, 0)) )
pt.image_compare(sig,reconstructed);
Difference statistics:
  Range: [0, 0]
  Mean: 0.000337,  Stdev (rmse): 0.001278,  SNR (dB): 57.867947

2.1.4. FUNCTIONS for CONSTRUCTING/MANIPULATING QMF/Wavelet PYRAMIDS

To make things easier, we have bundled these qmf operations and data structures into an object in Python.

[41]:
# # If we feel like trying on another signal
# sig = pt.synthetic_images.pink_noise((1, 64), 1.5)
# plt.figure()
# plt.plot(sig.T)
# plt.show()
[42]:
pyr = pt.pyramids.WaveletPyramid(sig)
fig = pt.pyrshow(pyr.pyr_coeffs)
../_images/tutorials_02_pyramids_74_0.png
[43]:
# a different plot of the same thing
for i, (k, v) in enumerate(pyr.pyr_coeffs.items()):
    plt.subplot(pyr.num_scales+1, 1, i+1)
    x = np.array(range(pyr.pyr_size[k][1]))
    y = v.flatten()
    plt.stem(x, y)
    plt.axis([-1, pyr.pyr_size[k][1], 1.5*min(min(v)), 1.5*max(max(v))])
plt.tight_layout()
../_images/tutorials_02_pyramids_75_0.png
[44]:
res = pyr.recon_pyr()
pt.image_compare(sig, res);
Difference statistics:
  Range: [0, 0]
  Mean: 0.000573,  Stdev (rmse): 0.001662,  SNR (dB): 55.586728

Now for 2D, we use separable filters. There are 4 ways to apply the two filters to the input image (followed by the relavent subsampling operation):

  1. lowpass in both x and y

  2. lowpass in x and highpass in y

  3. lowpass in y and highpass in x

  4. highpass in both x and y.

The pyramid is built by recursively subdividing the first of these bands into four new subbands.

[45]:
# First, we'll take a look at some of the basis functions.
sz = 40
zim = np.zeros((sz,sz))
flo = 'qmf9'
edge_type = 'reflect1'
pyr = pt.pyramids.WaveletPyramid(zim)
[46]:
### Put an  impulse into the middle of each band:
for k, v in pyr.pyr_size.items():
    mid = (v[0]//2, v[1]//2)
#     print(lev, mid)
    pyr.pyr_coeffs[k][mid] = 1
[47]:
# And take a look at the reconstruction of each band:
reconList = []
for k in pyr.pyr_coeffs.keys():
    if isinstance(k, tuple):
        reconList.append(pyr.recon_pyr(flo, edge_type, k[0], k[1]))
    else:
        reconList.append(pyr.recon_pyr(flo, edge_type, k))
pt.imshow(reconList, vrange='indep1', zoom=4, col_wrap=3);
../_images/tutorials_02_pyramids_80_0.png

Note that the first column contains horizontally oriented basis functions at different scales. The second contains vertically oriented basis functions. The third contains both diagonals (a checkerboard pattern). The bottom row shows a lowpass basis function.

Now look at the corresponding Fourier transform magnitudes (these are plotted over the frequency range [-pi, pi] ):

[48]:
freq = 2 * np.pi * np.array(range(-sz//2,(sz//2)))/sz
imgList = []
for k in pyr.pyr_coeffs.keys():
    if isinstance(k, tuple):
        basisFn = pyr.recon_pyr(flo, edge_type, k[0], k[1])
    else:
        basisFn = pyr.recon_pyr(flo, edge_type, k)
    basisFmag = np.fft.fftshift(np.abs(np.fft.fft2(basisFn,(sz,sz))))
    imgList.append(basisFmag)

pt.imshow(imgList, vrange='indep1', zoom=4, col_wrap=3);
../_images/tutorials_02_pyramids_82_0.png
[49]:
# The filters at a given scale sum to a squarish annular region:
sumSpectra = np.zeros((sz,sz))
lnum = 2;
for bnum in range(3):
    basisFn = pyr.recon_pyr(flo, edge_type, [lnum], [bnum])
    basisFmag = np.fft.fftshift(np.abs(np.fft.fft2(basisFn,(sz,sz))))
    sumSpectra = basisFmag**2 + sumSpectra;
pt.imshow(sumSpectra, vrange='auto', zoom=4, title='one scale');
../_images/tutorials_02_pyramids_83_0.png
[50]:
# Now decompose an image:
pyr = pt.pyramids.WaveletPyramid(im/255)
[51]:
# View all of the subbands (except lowpass), scaled to be the same size
# (requires a big figure window):
pt.pyrshow(pyr.pyr_coeffs, zoom=1);
../_images/tutorials_02_pyramids_85_0.png
[52]:
# The recon_pyr function can be used to reconstruct the entire pyramid:
reconstructed = pyr.recon_pyr()
pt.imshow([im,reconstructed]);
pt.image_compare(im/255,reconstructed);
Difference statistics:
  Range: [0, 0]
  Mean: -0.000763,  Stdev (rmse): 0.000334,  SNR (dB): 53.416318
../_images/tutorials_02_pyramids_86_1.png
[53]:
# As with Laplacian pyramids, you can specify sub-levels and subbands
# to be included in the reconstruction.  For example:
pt.imshow(pyr.recon_pyr('qmf9', 'reflect1', levels='all', bands=[0]))  # Horizontal only, all scales
pt.imshow(pyr.recon_pyr('qmf9', 'reflect1', [1,2]));  # two middle scales
../_images/tutorials_02_pyramids_87_0.png
../_images/tutorials_02_pyramids_87_1.png

2.1.5. PERFECT RECONSTRUCTION: HAAR AND DEBAUCHIES WAVELETS

The symmetric QMF filters used above are not perfectly orthogonal. In fact, it’s impossible to construct a symmetric filter of size greater than 2 that is perfectly orthogonal to shifted copies (shifted by multiples of 2) of itself. For example, consider a symmetric kernel of length 3. Shift by two and the right end of the original kernel is aligned with the left end of the shifted one. Thus, the inner product of these two will be the square of the end tap, which will be non-zero.

However, one can easily create wavelet filters of length 2 that will do the job. This is the oldest known wavelet, known as the “Haar”. The two kernels are [1,1]/sqrt(2) and [1,-1]/sqrt(2). These are trivially seen to be orthogonal to each other, and shifts by multiples of two are also trivially orthogonal. The projection functions of the Haar transform are in the rows of the following matrix, constructed by applying the transform to impulse input signals, for all possible impulse locations:

[54]:
haarLo = pt.named_filter('haar')
haarHi = pt.pyramids.WaveletPyramid._modulate_flip(haarLo)
x = np.array(range(len(haarLo)))
plt.subplot(2,1,1)
plt.stem(x,haarLo)
plt.axis([-0.2, 1.2, -1, 1])
plt.gca().set_title('lowpass')
plt.subplot(2,1,2)
plt.stem(x, haarHi)
plt.axis([-0.2, 1.2, -1, 1])
plt.gca().set_title('highpass')
plt.tight_layout()
../_images/tutorials_02_pyramids_89_0.png
[55]:
M = np.concatenate((pt.corrDn(np.identity(32), haarLo, step=(1, 2)),
                    pt.corrDn(np.identity(32), haarHi.T, step=(1, 2), start=(0, 1))), axis=1).T
pt.imshow(M, 'auto', 5)
pt.imshow(np.dot(M, M.T), 'auto', 5);   # identity
../_images/tutorials_02_pyramids_90_0.png
../_images/tutorials_02_pyramids_90_1.png
[56]:
# As before, the filters are power-complementary (although the
# frequency isolation is rather poor, and thus the subbands will be
# heavily aliased):
# And the sum of the power spectra is again flat

freq = np.pi*np.array(range(-32,32))/32
flo_sp = np.fft.fftshift(np.abs(np.fft.fft( haarLo.T, 64)))[0]
fhi_sp = np.fft.fftshift(np.abs(np.fft.fft( haarHi.T, 64)))[0]

plt.subplot(3,1,1)
plt.plot(freq, flo_sp, '--',
         freq, fhi_sp,'-')
plt.axis([-np.pi, np.pi, 0, 1.5])
plt.gca().set_title('Fourier magnitudes')
plt.subplot(3,1,2)
plt.plot(freq,flo_sp**2,'--')
plt.plot(freq,fhi_sp**2,'-')
plt.gca().set_title('squared Fourier magnitudes')
plt.subplot(3,1,3)
plt.plot(freq, flo_sp ** 2 + fhi_sp ** 2)
plt.axis([-np.pi, np.pi, 0, 2.2])
plt.gca().set_title('Sum of squared Fourier magnitudes')
plt.tight_layout()
../_images/tutorials_02_pyramids_91_0.png
[57]:
sig = pt.synthetic_images.pink_noise((1,64),0.5)
pyr = pt.pyramids.WaveletPyramid(sig, 4, 'haar', 'reflect1')

pt.pyrshow(pyr.pyr_coeffs);
../_images/tutorials_02_pyramids_92_0.png
[58]:
# check perfect reconstruction:
res = pyr.recon_pyr('haar', 'reflect1')
pt.image_compare(sig,res);

plt.plot(sig.T)
plt.plot(res.T, '--')
plt.show()
Difference statistics:
  Range: [0, 0]
  Mean: 0.000000,  Stdev (rmse): 0.000000,  SNR (dB): 302.340511
../_images/tutorials_02_pyramids_93_1.png

If you want perfect reconstruction, but don’t like the Haar transform, there’s another option: drop the symmetry requirement. Ingrid Daubechies developed one of the earliest sets of such perfect-reconstruction wavelets. The simplest of these is of length 4:

[59]:
daub_lo = pt.named_filter('daub2')
daub_hi = pt.pyramids.WaveletPyramid._modulate_flip(daub_lo)
# The daub_lo filter is constructed to be orthogonal to 2 shifted
# copy of itself.  For example:
print (np.dot(np.concatenate((daub_lo,[[0],[0]])).T, np.concatenate(([[0],[0]],daub_lo))))

M = np.concatenate((pt.corrDn(image = np.identity(32), filt = daub_lo, edge_type = 'circular', step = (2, 1), start = (1, 0)),
                    pt.corrDn(image = np.identity(32), filt = daub_hi, edge_type = 'circular', step = (2, 1), start = (1, 0))))
pt.imshow(M, 'auto', 5)
pt.imshow(np.dot(M, M.T), 'auto', 5);
[[2.90920066e-13]]
../_images/tutorials_02_pyramids_95_1.png
../_images/tutorials_02_pyramids_95_2.png
[60]:
plt.stem(M[10])
[60]:
<StemContainer object of 3 artists>
../_images/tutorials_02_pyramids_96_1.png
[61]:
# Again, they're power complementary and sum of the power spectra is flat

freq = np.pi*np.array(range(-32,32))/32
flo_sp = np.fft.fftshift(np.abs(np.fft.fft(daub_lo.T,64)))[0]
fhi_sp = np.fft.fftshift(np.abs(np.fft.fft(daub_hi.T,64)))[0]

plt.subplot(3,1,1)
plt.plot(freq, flo_sp, '--',
         freq, fhi_sp,'-')
plt.axis([-np.pi, np.pi, 0, 1.5])
plt.gca().set_title('Fourier magnitudes')
plt.subplot(3,1,2)
plt.plot(freq,flo_sp**2,'--')
plt.plot(freq,fhi_sp**2,'-')
plt.gca().set_title('squared Fourier magnitudes')
plt.subplot(3,1,3)
plt.plot(freq, flo_sp ** 2 + fhi_sp ** 2)
plt.axis([-np.pi, np.pi, 0, 2.2])
plt.gca().set_title('Sum of squared Fourier magnitudes')
plt.tight_layout()
../_images/tutorials_02_pyramids_97_0.png
[62]:
# Make a pyramid using the same code as before (except that we can't
# use reflected boundaries with asymmetric filters):
pyr = pt.pyramids.WaveletPyramid(sig, filter_name=daub_lo, edge_type='circular')

pt.pyrshow(pyr.pyr_coeffs, zoom=2);
../_images/tutorials_02_pyramids_98_0.png
[63]:
res = pyr.recon_pyr(daub_lo,'circular')
pt.image_compare(sig,res);

plt.plot(sig.T)
plt.plot(res.T, '--')
plt.show()
Difference statistics:
  Range: [0, 0]
  Mean: -0.000000,  Stdev (rmse): 0.000000,  SNR (dB): 227.146682
../_images/tutorials_02_pyramids_99_1.png

2.1.6. ALIASING IN WAVELET TRANSFORMS

aka. what’s wrong with wavelets

All of these orthonormal pyramid/wavelet transforms have a lot of aliasing in the subbands. You can see that in the frequency response plots since the frequency response of each filter covers well more than half the frequency domain. The aliasing can have serious consequences…

[64]:
# Get one of the basis functions of the 2D Daubechies wavelet transform:
filt = pt.named_filter('daub4')
# filt = pt.named_filter('qmf13')
pyr = pt.pyramids.WaveletPyramid(np.zeros((1,64)),3,filt,'circular')
lev = 2;
mid = (0, int(np.ceil(pyr.pyr_size[(lev, 0)][1]/2.0)))
pyr.pyr_coeffs[(lev, 0)][mid] = 1
sig = pyr.recon_pyr(filt,'circular')
x = np.arange(64).reshape(64,1)
plt.stem(x,sig.T)
plt.axis([-2, 66, -0.65, 0.65])
plt.show()
../_images/tutorials_02_pyramids_101_0.png
[65]:
# Since the basis functions are orthonormal, building a pyramid using this
# input will yield a single non-zero coefficient.
pyr = pt.pyramids.WaveletPyramid(sig, 3, filt, 'circular')

for i, (k, v) in enumerate(pyr.pyr_coeffs.items()):
    plt.subplot(pyr.num_scales+1, 1, i+1)
    x = np.array(range(pyr.pyr_size[k][1]))
    y = v.flatten()
    plt.stem(x, y)
    plt.axis([-1, pyr.pyr_size[k][1], -.3, 1.3])
plt.tight_layout()
../_images/tutorials_02_pyramids_102_0.png
[66]:
# Now shift the input by one sample and re-build the pyramid.
shifted_sig = np.roll(sig,1)
spyr = pt.pyramids.WaveletPyramid(shifted_sig, 3, filt, 'circular')

# Plot each band of the unshifted and shifted decomposition
for i, (k, v) in enumerate(spyr.pyr_coeffs.items()):
    plt.subplot(spyr.num_scales+1, 1, i+1)
    x = np.array(range(spyr.pyr_size[k][1]))
    y = v.flatten()
    plt.stem(x, y)
    plt.axis([-1, spyr.pyr_size[k][1], -.3, 1.3])
plt.tight_layout()
../_images/tutorials_02_pyramids_103_0.png

In the third band, we expected the coefficients to move around because the signal was shifted. But notice that in the original signal decomposition, the other bands were filled with zeros. After the shift, they have significant content. Although these subbands are supposed to represent information at different scales, their content also depends on the relative POSITION of the input signal.

This problem is not unique to the Daubechies transform. The same is true for the QMF transform. Try it… In fact, the same kind of problem occurs for almost any orthogonal pyramid transform (the only exception is the limiting case in which the filter is a sinc function).

Orthogonal pyramid transforms are not shift-invariant. Although orthogonality may be an important property for some applications (e.g., data compression), orthogonal pyramid transforms are generally not so good for image analysis.

The overcompleteness of the Laplacian pyramid turns out to be a good thing in the end. By using an overcomplete representation (and by choosing the filters properly to avoid aliasing as much as possible), you end up with a representation that is useful for image analysis.

[ ]: