pyrtools

pypi-shield license-shield python-version-shield build binder doi

Pyrtools is a python package for multi-scale image processing, adapted from Eero Simoncelli’s matlabPyrTools.

The tools include:
  • Recursive multi-scale image decompositions (pyramids), including Laplacian pyramids, QMFs, Wavelets, and steerable pyramids. These operate on 1D or 2D signals of arbitrary dimension.

  • Fast 2D convolution routines, with subsampling and boundary-handling.

  • Fast point-operations, histograms, histogram-matching.

  • Fast synthetic image generation: sine gratings, zone plates, fractals, etc.

  • Display routines for images and pyramids. These include several auto-scaling options, rounding to integer zoom factors to avoid resampling artifacts, and useful labeling (dimensions and gray-range).

NOTE: If you are only interested in the complex steerable pyramid, we have a pytorch implementation in the plenoptic package; the implementation in plenoptic is differentiable.

Citing us

If you use pyrtools in a published academic article or presentation, please cite us! You can find the link to the most recent release on Zenodo here (though please specify the version you used not the most recent one!). You can also get a formatted citation at the top right of our GitHub repo

Quick Start

On Linux or macOS, open a shell and run:

pip install pyrtools

More instructions available at Installation.

In the python interpreter, then call:

import pyrtools as pt

Create pyramid:

pyr = pt.pyramids.LaplacianPyramid(img)

Reconstruct image from pyramid:

recon_img = pyr.recon_pyr()

For more details, see the jupyter notebooks included in the TUTORIALS/ directory, static versions of which are linked in the navigation sidebar. You can play around with a live version of them in order to test out the code before downloading on binder

Pyramid resources

If you would like to learn more about pyramids and why they’re helpful for image processing, here are some resources to get you started:

Installation

There are two ways to install pyrtools: via the pip package management system, or directly from source.

Attention

Windows support was added in version 1.0.3. If you are on Windows and get an installation error, make sure you are installing the newest version.

From source

Obtain the latest version of pyrtools:

git clone https://github.com/LabForComputationalVision/pyrtools

(If you have already cloned the repo, you can update it with git pull.)

Finally, the package is installed by running:

cd pyrtools
pip install -e .

This will install an editable version of the package, so changes made to the files within the pyrtools directory will be reflected in the version of pyrtools you use.

When installing from source on Linux or Mac, we require gcc version >= 6 in order for the C code to compile, because of this issue

When installing from source on Windows, Microsoft Visual C++ 14.0 or greater is required, which can be obtained with Microsoft C++ Build Tools.

Information for developers

Unit tests

For running all the unit tests and avoiding bugs, please simply run from the main folder:

python TESTS/unitTests.py

If all the tests pass, then you might be able to submit your Pull Request as explained in the next section!

Proposing a Pull Request(PR)

Each PR must be documented using docstings and must run the unit tests. If you are adding new functionality, add a new test to unitTests.py and provide an example.

Basic examples using the tools in pyrtools

Content

  • Load an image, and downsample to a size appropriate for the machine speed

  • Synthetic images (‘ramp’, ‘impulse’, etc.)

  • Point operations (lookup tables)

  • histogram Modification/matching

  • Convolution routines

  • Compare speed of convolution/downsampling routines

  • Handling of left and top boundaries (‘reflect1’, ‘reflect2’, ‘repeat’, ‘extend’, ‘zero’, ‘circular’, ‘dont-compute’)

for multi-scale pyramids see TUTORIALS/02_pyramids.ipynb
for multi-scale steerable pyramids see TUTORIALS/03_steerable_pyramids.ipynb
[1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

import pyrtools as pt

%load_ext autoreload
%autoreload 2

Load an image, and downsample to a size appropriate for the machine speed

[2]:
oim = plt.imread('../DATA/Einstein.pgm').astype(float)
[3]:
imSubSample = 1
im = pt.blurDn(oim, n_levels=imSubSample, filt='qmf9')
[4]:
# imshow: 3 types of automatic graylevel scaling,
# 2 types of automatic sizing, with or without title and Range information.
# to see the documentation for a function, run:`? pt.imshow`

pt.imshow([oim,im], title=['original', 'subsampled'], vrange='auto2', col_wrap=2);
_images/tutorials_01_tools_5_0.png

Statistics:

[5]:
pt.image_stats(im);
Image statistics:
  Range: [10.126404, 245.754672]
  Mean: 116.771655,  Stdev: 37.977892,  Kurtosis: 3.085949

Synthetic images

[6]:
# pick some parameters
size      = 256
direction = 2 * np.pi * np.random.rand(1)
slope     = 10 * np.random.rand(1) - 5
intercept = 10 * np.random.rand(1) - 5
origin    = np.round(size * np.random.rand(2,1)).astype(int)
exponent  = 0.8 + np.random.rand(1)
amplitude = 1 + 5 * np.random.rand(1)
phase     = 2 * np.pi * np.random.rand(1)
period    = 20
twidth    = 7
[7]:
pt.imshow(pt.synthetic_images.ramp(size, direction, slope, intercept, origin));
_images/tutorials_01_tools_10_0.png
[8]:
pt.imshow(pt.synthetic_images.impulse(size, origin, amplitude));
_images/tutorials_01_tools_11_0.png
[9]:
pt.imshow(pt.synthetic_images.polar_radius(size, exponent, origin));
_images/tutorials_01_tools_12_0.png
[10]:
pt.imshow(pt.synthetic_images.polar_angle(size, direction));
_images/tutorials_01_tools_13_0.png
[11]:
pt.imshow(pt.synthetic_images.disk(size, size/4, origin, twidth));
_images/tutorials_01_tools_14_0.png
[12]:
pt.imshow(pt.synthetic_images.gaussian(size, (size/6)**2, origin, 'norm'));
_images/tutorials_01_tools_15_0.png
[13]:
# TODO fix normalization - range
g = pt.synthetic_images.gaussian(size, (size/6)**2, origin, 'norm')
g.min(), g.max()
[13]:
(9.402979666533684e-14, 8.742642137616321e-05)
[14]:
pt.imshow(pt.synthetic_images.gaussian(size, ((size/8)**2,(size/3)**2), origin, amplitude));
_images/tutorials_01_tools_17_0.png
[15]:
# cov = (size * np.random.uniform(-1,1,(2,2)))
# cov = cov.dot(cov.T)
# print(np.round(cov))
cov = np.array((( 12571., -25233.),
                (-25233.,  52488.)))

pt.imshow(pt.synthetic_images.gaussian(size, cov, origin, amplitude));
_images/tutorials_01_tools_18_0.png
[16]:
fig=pt.imshow(pt.synthetic_images.zone_plate(size, amplitude, phase));
fig.savefig('zoneplate.png')
_images/tutorials_01_tools_19_0.png
[17]:
pt.imshow(pt.synthetic_images.angular_sine(size, 3, amplitude, phase, origin));
_images/tutorials_01_tools_20_0.png
[18]:
pt.imshow(pt.synthetic_images.sine(size, period, direction, amplitude=amplitude, phase=phase, origin=origin));
_images/tutorials_01_tools_21_0.png
[19]:
pt.imshow(pt.synthetic_images.sine(20, frequency=[1,2]));
_images/tutorials_01_tools_22_0.png
[20]:
pt.imshow(pt.synthetic_images.square_wave(size, period, direction, amplitude, phase=phase, origin=origin, twidth=twidth));
_images/tutorials_01_tools_23_0.png
[21]:
pt.imshow(pt.synthetic_images.pink_noise(size, exponent));
_images/tutorials_01_tools_24_0.png
[22]:
# Plotting complex valued images

sp = np.fft.fftshift(np.fft.fft2(im, norm='ortho'))

pt.imshow([im, sp], plot_complex='logpolar');
_images/tutorials_01_tools_25_0.png

Point operations (lookup tables):

[23]:
Xtbl,Ytbl = pt.rcosFn(width=20, position=25, values=(-1, 1))
plt.plot(Xtbl,Ytbl)
plt.show()
_images/tutorials_01_tools_27_0.png
[24]:
pt.imshow(pt.pointOp(pt.synthetic_images.polar_radius(size,1,[70,30]), Ytbl, Xtbl[0], Xtbl[1]-Xtbl[0], 0));
_images/tutorials_01_tools_28_0.png

Convolution routines: Compare speed of convolution/downsampling routines

[25]:
k = 5
size = 2 ** 9
noise = np.random.rand(size,size)
filt  = np.random.rand(k,k)
[26]:
%%time
res1 = pt.corrDn(noise, np.flipud(np.fliplr(filt)), 'reflect1', step=[2, 2])
CPU times: user 4.54 ms, sys: 8.26 ms, total: 12.8 ms
Wall time: 4.15 ms
[27]:
%%time
ires = pt.rconv2(noise,filt) # this is using scipy.signal.convolve
res2 = ires[0:size:2,0:size:2]
CPU times: user 48.9 ms, sys: 18.1 ms, total: 67 ms
Wall time: 22.3 ms
[28]:
res1 = pt.corrDn(noise, np.flipud(np.fliplr(filt)), 'reflect1', step=[2, 2])
ires = pt.rconv2(noise,filt) # this is using scipy.signal.convolve
res2 = ires[0:size:2,0:size:2]
pt.image_compare(res1, res2);
Difference statistics:
  Range: [0, 0]
  Mean: -0.000000,  Stdev (rmse): 0.000000,  SNR (dB): 294.068252
[29]:
res3 = pt.corrDn(noise, np.flipud(np.fliplr(filt)), 'reflect1', step=[1, 1])
pt.image_compare(res3, ires);
Difference statistics:
  Range: [0, 0]
  Mean: -0.000000,  Stdev (rmse): 0.000000,  SNR (dB): 294.006866
[30]:
im = plt.imread('../DATA/Einstein.pgm').astype(float)
# im = pt.synthetic_images.impulse(256, origin=None)
[31]:
binom5 = pt.binomial_filter(5)
# construct a separable 2D filter
lo_filt = 2*binom5*binom5.T
[32]:
blurred = pt.rconv2(im, lo_filt)
pt.imshow([im, blurred], title=['im', 'blurred']);
_images/tutorials_01_tools_37_0.png
[33]:
# much faster implementation:
b = pt.corrDn(im, np.flipud(np.fliplr(lo_filt)), 'reflect1', step=[1, 1])
pt.image_compare(blurred, b);

# each dimension separately
bx  = pt.corrDn(im, binom5.T, 'reflect1')
bxy = pt.corrDn(bx, binom5, 'reflect1')
bxy *= 2
pt.image_compare(bxy, b);
Difference statistics:
  Range: [0, 0]
  Mean: 0.000000,  Stdev (rmse): 0.000000,  SNR (dB): 305.282881
Difference statistics:
  Range: [0, 0]
  Mean: 0.000000,  Stdev (rmse): 0.000000,  SNR (dB): inf
[34]:
# blur and downsample in a single step
blurred1 = pt.corrDn(image = im, filt = lo_filt, step = (2,2))

bx  = pt.corrDn(im, binom5.T, 'reflect1', step=[1, 2])
bxy = pt.corrDn(bx, binom5, 'reflect1', step=[2, 1])
bxy *= 2

pt.imshow([blurred1, bxy], zoom=2);
pt.image_compare(blurred1, bxy);
Difference statistics:
  Range: [0, 0]
  Mean: 0.000000,  Stdev (rmse): 0.000000,  SNR (dB): inf
_images/tutorials_01_tools_39_1.png
[35]:
# NOTE: here is helper to decide on subsampling depending on machine speed

import time
t = time.time()
pt.corrDn(oim, filt=np.ones((2,2)) / 4, edge_type='reflect1', step=(2, 2), start=(0, 0), stop=None)
elapsed = time.time() - t
imSubSample = min(max(np.floor(np.log2(elapsed)/2+3),0),2)
imSubSample
[35]:
0

Handling of left and top boundaries:

[36]:
fsz = [9, 9]
fmid = np.ceil((fsz[0]+1)/2)
imsz = (16, 16)

# pick one:
im = np.eye(imsz[0])
im = pt.synthetic_images.ramp(imsz, np.pi/6)
im = pt.synthetic_images.square_wave(imsz, 6, np.pi/6)

# pick one:
edges='reflect1'
edges='reflect2'
edges='repeat'
edges='extend'
edges='zero'
edges='circular'
edges='dont-compute'

filt = pt.synthetic_images.impulse(fsz,[0, 0])
pt.imshow(pt.corrDn(im,filt,edges), zoom=10, title='Edges: ' + str(edges))
# TODO
plt.plot([0,0,imsz[1],imsz[1],0]+fmid-1.5,
         [0,imsz[0],imsz[0],0,0]+fmid-1.5, lw=.5)
plt.show()
_images/tutorials_01_tools_42_0.png

Image pyramid tutorial

Content

  • Laplacian pyramids

    • functions for constructing / manipulating Laplacian pyramids

    • aliasing in the Gaussian and Laplacian pyramids

    • Projection and Basis functions

  • QMF/wavelet pyramids

    • functions for constructing / manipulating QMF/Wavelet pyramids

    • perfect reconstruction: Haar and Daubechies wavelets

    • aliasing in wavelet transforms

A brief introduction to multi-scale pyramid decompositions for image processing. You should go through this, reading the comments, and executing the corresponding code. This file assumes a basic familiarity with matrix algebra, with linear systems and Fourier theory, and with Python. If you don’t understand a particular function call, execute help(functionName) to see documentation.

for basic tools see TUTORIALS/01_tools.ipynb
for multi-scale steerable pyramids see TUTORIALS/03_steerable.ipynb
[1]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

import pyrtools as pt
%load_ext autoreload
%autoreload 2
[2]:
im = plt.imread('../DATA/Einstein.pgm')
# im = pt.blurDn(im, 1, 'qmf9')
pt.imshow(im);
_images/tutorials_02_pyramids_2_0.png
Laplacian Pyramids:

Images may be decomposed into information at different scales.

[3]:
# Blurring eliminates the fine scales (detail):
binom5 = pt.binomial_filter(5)
lo_filt = binom5*binom5.T
blurred = pt.rconv2(im, lo_filt) # NOTE this convolution is a slow, see corrDn instead
# rconv2: Convolution of two matrices, with boundaries handled via reflection about the edge pixels.

pt.imshow([im, blurred], title=['im', 'blurred']);
# help(pt.rconv2)
_images/tutorials_02_pyramids_4_0.png
[4]:
# subtracting the blurred image from the original leaves ONLY the fine scale detail:
fine0 = im - blurred
pt.imshow([fine0, blurred], title=['fine0', 'blurred']);
_images/tutorials_02_pyramids_5_0.png

The blurred and fine images contain all the information found in the original image. Trivially, adding the blurred image to the fine scale detail will reconstruct the original. We can compare the original image to the sum of blurred and fine using the “image_stats” function, which reports on the statistics of the difference between it’s arguments:

[5]:
# pt.imshow([im, blurred, fine0], title=None);
pt.image_compare(im, blurred+fine0);
Difference statistics:
  Range: [0, 0]
  Mean: 0.000000,  Stdev (rmse): 0.000000,  SNR (dB): inf

Since the filter is a lowpass filter, we might want to subsample the blurred image. This may cause some aliasing (depends on the filter), but the decomposition structure given above will still be possible. The corrDn function correlates (same as convolution, but flipped filter) and downsamples in a single operation (for efficiency). The string ‘reflect1’ tells the function to handle boundaries by reflecting the image about the edge pixels. Notice that the blurred1 image is half the size (in each dimension) of the original image.

[6]:
lo_filt = 2*binom5*binom5.T   # construct a separable 2D filter
blurred1 = pt.corrDn(image = im, filt = lo_filt, step = (2,2))
#corrDn: Compute correlation of matrices image with `filt, followed by downsampling.

pt.imshow([fine0, blurred1], title=['fine0', 'blurred1']);
_images/tutorials_02_pyramids_9_0.png
[7]:
fine1 = im - pt.upConv(image = blurred1, filt = lo_filt, step = (2, 2))
# upConv: Upsample matrix image, followed by convolution with matrix filt.
pt.imshow([fine1, blurred1], title=['fine1', 'blurred1']);
_images/tutorials_02_pyramids_10_0.png

We now have a technique that takes an image, computes two new images (blurred1 and fine1) containing the coarse scale information and the fine scale information. We can also (trivially) reconstruct the original from these two (even if the subsampling of the blurred1 image caused aliasing):

[8]:
recon = fine1 + pt.upConv(image=blurred1, filt=lo_filt, step=(2, 2),start=(0, 0), stop=(256, 256))
pt.image_compare(im, recon);
Difference statistics:
  Range: [0, 0]
  Mean: 0.000000,  Stdev (rmse): 0.000000,  SNR (dB): inf

Thus, we have described an INVERTIBLE linear transform that maps an input image to the two images blurred1 and fine1. The inverse transformation maps blurred1 and fine1 to the result. This is depicted graphically with a system diagram:

 (IM) --> blur/down2: (BLURRED1) -------> up2/blur -----> add --> (RECON)
  |                     |                                  ^
  |                     |                                  |
  |                     V                                  |
  |                  up2/blur                              |
  |                     |                                  |
  |                     |                                  |
  |                     V                                  |
   ---------------> subtract: (FINE1) ---------------------

Note that the number of samples in the representation (i.e., total samples in BLURRED1 and FINE1) is 1.5 times the number of samples in the original IM. Thus, this representation is OVERCOMPLETE.

[9]:
blurred2 = pt.corrDn(image = blurred1, filt = lo_filt, step = (2, 2))
pt.imshow(blurred2, 'auto', 4);
_images/tutorials_02_pyramids_14_0.png
[10]:
fine2 = blurred1 - pt.upConv(image = blurred2, filt = lo_filt, step = (2, 2))
pt.imshow(fine2, 'auto', 2);
_images/tutorials_02_pyramids_15_0.png

Since blurred2 and fine2 can be used to reconstruct blurred1, and blurred1 and fine1 can be used to reconstruct the original image, the set of THREE images (also known as “subbands”) {blurred2, fine2, fine1} constitute a complete representation of the original image. Note that the three subbands are displayed at the same size, but they are actually three different sizes.

[11]:
pt.imshow([fine1, fine2, blurred2], title=['fine1', 'fine2', 'blurred2'], col_wrap=3);
_images/tutorials_02_pyramids_17_0.png

It is useful to consider exactly what information is stored in each of the pyramid subbands. The reconstruction process involves recursively interpolating these images and then adding them to the image at the next finer scale. To see the contribution of ONE of the representation images (say blurred2) to the reconstruction, we imagine filling all the other subbands with zeros and then following our reconstruction procedure. For the blurred2 subband, this is equivalent to simply calling upConv twice:

[12]:
blurred2_tmp  = pt.upConv(image = blurred2, filt = lo_filt, step = (2, 2))
blurred2_full = pt.upConv(image = blurred2_tmp, filt = lo_filt, step = (2, 2))
pt.imshow([fine1, fine2, blurred2_full], title=['fine1', 'fine2', 'blurred2_full'], col_wrap=3);
_images/tutorials_02_pyramids_19_0.png
[13]:
# For the fine2 subband, this is equivalent to calling upConv once:
fine2_full = pt.upConv(image = fine2, filt = lo_filt, step = (2, 2))
pt.imshow([fine1, fine2_full, blurred2_full], title=['fine1', 'fine2_full', 'blurred2_full'], col_wrap=3);
_images/tutorials_02_pyramids_20_0.png
[14]:
# If we did everything correctly, we should be able to add together
# these three full-size images to reconstruct the original image:
recon = blurred2_full + fine2_full + fine1
pt.image_compare(im, recon);
Difference statistics:
  Range: [0, 0]
  Mean: 0.000000,  Stdev (rmse): 0.000000,  SNR (dB): inf
[15]:
pt.imshow([recon, im]);
_images/tutorials_02_pyramids_22_0.png

FUNCTIONS for CONSTRUCTING/MANIPULATING LAPLACIAN PYRAMIDS

We can continue this process, recursively splitting off finer and finer details from the blurred image (like peeling off the outer layers of an onion). The resulting data structure is known as a “Laplacian Pyramid”. To make things easier, we have written a function called Lpyr to construct this object. The function returns a long vector containing the subbands of the pyramid. The display routine showLpyr shows all the subbands of the pyramid, at their correct relative sizes. It should now be clearer why these data structures are called “pyramids”.

[16]:
pyr = pt.pyramids.LaplacianPyramid(im)
pt.pyrshow(pyr.pyr_coeffs);
_images/tutorials_02_pyramids_24_0.png
[17]:
# To access the coefficients of a single band, use: pyr.pyr_coeffs[(level, band)]
pt.imshow(pyr.pyr_coeffs[(1, 0)]);
_images/tutorials_02_pyramids_25_0.png
[18]:
# The reconLpyr function allows you to reconstruct from a laplacian pyramid.
# The third (optional) arg allows you to select any subset of pyramid bands
# (default is to use ALL of them).
pt.imshow(pyr.recon_pyr(levels=[0, 2]));  # reconstruct with just the first and third band
_images/tutorials_02_pyramids_26_0.png
[19]:
fullres = pyr.recon_pyr()
pt.imshow(fullres);
pt.image_compare(im,fullres);
Difference statistics:
  Range: [0, 0]
  Mean: 0.000000,  Stdev (rmse): 0.000000,  SNR (dB): inf
_images/tutorials_02_pyramids_27_1.png
[20]:
# buildLpyr uses 5-tap filters by default for building Laplacian
# pyramids.  You can specify other filters:
print (pt.named_filter('binom3'))

pyr3 = pt.pyramids.LaplacianPyramid(im, height=7, downsample_filter_name='binom3')
pt.pyrshow(pyr3.pyr_coeffs);

fullres3 = pyr3.recon_pyr(levels='all', upsample_filter_name='binom3')
pt.image_compare(im,fullres3);
[[0.35355339]
 [0.70710678]
 [0.35355339]]
Difference statistics:
  Range: [0, 0]
  Mean: 0.000000,  Stdev (rmse): 0.000000,  SNR (dB): inf
_images/tutorials_02_pyramids_28_1.png

Here we build a “Laplacian” pyramid using random filters. filt1 is used with the downsampling operations and filt2 is used with the upsampling operations. We normalize the filters for display purposes. Of course, these filters are (almost certainly) not very “Gaussian”, and the subbands of such a pyramid will be garbage! Nevertheless, it is a simple property of the Laplacian pyramid that we can use ANY filters and we will still be able to reconstruct perfectly. Indeed fine bands are produced by subtracting the parent from blurred, so adding them back gives the parent back.

[21]:
filt1 = np.random.rand(5,)
filt1 = np.sqrt(2)*filt1/sum(filt1)
plt.subplot(2,1,1)
plt.stem(filt1)
plt.subplot(2,1,2)
filt2 = np.random.rand(3,)
filt2 = np.sqrt(2)*filt2/sum(filt2)
plt.stem(filt2)
plt.show()
_images/tutorials_02_pyramids_30_0.png
[22]:
pyrr = pt.pyramids.LaplacianPyramid(im, height=7,
                                    downsample_filter_name=filt1,
                                    upsample_filter_name=filt2)
pt.pyrshow(pyrr.pyr_coeffs)

fullresr = pyrr.recon_pyr(filt2)

pt.image_compare(im, fullresr);
pt.imshow([im, fullresr]);
Difference statistics:
  Range: [0, 0]
  Mean: -0.000000,  Stdev (rmse): 0.000000,  SNR (dB): 320.773103
_images/tutorials_02_pyramids_31_1.png
_images/tutorials_02_pyramids_31_2.png

ALIASING in the Gaussian and Laplacian pyramids:

Unless one is careful, the subsampling operations will introduce aliasing artifacts in these pyramid transforms. This is true even though the Laplacian pyramid can be used to reconstruct the original image perfectly. When reconstructing, the pyramid is designed in such a way that these aliasing artifacts cancel out. So it’s not a problem if the only thing we want to do is reconstruct. However, it can be a serious problem if we intend to process each of the subbands independently.

One way to see the consequences of the aliasing artifacts is by examining variations that occur when the input is shifted. We choose an image and shift it by some number of pixels. Then blur (filter-downsample-upsample-filter) the original image and blur the shifted image. If there’s no aliasing, then the blur and shift operations should commute (i.e., shift-filter-downsample-upsample-filter is the same as filter-downsample-upsample-filter-shift). Try this for 2 different filters (by replacing ‘binom3’ with ‘binom5’ or ‘binom7’ below), and you’ll see that the aliasing is much worse for the 3 tap filter.

[23]:
sig = 100 * np.random.randn(1, 16)
sh = (0,7)   # shift amount
lev = 2      # level of pyramid to look at
#filter to use:
flt = pt.named_filter('binom3')
# flt = pt.named_filter('binom5')
# flt = pt.named_filter('binom7')

shiftIm = np.roll(sig,sh)
pyr = pt.pyramids.LaplacianPyramid(shiftIm, lev, flt, flt, 'circular')
shiftBlur = pyr.recon_pyr(flt, 'circular', lev-1)

pyr2 = pt.pyramids.LaplacianPyramid(sig, lev, flt, flt, 'circular')
res = pyr2.recon_pyr(flt, 'circular', lev-1)
blurShift = np.roll(res,sh)

plt.figure()
plt.plot(blurShift.T)
plt.plot(shiftBlur.T)
plt.legend(['blurShift', 'shiftBlur'])
plt.show()

pt.image_compare(blurShift,shiftBlur);
_images/tutorials_02_pyramids_33_0.png
Difference statistics:
  Range: [-37, 56]
  Mean: 0.000000,  Stdev (rmse): 27.396116,  SNR (dB): 4.143998
PROJECTION and BASIS functions:
An invertible, linear transform can be characterized in terms of a set of PROJECTION and BASIS functions.
In python matrix notation:
c = np.dot(P.T, x)
x = np.dot(B, c)

where x is an input, c are the transform coefficients, P and B are matrices. The columns of \(P^T\) are the projection functions (the input is projected onto the the columns of P to get each successive transform coefficient). The columns of B are the basis functions (x is a linear combination of the columns of B).

Since the Laplacian pyramid is a linear transform, we can ask: what are its BASIS functions? We consider these in one dimension for simplicity. The BASIS function corresponding to a given coefficient tells us how much that coefficient contributes to each pixel in the reconstructed image. We can construct a single basis function by setting one sample of one subband equal to 1.0 (and all others to zero) and reconstructing. To build the entire matrix, we have to do this for every sample of every subband:

[24]:
sz = 36
sig = np.zeros((1,sz))
pyr = pt.pyramids.LaplacianPyramid(sig,)

# find size of basis
Nelements = 0
for n in pyr.pyr_size.values():
    Nelements += np.prod(n)

basis = np.zeros((sz, Nelements))
ctr = 0

for k in pyr.pyr_coeffs.keys():
    for i in range(pyr.pyr_size[k][1]):
        pyr.pyr_coeffs[k][0, i] = 1
        basis[:,ctr] = pyr.recon_pyr().flatten()
        ctr += 1
        pyr.pyr_coeffs[k][0, i] = 0
pt.imshow(basis, 'auto', 4);
_images/tutorials_02_pyramids_36_0.png

So the first pixel of the reconstructed image, out of the 36 pixels, receives constributions from the first, 37th, 38th, 55th, 56th, 64th and 65th dimensions of the pyramid representation.

The columns of the basis matrix are the basis functions. The matrix is short and fat, corresponding to the fact that the representation is OVERCOMPLETE. Below, we plot the middle one from each subband, starting with the finest scale. Note that all of these basis functions are lowpass (Gaussian-like) functions.

[25]:
locations = pt.matlab_round(sz * (2 - 3./2.**np.array(range(1,len(pyr.pyr_size)+1))))
# locations, Nelements

for lev in range(len(locations)):
    plt.subplot(2,2,lev+1)
    plt.plot(basis[:, int(locations[lev])])
    plt.axis([0, sz, 0, 1.1])
plt.show()
_images/tutorials_02_pyramids_39_0.png

Now, we’d also like see the inverse (we’ll call them PROJECTION) functions. We need to ask how much of each sample of the input image contributes to a given pyramid coefficient. Thus, the matrix is constructed by building pyramids on the set of images with impulses at each possible location. The rows of this matrix are the projection functions.

[26]:
projection = np.zeros((Nelements,sz))
for pos in range(sz):
    pyr = pt.pyramids.LaplacianPyramid(pt.synthetic_images.impulse((1, sz), (0, pos)).T)
    proj = np.array([]).reshape((1, 0))
    for v in pyr.pyr_coeffs.values():
        proj = np.concatenate((proj, v.T), axis=1)
    projection[:,pos] = proj
pt.imshow(projection, 'auto', 4);
_images/tutorials_02_pyramids_41_0.png
[27]:
# Looking at successive rows of this matrix, we can visualize the subsampling operation.

plt.figure(figsize=(10,3))
# Non-zero values are shifted by 1 position.
plt.subplot(1,3,1)
plt.plot(projection[18,:])
plt.plot(projection[19,:])
# Non-zero values are shifted by 2 positions, rather than 1.
plt.subplot(1,3,2)
plt.plot(projection[45,:])
plt.plot(projection[46,:])
# Non-zero values are shifted by 4 positions, rather than 1.
plt.subplot(1,3,3)
plt.plot(projection[58,:])
plt.plot(projection[59,:])
plt.show()
_images/tutorials_02_pyramids_42_0.png

Building a pyramid corresponds to multiplication by the projection matrix. Reconstructing from this pyramid corresponds to multiplication by the basis matrix. Thus, the product of the two matrices (in this order) should be the identity matrix:

[28]:
pt.imshow(np.dot(basis, projection), 'auto', 4);
_images/tutorials_02_pyramids_44_0.png

We can plot a few example projection functions at different scales. Note that all of the projection functions are bandpass functions, except for the coarsest subband which is lowpass.

[29]:
for lev in range(locations.shape[0]):
    plt.subplot(2,2,lev+1)

    plt.plot(projection[int(locations[lev])])
    print('mean: ' , np.round(np.mean(projection[int(locations[lev])]),4))
    plt.axis([0, sz, -.3, 0.8])
plt.show()
mean:  -0.0
mean:  -0.0
mean:  -0.0
mean:  0.0786
_images/tutorials_02_pyramids_46_1.png

Now consider the frequency response of these functions, plotted over the range [-pi,pi]:

[30]:
for lev in range(len(locations)):
    plt.subplot(2,2,lev+1)
    proj = projection[int(locations[lev])]
    plt.plot(np.pi*np.array(range(-32,32))/32, np.fft.fftshift(np.abs(np.fft.fft(proj.T,64))))
    plt.axis([-np.pi, np.pi, -0.1, 3])
plt.show()
_images/tutorials_02_pyramids_48_0.png

The first projection function is highpass, and the second is bandpass. Both of these look something like the Laplacian (2nd derivative) of a Gaussian. The last is lowpass, as are the basis functions. Thus, the basic operation used to create each level of the pyramid involves a simple highpass/lowpass split.

QMF/WAVELET PYRAMIDS.

Two things about Laplacian pyramids are a bit unsatisfactory. First, there are more pixels (coefficients) in the representation than in the original image. Specifically, the 1-dimensional transform is overcomplete by a factor of 4/3, and the 2-dimensional transform is overcomplete by a factor of 2. Secondly, the “bandpass” images (fineN) do not segregate information according to orientation.

There are other varieties of pyramid. One type that arose in the speech coding community is based on a particular pairs of filters known as a “Quadrature Mirror Filters” or QMFs. These are closely related to Wavelets (essentially, they are approximate wavelet filters).

Recall that the Laplacian pyramid is formed by simple hi/low splitting at each level. The lowpass band is subsampled by a factor of 2, but the highpass band is NOT subsampled. In the QMF pyramid, we apply two filters (hi- and lo- pass) and subsample BOTH by a factor of 2, thus eliminating the excess coefficients of the Laplacian pyramid.

The two filters must have a specific relationship to each other. In particular, let n be an index for the filter samples. The highpass filter may be constructed from the lowpass filter by

  1. modulating (multiplying) by (-1)^n (equivalent to shifting by pi in the Fourier domain)

  2. flipping (i.e., reversing the order of the taps)

  3. spatially shifting by one sample.

Try to convince yourself that the resulting filters will always be orthogonal to each other (i.e., their inner products will be zero) when shifted by any multiple of two.

The function modulateFlip performs the first two of these operations. The third (spatial shifting) step is built into the convolution code.

[31]:
flo = pt.named_filter('qmf9').T
# modulateFlip is a hidden method of the WaveletPyramid class
fhi = pt.pyramids.WaveletPyramid._modulate_flip(flo)
x = np.arange(9)

plt.subplot(2,1,1)
plt.stem(x, flo.T)
plt.axis([-1, 9, -0.5, 1.0])
plt.gca().set_title('lowpass')
plt.subplot(2,1,2)
plt.stem(x, fhi.T)
plt.axis([-1, 9, -0.5, 1.0])
plt.gca().set_title('highpass')

plt.tight_layout()

_images/tutorials_02_pyramids_52_0.png

In the Fourier domain, these filters are (approximately) “power-complementary”: the sum of their squared power spectra is (approximately) a constant. But note that neither is a perfect bandlimiter (i.e., a sinc function), and thus subsampling by a factor of 2 will cause aliasing in each of the subbands. See below for a discussion of the effect of this aliasing.

[32]:
# Plot the two frequency responses:
freq = np.pi*np.array(range(-32,32))/32
flo_sp = np.fft.fftshift(np.abs(np.fft.fft(flo,64)))[0]
fhi_sp = np.fft.fftshift(np.abs(np.fft.fft(fhi,64)))[0]

plt.subplot(2,1,1)
plt.plot(freq, flo_sp, '--',
         freq, fhi_sp,'-')
plt.axis([-np.pi, np.pi, 0, 1.5])
plt.gca().set_title('FFT magnitudes')
plt.subplot(2,1,2)
plt.plot(freq, flo_sp ** 2 + fhi_sp ** 2)
plt.axis([-np.pi, np.pi, 0, 2.2])
plt.gca().set_title('Sum of squared magnitudes')
plt.tight_layout()
_images/tutorials_02_pyramids_54_0.png
[33]:
# We can split an input signal into two bands as follows:
sig = pt.synthetic_images.pink_noise((1,64), 1.6)
plt.subplot(2,1,1)
plt.plot(sig.T)
plt.gca().set_title('signal')
plt.subplot(2,1,2)
lo1 = pt.corrDn(image = sig, filt = flo, step = (1, 2)) #subsampling by a factor of 2 (step=2)
plt.plot(lo1.T)
hi1 = pt.corrDn(image = sig, filt = fhi, step = (1, 2), start = (0, 1)) #subsampling by a factor of 2 (step=2)
plt.plot(hi1.T)
plt.legend(['low band','high band'])
plt.tight_layout()
_images/tutorials_02_pyramids_55_0.png
Notice that the two subbands are half the size of the original image, due to the subsampling by a factor of 2.
One subtle point: the highpass and lowpass bands are subsampled on different lattices: the lowpass band retains the odd-numbered samples and the highpass band retains the even-numbered samples. This was the 1-sample shift relating the high and lowpass kernels (mentioned above).
We’ve used the ‘reflect1’ to handle boundaries, which works properly for symmetric odd-length QMFs.
[34]:
# We can reconstruct the original image by interpolating these two subbands
# USING THE SAME FILTERS:
reconlo = pt.upConv(image = lo1, filt = flo, step = (1, 2))
reconhi = pt.upConv(image = hi1, filt = fhi, step = (1, 2), start = (0, 1))
plt.plot(sig.T)
plt.plot((reconlo+reconhi).T, '--')
plt.show()

pt.image_compare(sig, (reconlo+reconhi));
_images/tutorials_02_pyramids_57_0.png
Difference statistics:
  Range: [0, 0]
  Mean: 0.000218,  Stdev (rmse): 0.000901,  SNR (dB): 60.907085

We have described an INVERTIBLE linear transform that maps an input image to the two images lo1 and hi1. The inverse transformation maps these two images to the result. This is depicted graphically with a system diagram:

IM ---> flo/down2 --> LO1 --> up2/flo --> add --> RECON
    |                                      ^
    |                                     |
    |                                     |
     -> fhi/down2 --> HI1 --> up2/fhi -----

Note that the number of samples in the representation (i.e., total samples in LO1 and HI1) is equal to the number of samples in the original IM. Thus, this representation is exactly COMPLETE, or “critically sampled”.

So we’ve fixed one of the problems that we had with Laplacian pyramid. But the system diagram above places strong constraints on the filters. In particular, for these filters the reconstruction is no longer perfect. Turns out there are NO perfect-reconstruction symmetric filters that are power-complementary, except for the trivial case [1] and the nearly-trivial case [1 1]/sqrt(2).

Let’s consider the projection functions of this 2-band splitting operation. We can construct these by applying the transform to impulse input signals, for all possible impulse locations. The rows of the following matrix are the projection functions for each coefficient in the transform.

[35]:
M = np.concatenate((pt.corrDn(image = np.identity(32), filt = flo, edge_type = 'circular', step = (2, 1)),
                    pt.corrDn(image = np.identity(32), filt = fhi, edge_type = 'circular', step = (2, 1),
                               start = (1, 0))))
pt.imshow(M, vrange='auto', zoom=5, title="M");
_images/tutorials_02_pyramids_60_0.png

The transform matrix is composed of two sub-matrices. The top half contains the lowpass kernel, shifted by increments of 2 samples. The bottom half contains the highpass. Now we compute the inverse of this matrix:

[36]:
M_inv = np.linalg.inv(M)
pt.imshow(M_inv, vrange='auto', zoom=5, title="M_inv");
_images/tutorials_02_pyramids_62_0.png
[37]:
# The inverse is (very close to) the transpose of the original
# matrix!  In other words, the transform is orthonormal.

pt.image_compare(M_inv.T,M);

pt.imshow(np.dot(M_inv,M), zoom=5);
Difference statistics:
  Range: [0, 0]
  Mean: -0.000004,  Stdev (rmse): 0.000263,  SNR (dB): 56.495757
_images/tutorials_02_pyramids_63_1.png
This also points out a nice relationship between the corrDn and upConv functions, and the matrix representation:
- corrDn is equivalent to multiplication by a matrix with copies of the filter on the ROWS, translated in multiples of the downsampling factor. - upConv is equivalent to multiplication by a matrix with copies of the filter on the COLUMNS, translated by the upsampling factor.

Multiplying the coefficient vector, which consists of concatenation of low pass and high pass coefficients, with the inverse (=transpose) matrix is the same as upsampling and convolving (i.e. UpConv) with the same decomposition filters. Indeed each rows of the inverse matrix is a subsampled shifted copy of low and high pass filters. When this is multiplied by the vector of coefficients, it is equivalent to upsampling the vector of coeffs (by inserting zeros) and convolving the first half (low pass) with low pass filter and the second half with the high pass filter and then sum them up. This makes sense, because in order to go back to the original signal, we need a weighted sum of the low pass and high pass coeffs.

As in the Laplacian pyramid, we can recursively apply this QMF band-splitting operation to the lowpass band:

[38]:
lo2 = pt.corrDn(image = lo1, filt = flo, step = (1, 2))
hi2 = pt.corrDn(image = lo1, filt = fhi, step = (1, 2), start = (0, 1))

The representation of the original signal is now comprised of the three subbands {hi1, hi2, lo2} (we don’t hold onto lo1, because it can be reconstructed from lo2 and hi2). Note that hi1 is at 1/2 resolution, and hi2 and lo2 are at 1/4 resolution: The total number of samples in these three subbands is thus equal to the number of samples in the original signal.

[39]:
plt.figure(figsize = (5,5))
plt.subplot(4,1,1)
plt.plot(hi1.T)
plt.gca().set_title('hi1')
plt.axis([1, len(hi1[0]), 1.1*hi1.min(), 1.1*hi1.max()])
plt.subplot(4,1,2)
plt.plot(hi2.T)
plt.gca().set_title('hi2')
plt.axis([1, hi2.shape[1], 1.1*hi2.min(), 1.1*hi2.max()])
plt.subplot(4,1,3)
plt.plot(lo2.T)
plt.gca().set_title('lo2')
plt.axis([1, lo2.shape[1], 1.1*lo2.min(), 1.1*lo2.max()])
plt.tight_layout()
plt.subplot(4,1,4)
plt.plot(sig.T, 'k')
plt.gca().set_title('signal')
plt.axis([1, sig.shape[1], 1.1*sig.min(), 1.1*sig.max()])
plt.tight_layout()
_images/tutorials_02_pyramids_69_0.png

Reconstruction proceeds as with the Laplacian pyramid: combine lo2 and hi2 to reconstruct lo1, which is then combined with hi1 to reconstruct the original signal:

[40]:
recon_lo1 = ( pt.upConv(image = hi2, filt = fhi, step = (1, 2), start = (0, 1)) +
              pt.upConv(image = lo2, filt = flo, step = (1, 2), start = (0, 0)) )
reconstructed = ( pt.upConv(image = hi1, filt = fhi, step = (1, 2), start = (0, 1)) +
                  pt.upConv(image = recon_lo1, filt = flo, step = (1, 2), start = (0, 0)) )
pt.image_compare(sig,reconstructed);
Difference statistics:
  Range: [0, 0]
  Mean: 0.000337,  Stdev (rmse): 0.001278,  SNR (dB): 57.867947
FUNCTIONS for CONSTRUCTING/MANIPULATING QMF/Wavelet PYRAMIDS

To make things easier, we have bundled these qmf operations and data structures into an object in Python.

[41]:
# # If we feel like trying on another signal
# sig = pt.synthetic_images.pink_noise((1, 64), 1.5)
# plt.figure()
# plt.plot(sig.T)
# plt.show()
[42]:
pyr = pt.pyramids.WaveletPyramid(sig)
fig = pt.pyrshow(pyr.pyr_coeffs)
_images/tutorials_02_pyramids_74_0.png
[43]:
# a different plot of the same thing
for i, (k, v) in enumerate(pyr.pyr_coeffs.items()):
    plt.subplot(pyr.num_scales+1, 1, i+1)
    x = np.array(range(pyr.pyr_size[k][1]))
    y = v.flatten()
    plt.stem(x, y)
    plt.axis([-1, pyr.pyr_size[k][1], 1.5*min(min(v)), 1.5*max(max(v))])
plt.tight_layout()
_images/tutorials_02_pyramids_75_0.png
[44]:
res = pyr.recon_pyr()
pt.image_compare(sig, res);
Difference statistics:
  Range: [0, 0]
  Mean: 0.000573,  Stdev (rmse): 0.001662,  SNR (dB): 55.586728

Now for 2D, we use separable filters. There are 4 ways to apply the two filters to the input image (followed by the relavent subsampling operation):

  1. lowpass in both x and y

  2. lowpass in x and highpass in y

  3. lowpass in y and highpass in x

  4. highpass in both x and y.

The pyramid is built by recursively subdividing the first of these bands into four new subbands.

[45]:
# First, we'll take a look at some of the basis functions.
sz = 40
zim = np.zeros((sz,sz))
flo = 'qmf9'
edge_type = 'reflect1'
pyr = pt.pyramids.WaveletPyramid(zim)
[46]:
### Put an  impulse into the middle of each band:
for k, v in pyr.pyr_size.items():
    mid = (v[0]//2, v[1]//2)
#     print(lev, mid)
    pyr.pyr_coeffs[k][mid] = 1
[47]:
# And take a look at the reconstruction of each band:
reconList = []
for k in pyr.pyr_coeffs.keys():
    if isinstance(k, tuple):
        reconList.append(pyr.recon_pyr(flo, edge_type, k[0], k[1]))
    else:
        reconList.append(pyr.recon_pyr(flo, edge_type, k))
pt.imshow(reconList, vrange='indep1', zoom=4, col_wrap=3);
_images/tutorials_02_pyramids_80_0.png

Note that the first column contains horizontally oriented basis functions at different scales. The second contains vertically oriented basis functions. The third contains both diagonals (a checkerboard pattern). The bottom row shows a lowpass basis function.

Now look at the corresponding Fourier transform magnitudes (these are plotted over the frequency range [-pi, pi] ):

[48]:
freq = 2 * np.pi * np.array(range(-sz//2,(sz//2)))/sz
imgList = []
for k in pyr.pyr_coeffs.keys():
    if isinstance(k, tuple):
        basisFn = pyr.recon_pyr(flo, edge_type, k[0], k[1])
    else:
        basisFn = pyr.recon_pyr(flo, edge_type, k)
    basisFmag = np.fft.fftshift(np.abs(np.fft.fft2(basisFn,(sz,sz))))
    imgList.append(basisFmag)

pt.imshow(imgList, vrange='indep1', zoom=4, col_wrap=3);
_images/tutorials_02_pyramids_82_0.png
[49]:
# The filters at a given scale sum to a squarish annular region:
sumSpectra = np.zeros((sz,sz))
lnum = 2;
for bnum in range(3):
    basisFn = pyr.recon_pyr(flo, edge_type, [lnum], [bnum])
    basisFmag = np.fft.fftshift(np.abs(np.fft.fft2(basisFn,(sz,sz))))
    sumSpectra = basisFmag**2 + sumSpectra;
pt.imshow(sumSpectra, vrange='auto', zoom=4, title='one scale');
_images/tutorials_02_pyramids_83_0.png
[50]:
# Now decompose an image:
pyr = pt.pyramids.WaveletPyramid(im/255)
[51]:
# View all of the subbands (except lowpass), scaled to be the same size
# (requires a big figure window):
pt.pyrshow(pyr.pyr_coeffs, zoom=1);
_images/tutorials_02_pyramids_85_0.png
[52]:
# The recon_pyr function can be used to reconstruct the entire pyramid:
reconstructed = pyr.recon_pyr()
pt.imshow([im,reconstructed]);
pt.image_compare(im/255,reconstructed);
Difference statistics:
  Range: [0, 0]
  Mean: -0.000763,  Stdev (rmse): 0.000334,  SNR (dB): 53.416318
_images/tutorials_02_pyramids_86_1.png
[53]:
# As with Laplacian pyramids, you can specify sub-levels and subbands
# to be included in the reconstruction.  For example:
pt.imshow(pyr.recon_pyr('qmf9', 'reflect1', levels='all', bands=[0]))  # Horizontal only, all scales
pt.imshow(pyr.recon_pyr('qmf9', 'reflect1', [1,2]));  # two middle scales
_images/tutorials_02_pyramids_87_0.png
_images/tutorials_02_pyramids_87_1.png
PERFECT RECONSTRUCTION: HAAR AND DEBAUCHIES WAVELETS

The symmetric QMF filters used above are not perfectly orthogonal. In fact, it’s impossible to construct a symmetric filter of size greater than 2 that is perfectly orthogonal to shifted copies (shifted by multiples of 2) of itself. For example, consider a symmetric kernel of length 3. Shift by two and the right end of the original kernel is aligned with the left end of the shifted one. Thus, the inner product of these two will be the square of the end tap, which will be non-zero.

However, one can easily create wavelet filters of length 2 that will do the job. This is the oldest known wavelet, known as the “Haar”. The two kernels are [1,1]/sqrt(2) and [1,-1]/sqrt(2). These are trivially seen to be orthogonal to each other, and shifts by multiples of two are also trivially orthogonal. The projection functions of the Haar transform are in the rows of the following matrix, constructed by applying the transform to impulse input signals, for all possible impulse locations:

[54]:
haarLo = pt.named_filter('haar')
haarHi = pt.pyramids.WaveletPyramid._modulate_flip(haarLo)
x = np.array(range(len(haarLo)))
plt.subplot(2,1,1)
plt.stem(x,haarLo)
plt.axis([-0.2, 1.2, -1, 1])
plt.gca().set_title('lowpass')
plt.subplot(2,1,2)
plt.stem(x, haarHi)
plt.axis([-0.2, 1.2, -1, 1])
plt.gca().set_title('highpass')
plt.tight_layout()
_images/tutorials_02_pyramids_89_0.png
[55]:
M = np.concatenate((pt.corrDn(np.identity(32), haarLo, step=(1, 2)),
                    pt.corrDn(np.identity(32), haarHi.T, step=(1, 2), start=(0, 1))), axis=1).T
pt.imshow(M, 'auto', 5)
pt.imshow(np.dot(M, M.T), 'auto', 5);   # identity
_images/tutorials_02_pyramids_90_0.png
_images/tutorials_02_pyramids_90_1.png
[56]:
# As before, the filters are power-complementary (although the
# frequency isolation is rather poor, and thus the subbands will be
# heavily aliased):
# And the sum of the power spectra is again flat

freq = np.pi*np.array(range(-32,32))/32
flo_sp = np.fft.fftshift(np.abs(np.fft.fft( haarLo.T, 64)))[0]
fhi_sp = np.fft.fftshift(np.abs(np.fft.fft( haarHi.T, 64)))[0]

plt.subplot(3,1,1)
plt.plot(freq, flo_sp, '--',
         freq, fhi_sp,'-')
plt.axis([-np.pi, np.pi, 0, 1.5])
plt.gca().set_title('Fourier magnitudes')
plt.subplot(3,1,2)
plt.plot(freq,flo_sp**2,'--')
plt.plot(freq,fhi_sp**2,'-')
plt.gca().set_title('squared Fourier magnitudes')
plt.subplot(3,1,3)
plt.plot(freq, flo_sp ** 2 + fhi_sp ** 2)
plt.axis([-np.pi, np.pi, 0, 2.2])
plt.gca().set_title('Sum of squared Fourier magnitudes')
plt.tight_layout()
_images/tutorials_02_pyramids_91_0.png
[57]:
sig = pt.synthetic_images.pink_noise((1,64),0.5)
pyr = pt.pyramids.WaveletPyramid(sig, 4, 'haar', 'reflect1')

pt.pyrshow(pyr.pyr_coeffs);
_images/tutorials_02_pyramids_92_0.png
[58]:
# check perfect reconstruction:
res = pyr.recon_pyr('haar', 'reflect1')
pt.image_compare(sig,res);

plt.plot(sig.T)
plt.plot(res.T, '--')
plt.show()
Difference statistics:
  Range: [0, 0]
  Mean: 0.000000,  Stdev (rmse): 0.000000,  SNR (dB): 302.340511
_images/tutorials_02_pyramids_93_1.png

If you want perfect reconstruction, but don’t like the Haar transform, there’s another option: drop the symmetry requirement. Ingrid Daubechies developed one of the earliest sets of such perfect-reconstruction wavelets. The simplest of these is of length 4:

[59]:
daub_lo = pt.named_filter('daub2')
daub_hi = pt.pyramids.WaveletPyramid._modulate_flip(daub_lo)
# The daub_lo filter is constructed to be orthogonal to 2 shifted
# copy of itself.  For example:
print (np.dot(np.concatenate((daub_lo,[[0],[0]])).T, np.concatenate(([[0],[0]],daub_lo))))

M = np.concatenate((pt.corrDn(image = np.identity(32), filt = daub_lo, edge_type = 'circular', step = (2, 1), start = (1, 0)),
                    pt.corrDn(image = np.identity(32), filt = daub_hi, edge_type = 'circular', step = (2, 1), start = (1, 0))))
pt.imshow(M, 'auto', 5)
pt.imshow(np.dot(M, M.T), 'auto', 5);
[[2.90920066e-13]]
_images/tutorials_02_pyramids_95_1.png
_images/tutorials_02_pyramids_95_2.png
[60]:
plt.stem(M[10])
[60]:
<StemContainer object of 3 artists>
_images/tutorials_02_pyramids_96_1.png
[61]:
# Again, they're power complementary and sum of the power spectra is flat

freq = np.pi*np.array(range(-32,32))/32
flo_sp = np.fft.fftshift(np.abs(np.fft.fft(daub_lo.T,64)))[0]
fhi_sp = np.fft.fftshift(np.abs(np.fft.fft(daub_hi.T,64)))[0]

plt.subplot(3,1,1)
plt.plot(freq, flo_sp, '--',
         freq, fhi_sp,'-')
plt.axis([-np.pi, np.pi, 0, 1.5])
plt.gca().set_title('Fourier magnitudes')
plt.subplot(3,1,2)
plt.plot(freq,flo_sp**2,'--')
plt.plot(freq,fhi_sp**2,'-')
plt.gca().set_title('squared Fourier magnitudes')
plt.subplot(3,1,3)
plt.plot(freq, flo_sp ** 2 + fhi_sp ** 2)
plt.axis([-np.pi, np.pi, 0, 2.2])
plt.gca().set_title('Sum of squared Fourier magnitudes')
plt.tight_layout()
_images/tutorials_02_pyramids_97_0.png
[62]:
# Make a pyramid using the same code as before (except that we can't
# use reflected boundaries with asymmetric filters):
pyr = pt.pyramids.WaveletPyramid(sig, filter_name=daub_lo, edge_type='circular')

pt.pyrshow(pyr.pyr_coeffs, zoom=2);
_images/tutorials_02_pyramids_98_0.png
[63]:
res = pyr.recon_pyr(daub_lo,'circular')
pt.image_compare(sig,res);

plt.plot(sig.T)
plt.plot(res.T, '--')
plt.show()
Difference statistics:
  Range: [0, 0]
  Mean: -0.000000,  Stdev (rmse): 0.000000,  SNR (dB): 227.146682
_images/tutorials_02_pyramids_99_1.png
ALIASING IN WAVELET TRANSFORMS

aka. what’s wrong with wavelets

All of these orthonormal pyramid/wavelet transforms have a lot of aliasing in the subbands. You can see that in the frequency response plots since the frequency response of each filter covers well more than half the frequency domain. The aliasing can have serious consequences…

[64]:
# Get one of the basis functions of the 2D Daubechies wavelet transform:
filt = pt.named_filter('daub4')
# filt = pt.named_filter('qmf13')
pyr = pt.pyramids.WaveletPyramid(np.zeros((1,64)),3,filt,'circular')
lev = 2;
mid = (0, int(np.ceil(pyr.pyr_size[(lev, 0)][1]/2.0)))
pyr.pyr_coeffs[(lev, 0)][mid] = 1
sig = pyr.recon_pyr(filt,'circular')
x = np.arange(64).reshape(64,1)
plt.stem(x,sig.T)
plt.axis([-2, 66, -0.65, 0.65])
plt.show()
_images/tutorials_02_pyramids_101_0.png
[65]:
# Since the basis functions are orthonormal, building a pyramid using this
# input will yield a single non-zero coefficient.
pyr = pt.pyramids.WaveletPyramid(sig, 3, filt, 'circular')

for i, (k, v) in enumerate(pyr.pyr_coeffs.items()):
    plt.subplot(pyr.num_scales+1, 1, i+1)
    x = np.array(range(pyr.pyr_size[k][1]))
    y = v.flatten()
    plt.stem(x, y)
    plt.axis([-1, pyr.pyr_size[k][1], -.3, 1.3])
plt.tight_layout()
_images/tutorials_02_pyramids_102_0.png
[66]:
# Now shift the input by one sample and re-build the pyramid.
shifted_sig = np.roll(sig,1)
spyr = pt.pyramids.WaveletPyramid(shifted_sig, 3, filt, 'circular')

# Plot each band of the unshifted and shifted decomposition
for i, (k, v) in enumerate(spyr.pyr_coeffs.items()):
    plt.subplot(spyr.num_scales+1, 1, i+1)
    x = np.array(range(spyr.pyr_size[k][1]))
    y = v.flatten()
    plt.stem(x, y)
    plt.axis([-1, spyr.pyr_size[k][1], -.3, 1.3])
plt.tight_layout()
_images/tutorials_02_pyramids_103_0.png

In the third band, we expected the coefficients to move around because the signal was shifted. But notice that in the original signal decomposition, the other bands were filled with zeros. After the shift, they have significant content. Although these subbands are supposed to represent information at different scales, their content also depends on the relative POSITION of the input signal.

This problem is not unique to the Daubechies transform. The same is true for the QMF transform. Try it… In fact, the same kind of problem occurs for almost any orthogonal pyramid transform (the only exception is the limiting case in which the filter is a sinc function).

Orthogonal pyramid transforms are not shift-invariant. Although orthogonality may be an important property for some applications (e.g., data compression), orthogonal pyramid transforms are generally not so good for image analysis.

The overcompleteness of the Laplacian pyramid turns out to be a good thing in the end. By using an overcomplete representation (and by choosing the filters properly to avoid aliasing as much as possible), you end up with a representation that is useful for image analysis.

[ ]:

[1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

import pyrtools as pt
%load_ext autoreload
%autoreload 2

Steerable pyramids

The steerable pyramid transform given below is described in: E P Simoncelli and W T Freeman.
The Steerable Pyramid: A Flexible Architecture for Multi-Scale Derivative Computation.
IEEE Second Int’l Conf on Image Processing.
Washington DC, October 1995.
Online access:

The steerable pyramid is a multi-scale representation that is translation-invariant, but that also includes representation of orientation. Furthermore, the representation of orientation is designed to be rotation-invariant. The basis/projection functions are oriented (steerable) filters, localized in space and frequency. It is overcomplete to avoid aliasing. And it is “self-inverting” (like the QMF/Wavelet transform): the projection functions and basis functions are identical. The mathematical phrase for a transform obeying this property is “tight frame”.

The system diagram for the steerable pyramid (described in the reference given below) is as follows:

 IM ---> fhi0 -----------------> H0 ---------------- fhi0 ---> RESULT
     |                                                     |
     |                                                     |
     |-> flo0 ---> fl1/down2 --> L1 --> up2/fl1 ---> flo0 -|
               |                                 |
               |----> fb0 -----> B0 ----> fb0 ---|
               |                                 |
               |----> fb1 -----> B1 ----> fb1 ---|
               .                                 .
               .                                 .
               |----> fbK -----> BK ----> fbK ---|

The filters {fhi0,flo0} are used to initially split the image into a highpass residual band H0 and a lowpass subband. This lowpass band is then split into a low(er)pass band L1 and K+1 oriented subbands {B0,B1,…,BK}. The representatation is substantially overcomplete. The pyramid is built by recursively splitting the lowpass band (L1) using the inner portion of the diagram (i.e., using the filters {fl1,fb0,fb1,…,fbK}). The resulting transform is overcomplete by a factor of 4k/3.

The scale tuning of the filters is constrained by the recursive system diagram. The orientation tuning is constrained by requiring the property of steerability. A set of filters form a steerable basis if they 1) are rotated copies of each other, and 2) a copy of the filter at any orientation may be computed as a linear combination of the basis filters. The simplest examples of steerable filters is a set of N+1 Nth-order directional derivatives.

Spatial Domain construction

Choose a filter set

options are: ‘sp0_filters’, ‘sp1_filters’, ‘sp3_filters’, ‘sp5_filters’

[2]:
filters = pt.steerable_filters('sp3_filters')

fsz = int(np.round(np.sqrt(filters['bfilts'].shape[0])))
fsz = np.array([fsz, fsz])
nfilts = filters['bfilts'].shape[1]
nrows = int(np.floor(np.sqrt(nfilts)))

import scipy.signal as sps
# Look at the oriented bandpass filters:
filtList = []
for f in range(nfilts):
    filtList.append(sps.convolve2d(filters['bfilts'][:,f].reshape(fsz), filters['lo0filt']))

pt.imshow(filtList, vrange='auto', zoom=3);
_images/tutorials_03_steerable_pyramids_4_0.png
[3]:
# Try "steering" to a new orientation (new_ori in degrees):
new_ori = 22
# new_ori = 360*np.random.rand(1)[0]

pt.imshow( sps.convolve2d( pt.pyramids.steer(filters['bfilts'],
                                      new_ori * np.pi/180).reshape(fsz),
                           filters['lo0filt']), 'auto', zoom=3);
_images/tutorials_03_steerable_pyramids_5_0.png
[4]:
# Look at Fourier transform magnitudes:
lo0filt = filters['lo0filt']
bfilts = filters['bfilts']
lo0 = np.fft.fftshift(np.abs(np.fft.fft2(filters['lo0filt'], (64,64)))) # zero padding
fsum = np.zeros(lo0.shape)
imgList = []
for f in range(bfilts.shape[1]):
    flt = bfilts[:,f].reshape(fsz)
    freq = lo0 * np.fft.fftshift(np.abs(np.fft.fft2(flt, (64,64))))
    fsum += freq**2
    imgList.append(freq)

pt.imshow(imgList, 'auto', zoom=3, col_wrap= 4);
_images/tutorials_03_steerable_pyramids_6_0.png
[5]:
# The filters sum to a smooth annular ring:
pt.imshow(fsum, 'auto', 3);
_images/tutorials_03_steerable_pyramids_7_0.png
Visualizing steerable pyramid coefficients of an image
[6]:
im = plt.imread('../DATA/Curie.pgm').astype(float)
# im = pt.blurDn(im, 1, 'qmf9')
# pt.imshow(im);

filt = 'sp3_filters' # There are 4 orientations for this filter
pyr = pt.pyramids.SteerablePyramidSpace(im, height=4, order=3)
[7]:
# Look at first (vertical) bands, different scales:
imgList = []
for s in range(pyr.num_scales):
    band = pyr.pyr_coeffs[(s,0)]
    imgList.append(band)
pt.imshow(imgList, col_wrap=4);
_images/tutorials_03_steerable_pyramids_10_0.png
[8]:
# look at all orientation bands at one level (scale):
imgList = []
for b in range(pyr.num_orientations):
    band = pyr.pyr_coeffs[(1,b)]
    imgList.append(band)

pt.imshow(imgList, zoom=2, col_wrap=4);
_images/tutorials_03_steerable_pyramids_11_0.png
[9]:
# look at one level and band (ie. scale and orientation):
pt.imshow(pyr.pyr_coeffs[(3,1)], 'auto', zoom=4);
_images/tutorials_03_steerable_pyramids_12_0.png
[10]:
# To access the high-pass and low-pass bands:
low = pyr.pyr_coeffs['residual_lowpass']
high = pyr.pyr_coeffs['residual_highpass']
pt.imshow([low, high]);
_images/tutorials_03_steerable_pyramids_13_0.png
[11]:
# Display the whole pyramid (except for the highpass residual band),
# Note that images are shown at same size for ease of visulization:
pt.pyrshow(pyr.pyr_coeffs);
_images/tutorials_03_steerable_pyramids_14_0.png
Steering

Spin a level of the pyramid, interpolating (steering to) intermediate orienations

[12]:
s = 2 # pick a scale
lev = np.array([pyr.pyr_coeffs[(s, i)] for i in range(pyr.num_orientations)])
n = lev[0].shape[0] * lev[0].shape[1]
# create a matrix containing outputs of the pyramid in long columns
lev2 = np.concatenate((lev[0].reshape((n,1)),
                       lev[1].reshape((n,1)),
                       lev[2].reshape((n,1)),
                       lev[3].reshape((n,1))),
                      axis=1)
# print(lev.shape, lev2.shape)

filters = pt.steerable_filters(filt)

k = 32
M = np.empty((k, lev.shape[1], lev.shape[2]))

for frame in range(k):
    steered_im = pt.pyramids.steer(lev2, 2 * np.pi*frame/k,
                                   filters['harmonics'],
                                   filters['mtx']).reshape(lev.shape[-2:])
    M[frame] = steered_im
[13]:
# NOTE: to show the animation properly, make sure ffmpeg is installed. For anaconda users do:
# conda install -c conda-forge ffmpeg

pt.animshow(M, framerate=6, zoom=4, repeat=True)
[13]:
Reconstruction

Note that the filters are not perfect, although they are good enough for most applications.

[14]:
res = pyr.recon_pyr()
pt.imshow([im, res, im - res]); # note the different range of the error subplot
pt.image_compare(im, res);
Difference statistics:
  Range: [-113, 22]
  Mean: -0.770771,  Stdev (rmse): 7.192672,  SNR (dB): 21.011911
_images/tutorials_03_steerable_pyramids_19_1.png

As with previous pyramids, you can select subsets of the levels and orientation bands to be included in the reconstruction. For example:

[15]:
# All levels (including highpass and lowpass residuals), one orientation:
pt.imshow(pyr.recon_pyr(levels='all', bands=[0]));
_images/tutorials_03_steerable_pyramids_21_0.png
[16]:
# Without the highpass and lowpass:
pt.imshow(pyr.recon_pyr(levels=range(pyr.num_scales), bands=[0]));
_images/tutorials_03_steerable_pyramids_22_0.png

Frequency domain construction

We also provide an implementation of the Steerable pyramid in the Frequency domain. The advantages are perfect-reconstruction (within floating-point error), and any number of orientation bands. The disadvantages are that it is typically slower, and the boundary handling is always circular.

Impulse response
[17]:
sz = 128
empty_image = np.zeros((sz,sz))
pyr = pt.pyramids.SteerablePyramidFreq(empty_image, height='auto', order=3)

### Put an  impulse into the middle of each band:
for k, v in pyr.pyr_size.items():
    mid = (v[0]//2, v[1]//2)
#     print(lev, mid)
    pyr.pyr_coeffs[k][mid] = 1
# pt.pyrshow(pyr.pyr_coeffs, vrange='indep1');

# And take a look at the reconstruction of each band:
reconList = []
for k in pyr.pyr_coeffs.keys():
    if isinstance(k, tuple):
        reconList.append(pyr.recon_pyr(k[0], k[1]))
for k in ['residual_highpass', 'residual_lowpass']:
    reconList.append(pyr.recon_pyr(k))

pt.imshow(reconList, col_wrap=pyr.num_orientations, vrange='indep1');
_images/tutorials_03_steerable_pyramids_24_0.png
[18]:
# now in the frequency domain
freq = 2 * np.pi * np.array(range(-sz//2,(sz//2)))/sz
imgList = []
for k in pyr.pyr_coeffs.keys():
    if isinstance(k, tuple):
        basisFn = pyr.recon_pyr(k[0], k[1])
        basisFmag = np.fft.fftshift(np.abs(np.fft.fft2(basisFn, (sz,sz))))
        imgList.append(basisFmag)

for k in ['residual_highpass', 'residual_lowpass']:
    basisFn = pyr.recon_pyr(k)
    basisFmag = np.fft.fftshift(np.abs(np.fft.fft2(basisFn, (sz,sz))))
    imgList.append(basisFmag)

pt.imshow(imgList, col_wrap=pyr.num_orientations);
_images/tutorials_03_steerable_pyramids_25_0.png
Visualization and reconstruction
[19]:
# 4 levels, 5 orientation bands
pyr = pt.pyramids.SteerablePyramidFreq(im, height=4, order=4)
pt.pyrshow(pyr.pyr_coeffs);
_images/tutorials_03_steerable_pyramids_27_0.png
[20]:
res = pyr.recon_pyr()
pt.image_compare(im,res);  # nearly perfect

pt.imshow([im, res, im-res]); # note the different range of the error subplot
Difference statistics:
  Range: [0, 0]
  Mean: 0.000000,  Stdev (rmse): 0.000334,  SNR (dB): 107.682181
_images/tutorials_03_steerable_pyramids_28_1.png
Steerability in the frequency domain

Just as in the spatial domain, this frequency domain construction of the pyramid can be steered to other orientations than those used when constructing the pyramid.

We use again the steer_coeffs method, and visualize it’s effect both in the spatial and frequency domains.

[21]:
num_orientations_steered = 120
steered_coeffs, steering_vectors = pyr.steer_coeffs([i*np.pi/180
                                            for i in np.linspace(-180, 180, num_orientations_steered)])

steered_coeffs_dft = {}
for k, v in steered_coeffs.items():
    steered_coeffs_dft[k] = np.fft.fftshift(np.abs(np.fft.fft2(v,
                                                s=pyr.pyr_coeffs[(k[0], 0)].shape)))
[22]:
pt.animshow(np.array([steered_coeffs[(0, i)] for i in range(num_orientations_steered)]),
                   framerate=12, repeat=True)
[22]:
[23]:
pt.animshow(np.array([steered_coeffs_dft[(0, i)] for i in range(num_orientations_steered)]),
            framerate=12, repeat=True)
[23]:
[24]:
pt.animshow(np.array([steered_coeffs[(1, i)] for i in range(num_orientations_steered)]),
            framerate=12, zoom=2, repeat=True)
[24]:
[25]:
pt.animshow(np.array([steered_coeffs_dft[(1, i)] for i in range(num_orientations_steered)]),
            framerate=12, zoom=2, repeat=True)
[25]:
[ ]:

[26]:
# the steering weighting vectors
[27]:
plt.figure()
for i in range(pyr.num_orientations):
    plt.plot(np.vstack([steering_vectors[(0, j)] for j in range(num_orientations_steered)])[:, i])
plt.show()
_images/tutorials_03_steerable_pyramids_38_0.png
[28]:
# expected harmonics (up to sapling error)
i = np.random.choice(pyr.num_orientations)
sp = np.fft.fftshift(np.fft.fft(np.vstack([steering_vectors[(1, j)] for j in range(num_orientations_steered)])[:, i]))
plt.stem(np.abs(sp)[sp.shape[0] // 2 : sp.shape[0] // 2 + pyr.num_orientations * 2]);
_images/tutorials_03_steerable_pyramids_39_0.png
[ ]:

Complex valued Steerable Pyramid

as described in: A parametric texture model based on joint statistics of complex wavelet coefficients J Portilla and E P Simoncelli. Int’l Journal of Computer Vision, vol.40(1), pp. 49–71, Dec 2000 http://www.cns.nyu.edu/pub/eero/portilla99-reprint.pdf

[29]:
pyr = pt.pyramids.SteerablePyramidFreq(im, order=2, is_complex=True)
# twice as many orientations
# filters organized in quadrature pair
# note that this construction is not steerable
#note need additional argument pyr.is_complex to pyrshow because default is False
pt.pyrshow(pyr.pyr_coeffs, pyr.is_complex);
_images/tutorials_03_steerable_pyramids_42_0.png