Spain 10.08.2022
Constructing the Scattering Transform
Author: Javier Minguillón Sánchez
1 Introduction
As Mallat presents it in [8], the scattering transform appears from the pursuit of a representation of an , , function such that the representations of two similar functions are close in the representation space, say .
What do we mean by similar? Well, one relevant sense is comparing images, such as images of handwritten digits. We want the structure of to bring together the handwritten representations every single digit 0 through 9, while simultaneously keeping different digits far from each other.
Choosing with the usual norm cannot possibly serve this purpose.
Take a look at Figure 1. The first and second columns represent the digit 1 as a positive piecewise constant function on a compact subset of . In both rows, the third column represents the absolute value of the difference of the first two functions in that row. If the norm of the top-left function is , the norms of the top-right and bottom-right functions are and respectively.
Figure 1. Digit 1 from MNIST [6]. These are grayscale images: the value corresponds to black pixels and the value , to white pixels. There are shades of gray in between.
Here we focus on functions because they are a way of modelling (grayscale) pictures. All the below discussion has a much deeper general side to it that can be seen in [8]. There the scattering transform operator is developed for functions on and an infinite number of parametrers in the scattering transform. This is far from our scope.
The scattering transform here is going to be an operator . Here is the space of representations of images. Regarding classification purposes, the idea is that two images that belong to the same category have to be sent to : two representations that are closer together in the metric space whenever the categories of coincide and comparatively further apart whenever the categories of differ.
Ideally, the scattering transform would be translation invariant, rotation invariant,
continuous to deformations and would keep high frequency information in order to be able to discriminate functions associated to signals. Examples of such signals are soundwaves in the case of or images in the case of
We do not discuss any theoretical aspects of the rotation invariant representation that Mallat finds in [8], as they are rather sophisticated and beyond the purpose of our exposition. In practice the rotation invariant operator is not used
(see [1]). If necessary, the dataset is augmentated with rotated copies of the original elements.
2 A Few Notations and Definitions
We need to introduce a few notations first, followed by two definitions. From the perspective of image classification Definition 1 is a way to model deformations of an image, using diffeomorphisms. Definition 2 says when a criterion of classification (an operator) is stable in spite of deformations within images.
Let and . The space of integrable complex valued functions is . The norm
denotes the usual -norm of .
Given , we denote its euclidean norm by . The usual scalar product of is .
The norm of a linear map , where are normed vector spaces is
. The space of linear maps from to is denoted by
The \bf Fourier transform of is
and the definition extends in the usual way to all by densirty. The \bf spectral energy of on is given by
Let be a function. We denote the supremum of the norms of the differential linear maps by
Likewise, the supremum of the norms of the Hessian tensors is denoted by
and the supremum of the variation of the function is
We now proceed to define the action of a diffeomorphism on a function and the Lipschitz continuity to diffeomorphisms.
Definition 1
Let . The action of on a function is given by
Whenever is a constant function, the action is equivalent to the composition with a mere translation. In this case, we denote as
We also call the deformation of by the field .
This is our way of modelling the tiny variations on an image. If we think of handwritten digits, for example, the digit ‘1’ takes many shapes that are continuous transformations of one another. See for example Figure 2.
Figure 2. A sample of a ‘1’ from MNIST and a deformation of it by a continuous field.
Definition 2
Let and let be a normed space. Let be an operator
and be a -diffeomorphism. We say the operator is Lipschitz continuous to (the action of) , if
for some not depending on of
Take an operator between function spaces. Given , we say the operator is translation covariant if
and we say it is translation invariant if
Let us make another comment on Definition 2. One possible choice of is a hilbert space of functions from to , where the inner product of is given by
while the associated norm is
3 Wavelets and Low-pass Filters
First of all, we need to define the filters that are used in the transformation. These are the basic components that we convolve with an We define these filters first and then we give some relevant examples and explain what their convolution with produces.
Definition 3
A function is a wavelet if
The above is also known a 2D-wavelet in the literature (see [2]).
Definition 4
Let we say it is a low-pass filter if
Let be a wavelet and a low-pass filter respectively. Fix a positive integer and, for each , denote by , the counterclockwise rotation of the plane by an angle of .
Fix For , and , we define the -directional child wavelets of as
and the low-pass filter for frequencies below is
Observe that the respective Fourier transforms are
and
Suppose for a moment that , the unit ball of Since , we see from the above that removes
high frequencies: larger than This justifies the name of the low-pass filter.
Common choices for are rescalings of the following functions:
• A complex Morlet wavelet (See Figures 3, 4) with parameter ,
with such that the integral of the above is equal to 0 and such that the squared modulus of the above has integral equal to 1. Both constants depend on
• A partial derivative wavelet, which is a partial derivative of the Gaussian function
A common choice for is a rescaling of the Gaussian function (see Figure 5)
with
chosen for to have integral equal to 1. It is a well known fact that the function is its own Fourier transform. Therefore, the above choice of gives us . Thus, the filter reduces to almost 0 all frequencies above some multiple of
The purpose of , , ; is to be convolved with a function that represents an image (in black and white). These filters extract certain features from the image. For example, the position and orientation of edges between light and dark parts of an image.
Figure 3. Real part of the Morlet wavelet from (15) with .
Figure 4. Imaginary part of the Morlet wavelet from (15) with .
Figure 5. The Gaussian function from (17)
4 Wavelet transform
Let us define the wavelet transform, which is a key component of the scattering trasnform.
Definition 5
Let and consider a wavelet . Let and Fix a rotation The continuous wavelet transform or simply wavelet trasnform of scale and rotation is given by
The Wavelet transform is defined as a sequence of the above
There are many variants of the wavelet transform of 2D-wavelets.
The version of wavelet transform that we use here is known as Littlewood Payley wavelet transform (see [8]). In general (see [2]), the wavelet transform can be defined depending on a scale parameter ,
a displacement parameter
and an angle of rotation .
In the above definition we consider several scales we take ; and we pick in the set of angles corresponding to the rotations in .
The wavelet transform operator also has the following property for a certain choice of the wavelets we introduced above (see [8] or [5]),
for all where the space of sequences where takes values is , and the norm of this space is
Since is linear, the above also means that it is a weakly contractive operator from to
5 The Fourier Domain Perspective
There is a simplified interpretation of how acts on a function Notice that the Fourier transform of the convolutions in the wavelet transform gives
and
The wavelet trasform extracts certain information out of the Fourier transform of Look at the above equation. If we think of as compactly supported on a ball not centered at the origin, the scale and the angle determine which part of the Fourier domain of is preserved by the transform with those parameters. See Figure 6.
The reader may be wondering why it makes any sense to think of the filters as compactly supported. The low-pass filter is simpler. If we choose as a Gaussian, then it is certainly not compactly supported. Nonetheless, the vast majority of its mass is concentrated on a ball centered at the origin. And, outside of this ball, all frequencies are reduced to almost zero.
To understand the rest of the filters, consider a simpler version of Morlet wavelets, called Gabor functions
then,
a translated Gaussian. This explains the interpretation on Figure 6.
In practice, if we consider Morlet wavelets from (15), we have that the larger is (or the smaller is), the closer to 0 the constant is. The above interpretation using equation (24) is an approximate one.
Figure 6. The filters on the frequency domain. In blue, the low-pass filter with . In green the filters In red, the filters , }
6 Construction of the Scattering Transform
Let . This function represents an image in black and white. Eventually, it needs to be changed to a function with compact (and discrete) support. In this section we see how the Scattering Transform operator is constructed and the aspect of a transformed function has.
We need some auxiliar transformations first. The first transformation is given by a simple averaging of each value of with the surrounding values: the convolution
Here preserves the frequencies and removes the rest. See Figure 7 for an example.
Figure 7. On the left, an example of representing a grayscale image of a human hand. Extracted from [9]. On the right, the with and a Gaussian filter.
After that, we analyze higher freqencies by computing wavelet coefficients. We fix
and find
Let us make a comment on the bounds of the parameter The bound is part of the definition, but the choice of is related to the image processing perspective. Any scale lower than on a discrite function (an image) would not yield new information (it would be a scale smaller than a pixel).
The wavelet transforms in (27) commute with translations, in the sense that, given ,
This property of translation covariance comes solely from convolution. But this property also implies that the elements in (27) are not translation invariant. In other words, the relation , does not hold in general for arbitrary and .
We want a translation invariant representation of but is not, as (28) makes clear. We can obtain a translation invariant operator
but it is equal to . Indeed, applying Fubini, it is
and the inner integral is for all because of our definition of wavelet.
We can obtain a translation invariant operator from the integral by composing a non-linear function with the integrand as
There are several possible non-linearities. But the chosen one is the complex modulus.
There are two reasons for this.
Firstly, we want . Secondly, we want contractiveness, that is, These properties are key to preserve the properties of boundedness and contractiveness that the Wavelet Transform already has. One need only look at (20) and (21) to see that a composition with the modulus preserve the aforementioned properties.
Setting we have a quantity that is invariant to translations of given by the integral
for each and This is not satisfactory yet. Yes, the above positive quantities in (33) are translation invariant with respect to ; and they are also stable to deformations because and the integrand is stable to deformations [8]. However, the averaging over all loses too much information [8]. We do not settle for .
Suppose that, instead of taking an average over , we take averages over local sets (for the sake of fixing ideas, think about balls of radius proportional to ). We would end up with functions
The above functions happen to be Lipschitz continuous to deformations (see Theorem 6).
They also turn out to be locally translation invariant (also thanks to Theorem 6) for a translation of a scale below .
The above integrals constitute the second auxiliary transform given by
See Figure 8 for an example.
Figure 8. Components of for and scale , with rotations of angles respectively. The low-pass filter is a gaussian and the rest are Morlet wavelets. This output is normalized: white pixels respresent the largest value on any single image and black pixels represent the value 0. There are 255 shades of gray in total.
Figure 9. The third image in Figure 8 The component , with , of angle Here is small compared to the support of . This makes the Morlet wavelet detect oriented edges according to how the rotation affects in (15). For this particular , the value of horizontal edges is visibly the largest.
Now, is a low-pass filter and we are removing high frequencies of euclidean norm larger than (and smaller than ) from . In order to take into account some of those frequencies, we convolve with another wavelet: we perform the wavelet transform of
We obtain, for some such that ,
There is a reason for the choice of the parameter as an given by
See Figure 10 for an example.
Figure 10. Output of for , , , of angle and, from left to right, of angles . The output is normalized: white is the maximum value and black is 0.
Let . The general -th auxiliar transform is an iteration of the above given by
Each choice of parameters , determines a path, that is, an ordered sequence of pairs , where . The number of pairs is the of the path
We finally arrive at the scattering transform of with path lengths lesser or equal than . It is given by
where the dimension of the space comes from the choice of parameters at each of the auxiliar transforms. Those are all the possible choices of scales in steps with rotations at each step.
For most practical classification purposes the parameter from (39) can be set to and it is enough. In fact, the package Kymatio [1] only implements scattering of images up to for this reason, which is explained in [4]. Parameters and are less straightforward to choose. Theorem 6 is helpful for guessing a good choice of . Regarding the parameter , it only needs to be somewhat high. A common choice would be Of course, instead of guessing, one can perform cross validation for some values of the parameters to decide as it is done in [5].
7 Lipschitz Continuity to Deformations
The following result tells us that the scattering transform as we have defined it is both Lipschitz continuous to diffeomorphisms and translation invariant whenever the maximum displacement is small with respect to the scale and the deformation . The following is a consequence of Corollary 1 in [8], a simpler version of it that is enough for our purposes.
Theorem 6
There exists such that, given with compact support, for all -diffeomorphism with and , it holds that
8 Quick Comparison with a Convolutional Neural Network
For those readers familiar with the basic structure of a Convolutional Neural Network (abbreviated CNN), let us take another look at (39). We can think of as an image, and , for ; as the filters. The filters remain unchanged from beggining to end here: in contrast to the filters in CNNs, that do change with training.
The complex modulus performs a pooling. It puts the real and imaginary parts together. This gives us , which is the output of the first layer, and the input of the second layer. In the second layer, we do another filtering with the , and yet another pooling with the complex modulus. We repeat this process until the final layer.
The low-pass filter produces another pooling. It is an average pooling over regions of size (proportional to) We apply to the output of the network. But there is a caveat. Unlike in CNNs, whose output is the output of the final layer, here we apply the pooling to the output of each layer and the input itself. This gives us the , which constitute the final output .
9 Plotting the Scattering transform
In order to have a broader intuition on how the output of the scattering transform looks, we include a folder (.zip) with a Python script (my\_scattering.py). The script generates two files (scattering\_slow.gif, scattering\_fast.gif) that contain the whole output that the above Figures 8, 9 and 10 represent only in part. It also generates an image (scattering.png)
There are six relevant variables with assigned values that we encourage the reader to vary. The following lines of code are among the first 40 lines of the script (.py) and they contain the definitions of the relevant variables. We encourage the reader to tinker with the values and compare the resulting outputs.
scale = 3
rotations = 8
resize = True
enhance_each = False
enhance_global = True
In fact, the reader can use a grayscale image of their choice istead of the example image of a digit 1 from MNIST [6]]. The only requirement is to put a grayscale image file on the same folder as the script (.py) and change the variable
filename = 'one.png'
to match the name of the file and the extension that correspond.
References
[1] Mathieu Andreux, Tomás Angles, Georgios Exarchakis, Roberto Leonarduzzi, Gaspar Rochette, Louis Thiry, John Zarka, Stéphane Mallat, Joakim Andén, Eugene Belilovsky, Joan Bruna, Vincent Lostanlen, Matthew J. Hirn, Edouard Oyallon, Sixin Zhang, Carmine Cella, and Michael Eickenberg. Kymatio: Scattering Transforms in Python. 2019. http://arxiv.org/pdf/1812.11214v2[2] Jean P. Antoine and Romain Murenzi. Two-dimensional directional wavelets and the scale-angle representation. Signal Processing, 52(3):259281, 1996. https://www.sciencedirect.com/science/article/pii/0165168496000655
[3] Alexandre Bernardino and José Santos-Victor. A real-time gabor primal sketch for visual attention. In David Hutchison et al. , Pattern Recognition and Image Analysis, volume 3522 of Lecture Notes in Computer Science, 335342. Springer Berlin Heidelberg, Berlin, Heidelberg, 2005. https://www.researchgate.net/publication/221258995_A_Real-Time_Gabor_Primal_Sketch_for_Visual_Attention
[4] Joan Bruna and Stéphane Mallat. Classication with scattering operators. 2013. https://arxiv.org/pdf/1011.3023
[5] Joan Bruna and Stéphane Mallat. Invariant scattering convolution networks. 2012. https://arxiv.org/pdf/1203.1513
[6] Li Deng. The MNIST Database of Handwritten Digit Images for Machine Learning Research IEEE Signal Processing Magazine, 29(6):141142, 2012. https://ieeexplore.ieee.org/document/6296535
[7] David Hutchison, Takeo Kanade, Josef Kittler, Jon M. Kleinberg, Friedemann Mattern, John C. Mitchell, Moni Naor, Oscar Nierstrasz, C. Pandu Rangan, Bernhard Steen, Madhu Sudan, Demetri Terzopoulos, Dough Tygar, Moshe Y. Vardi, Gerhard Weikum, Jorge S. Marques, Nicolás La Pérez de Blanca, and Pedro Pina, editors. Pattern Recognition and Image Analysis, volume 3522 of Lecture Notes in Computer Science. Springer Berlin Heidelberg, Berlin, Heidelberg, 2005. https://link.springer.com/book/10.1007/b136825
[8] Stéphane Mallat. Group Invariant Scattering. 2012. https://arxiv.org/pdf/1101.2286
[9] Shyambhu Mukherjee. Dataset: Hands and Palm Images Dataset, Version 2, March 2021. https://www.kaggle.com/shyambhu/hands-and-palm-images-dataset/metadata