In this work we present the wavScalogram R package, which contains methods based on wavelet scalograms for time series analysis. These methods are related to two main wavelet tools: the windowed scalogram difference and the scale index. The windowed scalogram difference compares two time series, identifying if their scalograms follow similar patterns at different scales and times, and it is thus a useful complement to other comparison tools such as the squared wavelet coherence. On the other hand, the scale index provides a numerical estimation of the degree of non-periodicity of a time series and it is widely used in many scientific areas.
Since the works of Mallat (2008) and Daubechies (1992), wavelet analysis has become, in the last few decades, a standard tool in the field of time series analysis. Its ability to simultaneously analyze a signal in frequency space (scales) and in time, allows it to overcome many of the limitations that Fourier analysis presents for non-stationary time series. Furthermore, the algorithms for calculating the different wavelet transforms are characterized by their speed and ease of implementation.
There are currently many software packages that implement functions for wavelet analysis of time series (MATLAB’s Wavelet Toolbox, Wavelab, etc.), and in recent years, the exponential growth of the R ecosystem has not been outside the field of wavelet analysis. Within CRAN there are many packages related to wavelet analysis for time series. Specifically, as collected in the TimeSeries Task View, the wavelets package (Aldrich 2020) , the WaveletComp and biwavelet packages (Roesch and Schmidbauer 2018; Gouhier et al. 2021) , the mvLSW package (Taylor et al. 2019) and other packages such as hwwntest (Savchev and Nason 2018), rwt (Roebuck and Rice University’s DSP group 2022), waveslim (Whitcher 2020) and wavethresh (Nason 2022).
In this work we will describe in depth the wavScalogram package (Bolós and Benı́tez 2021) (also mentioned in the TimeSeries Task View). In this package, methods based on the wavelet scalogram are introduced as defined in Benı́tez et al. (2010; Bolós et al. 2017, 2020). These methods are basically related to two main wavelet tools: the windowed scalogram difference and the scale index. The first one, the windowed scalogram difference, was introduced in Bolós et al. (2017). It allows to compare two time series at different scales and times, determining if their scalograms follow similar patterns. In this sense, it is a complement to other wavelet tools for comparing time series such as the squared wavelet coherence and the phase difference, since there are certain differences in time series that these measurements are not capable of detecting while the windowed scalogram difference can. The second tool is the scale index introduced in Benı́tez et al. (2010). It focuses on the analysis of the non-periodicity of a signal, giving a numerical measure of its degree of non-periodicity, taking the value \(0\) if the signal is periodic and a value close to \(1\) if the signal is totally aperiodic (for example, a purely stochastic signal). The scale index has been used in many scientific areas, being the evaluation of the quality of pseudo-random number generators the area where it has been used the most. In addition, the scale index also has a “windowed” version, in which the windowed scalogram is used to calculate the scale index instead, allowing to measure the evolution of the scale index over time, which is useful in the case of non-stationary time series (see Bolós et al. (2020)).
The article is organized as follows: In the next section, we describe the basics of the wavelet analysis and how to use them in the wavScalogram package. Then, a description of the wavelet scalogram and its implementation is given. The following sections are devoted to the windowed scalogram difference and the scale index, in its original and windowed versions. Finally we illustrate the use of the package with some examples in applied problems, such as the analysis of time series of sunspots or the use of the windowed scalogram difference in the clustering of time series, particularly the interest rate series of sovereign bonds.
A wavelet (or mother wavelet) is a function \(\psi \in L^2\left( \mathbb{R}\right)\) with zero average (i.e. \(\int _{\mathbb{R}} \psi =0\)), unit energy (\(\| \psi \| =1\), i.e. normalized) and centered in the neighborhood of \(t=0\) (Mallat 2008). There exists a wide variety of wavelets but in this package we use the following, described in Torrence and Compo (1998) (see Figure 1):
Moreover, we have added:
Scaling a wavelet \(\psi\) by \(s>0\) and translating it by \(u\), we create a family of unit energy “time-frequency atoms”, called daughter wavelets, \(\psi _{u,s}\), as follows \[\label{eq:psius} \psi_{u,s}(t)=\frac{1}{\sqrt{s}}\psi \left( \frac{t-u}{s}\right) . \tag{1}\]
Remark 1 (Fourier factor). Usually, the Fourier wavelength of a daughter wavelet does not coincide with its scale \(s\). Nevertheless, they are proportional, and this proportionality factor for converting scales into Fourier periods is called Fourier factor. This Fourier factor is taken \(4\pi/\left( \omega _0+\sqrt{2+\omega_0^2}\right)\), \(4\pi/\left( 2m+1\right)\) and \(2\pi / \sqrt{m+1/2}\) for Morlet, Paul and DoG wavelets respectively (Torrence and Compo 1998). For the default parameter values, the Fourier factor is approximately \(1.033\), \(1.3963\), and \(3.9738\) respectively. For Haar wavelet, the Fourier factor is \(1\).
Given a function \(f\in L^2\left( \mathbb{R}\right)\), that we will identify with a signal or time series, the continuous wavelet transform (CWT) of \(f\) at time \(u\) and scale \(s>0\) is defined as \[\label{eq:cwt} \mathcal{W}f\left( u,s\right) =\int _{-\infty}^{+\infty}f(t)\psi_{u,s}^* (t) \, \textrm{d}t, \tag{2}\] where \(^*\) denotes the complex conjugate. The CWT allows us to obtain the frequency components (or details) of \(f\) corresponding to scale \(s\) and time location \(u\).
In practical situations, however, it is common to deal with finite signals. That is, given a time signal \(x\), and a finite time interval \([0,T]\), we shall consider the finite sequence \(x_n=x\left(t_n\right)\), for \(n=0,\ldots,N\). Here, \(t_0,\ldots,t_{N}\) is a discretization of the interval \([0,T]\), i.e. \(t_n = nh\), being \(h = T/N\) the time step. According to (1) and (2), the CWT of \(x\) at scale \(s>0\) is defined as the sequence \[\label{eq:wxns} \mathcal{W}x_n(s)=h\sum _{i=0}^{N} x_i \psi _{n,s}^*\left( t_i\right) , \tag{3}\] where \(n=0,\ldots,N\) and \[\label{eq:psins} \psi_{n,s}(t)=\frac{1}{\sqrt{s}}\psi \left( \frac{t-t_n}{s}\right) . \tag{4}\] Note that \(\psi_{n,s}(t)\) is in fact \(\psi_{t_n,s}(t)\), but this abuse of notation between (1) and (4) is assumed for the sake of readability. Using Fourier transform tools, one can calculate (3) for all \(n=0,\ldots N\) simultaneously and efficiently (Torrence and Compo 1998).
Remark 2 (Energy density). It is known that the CWT coefficients
are biased in favour of large scales (Liu et al. 2007). Nevertheless, if the
mother and daughter wavelets are normalized by the \(L^1\)-norm (as in the
Rwave package, by Carmona and Torresani (2021))
instead of the \(L^2\)-norm (as in our package), this bias is not produced
. Hence, to rectify the bias, the CWT in (2) and
(3) can be multiplied by the factor \(\frac{1}{\sqrt{s}}\).
This rectification will be specially useful in some wavelet tools of our
package that quantify the “energy density” of a signal, such as the
wavelet power spectrum, the scalograms and the windowed scalogram
difference. On the other hand, in the case of the scale index, this
correction will not be advisable (see Remark 7). Usually,
the wavelet tools of our package have a logical parameter called
energy_density
that switches this correction.
For computing the CWT of a time series \(x\) at a given set of scales, we
use cwt_wst
. For example,
# install.packages("wavScalogram")
library(wavScalogram)
<- 0.1
h <- 1000
N <- seq(from = 0, to = N * h, by = h)
time <- sin(pi * time)
signal <- seq(from = 0.5, to = 4, by = 0.05)
scales <- cwt_wst(signal = signal, dt = h,
cwt scales = scales, powerscales = FALSE,
wname = "DOG", wparam = 6)
computes the CWT of signal
at scales
from \(s_a=0.5\) to \(s_b=4\) using
DoG wavelet with \(m=6\). The parameter wname
indicates the wavelet
used, and it can be "MORLET"
(default value), "PAUL"
, "DOG"
,
"HAAR"
or "HAAR2"
. The difference between these two last values is
that "HAAR2"
provides a more accurate but slower algorithm than the
one provided by "HAAR"
. Moreover, we can specify by means of wparam
the value of the parameters \(\omega_0\) or \(m\). As it has been stated
before, the default values of these parameters are \(\omega_0=6\) for
Morlet wavelet, \(m=4\) for Paul wavelet and \(m=2\) for DoG wavelet.
If the set of scales is a base \(2\) power scales set (Torrence and Compo 1998), the
parameter scales
can be a vector of three elements with the lowest
scale \(s_a\), the highest scale \(s_b\) and the number of suboctaves per
octave. This vector is internally passed to function pow2scales
that
returns the constructed base \(2\) power scales set. For example,
<- c(0.5, 4, 16)
scales <- cwt_wst(signal = signal, dt = h, scales = scales, powerscales = TRUE) cwt
computes the CWT of signal
at scales from \(s_a=0.5\) to \(s_b=4\), with
\(16\) suboctaves per octave. Since parameter powerscales
is TRUE
by
default, it is not necessary to specify it in the function call. If
scales = NULL
(default value), then the function constructs the scales
set automatically: \(s_a\) is chosen so that its equivalent Fourier period
is \(2h\) (Torrence and Compo 1998), and \(s_b=Nh/2r_w\), where \(r_w\) is the corresponding
wavelet radius. Note that in this case \(s_b\) maximizes the length of
the scales interval taking into account the cone of influence. The
wavelet radius and the cone of influence are defined and discussed in
Remark 4.
The output cwt
is a list containing the following fields:
coefs
is an \((N+1)\times\)length(scales)
array (either real or
complex depending on the wavelet used) containing the corresponding
CWT coefficients, i.e. cwt$coefs[i,j]
is the CWT coefficient at
the i-th time and j-th scale.scales
is the vector of scales used, either provided by the user
or constructed by the function itself.fourier_factor
is the scalar used to transform scales into Fourier
periods (see Remark 1).coi_maxscale
is a numeric vector of size \(N+1\) that defines the
cone of influence (see Remark 4).Remark 3 (Border effects). In (3) (or (2) for finite length signals) there appear border effects (or edge effects) when the support of the daughter wavelets is not entirely contained in the time domain \(\left[ t_0,t_N\right]\). In order to try to mitigate border effects, we can construct from the original time series \(x\) an infinite time series \(\bar{x}\) on \(t_i=t_0+ih\) for \(i\in \mathbb{Z}\) and then we define \[\label{eq:wbarxns} \bar{\mathcal{W}}x_n(s)=\mathcal{W}\bar{x}_n(s)=h\sum _{i\in \mathbb{Z}} \bar{x}_i \psi _{n,s}^*\left( t_i\right) , \tag{5}\] where \(n=0,\ldots ,N\). The most usual ways to construct \(\bar{x}\) are the following:
Depending on the nature of \(x\), it may be preferable to use one construction or another for minimizing the undesirable border effects (see Figure 2). For example, a periodization is advised for stationary short time series, and symmetric catenation for non-stationary short time series. On the other hand, for long time series, border effects are less important and then we can just pad with zeroes, i.e. use the original CWT given in (3).
How the border effects are treated by function cwt_wst
is determined
via the border_effects
parameter. Possible values for this parameter
are "BE"
(raw border effects, padding with zeroes), "PER"
(periodization) and "SYM"
(symmetric catenation), corresponding to the
three options described above.
Remark 4 (Cone of influence). The cone of influence (CoI) is defined
by the scales for which border effects become important at each time.
The field coi_maxscale
of the output in cwt_wst
function contains,
for each time, the maximum scale at which border effects are negligible,
and consequently determines the CoI.
In order to compute the CoI, we have to set a criterion to distinguish between relevant and negligible border effects. In Torrence and Compo (1998), the CoI is defined by the \(e\)-folding time for the autocorrelation of wavelet power spectrum (see the next section) at each scale \(s\), and this \(e\)-folding time is chosen so that the wavelet power spectrum for a discontinuity at the edge drops by a factor \(e^{-2}\). For Morlet and DoG wavelets, this \(e\)-folding time is \(\sqrt{2}s\), and for Paul wavelets is \(s/\sqrt{2}\).
For wavelets with symmetric modulus such as Morlet, Paul and DoG, the \(e\)-folding time at \(s=1\) is interpreted as a wavelet radius \(r_w\) that defines an effective support \(\left[ -r_w,r_w\right]\) for the mother wavelet. Therefore, the CoI is given by the scales from which the corresponding effective supports of the daughter wavelets \(\left[ u-sr_w,u+sr_w\right]\) are not entirely contained in the time domain.
The wavelet radius \(r_w\) determines the CoI in the different functions
of this package by means of the parameter waverad
. If it is NULL
(default value) we consider \(r_w=\sqrt{2}\) for Morlet and DoG wavelets,
and \(r_w=1/\sqrt{2}\) for Paul wavelets, following Torrence and Compo (1998). On the other
hand, we take \(r_w=0.5\) for Haar wavelet, i.e. we assume that its
effective support is in fact its support. Nevertheless we can introduce
a custom \(r_w\) for any wavelet, allowing us in this way to adjust the
importance of border effects in the construction of the CoI. For
example,
<- cwt_wst(signal = signal, dt = h,
cwt wname = "DOG", wparam = 6, waverad = 2)
computes the CWT coefficients of signal
for DoG wavelet with \(m=6\).
Here, cwt$coi_maxscale
is obtained assuming that the wavelet radius is
\(r_w=2\). Note that the value of waverad
does not affect the
computation of the CWT coefficients.
The wavelet power spectrum of a signal \(f\in L^2\left( \mathbb{R}\right)\) at time \(u\) and scale \(s>0\) is defined as \[\label{eq:wpsfus} \mathcal{WPS}f\left( u,s\right) = |\mathcal{W}f\left( u,s\right) |^2. \tag{6}\] Analogously to (6), the wavelet power spectrum of a time series \(x\) at scale \(s>0\) is given by the sequence \[\label{eq:wpsxns} \mathcal{WPS}x_n(s) = |\mathcal{W}x_n(s) |^2, \tag{7}\] where \(n=0,\ldots,N\).
We can plot the wavelet power spectrum of a time series \(x\) at a given
set of scales through function cwt_wst
if the parameter makefigure
is TRUE
(default value). There are other parameters regarding this
plot:
time_values
is a vector that provides customized values in the
time axis.energy_density
is a logical parameter. If it is TRUE
, it is
plotted the wavelet power spectrum divided by the scales, according
to Remark 2. By default, it is FALSE
.figureperiod
is a logical parameter that indicates if they are
represented periods or scales in the \(y\)-axis (see Remark
1). By default, it is TRUE
.xlab, ylab, main, zlim
are parameters to customize the figure.For example,
<- 1 / 12
h <- seq(from = 1920, to = 1970 - h, by = h)
time1 <- seq(from = 1970, to = 2020, by = h)
time2 <- c(sin(pi * time1), sin(pi * time2 / 2))
signal <- cwt_wst(signal = signal, dt = h, time_values = c(time1, time2))
cwt_a <- cwt_wst(signal = signal, dt = h, time_values = c(time1, time2),
cwt_b energy_density = TRUE)
plots Figure 3 (a) and (b) respectively. In this figure it
is shown how the wavelet power spectrum is biased in favour of large
scales, as it is pointed out in Remark 2. The parameter
energy_density
corrects it and so, values for different scales become
comparable. Note that energy_density
only affects the plot and hence,
cwt_a
is identical to cwt_b
.
The scalogram of \(f\) at scale \(s\) is defined as \[\label{eq:sfs} \mathcal{S}f(s)= \left( \int _{-\infty } ^{+\infty } | \mathcal{W}f\left( u,s\right) | ^2 \, \textrm{d}u \right) ^{1/2} . \tag{8}\] It gives the contribution of each scale to the total “energy” of the signal and so, the notion of scalogram here is analogous to the spectrum of the Fourier transform. It is important to note that the term “scalogram” is often used to refer the wavelet power spectrum, but in this package, we call “scalogram” to (8).
If \(f\) is a finite length signal with time domain \(I=\left[ a,b\right]\), it is usual to consider a normalized version of the scalogram for comparison purposes, given by \[\label{eq:sfsn} \mathcal{S}f(s)= \left( \frac{1}{b-a}\int _{a}^{b} | \mathcal{W}f\left( u,s\right) | ^2 \, \textrm{d}u \right) ^{1/2} . \tag{9}\] Hence, according to (9), the (normalized) scalogram of \(x\) at scale \(s\) is given by \[\label{eq:sxsn} \mathcal{S}x(s) = \left( \frac{1}{N+1}\sum _{i=0}^{N} | \mathcal{W}x_i(s) | ^2 \right) ^{1/2}. \tag{10}\] The normalization coefficient \(1/(N+1)\) in (10) allows us to compare scalograms of time series with different lengths.
We can compute the (normalized) scalograms of a time series \(x\) at a
given set of scales by means of function scalogram
. This function
follows the same rules as cwt_wst
regarding the data entry,
construction of scales, choice of the wavelet, border effects and
application of Fourier factor. So, parameters signal
, dt
, scales
,
powerscales
, wname
, wparam
, waverad
, border_effects
,
makefigure
, figureperiod
, xlab
, ylab
and main
are analogous to
those in function cwt_wst
with same default values. On the other hand,
parameter energy_density
is TRUE
by default. For example,
<- scalogram(signal = signal, dt = h,
sc_a energy_density = FALSE)
computes the (normalized) scalogram of signal
given by (10)
at a base \(2\) power scales set constructed automatically and plots
Figure 4 (a). The output sc_a
is a list with the following
fields:
scalog
is a vector of length length(scales)
with the values of
the (normalized) scalogram at each scale.energy
is the total energy of scalog
(i.e. its \(L^2\)-norm) if
parameter energy_density
is TRUE
.scales
and fourier_factor
are analogous to those in the output
of function cwt_wst
.