This paper introduces package ConvergenceClubs, which implements functions to perform the Phillips and Sul (2007, 2009) club convergence clustering procedure in a simple and reproducible manner. The approach proposed by Phillips and Sul to analyse the convergence patterns of groups of economies is formulated as a nonlinear time varying factor model that allows for different time paths as well as individual heterogeneity. Unlike other approaches in which economies are grouped a priori, it also allows the endogenous determination of convergence clubs. The algorithm, usage, and implementation details are discussed.
Economic convergence refers to the idea that per–capita incomes of poorer economies will tend to grow at faster rates than those of richer economies. The issue has been widely investigated in economic literature since the classical contributions on economic growth and development (Solow 1956; Myrdal 1957). In addition to the traditional concepts of beta and sigma convergence, an increasing amount of literature has recently emerged on the concept of club convergence. This notion was originally introduced by Baumol (1986) to describe convergence among a subset of national economies and it has quickly spread also at the regional level. Several contributions have tried to empirically investigate the topic proposing different methodologies. For example, Quah (1996) developed a Markov chain model with probability transitions to estimate the evolution of income distribution. Le Gallo and Dall’Erba (2005) proposed a spatial approach to detect convergence clubs using the Getis–Ord statistic. Corrado et al. (2005) introduced a multivariate stationarity test in order to endogenously identify regional club clustering.
More recently, Phillips and Sul (2007, 2009) proposed a
time-varying factor model that allows for individual and transitional
heterogeneity to identify convergence clubs. Due to its positive
attributes, this methodology has become predominant in the analysis of
the convergence patterns of economies. In fact, it has several
advantages. First, it allows for different time paths as well as
individual heterogeneity, therefore, different transitional paths are
possible
As for existing routines, Phillips and Sul (2007, 2009) provided Gauss (Aptech Systems 2016) code used in their empirical studies. Schnurbus et al. (2017) provided a set of R functions to replicate the key results of Phillips and Sul (2009), while Du (2018) developed a full Stata (StataCorp 2017) package to perform the club convergence algorithm. A dedicated R package for this methodology has been missing. The ConvergenceClubs (Sichera and Pizzuto 2019) package fills this gap, since it allows to carry out the Phillips and Sul’s methodology in a simple and reproducible fashion, allowing for easy definition of the parameters. Moreover, our package also implements the alternative club merging algorithm developed by Lyncker and Thoennessen (2017).
The remainder of the paper is organised as follows. First, the club convergence methodology is presented. Then, the main features of the package are listed and described. Finally, an example based on Phillips and Sul (2009) data is provided.
The approach proposed by Phillips and Sul is based on a modification of
the conventional panel data decomposition of the variable of interest.
In fact, panel data
where
In order to test if different economies converge, a key role is played
by the estimation of
In presence of convergence, there should be a common limit in the
transition path of each economy and the coefficient
In order to construct a formal statistical test for convergence,
Phillips and Sul (2007, 2009) assume the following
semi–parametric specification of
More formally, to test the presence of convergence among different
economies, Phillips and Sul (2007, 2009) suggest to estimate the
following equation model through the ordinary least squares method:
When the log t–test is rejected for the whole sample, the test procedure should be repeated according to the following clustering mechanism:
(Cross–section last observation ordering): Sort units in descending order according to the last panel observation of the period;
(Core group formation): Run the log–t regression for the first k
units (
(Sieve the data for club membership): After the core group
(Recursion and stopping rule): If there are units for which the
previous condition fails, gather all these units in one group and
run the log–t test to see if the condition
Phillips and Sul (2007) suggest to make sure
Due to the fact that the number of identified clubs strongly depends on
the core group formation, a key role is played by the critical value
However, as the same authors suggest, a high value of
Take the first two groups detected in the basic clustering mechanism
and run the log–t test. If the t statistic is larger than
Repeat the test adding the next group and continue until the basic
condition (t statistic
If the convergence hypothesis is rejected, conclude that all previous groups converge, except the last one. Hence, start again the merging algorithm beginning from the group for which the hypothesis of convergence did not hold.
In our package we also provide the implementation in R of an alternative club merging algorithm developed by Lyncker and Thoennessen (2017). They introduce two innovations in the club merging algorithm by Phillips and Sul. First, they add a further condition to the club clustering algorithm to avoid mistakes in merging procedures in the case of transition across clubs. Second, they propose an algorithm for diverging units. The first algorithm works as follows:
Take all the
Merge for adjacent groups starting from the first, under the
conditions
For the last element of vector M (the value of the last two clubs)
the only condition required for merging is
For the second algorithm, Lyncker and Thoennessen (2017) claim that units identified as divergent by the original clustering procedure by Phillips and Sul might not necessarily still diverge in the case of new convergence clubs detected with the club merging algorithm. To test if divergent units may be included in one of the new convergence clubs, they propose the following algorithm:
Run a log–t test for all diverging units; if
Run a log–t test for each diverging units and each club, creating a
matrix of t–statistic values with dimension
Take the highest t–value greater than a critical parameter
The algorithm stops when no t–value
ConvergenceClubs aims to make the clustering procedure described above easy to perform and simply reproducible.
The log–t test is performed by function estimateMod()
. It takes as
main input the vector of cross–sectional variances computeH()
:
# Compute cross-sectional variances
computeH(X, quantity = "H", id)
# Perform the log-t test
estimateMod(H, time_trim=1/3, HACmethod = c("FQSB", "AQSB"))
The former takes a matrix or data.frame object containing time series
data and returns either the vector of cross–sectional variances quantity
. These quantities can also be computed on a subset
of units by selecting the unit IDs through argument id
. Function
estimateMod()
takes two additional arguments, time_trim
and
HACmethod
, described later. These two functions are available for the
user who wants to test the convergence hypothesis on a set of units.
This is especially useful to assess the opportunity of carrying out the
clustering procedure during the initial phase of a study.
Nonetheless, the log–t test over the whole sample is automatically
performed before starting the clustering procedure by function
findClubs()
. This is the main function of the package, as it carries
out Phillips and Sul’s clustering algorithm:
findClubs(X, dataCols, unit_names = NULL,
refCol, time_trim = 1/3, cstar = 0, HACmethod = c("FQSB", "AQSB"))
where X
is a data frame containing the data, dataCols
is an integer
vector indicating the column indices of the time series data, and
unit_names
is an integer scalar, indicating the index of the column of
X
that includes id codes for the units (e.g. the name of the
countries/regions). The parameters of the clustering procedure are
regulated by the following arguments.
refCol
: takes an integer value representing the index of the
column to use for ordering data;time_trim
: accepts numeric scalars between 0 and 1, and indicates
the portion of time periods to trim when running the log–t
regression model. By default, time_trim=1/3
, which means that the
first third of the time series period is discarded, as suggested by
Phillips and Sul (2007, 2009);cstar
: takes a scalar indicating the threshold value of the sieve
criterion HACmethod
: accepts a character string indicating whether a Fixed
Quadratic Spectral Bandwidth (HACmethod="FQSB"
) or an Adaptive
Quadratic Spectral Bandwidth (HACmethod="AQSB"
) should be used for
the truncation of the Quadratic Spectral kernel in estimating the
log–t regression model with heteroskedasticity and autocorrelation
consistent standard errors. The default method is FQSB
.The clustering procedure is performed by iteratively calling two
internal functions: coreG()
and club()
, which implement steps 2 and
3 of Phillips and Sul clustering algorithm, respectively.
Function findClubs()
returns an object belonging to the S3 class
"convergence.clubs"
. Objects belonging to this class are lists that
include results about clubs and divergent units that have been detected
by the clustering procedure. Their structure can be analysed through
function str()
, and their elements can be accessed as commonly done
with list elements.
Information about clubs and divergent units can be easily displayed by
means of functions print()
and summary()
, for which the package
provides specific methods for class "convergence.clubs"
. A plot()
method is available for class "convergence.clubs"
, which provides a
way to visualise the transition paths of the units included in
convergence clubs, and also the average transition paths for each club:
plot(x, y = NULL, nrows = NULL, ncols = NULL, clubs, avgTP = TRUE, avgTP_clubs,
y_fixed = FALSE, legend = FALSE, save = FALSE, filename, path, width = 7,
height = 7, device = c("pdf", "png", "jpeg"), res, ...)
Plot customisation (i.e. clubs to be displayed or the number of rows and
columns of the graphical layout) and options to export it to a file are
discussed in more details in the package manual (Sichera and Pizzuto 2019).
Finally, the merging algorithms described in the previous section are
implemented in function mergeClubs()
:
mergeClubs(clubs, time_trim, mergeMethod = c("PS", "vLT"),
threshold = -1.65, mergeDivergent = FALSE, estar = -1.65)
Merging is performed on argument clubs
, an object of class
"convergence.clubs"
, by means of either the Phillips and Sul (2009) or the
Lyncker and Thoennessen (2017) algorithm, selected through argument
mergeMethod
. Through argument threshold
it is possible to change the
significance level of the log–t test for club merging. Moreover,
argument mergeDivergent
determines whether the test for diverging
units according to Lyncker and Thoennessen (2017) should be performed, while
argument estar
is used to set the value of the critical parameter
mergeClubs()
returns an object of class
"convergence.clubs"
as well, thus information about the new clubs can
be accessed and summarised as previously discussed.
A detailed example of all functionalities of the package is presented in the next section.
In this section we provide an example that replicates the results of
Phillips and Sul (2009). The dataset GDP
, available in package
ConvergenceClubs, covers a panel of 152 countries for the period
1970-2003.
First, we filter the data using the Hodrick-Prescott filter methodology
by means of function hpfilter
in package
mFilter (Balcilar 2018).
Filtered data are also available in the package through dataset
filteredGDP
.
### Load ConvergenceClubs package
library(ConvergenceClubs)
### Load GDP data
data("GDP")
### Filter data
logGDP <- log(GDP[,-1])
filteredGDP <- apply(logGDP, 1,
function(x){mFilter::hpfilter(x, freq=400, type="lambda")$trend} )
filteredGDP <- data.frame(Countries = GDP[,1], t(filteredGDP), stringsAsFactors=FALSE )
colnames(filteredGDP) <- colnames(GDP)
## Filtered data are available in the package
data(filteredGDP)
By using the estimateMod()
function we perform the log–t test over
the whole sample in order to verify whether all units converge.
### log-t test over all units
H <- computeH(filteredGDP[,-1], quantity = "H")
round(estimateMod(H, time_trim=1/3, HACmethod = "FQSB"), 3)
# beta std.err tvalue pvalue
# -0.875 0.005 -159.555 0.000
The null hypothesis of convergence is rejected at findClubs()
function. As for the arguments, we set:
unit_names=1
indicates that Countries’ IDs are represented in the
first column of the dataset;dataCols=2:35
indicates the columns (years) for which the test
should be performed;refCol=35
represents the final period according to which data
should be ordered (see step 1 of the clustering algorithm).time_trim=1/3
represents the portion of time periods to trim when
running the log–t regression model;cstar= 0
is the threshold value of the sieve criterion HACmethod = ’FQSB’
indicates that the Fixed Quadratic Spectral
Bandwidth is used for the truncation of the Quadratic Spectral
kernel in estimating the log–t regression model.### Cluster Countries using GDP from year 1970 to year 2003, with 2003 as reference year
clubs <- findClubs(filteredGDP, dataCols=2:35, unit_names = 1, refCol=35,
time_trim=1/3, cstar=0, HACmethod = 'FQSB')
class(clubs)
# [1] "convergence.clubs" "list"
As we can see, clubs
is an object of class "convergence.clubs"
, that
is a common list, whose structure can be displayed through function
str()
, and whose elements can be accessed as usual:
str(clubs, give.attr=FALSE)
A method for function summary()
is provided for class
"convergence.clubs"
. It produces a summary table with the key results
of the clustering procedure:
summary(clubs)
# Number of convergence clubs: 7
# Number of divergent units: 0
#
# | # of units | beta | std.err | tvalue
# -------- ------------- ---------- ---------- ----------
# club1 | 50 | 0.382 | 0.041 | 9.282
# club2 | 30 | 0.24 | 0.035 | 6.904
# club3 | 21 | 0.11 | 0.032 | 3.402
# club4 | 24 | 0.131 | 0.064 | 2.055
# club5 | 14 | 0.19 | 0.111 | 1.701
# club6 | 11 | 1.003 | 0.166 | 6.024
# club7 | 2 | -0.47 | 0.842 | -0.559
The summary shows that there are 7 clubs and no divergent units. For each club, the summary also reports how many units are included, the beta coefficient of the log–t test, its standard error, and the value of the t–statistics. This exercise exactly replicates the results obtained by Phillips and Sul (2009). A minor difference concerns the last two clubs (6 and 7). In the original paper, Phillips and Sul showed a divergence group of 13 countries. However, another iteration of the algorithm using these 13 countries suggests the presence of two clubs consisting of 11 and 2 countries, respectively (on this point see also Schnurbus et al. (2017) and Du (2018)).
As shown in the following example, information about the club composition can be obtained using the print() function. For brevity, only the output for the first club is shown.
## Print results
print(clubs)
# or just
clubs
# ========================================================================================
# club 1
# ----------------------------------------------------------------------------------------
# United.States, Norway, Bermuda, United.Arab.Emirates, Qatar, Luxembourg, Singapore,
# Switzerland, Hong.Kong, Denmark, Ireland, Austria, Australia, Canada, Macao,
# Netherlands, Kuwait, Iceland, United.Kingdom, Germany, France, Sweden, Belgium, Japan,
# Brunei, Finland, Italy, Cyprus, Puerto.Rico, Israel, New.Zealand, Taiwan, Spain, Malta,
# Korea..Republic.of, Portugal, Oman, Mauritius, Antigua, St..Kitts...Nevis, Chile,
# Malaysia, Equatorial.Guinea, Dominica, St.Vincent...Grenadines, Botswana, Thailand,
# Cape.Verde, China, Maldives
#
# beta: 0.3816
# std.err: 0.0411
# tvalue 9.2823
# pvalue: 1
#
# [...]
Transition path plots can be generated through the function plot()
.
Here we show some examples.
# Plot Transition Paths for all units in each club and average Transition Paths
# for all clubs
plot(clubs)
# Plot Transition Paths
plot(clubs, avgTP = FALSE, nrows = 4, ncols = 2, plot_args = list(type='l'))
# Plot only average Transition Paths of each club
plot(clubs, clubs=NULL, avgTP = TRUE, legend=TRUE, plot_args = list(type='o'))
The second and third commands produce [fig:transitionsPlot,fig:avgtransitionsPlot], respectively. In the first case, we can see how economies approach the steady–state of each club. Conversely, in the second case, the comparison among the average transitional behaviour of each club is shown.
Finally, we assess if it is possible to merge some clubs together by
using function mergeClubs()
. Phillips and Sul (2009) merging algorithm is
chosen through the argument mergeMethod=’PS’
.
# Merge clusters using Phillips and Sul (2009) method
mclubs <- mergeClubs(clubs, mergeMethod='PS')
summary(mclubs)
# Number of convergence clubs: 6
# Number of divergent units: 0
#
# | merged clubs | # of regions | beta | std.err | tvalue
# -------- --------------- --------------- ---------- ---------- ----------
# club1 | clubs: 1 | 50 | 0.382 | 0.041 | 9.282
# club2 | clubs: 2 | 30 | 0.24 | 0.035 | 6.904
# club3 | clubs: 3 | 21 | 0.11 | 0.032 | 3.402
# club4 | clubs: 4, 5 | 38 | -0.044 | 0.07 | -0.636
# club5 | clubs: 6 | 11 | 1.003 | 0.166 | 6.024
# club6 | clubs: 7 | 2 | -0.47 | 0.842 | -0.559
According to the Phillips and Sul merging algorithm, former clubs 4 and 5 have been merged forming a new club (club 4), which now includes 38 Countries (24+14).
In this paper, we have discussed the implementation in R of the Phillips and Sul (2007, 2009) clustering procedure by presenting the ConvergenceClubs package. The package allows for simple and intuitive application of this methodology, which has become predominant in the analysis of the convergence patterns of economies due to its positive attributes. We have provided functions to perform the log–t test and cluster units, as well as to merge existing clubs. We have also described functions to summarise and plot the information obtained through the application of the clustering algorithm, as well as a detailed example of the package functionalities.
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Sichera & Pizzuto, "ConvergenceClubs: A Package for Performing the Phillips and Sul's Club Convergence Clustering Procedure", The R Journal, 2019
BibTeX citation
@article{RJ-2019-021, author = {Sichera, Roberto and Pizzuto, Pietro}, title = {ConvergenceClubs: A Package for Performing the Phillips and Sul's Club Convergence Clustering Procedure}, journal = {The R Journal}, year = {2019}, note = {https://rjournal.github.io/}, volume = {11}, issue = {2}, issn = {2073-4859}, pages = {142-151} }