In network analysis, many community detection algorithms have been developed. However, their implementation leaves unaddressed the question of the statistical validation of the results. Here, we present robin (ROBustness In Network), an R package to assess the robustness of the community structure of a network found by one or more methods to give indications about their reliability. The procedure initially detects if the community structure found by a set of algorithms is statistically significant and then compares two selected detection algorithms on the same graph to choose the one that better fits the network of interest. We demonstrate the use of our package on the American College Football benchmark dataset.
Over the last twenty years, network science has become a strategic field of research thanks to the strong development of high-performance computing technologies. The activity and interaction of thousands of elements can now be measured simultaneously, allowing us to model cellular networks, social networks, communication networks, power grids, and trade networks, to cite a few examples. Different types of data will produce different types of networks in terms of structure, connectivity, and complexity. In the study of complex networks, a network is said to have a community structure if the nodes are densely connected within groups but sparsely connected between them (Girvan and Newman 2002). The inference of the community structure of a network is an important task. Communities allow us to create a large-scale map of a network since individual communities act like meta-nodes in the network, which makes its study easier. Moreover, community detection can predict missing links and identify false links in the network. Despite its difficulty, a huge number of methods for community detection have been developed to deal with different size complexity and made available to the scientific community by open-source software packages. In this paper, we will address a specific question: are the detected communities significant, or are they a result of chance only due to the positions of edges in the network?
An important answer to this question is the Order Statistics Local
Optimisation Method (
Another interesting approach is the Extraction of Statistically
Significant Communities (
(Kojaku and Masuda 2018) introduced the QStest (https://github.com/skojaku/qstest/), a method to statistically test the significance of individual communities in a given network. Their algorithm works with different detection algorithms using a quality function that is consistent with the one used in community detection and takes into account the dependence of the quality function value on the community size. QStest assesses the statistical significance under the configuration model too.
Very recently, (He et al. 2020) suggested the Detecting statistically Significant Communities (DSC) method, a significance-based community detection algorithm that uses a tight upper bound on the p-value under the configuration model coupled with an iterative local search method.
OSLOM, ESSC, and DSC assess the statistical significance of every single community analytically while QStest adopts the sampling method to calculate the p-value of a given community. Moreover, all of them detect statistically significant communities under the configuration model, and only QStest is independent of the detection algorithm.
We present robin (ROBustness In Network), an R/CRAN package whose purpose is to give clear indications about the reliability of one or more community detection algorithms understudy, analyzing their robustness with respect to random perturbations. The idea behind robin is that if a partition is significant, it will be recovered even if the structure of the graph is modified. Alternatively, if the partition is not significant, minimal modifications of the graph will be sufficient to change it. robin is inspired by the concept presented by (Carissimo et al. 2018), who studied the stability of the recovered partition against random perturbations of the original graph structure using tools from Functional Data Analysis (FDA).
robin provides the best choice among the variety of the existing methods for the network of interest. It is based on a procedure that gives the opportunity to use the community detection techniques implemented in the igraph package (Csardi and Nepusz 2019) while providing the user with the possibility to include other community detection algorithms. robin initially detects if the community structure found by some algorithms is statistically significant, then it compares the different selected detection algorithms on the same network. robin assumes undirected graphs without loops and multiple edges.
robin looks at the global stability of the detected partition and not of single communities but accepts any detection algorithm and any random model, and these aspects differentiate it from OSLOM, ESSC, DSC, and QStest. Unlike other studies that treat the comparison between algorithms in a theoretical way, such as (Yang et al. 2016), robin aims to give a practical answer to such a comparison that can vary with the network of interest.
robin implements a methodology that examines the stability of the recovered partition by one or more algorithms. The methodology is useful for two purposes: to detect if the community structure found is statistically significant or is a result of chance; to choose the detection algorithm that better fits the network under study. These are implemented following two different workflows.
The first workflow tests the stability of the partitions found by a single community detection algorithm against random perturbations of the original graph structure. To address this issue, we specify a perturbation strategy (see subsection 2.1) and a null model to build some procedures based on a prefixed stability measure (see subsection 2.2). Given:
Our process builds two curves as functions of the perturbation level
The first curve
The comparison between the two M curves enables us to reconsider the problem regarding the significance of the retrieved community structure in the context of stability/robustness of the recovered partition against perturbations. The basic idea is that if small changes in the network cause a completely different grouping of the data, the detected communities are not reliable. For a better understanding of this point, we refer the reader to the paper (Carissimo et al. 2018) where the methodology was developed.
The choice of the null model plays a key role because we would expect it
to reproduce the same structure of the real network but with completely
random edges. For this reason, robin offers two possibilities: a
degree preserving randomization by using the rewire
function of the
igraph package or a model
chosen by the user.
The degree preserving randomization, i.e., Configuration Model (CM), is a model able to capture and preserve strongly heterogeneous degree distributions often encountered in real network data sets and is the standard null model for empirical patterns. Nevertheless, it can happen that it is not sufficient to preserve only the degree of the graph understudy, so robin allows the user to include their own null model.
In section 4, we explore the
The first workflow is summarised as follows (see Figure 2):
find a partition
perturb both networks,
retrieve two new partitions
calculate two clustering distances (for the real network and the
null network) between the original partitions and the ones obtained
from the perturbed network as:
Steps 2) - 4) are computed at different perturbation levels
This procedure allows the filtering of the detection algorithms according to their performance. Moreover, the selected ones can be compared using the second workflow.
The second workflow helps to choose among different community detection algorithms the one that best fits the network of interest, comparing their robustness two at a time. More precisely, the technique (see Figure 3):
find two partitions
perturb the network creating a new one,
retrieve two new partitions
evaluate
Steps 2) - 4) are repeated at different perturbation levels
The perturbed network has been restricted to have the same number of
vertices and edges as the original unperturbed network. Therefore, only
the positions of the edges are changed. It is expected that if a
community structure is robust, it should be stable under small
perturbations of the edges. This is because perturbing the network edges
by a small amount will imply just a small percentage of nodes to be
moved in different communities; on the other hand, perturbing a high
percentage of the edges in the network will produce random clusters.
Note that zero perturbation
Two different procedures for the perturbation strategy are implemented,
namely independent and dependent types. The independent strategy
introduces a percentage
In particular, we noticed that the greatest modification of the network
structure happens for a perturbation level between
We stress again that the M curve for a network with a strong structure
grows rapidly (perturbation level between 0
Moreover, the choice
Varying the percentage of perturbation, many graphs are generated and
compared by means of the stability measure chosen. For each perturbation
level, we generated 10 perturbed graphs and calculated the stability
measure. From each of these graphs, we generated 9 more by rewiring an
additional
The procedure we implemented is based on four different stability measures:
VI measures the amount of information lost and gained in changing from
one cluster to another, while split-join distance calculates the number
of nodes that have to be exchanged to transform any of the two
clusterings into the other; but for both of them, low values represent
more similar clusters, and high values represent more different
clusters. On the contrary, NMI and ARI are similarity measures, and
therefore, lower values identify more different clusters and higher
values more similar ones. To make all the measures comparable, we
considered the 1-1 transformation for the NMI and the ARI since they
vary between
robin allows different multiple statistical tests to check the
differences between the real and the random curve or between the curves
built from two different detection algorithms. The variation of
The first is a test based on the Gaussian Process regression (GP)
described in (Kalaitzis and Lawrence 2011b). In this paper, the authors
use GP to compare treatment and control profiles in biological
time-course experiments. The main idea is to test if two time series
represent the same or two different temporal processes. A Gaussian
process is a collection of random variables, any finite number of which
have a joint Gaussian distribution and is completely specified by its
mean function and its covariance function, see e.g., (Rasmussen and Williams 2006).
Given the mean function
In this framework, the hypothesis testing problem over the perturbation
interval
The second test implemented is based on the Interval Testing Procedure (ITP) described in (Pini and Vantini 2016). The ITP provides an interval-wise nonparametric functional testing and is not only able to assess the equality in distribution between functions, but also to underline specific differences. Indeed, users can see where are localized the differences between the two curves. The Interval Testing Procedure is based on:
Basis Expansion: functional data are projected on a functional basis (i.e. Fourier or B-splines expansion);
Interval-Wise Testing: statistical tests are performed on each interval of basis coefficients;
Multiple Correction: for each component of the basis expansion, an adjusted p-value is computed from the p-values of the tests performed in the previous step.
In summary, GP provides a global answer on the dissimilarity of the two M curves, while ITP points out local changes between such curves. As a rule of thumb, we suggest initially using GP to flag a difference and then ITP to understand at which level of perturbation such a difference is locally significant.
We also provide a global method to quantify the differences between the curves when they are very close. This is based on the calculation of the area under the curves with a spline approach.
Once in the R environment, it is possible to install and load the robin package with its dependencies, as follows:
install.packages("robin")
The robin package includes as dependencies igraph (Csardi and Nepusz 2019), networkD3 (Gandrud et al. 2017), ggplot2 (Wickham 2019), gridExtra (Auguie 2017), fdatest (Pini and Vantini 2015) , gprege (Kalaitzis and Lawrence 2011a), and DescTools (Signorell and al. 2019) packages. All, except gprege which is a Bioconductor package, are automatically loaded with the command:
library(robin).
To install the gprege package, start R and enter:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("gprege")
robin is a user-friendly software providing some additional functions
for data import and visualization, such as prepGraph
, plotGraph
, and
plotComm
. The function prepGraph
, required by the procedure, reads,
and simplifies undirected graphs removing loops and multiple edges. The
available input graphs formats are: “edgelist”, “pajek”, “ncol”, “lgl”,
“graphml”, “dimacs”, “graphdb”, “gml”, “dl”, and an igraph object. The
function plotGraph
, with the aid of the network3D package, starting
from an igraph object loaded with prepGraph
, shows an interactive 3D
graphical representation of the network, useful to visualize the network
of interest before the analysis. Furthermore, the function plotComm
helps to plot a graph with colorful nodes that simplifies the
visualization of the detected communities, given the membership of the
communities.
robin embeds all the community detection algorithms present in igraph. They can be classified as in (Fortunato 2009)
modularity based methods:
cluster_fast_greedy
(Clauset et al. 2005)cluster_leading_eigen
(Newman 2006)cluster_louvain
(Blondel et al. 2008)divisive algorithms:
cluster_edge_betweenness
(Newman and Girvan 2004)methods based on statistical inference:
cluster_infomap
(Rosvall and Bergstrom 2008)dynamic algorithms:
cluster_spinglass
(Reichardt and Bornholdt 2006)cluster_walktrap
(Pons and Latapy 2005)alternative methods:
cluster_label_prop
(Raghavan et al. 2007).robin gives the possibility to input a custom external function to
detect the communities. The user can provide his own function as a value
of the parameter FUN
in both analyses, implemented into the functions
robinRobust
and robinCompare
. These two functions create the
internal process for perturbation and measurement of communities
stability. In particular robinRobust
tests the robustness of a chosen
detection algorithm and robinCompare
compares two different detection
algorithms. The option measure
in the robinRobust
and robinCompare
functions provides the flexibility to choose between the four different
measures listed in the subsection 2.2.
robin offers two choices for the null model to set up for
robinRobust
:
random
.The function random
creates a random graph with the same degree
distribution as the original graph, but with completely random edges, by
using the rewire
function of the igraph package with the
keeping_degseq
option that preserves the degree distribution of the
original network. The function rewire
randomly assigns a number of
edges between vertices with the given degree distribution. Note that
robin assumes undirected graphs without loops and multiple edges which
are directly created, from any input graph, by the function prepGraph
.
The plotRobin
function allows the user to generate two curves based on
the computation of the chosen stability measures.
When plotRobin
is used on the output of robinRobust
, i.e., the first
step of the overall procedure, the first curve represents the measure
between the partition of the original unperturbed graph and the
partition of each perturbed graph (blue curve in Figure
4-Left panel), and the second curve is obtained
in the same way but considering as the original graph the random graph
(red curve in Figure 4-Left panel). The
comparison between the two curves turns the question about the
significance of the retrieved community structure into the study of the
robustness of the recovered partition against perturbation.
When plotRobin
is used on the output of robinCompare
, i.e. the
second step of the overall procedure, it generates a plot that depicts
two curves, one for each clustering algorithm. In the right panel of
Figure 4, each curve is obtained by computing the
measure between the partition of the original unperturbed graph with the
partition of each perturbed graph, where the partition method is either
Louvain (blue curve) or Fast Greedy (red curve).
The GP test is implemented in robinGPTest
and uses the R package
gprege (Kalaitzis and Lawrence 2011a). The ITP test is implemented in
robinFDATest
and uses the R package fdatest (Pini and Vantini 2015). The
area under the curves is calculated by the function robinAUC
and
relies on the DescTools package. Figure 5
shows the curves for the comparison of Louvain and Fast greedy
algorithms’ performance generated by the VI stability measure using the
Interval Testing Procedure on the American College Football network
(left panel) (Girvan and Newman 2002) and corresponding adjusted
p-values (right panel).
All the functions implemented in robin are summarized in Table 1.
Function | Description |
---|---|
Import/Manipulation | |
prepGraph |
Management and preprocessing of input graph |
random |
Building of null model |
Analysis | |
robinRobust |
Comparison of a community detection method versus random perturbations of the original graph |
robinCompare |
Comparison of two different community detection methods |
Visualization | |
methodCommunity |
Detection of the community structure |
membershipCommunities |
Detection of the membership vector of the community structure |
plotGraph |
Graphical interactive representation of the network |
plotComm |
Graphical interactive representation of the network and its communities |
plotRobin |
Plots of the two curves |
Test | |
robinGPTest |
GP test and evaluation of the Bayes factor |
robinFDATest |
ITP test and evaluation of the adjusted p-values |
robinAUC |
Evaluation of the area under the curve |
The time complexity of the proposed strategy, more precisely of the
robinRobust
function, is evaluated as follows. Generating a rewired
network with
In Table 2, we show the computational time evaluated on a
computer with an Intel 4 GHz i7-4790K processor and 24GB of memory. In
this example, we used
100 |
500 | 2.1 |
1000 |
9267 | 36.1 |
10000 |
100024 | 361.8 |
100000 |
994053 | 9411.6 |
1000000 |
8105913 | 110981.5 |
robin includes the American College football benchmark dataset as an analysis example that is freely available at http://www-personal.umich.edu/~mejn/netdata/. The dataset contains the network of United States football games between Division I colleges during the 2000 season (Girvan and Newman 2002). It is a network of 115 vertices that represent teams (identified by their college names) and 613 edges that represent regular-season games between the two teams they connect. The graph has the ground truth, where each node has a value that indicates to which of the 12 conferences it belongs, and this offers a good opportunity to test the ability of robin to validate the community robustness. It is known that each conference contains around 8-12 teams. The games are more frequent between members of the same conference than between members of different conferences. They are on average seven between teams of the same conference and four between different ones. We applied all the methods listed in subsection 3.3 to this network, choosing as measure the VI metric.
library(robin)
my_network <- system.file("example/football.gml", package="robin")
graph <- prepGraph(file=my_network, file.format="gml")
attributes <- vertex_attr(graph, index = V(graph))
real <- attributes$value
real <- as_membership(real)
set.seed(10)
members_In <- membershipCommunities(graph=graph, method=DA)
VI_In <- compare(real, members_In, method="vi")
Note that the variable compare
is contained in the
package igraph and permits the assessment of the distance between two
community structures according to the chosen method.
Table 3 summarizes the VI results calculated between the real communities and the ones that the detection algorithms created.
cluster_infomap |
0.054 |
cluster_spinglass |
0.063 |
cluster_louvain |
0.076 |
cluster_label_prop |
0.076 |
cluster_walktrap |
0.078 |
cluster_edge_betweenness |
0.083 |
cluster_fast_greedy |
0.185 |
cluster_leading_eigen |
0.196 |
It is possible to observe that the best performance is offered by Infomap, having the lowest VI value, followed by Spinglass. Louvain, Propagating Labels, Walktrap, and Edge betweenness have a similar intermediate VI value, while the worst performance is given by Fast greedy and Leading eigenvector. Then, we used robin to check if the results are confirmed by looking at the VI curves and the results of the testing procedure for the second workflow, i.e., the one comparing two detection algorithms, considering Infomap versus all the others.
comp <- robinCompare(graph=graph, method1=DA1,
method2=DA2, measure="vi", type="independent")
plotRobin(graph=graph, model1=comp$Mean1, model2=comp$Mean2,
measure="vi")
Figure 6 shows the results we
obtained. If we focus on the perturbation interval
In our overall procedure, we explored two different ways of generating a null model, namely the Configuration Model (CM) and the dk-series model.
graphRandomCM <- random(graph=graph)
graphRandomDK <- prepGraph(file="dk2.1_footballEdgelist.txt",
file.format = "edgelist")
plotGraph(graph)
plotGraph(graphRandomCM)
plotGraph(graphRandomDK)
The different structures provided by the real data network, CM, and
dk-series with
The CM generates a random graph with the same degree sequence as the
original one but with a randomized group structure. Our experiments show
that CM is not a good null model when using Propagating Labels and
Infomap as community extraction methods (Figure
8). In fact, when the modularity is low,
these two algorithms tend to assign all the nodes to the same community,
hence resulting in a flat stability measure curve. We launched the
function robinRobust
to assess the robustness of each detection
algorithm.
proc_CM <- robinRobust(graph=graph, graphRandom=graphRandomCM,
measure="vi", method=DA, type="independent")
plotRobin(graph=graph, model1=proc_CM$Mean,model2=proc_CM$MeanRandom,
measure="vi")
The dk-series model generates a random graph preserving the global
organization of the original network at various increasing levels of
details chosen by the user via the setting of the parameter
proc_DK <- robinRobust(graph=graph, graphRandom=graphRandomDK,
measure="vi", method="fastGreedy", type="independent")
plotRobin(graph=graph, model1=proc_DK$Mean, model2=proc_DK$MeanRandom,
measure="vi")
Figure 9 shows the stability measure curves of each detection algorithm compared to dk 2.1 null model. For all the methods, the two curves are very close due to the capability of the null model to preserve a structure similar to the real network and visually confirm the results in Table 3.
Moreover, for the dk-series model, we tested the differences between
the two curves using the GP methodology implemented in the function in
robinGPTest
.
BFdk <- robinGPTest(model1=proc_DK$Mean, model2=proc_DK$MeanRandom)
The results are shown in Table 4 and agree with those shown in Table 3.
AUC | ||
---|---|---|
cluster_infomap |
53.67 | 1.133 |
cluster_spinglass |
40.55 | 1.249 |
cluster_louvain |
31.52 | 1.208 |
cluster_label_prop |
102.2 | 0.852 |
cluster_walktrap |
30.74 | 1.204 |
cluster_edge_betweenness |
31.10 | 1.194 |
cluster_fast_greedy |
0.001 | 1.017 |
cluster_leading_eigen |
8.474 | 1.042 |
Fastgreedy clearly fails in recovering the communities. LeadingEigen has
stronger evidence but too weak when compared to the other methods.
Louvain, Walktrap, and EdgeBetweenness have the same strong evidence
followed by Spinglass and Infomap. LabelProp shows the strongest
evidence, but the result is obviously influenced by the swap between the
two curves when the perturbation is greater than 20
AUC <- robinAUC(graph=graph, model1=proc_DK$Mean,
model2=proc_DK$MeanRandom, measure="vi")
AUCdkratio <- AUC$area2/AUC$area1
Also, note that LabelProp originates the paradox that the AUC of the real model curve exceeds the AUC of the null network, despite the hypothesis testing result is positive. Hence, it is always the case to look at the plots and AUC ratios.
In this paper, we presented robin, an R/CRAN package, to assess the
robustness of the community structure of a network found by one or more
detection methods, providing an indication of their reliability. The
procedure implemented is useful for comparing different community
detection algorithms and choosing the one that best fits the network of
interest. More precisely, robin initially detects if the community
structure found by some algorithms is statistically significant, then it
compares the different selected detection algorithms on the same
network. robin uses analysis tools set up for functional data
analysis, such as robinRobust
and robinCompare
, which build the stability
measure curves for the null model and the network understudy for a fixed
detection algorithm and the stability measure for the network understudy
for two detection algorithms, respectively. Moreover, robinGPTest
and
robinFDATest
implement the GP test and the ITP test. We illustrated
the usage of the package on a benchmark dataset. The package is
available on CRAN at https://CRAN.R-project.org/package=robin.
The results in this paper were obtained using R 3.6.1 with the packages igraph version 1.2.4.2, networkD3 version 0.4, ggplot2 version 3.2.1, gridExtra version 2.3, fdatest version 2.1, gprege version 1.30.0, and DescTools version 0.99.31. R itself and all packages used are available from the Comprehensive R Archive Network (CRAN) at https://CRAN.R-project.org/.
This work was supported by the project Piattaforma Tecnologica ADVISE - Regione Campania, by the project “TAILOR” (H2020-ICT-48 GA: 952215) and by DiSTABiF at the University of Campania Luigi Vanvitelli that is coordinating V.P. PhD program.
V.P. implemented the software and analyzed its properties. D.R. supported V.P. in R package implementation. A.C., L.C., and I.D.F. conceived the work, equally contributed to the development and implementation of the concept, discussed and analyzed the results. A.C., L.C., I.D.F., and V.P. wrote the manuscript.
robin, igraph, networkD3, ggplot2, gridExtra, fdatest, DescTools
FunctionalData, GraphicalModels, MissingData, Optimization, Phylogenetics, Spatial, TeachingStatistics
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Policastro, et al., "ROBustness In Network (robin): an R Package for Comparison and Validation of Communities ", The R Journal, 2021
BibTeX citation
@article{RJ-2021-040, author = {Policastro, Valeria and Righelli, Dario and Carissimo, Annamaria and Cutillo, Luisa and Feis, Italia De}, title = {ROBustness In Network (robin): an R Package for Comparison and Validation of Communities }, journal = {The R Journal}, year = {2021}, note = {https://rjournal.github.io/}, volume = {13}, issue = {1}, issn = {2073-4859}, pages = {276-293} }