The latent budget model is a mixture model for compositional data sets in which the entries, a contingency table, may be either realizations from a product multinomial distribution or distribution free. Based on this model, the latent budget analysis considers the interactions of two variables; the explanatory (row) and the response (column) variables. The package lba uses expectation-maximization and active constraints method (ACM) to carry out, respectively, the maximum likelihood and the least squares estimation of the model parameters. It contains three main functions, lba
which performs the analysis, goodnessfit
for model selection and goodness of fit and the plotting functions plotcorr
and plotlba
used as a help in the interpretation of the results.
The idea of latent budget was first proposed by (Goodman 1974) in which he wanted to show a method to analyse the relationship between a set of qualitative variables, when some of them are manifested variables and other are non observable or latent variables. These ideas were later elaborated by (Clogg 1981) by interpreting a simple latent class model in an asymmetric way. Independently, (de Leeuw and van der Heijden 1988) introduced the model and named it latent budget analysis because they used it to analyse time-budget data. The model was also introduced independently in geology by (Renner 1988), where it is known as the endmember model.
LBA is an analysis method for compositional data which is basically an
LBA allows us to find out which categories of the response variables are related to different groups of the explanatory categories. If the table has a product multinomial distribution we can understand the latent budget model (LBM) as explaining the relationship between the explanatory and the response variables. It is done by assuming that conditioned on the latent variable they are independent. In that sense, the latent budgets, which are categories of a latent variable, are hidden values which explain the relationship between the explanatory and response variables. LBA reduces the dimensionality of the original problem, thus making it easier to understand its hidden relations.
Examples of latent budget models in sociological research, political sciences and other areas where categorical variables are used can be found in (Van der Ark 1999a). Generalizations of the LBA method may be found in (Siciliano and Heijden 1994), (Siciliano and Mooijaart 2001) and (Aria 2008). However, we found few articles in our literature search that used LBA in their data analyses. (Larrosa 2005) shows how to use LBA in applications to economics. (Tambrea and Siciliano 1999) proposed an LBA approach for three-way tables in a business field. We can also cite (Aquilia et al. 2015) in geology, (Ros-Freixedes and Estany 2014) in biology, and (Aria et al. 2003) in food engineering.
In our point of view, the reason why the use of LBA is not more widespread is due to the lack of available software. The software “A freeware computer program to perform latent budget analysis” written by L. Andries van der Ark (Van der Ark 1999b) in Borland Pascal 7.0 only runs under MS-DOS. Unfortunately, it is no longer available at http://come.to/lba/software. The only software we could find that performs LBA analysis is CoDaPack http://ima.udg.edu /codapack/, which until recently only ran in the Windows operational system. Therefore, because of the importance LBA has in categorical data analysis, the authors decided to write the lba package.
This is the first package for this type of analysis in R. The package is available from both the Comprehensive R Archive Network at http://cran.r-project.org/web/packages/lba/index.html and the lba project web site at https://github.com/ivanalaman/lba.
LBM is a mixture model for compositional data. A row of compositional
data is called a composition or a budget and its elements are the
components. We will follow the notation of (Van der Ark 1999a). He says: “by
performing LBA we approximate
The original contingency table is
row total:
column total:
total:
The compositional data matrix
The row vectors
The latent budgets are represented by
The elements of
Quoting (Van der Ark 1999a)
The latent budgets can be characterized by being compared to the latent budgets of LBM(1). LBM(1) is the independence model with
and , in this case . Hence, if latent component , then is characterized by the -th category. On the other hand, if , then the -th category is of lesser importance. The relative importance of each latent budget, in terms of how much of the expected data they account for, is expressed by the budget proportions . The
parameter also denotes the probability of latent budget when there is no information about the level of the row variable. To understand how the expected budgets are constructed to form the latent budgets, we must compare the mixing parameters to . If then the expected budget is characterized more than average by latent budget , otherwise, if then the expected budget is characterized less than average by latent budget . In practice, the mixture model interpretation is easier to carry out when we first characterize the latent budgets and then interpret the expected budgets in terms of them.
Compositional data that follows the product multinomial sampling scheme may be estimated by the maximum likelihood estimation method (MLE) which is estimated by using the EM algorithm (Dempster et al. 1977). On the other hand, if the data cannot be assumed to follow that distribution, using MLE is not recommended. Following (Van der Ark 1999a) we opted to estimate by using weighted least squares (WLS) estimators since it is a distribution free method, that is, one which does not assume any probability configuration on the data, see (Mooijaart et al. 1999).
In (de Leeuw et al. 1990) they describe the MLE for compositional data under a
product-multinomial distribution. The log likelihood is:
In this case,
The
The EM algorithm proceeds iteratively. Beginning with arbitrary initial
values of
In the maximization (M) step, given
The WLS function to be minimized is:
The weights
The lba package follows the algorithm completely described in
(Van der Ark 1999a) and (Mooijaart et al. 1999) called the active constraints method
(ACM). The method is a minimization with constraints. The equality
constraints are the row sums of matrix
In general, the LBM is not identifiable, meaning that there could be
various sets of parameters yielding the same goodness of fit
(van der Ark et al. 1999). In fact, as (de Leeuw et al. 1990) show,
We follow (van der Ark et al. 1999) in which he shows that
The lba package follows the solutions proposed by (de Leeuw et al. 1990) for LBM(2), as described in (Van der Ark 1999a) Chapter 2, when discussing the geometry of LBM(2).
In LBM(2), the unidentified latent budgets
and can be viewed as two vectors in a J-dimensional space. The heads of any two vectors can be connected by a line segment, denoted by V, which is a subset of a line S. The expected budgets are J-dimensional vectors and convex combinations of and . Therefore, the heads of lie on V, and the relative distance from to and is expressed by the mixing parameters. The unidentified latent budgets and , collected in , can be transformed into .
The region of budgets is denoted by
The matrices
In the outer extreme solution the latent budgets are as different as possible, simplifying their interpretation in most cases. In the inner extreme solution, the latent budgets are as similar as possible. At the same time, the mixing parameters will be as different as possible.
(Van der Ark 1999a) uses the following criteria to identify the solutions:
minimize
Where
In order to find those minimal solutions, the lba package uses the
constrOptim.nl
function from the
alabama package
(Varadhan 2015). In this case the “BFGS” algorithm is used .
Parameters in LBM may be subject to optional constraints, which can be imposed by a researcher, either to test specific hypotheses about the model to facilitate its interpretation, or to build complex models.
There are three different types of optional constraints, namely fixed value constraints, equality constraints, and multinomial logit constraints.
Fixed value constraints have the form
Equality constraints have the form
The multinomial logit constraints were introduced in LBA by
(van der Heijden et al. 1992), and have the following form:
The
The degrees of freedom, according to (de Leeuw et al. 1990), is the number of
independent cells minus the number of independent parameters, For
compositional data the number of independent cells is always
The lba package only runs the identifiability function when there
are no optional constraints in the model. This is due to the fact that
it is not possible to maintain the constraints while running that
function. Therefore users who use optional constraints should use
The maximum likelihood estimation is adjusted for fixed value constraint according to (van der Heijden et al. 1992).
Let
Optional equality constraints for parameter estimates obtained with the
EM algorithm are described in (Mooijaart and van der Heijden 1992); see also
(van der Heijden et al. 1992). For the mixing parameters, if
The remaining parameters should be updated by using quations (1) and (2).
In (van der Heijden et al. 1992) they warn that the estimation of optional equality constraints (in combination with fixed value constraints) by the EM-algorithm is not always correct. The lba package takes it into account automatically and uses the alabama package to estimate the parameters when necessary.
The estimation of the parameters under multinomial logit constraints is
described in (van der Heijden et al. 1992). The complete data log likelihood function
can be split into two parts where one depends only on row covariates and
the other only on the column covariates; therefore, the E-step of the EM
algorithm is the same as in the unconstrained LBM. The M-step is
implemented by making use of the optim
function and the alabama
package.
Depending on the values of the matrices 1e6
. Also, whenever the values of the
row or column covariates are not supplied, lba creates random
values from the standard normal distribution.
Note:
Depending on the starting parameters, all algorithms cited above may
only locate a local, rather than global, maximum or minimum. This
becomes more and more of a problem as
Some times it a label switching may occur. Usually the interpretation remains the same but the label of the budgets might not be the same. The lba package does try to minimize those occurrences, nevertheless they still may occur.
Latent budget analysis has a great variety of tools available for
assessing model fit and determining an appropriate number of latent
budgets
Adding an additional budget to a latent budget model will increase the
fit of the model, but at the risk of fitting too much noise, and at the
expense of estimating further
The most used criteria can be found in the Table 1.
Statistics | Formula |
---|---|
Likelihood ratio statistic | |
Pearson chi-squared statistic | |
Residual sum of squares | |
Weighted residual sum of squares | |
Akaike information criterion | |
Corrected |
|
Bayesian information criterion |
The lba package calculates a great variety of goodness of fit statistics (GFS). Some can only be used with the data following the product multinomial distribution; others, on the other hand, may be used with distribution free data. A few have an asymptotic chi-square distribution, called exact GFS, but most have an unknown distribution.
Every numerical method performing an estimation of many parameters which
depends on a optimization algorithm is going to be computationally time
consuming. The lba package in particular uses a maximization with
constraints in order to identify the parameters being estimated. The
alabama package is used in this case and furthermore the
restrictions, which are row sums of matrix A and column sums of matrix B
must be one, is programmed in alabama in a very complex way. All
those conditions tend to make alabama somewhat computationally
time consuming, nevertheless, in our experience, those times are not too
long. Figure 1 shows that the most important time consuming
parameter is the number of latent budgets,
The main function of package lba is lba
. This function input may
be an object of class "formula"
, "matrix"
or "table"
. The lba
function can be called by:
lba(obj, ...)
If the object is from class "formula"
, the method lba.formula
will
be called:
lba(formula, data, A = NULL, B = NULL, K = 1L, cA = NULL,
cB = NULL, logitA = NULL, logitB = NULL, omsk = NULL, psitk = NULL,
S = NULL, T = NULL, row.weights = NULL, col.weights = NULL,
tolG = 1e-10, tolA = 1e-05, tolB = 1e-05, itmax.unide = 1000,
itmax.ide = 1000, trace.lba = TRUE, toltype = "all", method = c("ls",
"mle"), what = c("inner", "outer"), ...)
Objects of class "formula"
follow the same logic of linear models,
that is, dependent variables as a function of independent variables. The
argument data
must have objects of class "data.frame"
as input.
Objects of class matrix
are executed by method lba.matrix
:
lba.matrix(obj, A = NULL, B = NULL, K = 1L, cA = NULL, cB = NULL,
logitA = NULL, logitB = NULL, omsk = NULL, psitk = NULL,
S = NULL, T = NULL, row.weights = NULL, col.weights = NULL,
tolG = 1e-10, tolA = 1e-05, tolB = 1e-05, itmax.unide = 1000,
itmax.ide = 1000, trace.lba = TRUE, toltype = "all", method = c("ls",
"mle"), what = c("inner", "outer"), ...)
Objects of class "table"
are executed by the method lba.table
:
lba.table(obj, A = NULL, B = NULL, K = 1L, cA = NULL, cB = NULL,
logitA = NULL, logitB = NULL, omsk = NULL, psitk = NULL,
S = NULL, T = NULL, row.weights = NULL, col.weights = NULL,
tolG = 1e-10, tolA = 1e-05, tolB = 1e-05, itmax.unide = 1000,
itmax.ide = 1000, trace.lba = TRUE, toltype = "all", method = c("ls",
"mle"), what = c("inner", "outer"), ...)
The default method of estimation is weighted least squares with the
row weights (row.weights
and col.weights
. If all those values
equal one, then we get the ordinary least squares. The other available
method is the maximum likelihood estimator.
The arguments A
and B
are used whenever the user wants to set the
initial values of the mixing parameters or latent components
respectively. If the user has no initial values to set, those matrices
are randomly set by using a Dirichlet distribution. For matrix A
the
distribution parameters are I and alphavec where I is the row
number of the compositional data matrix and alphavec is randomly
generated from a uniform distribution with parameter K (number of
latent budgets) as for B
the parameters are K and alphavec which
is randomly generated from a uniform distribution with parameter J
where J is the column number of the compositional data matrix.
The arguments cA, cB
must be used whenever the estimation process is
done with constraints on the parameters
Use help(lba)
for the remaining parameters.
The lba package can produce plots of the mixing parameters and
latent components matrices. For Aoi
and Boi
. For plotcorr
, the other one is called plotlba
.
It should be noted that when plotting the latent components using the
triangular coordinate system, lba uses the rescaled latent
components matrix whose values are:
The functions plotlba
and plotcorr
use the generic functions plot
,
axis
, text
, points
, segments
and legend
for K = 2
. For
K = 3
, the function plotlba
uses the functions triax.plot
,
triax.points
and thigmophobe.labels
from package
plotrix and also
segments
and legend
. The function plotcorr
uses the generic
functions for K = 2
. Whenever K
4
, only the function
plotcorr
is used. In this case the function scatterplot3d
from
package
scatterplot3d
is internally called. Finally, if the argument rgl = TRUE
is used,
then the function plot3d
from package
rgl will be called.
The goodness of fit results can be obtained by making use of the
function goodnessfit(obj,...)
where obj
is an object of class
"lba"
.
Main et al. (2015) studied pregnancy related maternal deaths in California. They examined five distinct clinical conditions that account for nearly 70 of all pregnancy related deaths,
cardiovascular diseases, CVD
preeclampsia/eclampsia, Pre.E
obstetric hemorrhage, OH
deep vein thrombosis - pulmonary embolism, DVTPE
amniotic fluid embolism, AFE
together, they also collected data about
maternal age,
parity,
gestational age at delivery,
maternal race and country of birth,
body mass index - BMI.
which will be the rows or budgets of the data matrices.
Table 2 shows the number of deaths related to the five conditions for women with BMI less than 30, between 30 and 40 and above 40.
Pre.E | OH | CVD | DVTPE | AFE | |
---|---|---|---|---|---|
29 | 14 | 28 | 8 | 18 | |
30-40b | 4 | 2 | 15 | 6 | 0 |
1 | 2 | 6 | 5 | 0 |
The lba function was performed on data matrix bmi
, as shown below.
> library(lba)
> data(pregnancy)
> bmi <- pregnancy[5:7,]
> set.seed(1)
> bmilba <- lba(bmi, K = 2, method = "mle", what = 'outer',
+ trace.lba = FALSE)
Since all rows of the BMI matrix are independent, the product
multinomial model can be used and the maximum likelihood estimation
(MLE) method applies. We will also use set.seed
so that the user who wishes to replicate the analysis may get
the same results as the ones shown below.
> goodnessfit(bmilba)
Likelihood ratio statistic:
K budget Baseline
G2 value 1.809 2.95e+01
P-value 0.613 2.63e-04
The goodness of fit result shows that the likelihood ratio statistic
(G2) used to test the model gave a p-value of summary
of both lba
and goodnessfit
functions gives complete results. The only other possible model is the
independence model, or baseline model, which has a p-value of plotcorr
.
The plot has just one dimension, so that the points are spread only
along the horizontal axis; see documentation of function plotcorr
. The
first latent budget (LB1) is composed of CVD and DVTPE, which could be
considered as pregnancy-related conditions; the second latent budget
(LB2) is composed of AFE and Pre.E, which are more general conditions.
The OH condition can be considered neutral. The mixing parameter BMI
less than 30 is related to LB2 and the more obese women to LB1 as is
expected since obese people are more affected by the general conditions.
In this example, we consider three other explanatory variables connected to pregnancy-related death: parity, maternal age and gestational age at delivery/fetal demise. The resulting data matrix is:
Pre.E | OH | CVD | DVTPE | AFE | |
---|---|---|---|---|---|
1 | 16 | 3 | 13 | 3 | 3 |
2-4 | 16 | 13 | 31 | 14 | 10 |
5+ | 4 | 4 | 5 | 3 | 5 |
12 | 5 | 25 | 11 | 4 | |
30-40a | 18 | 13 | 22 | 8 | 13 |
6 | 2 | 2 | 1 | 1 | |
6 | 5 | 8 | 0 | 0 | |
32-36w | 16 | 5 | 8 | 8 | 1 |
14 | 10 | 33 | 12 | 17 |
Unlike the Example 1, the matrix rows are not independent since the same women are counted in each one of the row variables. Therefore the MLE method does not apply here and we use the least squares method in order to estimate the model parameters. The functions to call are:
> mcd <- pregnancy[8:16,]
> set.seed(1)
> mcdlba <- lba(mcd, K = 2, method = 'ls', what = 'outer', trace.lba = FALSE)
> set.seed(1)
> mcdlba1 <- lba(mcd, K = 3, method = 'ls', what = 'outer', trace.lba = FALSE)
> set.seed(1)
> mcdlba2 <- lba(mcd, K = 4, method = 'ls', what = 'outer', trace.lba = FALSE)
In order to get the values of Table 4, the function
goodnessfit
is used as follows.
> summary(goodnessfit(mcdlba))
> summary(goodnessfit(mcdlba1))
> summary(goodnessfit(mcdlba2))
Number of | |||||
Latent budgets | df | wRSS | Actual decrease | Required decrease | Fit improved? |
1 | 32 | 0.31 | NA | NA | NA |
2 | 21 | 0.14 | 0.17 | 0.11 | yes |
3 | 12 | 0.06 | 0.11 | 0.09 | yes |
4 | 05 | 0.02 | 0.04 | 0.07 | no |
The weighted residual sum of squares between the observed components and the expected components (wRSS) were used as a goodness of fit statistic, and the independence model, LBM(1) as a baseline model.
We followed (Van der Ark 1999a) for the following guidelines in order to make a decision on the number of latent budgets to be used:
The proportion of lack of fit with respect to the baseline model should be the largest one.
the improvement of adding an extra latent budget should be large
enough to justify the increased effort of interpreting the extra set
of parameter estimates. In order to achieve this we use the
criterion that the average improvement of fit per degree of freedom
as shown in the summary(goodnessfit)
times the number of the
difference of degrees of freedom between two values of
The results should be interpretable.
Both the models with
The interpretation of the latent budgets goes as follows: LB2 is explained by pre-eclampsia/eclampsia, which is a condition that occurs only in pregnant women, which is characterized by high blood pressure. LB1 has two conditions: AFE, which is a pregnancy condition; and DVTPE, which is a more general condition, and LB3 is explained by CVD. Unlike the first example, it does not put together CVD and DVTPE (Figure 3).
The mixing parameters connected to LB2 are gestational age at delivery of 32 to 36 weeks and maternal age older than 40 years. Pre.E may occur any time after the twentieth week. The LB1 is connected to women with more than 5 children, and age from 30 to 40 years. LB3 is connected to younger women with 2 to 4 children and early delivery (Figure 3).
In this example we consider as explanatory variables, maternal age and country of birth in connection to pregnancy related death. The resulting data matrix is in Table 5.
Pre.E | OH | CVD | DVTPE | AFE | |
---|---|---|---|---|---|
Hispanic,foreign-born | 18 | 5 | 8 | 4 | 5 |
Hispanic, us-born | 6 | 4 | 9 | 5 | 1 |
White, non-hispanic | 6 | 7 | 11 | 6 | 4 |
Black, non-hispanic | 5 | 2 | 19 | 5 | 5 |
This matrix has independent rows so that the product multinomial model and the MLE method are used to estimate the mixing parameters and latent components.
Number of | |||
latent budgets | df | p-value | |
1 | 12 | 20.5 | 0.06 |
2 | 6 | 6.8 | 0.34 |
3 | 2 | 1.61 | 0.45 |
Table 6 shows the results of goodness of fit. Both
> set.seed(1)
> mrd2 <- pregnancy[1:4,]
> rownames(mrd2) <- c("Hispanic,foreign-born", "Hispanic, us-born",
"White, non-hispanic", "Black, non-hispanic")
> mrd2lbaa <- lba(mrd2, K = 2, method = "mle", what = 'outer', trace.lba = FALSE)
> par(mfrow = c(1,2))
> plotcorr(mrd2lbaa, pch.points = 20, xlim = c(-3,2.5),
+ labels.points = rownames(mrd2lbaa$Aoi), col.budget = 'gray20',
+ args.legend = list(plot = FALSE))
> plotcorr(mrd2lbaa, with.ml = 'lat', pch.points = 20,
+ labels.points = rownames(mrd2lbaa$Boi), col.budget = 'gray20',
+ args.legend = list(plot = FALSE))
The Figure 4 graphs are both one dimensional. What is
important is the position at the horizontal axis. The default of the
plotcorr
function lays the numbered points out of the line so that
they do not overlap, which often happens when the number of parameters
increases. The latent components show that the LB2 is composed of CVD
and DVTPE which are the more general conditions, and LB1 is composed of
Pre.E and OH, which are conditions more specific to pregnancy. AFE is
around the origin and does not affect either one of the budgets. It
should be noted that CVD and Pre.E are the ones with the strongest
influence on their respective budgets because they are farther away from
the origin. Looking at the mixing parameters we can see that Black women
belong to LB2,that is Black women have a strong connection to more
general conditions, and Hispanic foreign-born women are mostly affected
by specific pregnancy conditions.
> set.seed(1)
> mrd2lba <- lba(mrd2, K = 3, method = "mle", what = 'outer', trace.lba = FALSE)
> par(mfrow = c(1,2))
> plotcorr(mrd2lba, with.ml = 'mix', xlim = c(-4, 3.5), ylim = c(-1.5, 2),
+ pch.points = 20, col.points = 4, pos.points = c(3, 2, 4, 3),
+ labels.points = rownames(mrd2lba$Aoi),
+ col.budget = 'gray20',
+ args.legend = list(plot = FALSE))
> plotcorr(mrd2lba, with.ml = 'lat', xlim = c(-2,2), ylim = c(-1.5,2.5),
+ pch.points = 20, col.points = 4, pos.points = c(1, 3, 3, 3, 3),
+ labels.points = rownames(mrd2lba$Boi),
+ col.budget = 'gray20',
+ args.legend = list(plot = FALSE))
The three budgets model is shown in Figure 5. CVD and Pre.E
each form a budget, LB1 and LB3 respectively; LB2 is composed of DVTPE
and OH. Most important is to see that Hispanic foreign born women are
strongly connected to Pre.E and Black women to CVD. This gives more
emphasis to the
One theory of post-materialism states that political values change with the industrial and economic growth of a society. It says that people can be classified into two major groups with respect to their political values, namely materialistic, who seek security and materialistic supply, and post-materialistic, who try to bring about idealistic goals. Those two views, according to the theory, could be regarded as the extremes of a continuum. The dataset consists of seven categories ranking from materialism (m..) to post-materialism (pm..) from a survey across Europe coded in a contingency table with 13 countries (rows) and the 7 levels of the in depth post-materialism index.
The countries included in the survey are:
B=Belgium, D=Germany, DK=Denmark, E=Spain, F=France, GB=Great Britain,
GR=Greece,I=Italy, IRL=Ireland, L=Luxembourg, NIRL=Northern Ireland, NL=Netherlands, P=Portugal.
The complete table is part of the postmater
dataset contained in
package lba. In order to find out how many typical societies are
needed to explain the data, four models for goodnessfit
using the
Number of | |||
latent budgets | df | p-value | |
1 | 72 | 855.6 | 0.00 |
2 | 55 | 93.7 | 0.00 |
3 | 40 | 66.3 | 0.01 |
4 | 27 | 36.9 | 0.03 |
We will now discuss the graphical results from the function plotcorr
.
> data(postmater)
> new_post <- as.matrix(postmater[,-1])
> row.names(new_post) <- postmater[,1]
>
> set.seed(1)
> ex4 <- lba(new_post, method = "mle", what = 'outer', K = 4, tolG = 1e-5,
+ itmax.unide = 1e4, trace.lba = FALSE)
> par(mfrow = c(1,2))
> plotcorr(ex4, main = "Mixing Parameters", ylim = c(-1.5,2.5), zlim = c(-3,3),
+ pch.points = 20, col.points = 4, labels.points = rownames(ex4$Aoi),
+ col.budget = 'gray20', args.legend = list(plot = FALSE))
> plotcorr(ex4, with.ml = "lat", main = "Latent Components", pch.points = 20,
+ col.points = 4, labels.points = rownames(ex4$Boi), col.budget = 'gray20',
+ args.legend = list(plot = FALSE))
In the first graph shown in Figure 6,
Should the user wish to use a dynamic visualization of 3D graphics, the
function plotcorr
has the argument rgl = TRUE
.
> tex4 <- ex4
> class(tex4) <- c("lba.2d", "lba.mle", "lba.matrix", "lba")
> par(mfrow=c(1,2))
> plotcorr(tex4, main = "Mixing Parameters", xlim = c(-2,2), ylim = c(-1.5,4),
+ pch.points = 20, col.points = 4, labels.points = rownames(tex4$Aoi),
+ col.budget = 'gray20', args.legend = list(plot = FALSE))
> plotcorr(tex4, with.ml = "lat", main = "Latent Components", xlim = c(-1.5,2.5),
+ ylim = c(-2,2.5), pch.points = 20, col.points = 4,
+ labels.points = rownames(tex4$Boi), col.budget = 'gray20',
+ args.legend = list(plot = FALSE))
The graph shown in Figure 7 shows, for
> set.seed(1)
> ex3 <- lba(new_post, method = "mle", what = 'outer', K = 3, tolG = 1e-5,
+ itmax.unide = 1e4, trace.lba = FALSE)
> par(mfrow = c(1,2))
> plotcorr(ex3, xlim = c(-2.5,2), ylim = c(-2,2), main = "Mixing Parameters",
+ pch.points = 20, col.points = 4, labels.points = rownames(ex3$Aoi),
+ col.budget = 'gray20', args.legend = list(plot = FALSE))
> plotcorr(ex3, with.ml = "lat", main = "Latent Components", xlim = c(-2,2.5),
+ ylim = c(-1.5,2.5), pch.points = 20, col.points = 4,
+ labels.points = rownames(ex3$Boi), col.budget = 'gray20',
+ args.legend = list(plot = FALSE))
In figure 8 we have two graphs. Looking at the latent component part we clearly see three latent budgets. These are:
LB1 consisting of m.. that means the clearly materialistic countries.
LB2 consisting of pm, pm. and pm.. which means the most post-materialistic countries.
LB3 consisting of m., m and
The mixing parameters show that:
The materialistic countries, belonging to LB1, are: Greece, Northern Ireland and Ireland.
Those in midway, belonging to LB3, are: Belgium and Italy.
The post-materialistic countries, belonging to LB2, are: France, Germany and Netherlands.
We could say that Great Britain and Luxembourg are in a group and Spain is also in a group apart. Portugal is midway between LB1 and LB3.
Finally, it becomes very interesting when we look at Figure
9, only the mixing parameters for
In this case there are two budgets; LB1 representing the post-materialism and LB2 representing the materialism. The graph shows a clear continuum from materialistic to post-materialistic countries where some groups, as we go from one end to another, become clear. They are:
Greece, Ireland, Northern Ireland, and Portugal the most materialistic,
Belgium and Spain,
Great Britain, Italy, and Luxembourg
Denmark, France, and Germany,
Netherlands the most post-materialistic.
> set.seed(1)
> ex2 <- lba(new_post, method = "mle", what = 'outer', tolG = 1e-5,
+ itmax.unide = 1e4, K = 2, trace.lba = FALSE)
> plotcorr(ex2, pch.points = 20, labels.points = rownames(ex2$Aoi),
+ col.budget = 'gray20', args.legend = list(plot = FALSE))
For more details see (Van der Ark 1999a) page 172.
The lba package permits different approaches in latent budget analysis, much more than we could possibly bring to this article, and we strongly suggest the reading of (Van der Ark 1999a) to get a full idea of them.
We have presented the lba package for latent budget analysis, which is derived from “A freeware computer program to perform latent budget analysis” (Van der Ark 1999b). All unconstraint and constraint methods found in (Van der Ark 1999a) were implemented.
We added some new features, such as the possibility to assign any value between zero and one as a fixed value constraint for both mixing parameters and latent components, and the implementation of two types of plots, which greatly facilitates the interpretation of the model.
The lba package does not have the capability to analyze longitudinal data and neural networks yet. It is part of our plan to add those features to the package’s capabilities.
Our next goal is to implement a new algorithm to perform identification to to replace the alabama package. This might be less computationally time consuming.
All the programming in R was done through the Tinn-R interface (Faria et al. 2015).
The package depends on the packages; MASS (Venables and Ripley 2002), alabama (Varadhan 2015), plotrix (Lemon 2006), scatterplot3d (Ligges and Mächler 2003), and rgl (Adler et al. 2016).
lba, alabama, plotrix, scatterplot3d, rgl, MASS
Distributions, Econometrics, Environmetrics, MixedModels, NumericalMathematics, Optimization, Psychometrics, Robust, SpatioTemporal, TeachingStatistics
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Jelihovschi & Allaman, "lba: An R Package for Latent Budget Analysis", The R Journal, 2018
BibTeX citation
@article{RJ-2018-026, author = {Jelihovschi, Enio and Allaman, Ivan Bezerra}, title = {lba: An R Package for Latent Budget Analysis}, journal = {The R Journal}, year = {2018}, note = {https://rjournal.github.io/}, volume = {10}, issue = {1}, issn = {2073-4859}, pages = {269-287} }