The functional logit regression model was proposed by (Escabias et al. 2004) with the objective of modeling a scalar binary response variable from a functional predictor. The model estimation proposed in that case was performed in a subspace of \(L^2(T)\) of squared integrable functions of finite dimension, generated by a finite set of basis functions. For that estimation it was assumed that the curves of the functional predictor and the functional parameter of the model belong to the same finite subspace. The estimation so obtained was affected by high multicollinearity problems and the solution given to these problems was based on different functional principal component analysis. The *logitFD* package introduced here provides a toolbox for the fit of these models by implementing the different proposed solutions and by generalizing the model proposed in 2004 to the case of several functional and non-functional predictors. The performance of the functions is illustrated by using data sets of functional data included in the *fda.usc* package from R-CRAN.

A functional variable is that whose values depend on a continuous magnitude such as time. They are functional in the sense that they can be evaluated at any time point of the domain, instead of the discrete way, in which they were originally measured or observed (see for example (Ramsay and Silverman 2005)). Different approaches have been used for the study of functional data, among others, the nonparametric methods proposed by (Müller and Stadtmüller 2005) and (Ferraty and Vieu 2006) or the basis expansion methods considered in (Ramsay and Silverman 2005). Most multivariate statistical techniques have been extended for functional data, whose basic theory and inferential aspects are collected in recent books by (Horvath and Kokoszka 2012), (Zhang 2014) and (Kokoszka and Reimherr 2018). The basic tools to reduce the dimension of the functional space to which the curves belong, are Functional Principal and Independent Component Analysis (FPCA) ((Ramsay and Silverman 2005); (Acal et al. 2020); (Vidal et al. 2021)) and Functional Partial Least Squares (FPLS) ((Preda and Saporta 2005); (Aguilera et al. 2010); (Aguilera et al. 2016)).

In the last decade of the XXth century and the first decade of XXIth
century, where functional data methods began to be developed, there was
no adequate software available for using and fitting functional data
methods. In fact, nowadays classical statistical software like SPSS,
STATA,... do not have a toolbox for functional data analysis. The
development of object-oriented software like R, Matlab or S-plus and the
great activity of scientific community in this field has made possible
to emerge different packages mainly in R for using functional data
analysis (FDA) methods. Every package is designed from the point of view
followed by its developer and the method used to fit functional data
methods. For example (Febrero-Bande and Oviedo 2012) used nonparametric methods in their
*fda.usc* package,
(Ramsay et al. 2009) designed their
*fda* package under basis
expansion philosophy, Principal Analysis by Conditional Estimation
(PACE) algorithm (see (Zhu et al. 2014)) was used for curves alignment, PCA and
regression in the
*fdasrvf* package (see
https://cran.r-project.org/web/packages/fdasrvf/index.html). Recently
Fabian Scheipl has summarized the available R packages for FDA (see
https://cran.r-project.org/web/views/FunctionalData.html).

This paper is devoted to
*logitFD* an R package for
fitting the different functional principal component logit regression
approaches proposed by (Escabias et al. 2004). Functional logit regression is a
functional method for modeling a scalar binary response variable in
different situations: firstly, from one single functional variable as
predictor; secondly, from several functional variables as predictors;
and thirdly, from several functional and nonfunctional variables as
predictors which is the most general case. There exist some R functions
with this objective as the `fregre.glm`

function of
*fda.usc* package (see
https://rpubs.com/moviedo/fda_usc_regression). Different to the former
the methods proposed by (Escabias et al. 2004), and developed in
*logitFD*, are basis
expansion based methods what makes the logit model suffer from
multicollinearity. The proposed solutions were based on different
functional principal components analysis: ordinary FPCA and filtered
FPCA (see (Escabias et al. 2014)). These models have been successfully applied
to solve environmental problems ((Aguilera et al. 2008b); (Escabias et al. 2005);
(Escabias et al. 2013)) and classification problems in food industry
((Aguilera-Morillo and Aguilera 2015)). Extensions for the case of sparse and correlated
data or generalized models have been also studied ((James 2002);
(Müller and Stadtmüller 2005); (Aguilera-Morillo et al. 2013); (Mousavi and Sørensen 2018); (Tapia et al. 2019);
(Bianco et al. 2021)).

This package adopts *fda*’s
package philosophy of basis expansion methods of (Ramsay et al. 2009) and it is
designed to use objects inherited from the ones defined in
*fda* package. For this reason
*fda* package is required for
*logitFD*. The package
consists of four functions that fit a functional principal component
logit regression model in four different situations

Filtered functional principal components of functional predictors, included in the model according to their variability explained power.

Filtered functional principal components of functional predictors, included in the model automatically according to their prediction ability by stepwise methods.

Ordinary functional principal components of functional predictors, included in the model according to their variability explained power.

Ordinary functional principal components of functional predictors, included in the model automatically according to their prediction ability by stepwise methods.

The designed functions of our package use as input the `fd`

objects
given by *fda* package and
also provide as output `fd`

objects among others elements.

This paper is structured as follows: after this introduction, the second
section shows the generalities of the package with the needed
definitions and objects of functional data analysis, functional logit
regression and extended functional logit regression, third and fourth
sections board ordinary and filtered functional principal component
logit regression, respectively. In fifth section ordinary and filtered
functional principal components logit regression is addressed by
including functional principal components according prediction ability
by stepwise methods. In every section a summary of the theoretical
aspects of the involved models is shown with a practical application
with functional data contained in
*fda.usc* package
((Febrero-Bande and Oviedo 2012)).

A functional data set is a set of curves \(\left\{ x_1(t),\ldots, x_n (t) \right\},\) with \(t\) in a real interval \(T\) (\(t \in T\)). Each curve can be observed at different time points of its argument \(t\) as \(x_{i}=\left( x_{i}\left( t_{0}\right),\ldots ,x_{i}\left(t_{m_{i}}\right) \right)^{\prime}\) for the set of times \(t_{0},\ldots,t_{m_{i}},\;i=1,\ldots ,n\) and these are not necessarily the same for each curve.

Basis expansion methods assume that the curves belong to a finite dimensional space generated by a basis of functions \(\left\{ \phi _{1}\left( t\right) ,\ldots ,\phi_{p}\left( t\right) \right\}\) and so they can be expressed as \[\label{BasisExpan} x_{i}\left( t\right) =\sum_{j=1}^{p}a_{ij}\phi _{j}\left( t\right), \; i=1,\ldots,n. \tag{1}\] The functional form of the curves is determined when the basis coefficients \(a_i=\left(a_{i1},\ldots,a_{ip}\right)^{\prime}\) are known. These can be obtained from the discrete observations either by least squares or by interpolation methods (see, for example, (Escabias et al. 2005) and (Escabias et al. 2007)).

Depending on the characteristics of the curves and the observations,
various types of basis can be used (see, for example, (Ramsay and Silverman 2005)). In
practice, those most commonly used are, on the one had, the basis of
trigonometric functions for regular, periodic, continuous and
differentiable curves, and on the other hand, the basis of B-spline
functions, which provides a better local behavior (see (De Boor 2001)).
In *fda* package the type of
basis used are B-spline basis, constant basis, exponential basis,
Fourier basis, monomial basis, polygonal basis and power basis
((Ramsay et al. 2009)). Due to
*logitFD* package use `fd`

objects from *fda* package,
the same types of basis can be used.

In order to illustrate the use of
*logitFD* package we are
going to use `aemet`

data included in
*fda.usc* package of
(Febrero-Bande and Oviedo 2012). As can be read in the package manual, `aemet`

data
consist of meteorological data of 73 Spanish weather stations. This data
set contains functional and nonfunctional variables observed in all the
73 weather stations. The information we are going to use to illustrate
the use of our *logitFD*
package is the following:

`aemet$temp`

: matrix with 73 rows and 365 columns with the average daily temperature for the period 1980-2009 in Celsius degrees for each weather station.`aemet$logprec`

: matrix with 73 rows and 365 columns with the average logarithm of precipitation for the period 1980-2009 for each weather station. We are going to use the proper precipitation, that is,`exp(aemet$logprec)`

`aemet$wind.speed`

: matrix with 73 rows and 365 columns with the average wind speed for the period 1980-2009 for each weather station.`aemet$df[,c("ind","altitude","longitude","latitude")]`

: data frame with 73 rows and 4 columns with the identifications code of each weather station, the altitude in meters over sea level and longitude and latitude of each weather station.

The problem with daily data is that they are too wiggly so if we need
smooth curves with few basis functions, the loose of information is big.
So, in order to illustrate the use of
*logitFD* package we are
going to use mean monthly data. So for each one of the previously
defined matrices we consider mean monthly data. On the other hand,
`logprec`

is also a very wiggly data set, so we considered their
exponential. So the final data sets considered were the following:

`TempMonth`

: matrix with 73 rows and 12 columns with the mean monthly temperature of`aemet$temp`

.`PrecMonth`

: matrix with 73 rows and 12 columns with the mean monthly exponential of`aemet$logprec`

.`WindMonth`

: matrix with 73 rows and 12 columns with the mean monthly wind speed of`aemet$wind.speed`

.

We are going to consider as binary response variable that variable with values: \(1\) if a weather station is located in the north of Spain (above Madrid, the capital of Spain, and located in the geographic center of the country) and \(0\) otherwise (stations of the south). Our objective will be to model the location of weather stations (north/south) from their meteorological information. This is a really artificial problem trying to explain the climate characteristics of Spanish weather stations classified according to their geographical location. Let us observe that only latitude is enough to determine the location of a weather station in the sense we are defining. In fact, latitude allows complete separation that makes the estimation of the logit model impossible (see for example (Hosmer et al. 2013)).

The steps for reading data would be

```
library(fda.usc)
data(aemet)
<- aemet$temp$data
Temp <- exp(aemet$logprec$data)
Prec <- aemet$wind.speed$data
Wind <- aemet$df[,c("ind","altitude","longitude","latitude")]
StationsVars $North <- c(1,1,1,1,0,0,0,0,1,1,0,0,0,0,0,1,1,1,0,0,1,0,0,0,0,1,0,0,1,1,1,1,1,
StationsVars0,0,0,1,1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,0,0,1,1,1,1,1,1)
```

and the transformations to consider mean monthly data from daily data only for Temperature

```
<- matrix(0,73,12)
TempMonth for (i in 1:nrow(TempMonth)){
1] <- mean(Temp[i,1:31])
TempMonth[i,2] <- mean(Temp[i,32:59])
TempMonth[i,3] <- mean(Temp[i,60:90])
TempMonth[i,4] <- mean(Temp[i,91:120])
TempMonth[i,5] <- mean(Temp[i,121:151])
TempMonth[i,6] <- mean(Temp[i,152:181])
TempMonth[i,7] <- mean(Temp[i,182:212])
TempMonth[i,8] <- mean(Temp[i,213:243])
TempMonth[i,9] <- mean(Temp[i,244:273])
TempMonth[i,10] <- mean(Temp[i,274:304])
TempMonth[i,11] <- mean(Temp[i,305:334])
TempMonth[i,12] <- mean(Temp[i,335:365])
TempMonth[i, }
```

The rest of matrices (`PrecMonth`

and `WindMonth`

) were obtained in the
same way.

*logitFD* is an R package
for fitting functional principal component logit regression based on
ordinary and filtered functional principal components described in
previous sections. As was stated in the introduction, this package uses
*fda*’s package philosophy of
basis expansion methods and it is designed to use objects inherited from
the ones defined in *fda*
package. For this reason *fda*
package is required for
*logitFD*. The R functions
designed in our package use as input the `fd`

objects given by
*fda* package and also provide
as output `fd`

objects among others elements. In order to use our
package it is assumed that the reader manage with
*fda* package, its objects and
functions.

Let us begin with a brief explanation of the
*fda* objects required in our
proposal. *fda* package
builds, from discrete observations of curves, an `fd`

object (named
`fdobj`

) that will be used by
*logitFD* for its methods.
So, let \(X_{n\times m}=(x_i(t_k)),\; i=1,\ldots,n;\; k=1,\ldots,m\) be
the matrix of discrete observations of curves
\(x_{1}\left( t\right) ,x_{2}\left( t\right) ,\ldots ,x_{n}\left( t\right)\)
at the same time points \(t_{1},t_{2},\ldots ,t_{m}\). An `fd`

object is
an `R`

list with elements:

`coefs`

: the matrix of basis coefficients.`basis`

: an object of type`basis`

with the information needed to build the functional form of curves based on basis expansion methods explained before. Depending on the selected basis the list of objects that contains the`basis`

object can be different (see*fda*reference manual).`fdnames`

: a list containing names for the arguments, function values and variables. This argument is not necessary.

The matrix of basis coefficients
\(A_{n \times p}=(a_{ij}), \; i=1,\ldots,n;\; j=1,\ldots,p\) (`coefs`

) of
all curves are obtained by least squares as
\(A^{T}=\left( \Phi ^{T}\Phi \right) ^{-1}\Phi ^{T}X^{T}\) where
\(\Phi_{m \times p} = (\phi _{j}\left( t_{k}\right)),\; j=1,\ldots,p; \; k=1,\ldots,m\)
is the matrix of basis functions evaluated at sampling points.

The `basis`

object allows the basis expansion ((1)) of
curves. We consider for aemet data these two basis:

\(7\)-length Fourier basis for Temperature.

\(8\)-length cubic B-spline basis for Precipitation and Wind

The `R`

parameters needed to define the basis object depend on the type
of basis used (see fda R reference manual). Fourier basis only needs the
interval where basis functions are defined and the dimension of the
basis. B-spline basis needs also the degree of polynomials that define
the basis functions. The default degree is 3.

The code to create the used basis have been

```
<- create.fourier.basis(rangeval = c(1,12),nbasis=7)
FourierBasis <- create.bspline.basis(rangeval = c(1,12),nbasis=8) BsplineBasis
```

The main function of *fda*
package that provides the `fdobj`

object from discrete data in a matrix
is `Data2fd`

function (see
*fda* reference manual). Our
`fdobj`

were obtained with the code

```
<- Data2fd(argvals = c(1:12), y=t(TempMonth),basisobj = FourierBasis)
TempMonth.fd <- Data2fd(argvals = c(1:12), y=t(PrecMonth),basisobj = BsplineBasis)
PrecMonth.fd <- Data2fd(argvals = c(1:12), y=t(WindMonth),basisobj = BsplineBasis) WindMonth.fd
```

An `fdobj`

allows plotting all curves by using the `R`

`plot()`

command.
The functional data so obtained can be seen in Figure 1.