kStatistics: Unbiased Estimates of Joint Cumulant Products from the Multivariate Faà Di Bruno’s Formula

Abstract:

kStatistics is a package in R that serves as a unified framework for estimating univariate and multivariate cumulants as well as products of univariate and multivariate cumulants of a random sample, using unbiased estimators with minimum variance. The main computational machinery of kStatistics is an algorithm for computing multi-index partitions. The same algorithm underlies the general-purpose multivariate Faà di Bruno’s formula, which therefore has been included in the last release of the package. This formula gives the coefficients of formal power series compositions as well as the partial derivatives of multivariable function compositions. One of the most significant applications of this formula is the possibility to generate many well-known polynomial families as special cases. So, in the package, there are special functions for generating very popular polynomial families, such as the Bell polynomials. However, further families can be obtained, for suitable choices of the formal power series involved in the composition or when suitable symbolic strategies are employed. In both cases, we give examples on how to modify the R codes of the package to accomplish this task. Future developments are addressed at the end of the paper

Cite PDF Tweet

Published

Oct. 11, 2022

Received

Aug 23, 2021

DOI

10.32614/RJ-2022-033

Volume

Pages

14/2

208 - 228


1 Introduction

Joint cumulants are usually employed for measuring interactions among two or more random variables simultaneously, extending the familiar notion of covariance to higher orders. More in details, suppose Y a random vector with moment generating function MY(z), for z=(z1,,zm) in a suitable neighborhood of 0. Thus MY(z) can be expressed as MY(z)=exp(KY(z))

where KY(z) is the cumulant generating function of Y. IfIf iNm0 is a multi-index then we set i!=i1!im! and |i|=i1++im. iNm0 and MY(z)=1+|i|>0E[Yi]i!ziKY(z)=|i|>0ki(Y)i!zi
then {ki(Y)} are said the joint cumulants of {E[Yi]}. From a theoretical point of view, cumulants are a useful sequence due to the following properties :

Thanks to all these properties, joint cumulants have a wide range of applications: from statistical inference and time series to asymptotic theory , from spatial statistics modeling to signal processing , from non-linear systems identification to Wiener chaos , just to mention a few. Indeed it is also well known that cumulants of order greater than two are zero for random vectors which are Gaussian. Therefore, higher order cumulants are often used in testing for multivariate Gaussianity .

The i-th multivariate k-statistic is a symmetric function of the multivariate random sample whose expectation is the joint cumulant of order i of the population characters. These estimators have minimum variance when compared to all other unbiased estimators and are built by free-distribution methods without using sample moments. Due to the properties of joint cumulants, multivariate k-statistics are employed to check multivariate gaussianity or to quantify high-order interactions among data , for applications in topology inference , in neuronal science and in mathematical finance . Polykays are unbiased estimators of cumulant products and are particularly useful in estimating covariances between k-statistics . In the kStatistics package , the nPolyk function provides k-statistics and polykays as well as their multivariate generalizations. Further implementations are in Phyton , in Maple and in Mathematica .

All these estimators are described with a wealth of details by and and their construction relied on some well-known change of bases in the ring of symmetric polynomials. In a different approach is followed using suitable polynomial families and symbolic strategies. This procedure was the core of the first release (version 1.0) of the kStatistics package , as the initial goal was to implement tools for the estimation of cumulants and cumulant products, both in the univariate and in the multivariate case. As the referred polynomial families can be traced back to the generalized (complete exponential) Bell polynomials, the latest version of the package has also included procedures to generate these polynomials together with a number of special cases.

Let us recall that the generalized (complete exponential) Bell polynomials are a family of polynomials involving multivariable Sheffer sequences . Among its various applications, we recall the cumulant polynomial sequences and their connection with special families of stochastic processes . Indeed, cumulant polynomials allow us to compute moments and cumulants of multivariate Lévy processes , subordinated multivariate Lévy processes and multivariate compound Poisson processes . Further examples can be found in , , or .

The generalized (complete exponential) Bell polynomials arise from the multivariate Faà di Bruno’s formula, whose computation has been included in the latest version of the kStatistics package. In enumerative combinatorics, Faà di Bruno’s formula is employed in dealing with formal power series. In particular the multivariate Faà di Bruno’s formula gives the i-th coefficient of the composition h(z)=f(g1(z)1,,gn(z)1)

where f and gj for j=1,,n are (exponential) formal power series f(x)=|t|0ftxtt!andgj(z)=|s|0gj;szss!,
with x=(x1,,xn),z=(z1,,zm) andWe use these notations independently if the powers or the subscripts are row vectors or column vectors. xt=xt11xtnn, zs=zs11zsmm, ft=ft1,,tn, gj;s=gj;s1,,sm for j=1,,n, and f0=g1;0==gn;0=1. For instance, from (1) and (2) joint moments can be recovered from joint cumulants using the multivariate Faà di Bruno’s formula for n=1, g(z)=1+KY(z) and f(x)=exp(x). As 1+KY(z)=1+ log([MY(z)1]+1) then joint cumulants can be recovered from joint moments using the multivariate Faà di Bruno’s formula for n=1,g(z)=MY(z) and f(x)=1+log(1+x). Let us remark that the exponential form (4) of the formal power series f and {gj} is not a constraint. To work with ordinary formal power series, the multi-index sequence {ft} needs to be replaced by the sequence {t!ft} as well as the multi-index sequence {gj;s} by the sequence {s!gj;s} for j=1,,n. In this case, the multivariate Faà di Bruno’s formula gives the coefficient i!˜hi with ˜hi the i-th coefficient of the (ordinary) formal power series composition (3).

The problem of finding suitable and easily manageable expressions of the multivariate Faà di Bruno’s formula has received attention from several researchers over the years. This is because the multivariate Faà di Bruno’s formula is a very general-purpose tool with many applications. We refer to the paper of for a detailed list of references on this subject and a detailed account of its applications. Further applications can be found in , , and . A classical way to generate the multivariate Faà di Bruno’s formula involves the partial derivatives of a composition of multivariable functions. Suppose f(x) and g1(z),,gn(z) in (3) be differentiable functions a certain number of times. The multivariate Faà di Bruno’s formula gives the partial derivative of order i of h(z) in z0 hi=|i|zi11zimmh(z1,,zm)|z=z0for |i|>0,

assuming the partial derivatives of order t of f(x) exist in x0= (g1(z0),,gn(z0)) ft=|t|xt11xtnnf(x1,,xn)|x=x0for 0<|t||i|,
and the partial derivatives of order s of gj(z) exist in z0 for j=1,,n gj,s=|s|zs11zsmmgj(z1,,zm)|z=z0for 0<|s||i|.

There are various ways to express hi in (5), see for example , and . Symbolic manipulation using Macsyma, Maple, Mathematica, etc. can produce any required order of (5), by applying the chain rule recursively and using a function that provides partial derivatives. Also in R, there are some functions for computing partial derivatives . Despite its conceptual simplicity, applications of the chain rule become impractical for its cumbersome computation even for small values of its order. As the number of additive terms becomes huge, the output is often untidy and further manipulations are required to simplify the result. By using combinatorial methods, have carried out the following expression of the multivariate Faà di Bruno’s formula hi=i!1|t||i|ft|i|k=1pk(i,t)kj=1(glj)qjqj!(lj!)|qj|

where (gs)q=nj=1(gj,s)qj with q=(q1,,qn) and pk(i,t)={(q1,,qk;l1,,lk):|qj|>0,kj=1qj=t,kj=1|qj|lj=i}
with q1,,qkNn0 and l1,,lkNm0 such thatIf μ,νNm0 we have μν if |μ|<|ν| or |μ|=|ν| and μ1<ν1 or |μ|=|ν| and μ1=ν1,,μk=νk,μk+1<νk+1 for some 1k<m. 0l1lk.

A completely different approach concerns the combinatorics of partial derivatives as pointed out for the univariate-multivariate composition using multisets and collapsing partitions. Motivated by his results and using the umbral calculus, which is a symbolic method particularly useful in dealing with formal power series (4), the combinatorics behind (6) has been simplified and a different expression has been given in . The key tool is the notion of partition of a multi-index which parallels the multiset partitions given in .

The contribution of this paper is multi-sided. We explain how to recover in R a multi-index partition, which is a combinatorial device. For statistical purposes, we show how to recover k-statistics and their multivariate generalizations using the referred polynomial approach and multi-index partitions. Then, we explain the main steps of the MFB function producing the multivariate Faà di Bruno’s formula, without any reference to the umbral calculus or chain rules and whose applications go beyond statistical purposes. The main idea is to expand the multivariable polynomial (is1,,sn)q1,s1(y1)qn,sn(yn)

where q1,s1(y1)qn,sn(yn) are suitable polynomials and the sum is over all the compositions of i in n parts, that is all the n-tuples (s1,,sn) of non-negative integer m-tuples such that s1++sn=i. Readers interested in the umbral setting may refer to and references therein.

Consequently, the MFB function gives an efficient computation of the following compositions:

The kStatistics package includes additional functions, for some of the most widespread applications of the multivariate Faà di Bruno’s formula. Indeed, not only this formula permits to generate joint cumulants and their inverse relations, but also further general families of polynomials. Therefore, we have set up special procedures for those families used very often in applications. These functions should be considered an easy to manage interfaces of the MFB function, with the aim of simplifying its application. Moreover, since the R codes are free, the user might follow similar steps to generate polynomial families not included in the package but always coming from the multivariate Faà di Bruno’s formula. The construction of new families of polynomials can be done mainly in two ways. The first way is to choose appropriately the coefficients {ft} and {gj;s} in (4). The second way is to use some suitable symbolic strategies, as discussed in . For both cases, we provide examples.

The paper is organized as follows. The next section explains the main steps of the algorithm that produces multi-index partitions with particular emphasis on its combinatorics. Then we present the symbolic strategy to generate k-statistics and their generalizations using suitable polynomial sequences and multi-index partitions. The subsequent section deals with generalized (complete exponential) Bell polynomials and some special cases corresponding to well-known families of polynomials. We have also included the procedures to generate joint cumulants from joint moments and vice versa. In the last section we explain the main steps of the algorithm to produce the multivariate Faà di Bruno’s formula. We give examples of how to build new polynomials not included in the package. Some concluding remarks end the paper.

2 Partitions of a multi-index

Most routines of the kStatistics package use the partitions of a multi-index i. Therefore, before describing any of these routines, we recall the notion of multi-index partition and describe the algorithm for its construction as implemented in the mkmSet function of the package.

A partition of the multi-index i=(i1,,im)Nm0 is a matrix Λ=(λr11,λr22,) of non-negative integers with m rows and no zero columns such that

We write Λi to denote that Λ is a partition of i. Some further notations are:

Example 1: The partitions of i=(2,1) are the matrices (21),(0210),(1101),(011100)=(λ1,λ22),

with λ1=(01)andλ2=(10).

The algorithm to get all the partitions of a multi-index resorts to multiset subdivisions. Let’s start by recalling the notion of multiset. A multiset M is a “set with multiplicities”. Suppose aM. Then the multiplicity of a is the number of times a occurs in M as a member. For example, the integers 3 and 2 are the multiplicities of a and b respectively in M={a,a,a,b,b}. A subdivision of the multiset M is a multiset of sub-multisets of M, such that their disjoint union returns M. Examples of subdivisions of M={a,a,a,b,b} are S1={{a},{a,b},{a,b}},S2={{a},{a,a,b},{b}},

S3={{a},{a,a},{b},{b}}.
The subdivisions of the multiset M={a,a,a,b,b} are in one-to-one correspondence with the partitions Λ(3,2). For example, the subdivisions (7) correspond to the partitions Λ1=(λ2,λ23) and Λ2=(λ1,λ2,λ5) respectively, while (8) to Λ3=(λ21,λ2,λ4) with λ1=(01){b}λ2=(10){a}
λ3=(11){a,b}λ4=(20){a,a}λ5=(21){a,a,b}.
Multiset subdivisions can be recovered by using collapsing set partitions . If the members 1,2,3 of the set {1,2,3,4,5} are made indistinguishable from each other and called a, and 4 and 5 are made indistinguishable from each other and called b, then the set {1,2,3,4,5} has “collapsed” to the multiset M={a,a,a,b,b}. Therefore the subdivisions of M can be recovered using the same substitution in the partitions of {1,2,3,4,5}. For example, S1 in (7) can be recovered from {{1,4},{2,5},{3}} or {{3,5},{2,4},{1}} and so on. As this last example shows, a subdivision might correspond to several partitions. The number of partitions corresponding to the same subdivision can be computed using the countP function of the package. However, to find multi-index partitions using set partitions is not a particularly efficient algorithm since the computational cost is proportional to the n-th Bell number, if n is the sum of the multi-index components .

The mkmSet function is based on a different strategy which takes into account the partitions of the multi-index components. When m=1, the mkmSet function lists all the partitions λ of the integer i. Recall that a partition of an integer i is a sequence λ=(λ1,λ2,) of weakly decreasing positive integers, named parts of λ, such that λ1+λ2+=i. A different notation is λ=(1r1,2r2,), where r1,r2, are the number of parts of λ equal to 1,2, respectively. The length of the partition is l(λ)=r1+r2+. We write λi to denote that λ is a partition of i. In the following, we describe the main steps of the mkmSet function by working on an example.

Suppose we want to generate all the partitions of (3,2). First consider the partitions of (3,0) obtained from the partitions (3),(1,2),(13) of the integer 3, and the partitions of (0,2) obtained from the partitions (2),(12) of the integer 2, that is Λ1=(30),Λ2=(1200),Λ3=(111000)(30)

Λ4=(02),Λ5=(0011)(02).

The following iterated adding-appending rule is thus implemented.

1. Consider the partition Λ5 in (10).

1.1 Add the first column of Λ5 to each column of Λ1,Λ2 and Λ3 in (9) one by one with the following rules: the sum must be done only once (if the column has multiplicities greater than one) taking as reference the first column; the sum can be done only to columns whose second component is zero and without subsequent elements (in the same row) greater than or equal to the integer we are adding. Then we have Λ(1,1)1=(31)Λ(1,1)2=(1210)Λ(2,1)2=(1201)Λ(1,1)3=(111100).

1.2 Append the same column to each partition Λ1,Λ2 and Λ3 in (10), that is
Λ(1,2)1=(3001)Λ(1,2)2=(120001)Λ(1,2)3=(11100001).
1.3 Repeat steps 1.1 and 1.2 for the second column of Λ5 with respect to the partitions generated in (11) and (12):
Λ(1,1)1=(31)addrule outappend(3011)Λ(1,1)2=(1210)add(1211)append(120101)Λ(2,1)2=(1201)addrule outappend(120011)Λ(1,1)3=(111100)add(111110)append(11101001)Λ(1,2)1=(3001)addrule outappend(300011)Λ(1,2)2=(120001)addrule outappend(12000011)Λ(1,2)3=(11100001)addrule outappend(1110000011)
2. Repeat step 1 for Λ4 in (10):
Λ1=(30)add(32)append(3002)Λ2=(1200)add(1220),(1202)append(120002)Λ3=(111000)add(111200)append(11100002)
More generally, the mkmSet function lists all the partitions Λi, with the columns reordered in increasing lexicographic order, together with the number of set partitions corresponding to the same multi-index partition, that is i!/Λ!m(Λ)!. In the latest version of the kStatistics package, among the input parameters of the mkmSet function, an input flag parameter has been inserted aiming to print the multi-index partitions in a more compact form. See the following example.

Example 2: To get all the partitions of (2,1) run

> mkmSet(c(2,1),TRUE)
[( 0 1 )( 1 0 )( 1 0 ),  1 ]
[( 0 1 )( 2 0 ),  1 ]
[( 1 0 )( 1 1 ),  2 ]   
[( 2 1 ),  1 ] 

Note that the integers 1,1,2,1 correspond to the coefficients 2!1!/Λ!m(Λ)!.

Example 3: To get all the partitions of the integer 3 run

> mkmSet(c(3),TRUE)
[( 1 )( 1 )( 1 ),  1 ]
[( 1 )( 2 ),  3 ]
[( 3 ),  1 ]

The mkmSet function is called by the intPart function, specifically designed with the purpose of listing only all the partitions of a given integer in increasing order. The input flag parameter allows us to print the partitions in a more compact form.

Example 4: To get all the partitions of the integer 4 run

> intPart(4,TRUE)
[ 1 1 1 1 ]
[ 1 1 2 ]
[ 2 2 ]
[ 1 3 ]
[ 4 ]

The parts function of the partitions package lists all the partitions of a given integer, but in decreasing order. Instead the get.partitions function of the nilde package finds all the partitions of a given integer with a fixed length l(λ) . If l(λ) is equal to the given integer, the get.partitions function lists all the partitions in increasing order.

3 kStatistics

The i-th k-statistic κi is the (unique) symmetric estimator whose expectation is the i-th cumulant ki(Y) of a population character Y and whose variance is a minimum relative to all other unbiased estimators.

The nKS function generates the numerical value of the i-th k-statistic starting from a data sample. The computation relies on the following polynomials Pt(y)=tj=1yjS(t,j)(1)j1(j1)!fort=1,,i

where {S(t,j)} are the Stirling numbers of second kind, generated trough the nStirling2 function. In detail, suppose to have a sample {a1,,aN} of N numerical data and denote with pt the t-th power sum in the numerical data pt(a1,,aN)=Nj=1atj,fort1.
To carry out the numerical value of the i-th k-statistic for iN, the nKS function computes the explicit expression of the polynomial of degree i Qi(y)=λidλPλ(y)pλ
where the sum is over all the partitions λ=(1r1,2r2,)i, and dλ=i!(1!)r1r1!(2!)r2r2!Pλ(y)=[P1(y)]r1[P2(y)]r2pλ=[p1]r1[p2]r2
with {Pt(y)} and {pt} given in (13) and (14) respectively. The final step is to replace the powers yt in the explicit form of the polynomial (15) with (1)t1(t1)!/(N)t for t=1,,i.

The main steps of the nKS function are summarized in the following.

Function nKS

i) Compute the power sums pt in (14) for t=1,,i.

ii) Compute S(t,j)(1)j1(j1)! in (13) for j=1,,t and t=1,,i.

iii) Using the mkmSet function, compute all the partitions λi.

iv) For a given partition λ, expand the product Pλ(y) in (15) and compute the coefficient dλpλ of each monomial in Qi(y) using (16).

v) For t=1,,i multiply (1)t1(t1)!/(N)t with the coefficients of the monomial of degree t carried out at the previous step and do the sum over all the resulting numerical values.

vi) Repeat steps iv) and v) for all the partitions λ carried out at step iii) and do the sum over all the resulting numerical values.

Example 5: Using (15) for i=1, we have Q1(y)=P1(y)p1=yNj=1aj and plugging 1/N in place of y, the sample mean is recovered. Using (15) for i=2, we have Q2(y)=P2(y)p2+(P1(y)p1)2=yNj=1a2j+y2((Nj=1aj)2Nj=1a2j)

and plugging 1/N in place of y and 1/N(N1) in place of y2, the sample variance is recovered. Compare the values of the sample mean, computed with the nKS function and the mean function, and the sample variance, computed with the nKS function and the var function, for the following dataset:

> data<-c(16.34, 10.76, 11.84, 13.55, 15.85, 18.20, 7.51, 10.22, 12.52, 14.68, 
16.08, 19.43, 8.12, 11.20, 12.95, 14.77, 16.83, 19.80, 8.55, 11.58, 12.10, 15.02, 
16.83, 16.98, 19.92, 9.47, 11.68, 13.41, 15.35, 19.11)
> nKS(1,data)
[1] 14.02167
> mean(data)
[1] 14.02167
> nKS(2,data)
[1] 12.65007
> var(data)
[1] 12.65007

Using the nKS function, for instance, the sample skewness and the sample kurtosis can be computed. Let us recall that the sample skewness is a measure of the central tendency of a univariate sample and can be computed as κ3/κ3/22 where κ2 and κ3 are the second and the third k-statistics respectively . The sample kurtosis is a measure of the tail-heaviness of a sample distribution. The sample excess kurtosis is defined as the sample kurtosis minus 3 and can be computed as κ4/κ22 where κ2 and κ4 are the second and the fourth k-statistics respectively .

> nKS(3,data)/sqrt(nKS(2,data))^(3/2)
[1] -0.03216229
> nKS(4,data)/nKS(2,data)^2 + 3
[1] 2.114708

A similar strategy is employed to compute multivariate k-statistics (the nKM function) of a sample data matrix whose columns each represent a population character. To simplify the notation, in the following we deal with the case of a bivariate data set {(a1,1,a2,1),(a1,N,a2,N)} of N paired numerical data. Denote with p(s,t) the bivariate power sum in the paired data p(s,t)[(a1,1,a2,1),,(a1,N,a2,N)]=Nj=1as1,jat2,jfors,t1.

Suppose i=(i1,i2) with i1,i2N and set i=i1+i2. To carry out the numerical value of the i-th multivariate k-statistic, the nKM function finds the explicit expression of the polynomial Qi(y)=ΛidΛPΛ(y)pΛ
where the sum is over all the partitions Λ=(λr11,λr22,)i, and dΛ=i!Λ!m(Λ)!PΛ(y)=[P|λ1|(y)]r1[P|λ2|(y)]r2pΛ=[pλ1]r1[pλ2]r2
with {Pt(y)} and {p(s,t)} given in (13) and (17) respectively. As for the univariate k-statistics, the final step consists in replacing the powers yj in the explicit expression of the polynomial (18) with the numerical values (1)j1(j1)!/(N)j for j=1,,i.

The main steps of the nKM function are summarized in the following.

Function nKM

i) Compute the bivariate power sums p(s,t) in (17) for s=1,,i1 and t=1,,i2.

ii) For i=i1+i2, compute S(t,j)(1)j1(j1)! in (13) for j=1,,t and t=1,,i.

iii) Using the mkmSet function, compute all the partitions Λi.

iv) For a given partition Λ, expand the product PΛ(y) in (18) and compute the coefficient dΛpΛ of each monomial in Qi(y) using (19).

v) For j=1,,i, multiply (1)j1(j1)!/(N)j with the coefficient of the monomial of degree j carried out at the previous step and do the sum over all the resulting numerical values.

vi) Repeat steps iv) and v) for all the partitions Λ carried out at step iii) and do the sum over all the resulting numerical values.

Example 6: To estimate the joint cumulant c2,1 of the following dataset, run

> data1<-list(c(5.31,11.16),c(3.26,3.26),c(2.35,2.35),c(8.32,14.34),
c(13.48,49.45),c(6.25,15.05),c(7.01,7.01),c(8.52,8.52),c(0.45,0.45),
c(12.08,12.08),c(19.39,10.42))
> nKM(c(2,1),data1)
[1] -23.7379

If the first column are observations of a population character X and the second column observations of a population character Y, then c2,1 measures how far from connectedness (as opposite to independence) are X2 and Y . A similar meaning has the estimation of the joint cumulant c2,2,2 of the following dataset:

> data2<-list(c(5.31,11.16,4.23),c(3.26,3.26,4.10),c(2.35,2.35,2.27),
c(4.31,10.16,6.45),c(3.1,2.3,3.2),c(3.20, 2.31, 7.3))
> nKM(c(2,2,2),data2)
[1] 678.1045

4 Polykays

Similarly to k-statistics, polykays are symmetric unbiased estimators of cumulant products. More in detail, when evaluated on a random sample, the i-th polykay gives an estimation of the product ki1(Y)kim(Y), where i=(i1,,im)Nm0 and {kij(Y)} are cumulants of a population character Y.

To simplify the notation, in the following we show how to compute the i-th polykay of N numerical data {a1,,aN} using the nPS function for i=(i1,i2). Set i=i1+i2 and suppose iN. The computation relies on the so-called logarithmic polynomials ˜Pt(y1,,yi)=λtyλdλ(1)l(λ)1(l(λ)1)!

for t=1,,i where the sum is over all the partitions λ=(1r1,2r2,)t, dλ is given in (16) and yλ=yr11yr22. To compute the polykay of order (i1,i2), the nPS function finds the explicit expression of the polynomial Ai(y1,,yi)=λidλ˜Pλ(y1,,yi)pλ
where the sum is over all the partitions λ=(1r1,2r2,)i, dλ and pλ are given in (16) and ˜Pλ(y1,,yi)=[˜P1(y1,,yi)]r1[˜P2(y1,,yi)]r2
with {˜Pt(y1,,yi)} given in (20). Note that the monomials in Ai(y1,,yi) are of type yλ=yr11yr22 with λ=(1r1,2r2,)i. The final step is to plug suitable numerical values in place of yλ depending on how the partition λ is constructed. Indeed, set ˜q(i1,i2)={λ+λi|λ=(1s1,2s2,)i1,λ=(1t1,2t2,)i2}
where λ+λ=(1r1,2r2,) with rj=sj+tj for j=1,2,. Then yλ is replaced by 0 if λ˜q(i1,i2) otherwise by (1)l(λ)1(l(λ)1)!(1)l(λ)1(l(λ)1)!(N)l(λ)+l(λ)dλdλdλ+λ.
Note that dλ and dλ in (23) are recovered from (16).

The main steps of the nPS function are summarized in the following.

Function nPS

i) Set i=i1+i2 and compute the power sums pt in (14) for t=1,,i.

ii) Generate the polynomials ˜Pt(y1,,yi) in (20) for t=1,,i.

iii) Using the mkmSet function, compute all the partitions λi.

iv) For a given partition λ, expand the product ˜Pλ(y1,,yi) in (21); then plug (23) or 0 in each monomial yλ, depending if λ is or not in the set ˜q(i1,i2) given in (22).

v) Multiply the numerical value of ˜Pλ carried out at step iv) with dλpλ given in (16).

vi) Repeat steps iv) and v) for all the partitions λ carried out at step iii) and do the sum over all the resulting numerical values.

Example 7: Suppose we need to estimate the square of the variance σ2 of the population character Y from which the data of Example 5 have been sampled. We have

> nKS(2,data)^2
[1] 160.0243
> var(data)^2
[1] 160.0243

but k22 is not an unbiased estimator of the square of σ2. An unbiased estimator of such a square is the polykay of order (2,2), that is

> nPS(c(2,2),data)
[1] 154.1177

Multivariate polykays are unbiased estimators of products of multivariate cumulants and the nPM function returns a numerical value for these estimators when evaluated on a random sample. As before, to show how the nPM function works, we consider a bivariate sample of N numerical data, that is {(a1,1,a2,1),(a1,N,a2,N)}. If we choose i=(i1,i2) and j=(j1,j2) with i1+i2+j1+j2N as input of the nPM function, the output is a numerical value which represents an estimated value of the product ki(X,Y)kj(X,Y), where ki(X,Y) and kj(X,Y) are cumulants of the population characters (X,Y). The computation relies on suitable polynomials in the indeterminates {y(s,t)} for s=0,,w1,t=0,,w2, with s+t>0 and w1=i1+j1,w2=i2+j2. These polynomials are a multivariable generalization of (20), that is ˜Pk({y(s,t)})=ΛkyΛdΛ(1)l(Λ)1(l(Λ)1)!

for 0<kw=(w1,w2), where the sum is over all the partitions Λ=(λr11,λr22,)k and yΛ=yr1λ1yr2λ2. To compute the multivariate polykay of order (i,j), the nPM function finds the explicit expression of the polynomial Aw({y(s,t)})=ΛwdΛ˜PΛ({y(s,t)})pΛ
where the sum is over all the partitions Λ=(λr11,λr22,)w, dΛ and pΛ are given in (19), and ˜PΛ({y(s,t)})=[˜Pλ1({y(s,t)})]r1[˜Pλ2({y(s,t)})]r2
with {˜Pλ({y(s,t)})} given in (24). The monomials in Aw({y(s,t)}) are of type yΛ with Λw. The final step is to plug suitable numerical values in place of yΛ depending on how the partition Λ is constructed. Indeed, set ˜q(w)={Λ+Λw|Λ=(λs11,λs22,)i,Λ=(λt11,λt22,)j},
where Λ+Λ=(˜λr11,˜λr22,) is built with the columns of Λ and Λ rearranged in increasing lexicographic order and such that rj=sj if ˜λj=λj or rj=tj if ˜λj=λj or rj=sj+tj if ˜λj=λj=λj. Therefore in the explicit expression of (25), yΛ is replaced by 0 if Λ˜q(w) otherwise by (1)l(Λ)1(l(Λ)1)!(1)l(Λ)1(l(Λ)1)!(N)l(Λ)+l(Λ)dΛdΛdΛ+Λ.
Note that dΛ and dΛ in (27) are recovered from (19).

The main steps of the nPM function are summarized in the following.

Function nPM

i) Set w1=i1+j1 and w2=i2+j2; compute the power sums p(s,t) in (17) for s=1,,w1 and t=1,,w2.

ii) Generate the polynomials ˜Pk({y(s,t)}) in (24) for 0<kw=(w1,w2).

iii) Using the mkmSet function, compute all the partitions Λw.

iv) For a given partition Λ, expand the product ˜PΛ({y(s,t)}) in (25) and plug (27) or 0 in each obtained monomial of type yΛ depending if Λ is or not in ˜q(w) given in (26).

v) Multiply the numerical value of ˜PΛ obtained at step iv) with dΛpΛ given in (19).

vi) Repeat steps iv) and v) for all the partitions Λ carried out at step iii) and do the sum over all the resulting numerical values.

Example 8: For the same dataset employed in Example 6, to estimate k(2,1)(X,Y)k(1,0)(X,Y) run

> nPM(list(c(2,1),c(1,0)),data1)
[1] 48.43243

Remark 1: The master nPolyk function runs one of the nKS, nKM, nPS and nPM functions depending if we ask for simple k-statistics, multivariate k-statistics, simple polykays or multivariate polykays.

5 Bell polynomials and generalizations

The algorithms to produce k-statistics and polykays rely on handling suitable polynomial families which are special cases of generalizations of Bell polynomials, as introduced in this section. Moreover, there are further families of polynomials widely used in applications which are special cases of these polynomials. For the most popular ones, we have implemented special functions in the kStatistics package. The list is not exhaustive, see for instance . Furthermore additional families of polynomials might be recovered using the multivariate Faà di Bruno’s formula. We will give some examples in the next section.

The i-th generalized (complete exponential) Bell polynomial in the indeterminates y1,,yn is hi(y1,,yn)=i!Λs1,,˜Λsns1++sn=iyl(Λ)1yl(˜Λ)ng1,Λgn,˜ΛΛ!˜Λ!m(Λ)!m(˜Λ)!

where the sum is over all the partitions Λs1,,˜Λsn with s1,,sn m-tuples of non-negative integers such that s1++sn=i and g1,Λ=gr11;λ1gr21;λ2forΛ=(λr11,λr22,)gn,˜Λ=gt1n;˜λ1gt2n;˜λ2for˜Λ=(˜λt11,˜λt22,)
with {g1;λ},,{gn;˜λ} multi-indexed sequences. These polynomials are the output of the GCBellPol function.

Example 9: To get h(1,1)(y1,y2) run

> GCBellPol(c(1,1),2)
[1] (y1)(y2)g1[0,1]g2[1,0] + (y1)(y2)g1[1,0]g2[0,1] + (y1^2)g1[0,1]g1[1,0] +
(y1)g1[1,1] + (y2^2)g2[0,1]g2[1,0] + (y2)g2[1,1]

The e_GCBellPol function evaluates hi(y1,,yn) when its indeterminates y1,,yn and/or its coefficients are substituted with numerical values.

Example 10: To plug the values from 1 to 6 respectively into the coefficients g1[ , ] and g2[ , ] of the polynomial h(1,1)(y1,y2) given in Example 9 run

> e_GCBellPol(c(1,1), 2, "g1[0,1]=1, g1[1,0]=2, g1[1,1]=3, g2[0,1]=4, g2[1,0]=5, 
g2[1,1]=6")
[1] 13(y1)(y2) + 2(y1^2) + 3(y1) + 20(y2^2) + 6(y2)

To evaluate h(1,1)(1,5) run

> e_GCBellPol(c(1,1), 2, "y1=1,  y2=5, g1[0,1]=1, g1[1,0]=2, g1[1,1]=3, g2[0,1]=4, 
g2[1,0]=5, g2[1,1]=6")
[1] 600

When the multi-indexed sequences {g1;λ},,{gn;˜λ} are all equal, the number of distinct addends in (28) might reduce and the corresponding generalized Bell polynomial is denoted by ˜hi(y1,,yn). To deal with this special case, we have inserted an input flag parameter in the e_GCBellPol function.

Example 11: To compare ˜h(1,1)(y1,y2) with h(1,1)(y1,y2) given in Example 9 run

> GCBellPol(c(1,1),2,TRUE)
[1] 2(y1)(y2)g[0,1]g[1,0] + (y1^2)g[0,1]g[1,0] + (y1)g[1,1] + (y2^2)g[0,1]g[1,0] +
(y2)g[1,1]

Set n=1 in (28). Then hi(y1,,yn) reduces to the univariate polynomial
hi(y)=Λiyl(Λ)dΛgΛ

where the sum is over all the partitions Λ=(λr11,λr22,)i, dΛ is given in (19) and gΛ=gr1λ1gr2λ2.

Example 12: To get h(1,1)(y) run

> GCBellPol(c(1,1),1)
[1] (y^2)g[0,1]g[1,0] + (y)g[1,1]

Remark 2: For all iNm0, we have hi(y1++yn)=˜hi(y1,,yn), where ˜hi(y1,,yn) is the i-th generalized Bell polynomial (28) corresponding to all equal multi-indexed sequences {g1,λ},,{gn,˜λ} . Therefore the e_GCBellPol function, with the input flag TRUE, produces also an explicit expression of hi(y1++yn).

The algorithm to generate joint moments in terms of joint cumulants and vice versa follows the same pattern designed to generate {hi(y)}. Indeed if {ki(Y)} and {mi(Y)} denote the sequences of joint cumulants and joint moments of a random vector Y respectively, then mi(Y)=ΛidΛkΛ(Y)andki(Y)=Λi(1)l(Λ)1(l(Λ)1)!dΛmΛ(Y),

where the sum is over all the partitions Λ=(λr11,λr22,)i, dΛ is given in (19) and mΛ(Y)=[mλ1(Y)]r1[mλ2(Y)]r2kΛ(Y)=[kλ1(Y)]r1[kλ2(Y)]r2.
In particular

When the multi-index i reduces to an integer i, formulae (31) are the classical expressions of univariate moments in terms of univariate cumulants and vice versa. The mom2cum and cum2mom functions do the same when the input is an integer.

Example 13: To get m(3,1) in terms of k(i,j) run

> mom2cum(c(3,1))
[1] k[0,1]k[1,0]^3 + 3k[0,1]k[1,0]k[2,0] + k[0,1]k[3,0] + 3k[1,0]^2k[1,1] +
3k[1,0]k[2,1] + 3k[1,1]k[2,0] + k[3,1]

To get k(3,1) in terms of m(i,j) run

> cum2mom(c(3,1))
[1]  - 6m[0,1]m[1,0]^3 + 6m[0,1]m[1,0]m[2,0] - m[0,1]m[3,0] + 
6m[1,0]^2m[1,1] - 3m[1,0]m[2,1] - 3m[1,1]m[2,0] + m[3,1]

Remark 3: There are different functions in R performing similar computations for cumulants and moments: for instance see for the multivariate case. A different strategy would rely on the recursive relations between cumulants and moments .

Similarly to (31), some of the polynomials employed in the previous sections are generated using the same pattern developed to find the explicit expression of hi(y) in (30):

Note that when the multi-index i in (30) reduces to a positive integer i, then the polynomial hi(y) becomes hi(y)=λidλyl(λ)gλ

where the sum is over all the partitions λ=(1r1,2r2,)i, dλ is given in (16) and gλ=gr11gr22 with {gj} a suitable sequence.

Example 14: To get h3(y) run

> GCBellPol(c(3),1)
[1] (y^3)g[1]^3 + 3(y^2)g[1]g[2] + (y)g[3]

With a combinatorial structure very similar to (32), the i-th general partition polynomial has the following expression in the indeterminates y1,,yi Gi(a1,,ai;y1,,yi)=λidλal(λ)yλ

where the sum is over all the partitions λ=(1r1,2r2,)i, dλ is given in (16), {aj} is a suitable numerical sequence and yλ=yr11yr22. It’s a straightforward exercise to prove that Gi(a1,,ai;y1,,yi)=ij=1ajBi,j(y1,,yij+1),
where {Bi,j} are the (partial) exponential Bell polynomials Bi,j(y1,,yij+1)=ˉp(i,j)dλyλ
where ˉp(i,j)={λ=(1r1,2r2,)i|l(λ)=j}, dλ is given in (16) and yλ=yr11yr22. The polynomials in (33) are widely used in applications such as combinatorics, probability theory and statistics . As particular cases, they include the exponential polynomials and their inverses, the logarithmic polynomials (20), the potential polynomials and many others . The general partition polynomials are the output of the gpPart function.

Example 15: To get G4(a1,a2,a3,a4;y1,y2,y3,y4) run

> gpPart(4)
[1] a4(y1^4) + 6a3(y1^2)(y2) + 3a2(y2^2) + 4a2(y1)(y3) + a1(y4)

When a1==ai=1, the i-th general partition polynomial in (34) reduces to the complete (exponential) Bell polynomial Gi(1,,1;y1,,yi)=ij=1Bi,j(y1,,yij+1)

where {Bi,j} are the (partial) exponential Bell polynomials (35). For instance, the polynomial Qi(y) in (15) is generated using the same pattern developed to generate (36) with Pj(y)pj in place of yj.

The eBellPol function returns the complete (exponential) Bell polynomials (36). The same function also produces the (partial) exponential Bell polynomial Bi,j(y1,,yij+1) using (33) with ak=δk,j (the Kronecker delta) for k=1,,i. gives a rather extensive survey of applications of these homogeneous polynomials.

Example 16: To get B5,3(y1,y2,y3) run

> eBellPol(5,3)
[1] 15(y1)(y2^2) + 10(y1^2)(y3)

To get G4(1,1,1,1;y1,y2,y3,y4) run

> eBellPol(4)
[1] (y1^4) + 6(y1^2)(y2) + 3(y2^2) + 4(y1)(y3) + (y4)

The oBellPol function returns the partial (ordinary) Bell polynomials ˆBi,j(y1,,yij+1)=j!i!Bi,j(1!y1,2!y2,,(ij+1)!yij+1)

and the complete (ordinary) Bell polynomials ˆGi(y1,,yi)=Gi(1,,1;1!y1,2!y2,,i!yi).

Example 17: To get ˆB5,3(y1,y2,y3) run

> oBellPol(5,3)
[1] 1/120( 360(y1)(y2^2) + 360(y1^2)(y3) )

To get ˆG3(y1,y2,y3,y4) run

> oBellPol(4)
[1] 1/24( 24(y1^4) + 72(y1^2)(y2) + 24(y2^2) + 48(y1)(y3) + 24(y4) )

The e_eBellPol function evaluates the exponential Bell polynomials when the indeterminates are substituted with numerical values. In Table 1 some special sequence of numbers are given obtained using this procedure.

Table 1: Numerical sequences (second column) obtained evaluating the exponential Bell polynomials (last column) when suitable numerical values replace indeterminates.  SequenceBell polynomialsLah numbersi!j!(i1j1)Bi,j(1!,2!,3!,)Stirling numbers of first kinds(i,j)Bi,j(0!,1!,2!,)unsigned Stirling numbers of first kind|s(i,j)|Bi,j(0!,1!,2!,)Stirling numbers of second kindS(i,j)Bi,j(1,1,1,)idempotent numbers(ij)jijBi,j(1,2,3,)Bell numbersBiij=0Bi,j(1,1,1,)

By default, the e_eBellPol function returns the Stirling numbers of second kind, as the following example shows.

Example 18: To get S(5,3) run

> e_eBellPol(5,3)
[1] 25
> e_eBellPol(5,3,c(1,1,1,1,1))
[1] 25

To get the 5-th Bell number B5 run

> e_eBellPol(5)
[1] 52

To get s(5,3) run

> e_eBellPol(5,3, c(1,-1,2,-6,24))
[1] 35

6 Composition of formal power series

In (3), suppose ft the t-th coefficient of f(x) and g1;s1,,gn;sn the s1-th,…,sn-th coefficients of g1(z),,gn(z) respectively. Using multi-index partitions, the multivariate Faà di Bruno’s formula (6) can be written as hi=i!Λs1,,˜Λsns1++sn=if(l(Λ),,l(˜Λ))g1,Λgn,˜ΛΛ!˜Λ!m(Λ)!m(˜Λ)!

where g1,Λ,,gn,˜Λ are given in (29) and the sum is over all the partitions Λs1,,˜Λsn, with s1,,sn m-tuples of non-negative integers such that s1++sn=i.

The MFB function generates all the summands of (37). Its first step is to find the set ˜p(n,i) of all the compositions of i in n parts, that is all the n-tuples (s1,,sn) of non-negative integer m-tuples such that s1++sn=i. This task is performed by the mkT function.

Function mkT

i) Find all the partitions Λi, using the mkmSet function.

ii) Select the first partition Λ. If l(Λ)=n, then the columns of Λ are the m-tuples (s1,,sn) such that s1++sn=i. If l(Λ)<n, add nl(Λ) zero columns to Λ.

iii) Generate all the permutations of the columns of Λ as collected at step ii).

iv) Repeat steps ii) and iii) for each partition Λ carried out at step i).

In the mkT function an input flag variable permits to obtain the output in a more compact set up. See the following example.

Example 19: Suppose we are looking for the elements of the set ˜p(2,(2,1)), that is the pairs (s1,s2) such that s1+s2=(2,1). Then run

> mkT(c(2,1),2,TRUE)
[( 0 1 )( 2 0 )]
[( 2 0 )( 0 1 )]
[( 1 0 )( 1 1 )]
[( 1 1 )( 1 0 )]
[( 2 1 )( 0 0 )]
[( 0 0 )( 2 1 )]

Consider the partitions of (2,1) as given in Example 2. Note that [( 2 1 )( 0 0 )] and [( 0 0 )( 2 1 )] are obtained adding a zero column to the partition [( 2 1 ), 1 ], and then permuting the two columns. No zero columns are added to [( 2 0 )( 0 1 )] as the length of the partition is 2. The same is true for [( 0 1 )( 2 0 )] or [( 1 1 )( 1 0 )] which are only permuted.

The MFB function produces the multivariate Faà di Bruno’s formula (37) making use of the following steps.

Function MFB

i) Find all the m-tuples (s1,,sn) in ˜p(n,i) using the mkT function.

ii) Let y1,,yn be indeterminates. For each j=1,,n, compute all the partitions Λsj using the mkmSet function and find the explicit expression of the polynomial qj,sj(yj)=sj!Λsjyl(Λ)jgj,ΛΛ!m(Λ)!.

iii) Make the products q1,s1(y1)qn,sn(yn) in the multivariable polynomial hi(y1,,yn)=(s1,,sn)˜p(n,i)(is1,,sn)q1,s1(y1)qn,sn(yn)

and compute its explicit expression.

iv) In the explicit expression of the polynomial hi(y1,,yn) as carried out at the previous step iii), replace the occurrences of the products yl(Λ)1yl(˜Λ)n with f(l(Λ),,l(˜Λ)).

Step iii) is performed by the joint function. This function is not directly accessible in the package, as defined locally in the MFB function. The joint function realizes a recursive pair matching: each coefficient g1,Λ of q1,s1(y1) is matched with each coefficient g2,˜Λ of q2,s2(y2), then each paired coefficient g1,Λg2,˜Λ is matched with each coefficient g3,Λ of q3,s3(y3) and so on. Step iv) consists of multiplying each coefficient found at step iii) with ft, where t is the multi-index whose j-th component gives how many times gj, appears in this coefficient. See the following example.

Example 20: Suppose n=m=2 and i=(1,1). To get h(1,1) in (37) run

> MFB(c(1,1),2)
[1] f[1,1]g1[0,1]g2[1,0] + f[1,1]g1[1,0]g2[0,1] + f[2,0]g1[0,1]g1[1,0] + 
f[1,0]g1[1,1] + f[0,2]g2[0,1]g2[1,0] + f[0,1]g2[1,1]

Taking into account (4), in the previous output f[i,j] corresponds to f(i,j) as well as g1[i,j] and g2[i,j] correspond to g1;(i,j) and g2;(i,j) respectively for i,j=0,1,2. Note that g1[1,1] is multiplied with f[1,0] as there is one occurrence of g1 and no occurrence of g2. In the same way, g1[1,0]g1[0,1] is multiplied with f[2,0] as there are two occurrences of g1 and no occurrence of g2 and g1[1,0]g2[0,1] is multiplied with f[1,1] as there is one occurrence of g1 and one occurrence of g2 and so on. Compare the previous output with the one obtained in Maple running

diff(f(g1(x1,x2),g2(x1,x2),x1,x2): D2,2(f)(g1(x1,x2),g2(x1,x2))(x1g2(x1,x2))(x2g2(x1,x2))+D1,2(f)(g1(x1,x2),g2(x1,x2))(x2g1(x1,x2))(x1g2(x1,x2))+D1,2(f)(g1(x1,x2),g2(x1,x2))(x1g1(x1,x2)))(x2g2(x1,x2))+D1,1(f)(g1(x1,x2),g2(x1,x2))(x1g1(x1,x2)))(x2g1(x1,x2)))+D2(f)(g1(x1,x2),g2(x1,x2))(2x2x1g2(x1,x2))+D1(f)(g1(x1,x2),g2(x1,x2))(2x2x1g1(x1,x2))

where D1(f) denotes f(x1,x2)/x1,D2(f) denotes f(x1,x2)/x2 and similarly D1,1(f)2f(x1,x2)x21,D2,2(f)2f(x1,x2)x22,D1,2(f)2f(x1,x2)x1x2.

The eMFB function evaluates the multivariate Faà di Bruno’s formula (37) when the coefficients of the formal power series f and g1,,gn in (4) are substituted with numerical values.

Example 21: To evaluate the output of Example 20 for some numerical values of the coefficients, run

> cfVal<-"f[0,1]=2, f[0,2]=5, f[1,0]=13, f[1,1]=-4, f[2,0]=0"
> cgVal<-"g1[0,1]=-2.1, g1[1,0]=2,g1[1,1]=3.1,g2[0,1]=5,g2[1,0]=0,g2[1,1]=6.1"
> cVal<-paste0(cfVal,",",cgVal)
> e_MFB(c(1,1),2,cVal)
[1] 12.5

The polynomial families discussed in the previous sections are generated using the MFB function. Indeed, the generalized (complete exponential) Bell polynomials in (28) are coefficients of the following formal power series H(y1,,yn;z)=1+|i|>0hi(y1,,yn)zii!=exp[ni=1yi(gi(z)1)],

which turns to be a composition (3), with f(x1,,xn)= exp(x1y1++xnyn) and ft=yt11ytnn for tNn0. In this case, y1,,yn play the role of indeterminates. The i-th coefficient hi(y1,,yn) - output of the GCBellPol function - is obtained from the multivariate Faà di Bruno’s formula (37) dealing with y1,,yn as they were constants. When {g1(z),,gn(z)} are the same formal power series g(z), the formal power series H(y1,,yn;z) in (38) reduces to H(y1,,yn;z)=exp[(y1++yn)(g(z)1)]
with coefficients ˜hi(y1,,yn) as given in the previous section.

If n=1 then H(y1,,yn;z) reduces to the composition exp[y(g(z)1)] whose coefficients are the polynomials given in (30). More in general the coefficients of f(g(z)1) are hi=ΛidΛfl(Λ)gΛ

where the sum is over all the partitions Λ=(λr11,λr22,)i, dΛ is given in (19) and gΛ=gr1λ1gr2λ2. If also m=1, then hi in (40) reduces to hi=λidλfl(λ)gλ
where the sum is over all the partitions λ=(1r1,2r2,)i, dλ is given in (16) and gλ=gr11gr22. Formula (41) corresponds to the univariate Faà di Bruno’s formula and gives the i-th coefficient of f(g(z)1) with f(x)=1+j1fjxjj!andg(z)=1+s1gszss!.
Example 22: To get h5 in (41) run

> MFB(c(5), 1)
[1] f[5]g[1]^5 + 10f[4]g[1]^3g[2] + 15f[3]g[1]g[2]^2 + 10f[3]g[1]^2g[3] + 
10f[2]g[2]g[3] + 5f[2]g[1]g[4] + f[1]g[5]

For instance, the i-th general partition polynomial in (33) is generated using the MFB function: in such a case the univariate Faà di Bruno’s formula (41) is generated with {ys} in place of {gs} and {aj} in place of {fj}.

Examples of how to generate polynomials not included in the kStatistics package

In the following we give some suggestions on how to use the R codes of the kStatistics package to generate additional polynomial families.

The pPart function is an example of how to use the univariate Faà di Bruno’s formula and a symbolic strategy different from those presented so far. Indeed the pPart function generates the so-called partition polynomial Fi(y) of degree i, whose coefficients are the number of partitions of i with j parts for j=1,,i . The partition polynomial Fi(y) is obtained from the univariate Faà di Bruno’s formula (41) setting fj=1/i!andgrss=(s!)rsrs!yrs

for s=1,,ij+1,j=1,,i and rs=1,,i. Note the symbolic substitution of grss with the powers yrs.

Example 23: To get F5(y) run

> pPart(5)
[1] y^5 + y^4 + 2y^3 + 2y^2 + y

Note that F5(y) is obtained from the output of Example 22 using (42).

Example 24: The following code shows how to evaluate F11(y) in y=7.

> s<-pPart(11)         # generate the partition polynomial of degree 11
> s<-paste0("1",s)     # add the coefficient to the first term  
> s<-gsub(" y","1y",s) # replace the variable y without coefficient  
> s<-gsub("y", "*7",s) # assign y = 7
> eval(parse(text=s))  # evaluation of the expression
[1] 3.476775e+182

We give a further example on how to generate a polynomial family not introduced so far but still coming from (41) for suitable choices of {fj} and {gs}. Consider the elementary symmetric polynomials in the indeterminates y1,,yn ei(y1,,yn)={1j1<<jinyj1yji,1in,0,i>n.

A well-known result states that these polynomials can be expressed in terms of the power sum symmetric polynomials (14) in the same indeterminates y1,,yn, using the general partition polynomials (34), that is ei=(1)ii!Gi(1,,1;p1,1!p2,2!p3,,(i1)!pi)
for i=1,,n. The following e2p function expresses the i-th elementary symmetric polynomial ei in terms of the power sum symmetric polynomials p1,,pi, using (44) and the MFB function.

> e2p <-  function(n=0){
+     v<-MFB(n,1);                      # Call the MFB Function
+     v<-MFB2Set( v );                  # Expression to vector
+     for (j in 1:length(v)) {
+         # ----- read -----------[ fix block ]-----------------------#
+         c <- as.character(v[[j]][2]);    # coefficient
+         x <- v[[j]][3];                  # variable
+         i <- v[[j]][4];                  # subscript
+         k <- strtoi(v[[j]][5]);          # power
+         # ----- change --------------------------------------------#
+         if (x=="f") {
+             c<-paste0(c,"*( (-1)^",n,")");
+             x<-"";
+             i<-"";
+         }
+         else if (x=="g") {
+             c<-paste0(c,"*((-factorial(",strtoi(i)-1,"))^",k,")");
+             x<-paste0("(p",i,ifelse(k>1,paste0("^",k),""),")");
+             i<-"";k<-1;
+         }
+         # ----- write ---------[ fix block ]-----------------------#
+         v[[j]][2] <- c;
+         v[[j]][3] <- x;
+         v[[j]][4] <- i;
+         v[[j]][5] <- k;
+         # ---------------------------------------------------------#
+     }
+     noquote(paste0("1/",factorial(n),"( ",Set2expr(v), " )"));
+ }

This function starts by initializing the vector v with (41) by means of the MFB function. There is a first code snippet [fix block] for extracting a set with the coefficients, variables, indexes and powers of v by means of the MFB2Set function. This first code snippet should not be changed whatever polynomial family we are generating. The second code snippet change includes instructions that can be changed according to the expressions of the coefficients {fj} and {gs} in (41). To get (44), we set fj=1 and gs=(s1)!ps. Once these coefficients have been changed, the last code snippet [fix block] updates the vector v. The Set2expr function assembles the final expression.

Example 25: To get e4 in (44) run

> e2p(4)
[1] 1/24( (p1^4) - 6(p1^2)(p2) + 3(p2^2) + 8(p1)(p3) - 6(p4) )

7 Concluding remarks

We have developed the kStatistics package with the aim to generate univariate and multivariate k-statistics/polykays, togheter with the multivariate Faà di Bruno’s formula and various user-friendly functions related to this formula. The paper briefly introduces the combinatorial tools involved in the package and presents, in detail, the core function of the package which generates multi-index partitions. We emphasize that the algorithms presented here have been designed with the aid of the umbral calculus, even if we did not mentioned this method in the paper.

One of the main applications we have dealt with is the generation and evaluation of various families of polynomials: from generalized complete Bell polynomials to general partition polynomials, from partial Bell polynomials to complete Bell polynomials. Numerical sequences obtained from the Bell polynomials can also be generated.

All these utilities intend to make the kStatistics package a useful tool not only for statisticians but also for users who need to work with families of polynomials usually available in symbolic software or tables. Indeed, we have provided examples on how to generate polynomial families not included in the package but which can still be recovered using the Faà di Bruno’s formula and suitable strategies, both numerical and symbolic. Following this approach, also the estimations of joint cumulants or products of joint cumulants is one further example of symbolic strategy coming from the multivariate Faà di Bruno’s formula.

Future works consist in expanding the kStatistics package by including extensions of the multivariate Faà di Bruno’s formula, as addressed in and references therein, aiming to manage nested compositions, as the BellY function in the Wolfram Language and System does. Moreover, further procedures can be inserted relied on symbolic strategies not apparently related to the multivariate Faà di Bruno’s formula but referable to this formula, as for example the central Bell polynomials .

The results in this paper were obtained using the kStatistics 2.1.1 package. The package is currently available with a general public license (GPL) from the Comprehensive R Archive Network.

8 Acknowledgements

The authors would like to thank the reviewers for their constructive feedback.

Supplementary materials

Supplementary materials are available in addition to this article. It can be downloaded at RJ-2022-033.zip

CRAN packages used

kStatistics, partitions, nilde

CRAN Task Views implied by cited packages

NumericalMathematics, Optimization

Footnotes

  1. If iNm0 is a multi-index then we set i!=i1!im! and |i|=i1++im.[↩]
  2. We use these notations independently if the powers or the subscripts are row vectors or column vectors.[↩]
  3. If μ,νNm0 we have μν if |μ|<|ν| or |μ|=|ν| and μ1<ν1 or |μ|=|ν| and μ1=ν1,,μk=νk,μk+1<νk+1 for some 1k<m.[↩]
  4. As example (a1,b1)<(a2,b2) if a1<a2 or a1=a2 and b1<b2.[↩]

References

N. P. Arnqvist, V. Voinov, R. Makarov and Y. Voinov. Nilde: Nonnegative integer solutions of linear diophantine equations with applications. 2021. URL https://CRAN.R-project.org/package=nilde. R package version 1.1-6.
A. Bernardini, P. Natalini and P. E. Ricci. Multidimensional Bell polynomials of higher order. Comput. Math. Appl., 50(10-12): 1697–1708, 2005. URL https://doi.org/10.1016/j.camwa.2005.05.008.
R. P. Boyer and W. M. Y. Goh. Partition polynomials: Asymptotics and zeros. In Tapas in experimental mathematics, pages. 99–111 2008. Amer. Math. Soc., Providence, RI. URL https://doi.org/10.1090/conm/457/08904.
J. W. Brown. On multivariable Sheffer sequences. J. Math. Anal. Appl., 69(2): 398–410, 1979. URL https://doi.org/10.1016/0022-247X(79)90151-3.
J. E. Chacón and T. Duong. Efficient recursive algorithms for functionals based on higher order derivatives of the multivariate Gaussian density. Statistics and Computing, 25(5): 959–974, 2015. URL https://doi.org/10.1007/s11222-014-9465-1.
C. A. Charalambides. Enumerative combinatorics. Chapman & Hall/CRC, Boca Raton, FL, 2002.
A. Clausen and S. Sokol. Deriv: R-based symbolic differentiation. 2020. URL https://CRAN.R-project.org/package=Deriv. Deriv package version 4.1.
G. M. Constantine and T. H. Savits. A multivariate Faà di Bruno formula with applications. Transactions of the American Mathematical Society, 348(2): 503–520, 1996. URL https://doi.org/10.1090/S0002-9947-96-01501-2.
De Leeuw, J. Multivariate Cumulates in R. 2012. URL https://escholarship.org/uc/item/1fw1h53c.
E. Di Nardo. On multivariable cumulant polynomial sequences with applications. Journal of Algebraic Statistics, 7(1): 72–89, 2016a. URL https://doi.org/10.18409/jas.v7i1.49.
E. Di Nardo. On photon statistics parametrized by a non-central Wishart random matrix. Journal of Statistical Planning and Inference, 169: 1–12, 2016b. URL https://doi.org/10.1016/j.jspi.2015.07.002.
E. Di Nardo. Symbolic calculus in mathematical statistics: A review. Séminaire Lotharingien de Combinatoire, 67: Art. B67a, 72, 2011. URL https://www.mat.univie.ac.at/~slc/wpapers/s67dinardo.pdf.
E. Di Nardo and G. Guarino. kStatistics: Unbiased estimators for cumulant products and faa di bruno’s formula. 2021. URL https://CRAN.R-project.org/package=kStatistics. R package version 2.1.
E. Di Nardo and G. Guarino. Unbiased estimators for cumulant products. 2019. URL https://cran.r-project.org/web/packages/kStatistics/index.html. R package version 1.0.
E. Di Nardo, G. Guarino and D. Senato. A new algorithm for computing the multivariate Faà di Bruno’s formula. 217(13): 6286–6295, 2011. URL https://doi.org/10.1016/j.amc.2011.01.001.
E. Di Nardo, M. Marena and P. Semeraro. On non-linear dependence of multivariate subordinated Lévy processes. Statistics & Probability Letters, 166: 108870–108877, 2020. URL https://doi.org/10.1016/j.spl.2020.108870.
E. Di Nardo and I. Oliva. On a symbolic version of multivariate lévy processes. American Institute of Physics Conference Proceedings, 1389(1): 345–348, 2011. URL https://doi.org/10.1063/1.3636735.
R. Dimitrakopoulos, H. Mustapha and E. Gloaguen. High-order statistics of spatial random fields: Exploring spatial cumulants for modeling complex non-Gaussian and non-linear phenomena. Mathematical Geosciences, 42(1): 65–99, 2010. URL https://doi.org/10.1007/s11004-009-9258-9.
K. Domino, P. Gawron and Ł. Pawela. Efficient computation of higher-order cumulant tensors. SIAM J. Sci. Comput., 40(3): A1590–A1610, 2018. URL https://doi.org/10.1137/17M1149365.
P. G. Ferreira, J. Magueijo and J. Silk. Cumulants as non-gaussian qualifiers. Physical Review D, 56(8): 4592, 1997. URL https://doi.org/10.1103/PhysRevD.56.4592.
M. Geng, H. Liang and J. Wang. Research on methods of higher-order statistics for phase difference detection and frequency estimation. In 2011 4th international congress on image and signal processing, pages. 2189–2193 2011. URL https://doi.org/10.1109/CISP.2011.6100593.
G. B. Giannakis. Cumulants: A powerful tool in signal processing. Proceedings of the IEEE, 75(9): 1333–1334, 1987. URL https://doi.org/10.1109/PROC.1987.13884.
G. Guarino, D. Senato and E. Di Nardo. Fast maple algorithms for k-statistics, polykays and their multivariate generalization. 2009. URL https://www.maplesoft.com/applications/view.aspx?SID=33041.
R. K. S. Hankin. Additive integer partitions in r. Journal of Statistical Software, Code Snippets, 16: 2006.
M. Hardy. Combinatorics of partial derivatives. Electronic Journal of Combinatorics, 13(1): Research Paper 1, 13, 2006. URL http://www.combinatorics.org/Volume_13/Abstracts/v13i1r1.html.
L. Hernández Encinas and J. Muñoz Masqué. A short proof of the generalized Faà di Bruno’s formula. Applied Mathematics Letters. An International Journal of Rapid Publication, 16(6): 975–979, 2003. URL https://doi.org/10.1016/S0893-9659(03)90026-7.
S. R. Jammalamadaka, T. S. Rao and G. Terdik. Higher order cumulants of random vectors and applications to statistical inference and time series. Sankhyā. The Indian Journal of Statistics, 68(2): 326–356, 2006. URL https://www.jstor.org/stable/25053499.
D. N. Joanes and C. A. Gill. Comparing measures of sample skewness and kurtosis. Journal of the Royal Statistical Society: Series D (The Statistician), 47(1): 183–189, 1998.
T. Kim, D. S. Kim and G.-W. Jang. On central complete and incomplete bell polynomials i. Symmetry, 11(2): 2019. URL https://www.mdpi.com/2073-8994/11/2/288.
R. B. Leipnik and C. E. M. Pearce. The multivariate Faà di Bruno formula and multivariate Taylor expansions with explicit integral remainder term. The ANZIAM Journal. The Australian & New Zealand Industrial and Applied Mathematics Journal, 48(3): 327–341, 2007. URL https://doi.org/10.1017/S1446181100003527.
T.-W. Ma. Higher chain formula proved by combinatorics. Electronic Journal of Combinatorics, 16(1): N21, 7, 2009. URL https://doi.org/10.37236/259.
P. McCullagh. Tensor methods in statistics. Chapman & Hall, London, 1987.
M. Mihoubi. Polynômes multivariés de bell et polynômes de type binomial. 2008.
R. L. Mishkov. Generalization of the formula of Faà di Bruno for a composite function with a vector argument. International Journal of Mathematics and Mathematical Sciences, 24(7): 481–491, 2000. URL https://doi.org/10.1155/S0161171200002970.
J. Y. Nguwi, G. Penent and N. Privault. A deep branching solver for fully nonlinear partial differential equations. 2022. URL https://arxiv.org/abs/2203.03234.
H. Oualla, R. Fateh, A. Darif, S. Safi, M. Pouliquen and M. Frikel. Channel identification based on cumulants, binary measurements, and kernels. Systems, 9(2): 2021. URL https://doi.org/10.3390/systems9020046.
G. Peccati and M. S. Taqqu. Combinatorial expressions of cumulants and moments. In Wiener chaos: Moments, cumulants and diagrams., Ed M. Springer pages. 201–213 2011. Bocconi & Springer Series.
N. Privault. Recursive computation of the hawkes cumulants. Statistics and Probability Letters, 109161, 2021. URL https://doi.org/10.1016/j.spl.2021.109161.
T. S. Rao and W. K. Wong. Some contributions to multivariate nonlinear time series and to bilinear models. In Asymptotics, nonparametrics, and time series, pages. 259–294 1999. Dekker, New York.
D. L. Reiner. Multivariate sequences of binomial type. Studies in Applied Mathematics, 57(2): 119–133, 1976. URL https://doi.org/10.1002/sapm1977572119.
D. S. Robson. Applications of multivariate polykays to the theory of unbiased ratio-type estimation. Journal of the American Statistical Association, 52(280): 511–522, 1957. URL https://www.tandfonline.com/doi/abs/10.1080/01621459.1957.10501407.
S. Roman. The umbral calculus. Academic Press, Inc. [Harcourt Brace Jovanovich, Publishers], New York, 1984.
C. Rose and M. D. Smith. Mathematical statistics with Mathematica®. Springer-Verlag, New York, 2002. URL https://doi.org/10.1007/978-1-4612-2072-5.
T. H. Savits. Some statistical applications of faà di Bruno. Journal of Multivariate Analysis, 97(10): 2131–2140, 2006. URL https://doi.org/10.1016/j.jmva.2006.03.001.
A. B. Shabat and M. Kh. Efendiev. On applications of Faà-di-Bruno formula. Ufa Mathematical Journal, 9(3): 131–136, 2017. URL https://doi.org/10.1007/s12572-017-0181-x.
H. Shrivastava. Multiindex multivariable Hermite polynomials. Math. Comput. Appl., 7(2): 139–149, 2002. URL https://doi.org/10.3390/mca7020139.
K. D. Smith. A tutorial on multivariate k-statistics and their computation. 2020. URL https://arxiv.org/abs/2005.08373.
K. D. Smith, S. Jafarpour, A. Swami and F. Bullo. Topology inference with multivariate cumulants: The möbius inference algorithm. IEEE/ACM Transactions on Networking, 1–15, 2022. URL https://doi.org/10.1109/TNET.2022.3164336.
B. Staude, S. Rotter and S. Grün. CuBIC: Cumulant based inference of higher-order correlations in massively parallel spike trains. Journal of Computational Neuroscience, 29(1-2): 327–350, 2010. URL https://doi.org/10.1007/s10827-009-0195-x.
A. Stuart and J. K. Ord. Kendall’s advanced theory of statistics. Vol. 1. Sixth Edward Arnold, London; copublished in the Americas by Halsted Press [John Wiley & Sons, Inc.], New York, 1994. Distribution theory.
V. Voinov and N. Pya Arnqvist. R-software for additive partitioning of positive integers. Mathematical Journal, 17(1): 69–76, 2017. URL http://www.math.kz/media/journal/journal2018-05-1574083.pdf.
C. S. Withers and S. Nadarajah. Multivariate Bell polynomials. International Journal of Computer Mathematics, 87(11): 2607–2611, 2010. URL https://doi.org/10.1080/00207160802702418.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Nardo & Guarino, "kStatistics: Unbiased Estimates of Joint Cumulant Products from the Multivariate Faà Di Bruno's Formula", The R Journal, 2022

BibTeX citation

@article{RJ-2022-033,
  author = {Nardo, Elvira Di and Guarino, Giuseppe},
  title = {kStatistics: Unbiased Estimates of Joint Cumulant Products from the Multivariate Faà Di Bruno's Formula},
  journal = {The R Journal},
  year = {2022},
  note = {https://doi.org/10.32614/RJ-2022-033},
  doi = {10.32614/RJ-2022-033},
  volume = {14},
  issue = {2},
  issn = {2073-4859},
  pages = {208-228}
}