Skew-t Expected Information Matrix Evaluation and Use for Standard Error Calculations

Skew-t distributions derived from skew-normal distributions, as developed by Azzalini and several co-workers, are popular because of their theoretical foundation and the availability of computational methods in the R package sn. One difficulty with this skew-t family is that the elements of the expected information matrix do not have closed form analytic formulas. Thus, we developed a numerical integration method of computing the expected information matrix in the R package skewtInfo. The accuracy of our expected information matrix calculation method was confirmed by comparing the result with that obtained using an observed information matrix for a very large sample size. A Monte Carlo study to evaluate the accuracy of the standard errors obtained with our expected information matrix calculation method, for the case of three realistic skew-t parameter vectors, indicates that use of the expected information matrix results in standard errors as accurate as, and sometimes a little more accurate than, use of an observed information matrix.


Introduction
The primary motivation for this paper was our desire to use a flexible family of skew-t distributions for quantitative finance research, and in particular for empirical studies of the tail fatness and skewness of returns distributions of various classes of assets, e.g., stock returns and hedge fund returns. That said, we hasten to add that skew-t distributions have extensive application uses in the broad areas of physical and social sciences, and in engineering and computer science. We believe that the main results of this paper will be of interest to this broader audience as well as to those interested in finance applications.
The remainder of this paper is organized as follows. The "Skew-t Distribution" section introduces the skew-t distribution family, and provides the relationship between the skew-t distribution direct parameters "dp" and the usual moment-based "cp" parameters, i.e., the mean, standard deviation, skewness and excess kurtosis. The "Maximum Penalized Likelihood Estimation Method" section briefly describes the sn package MPLE method used to fit skew-t distributions, and uses the method to compute skew-t parameters for each of the three sets of returns D1, D2, D3, compares the parameters to those obtained with symmetric-t distribution maximum-likelihood estimates (MLEs), and computes the values of the cp parameters from the skew-t dp parameters. The "Skew-t Information Matrix" section discusses our numerical method for computing the information matrix, as implemented in the package skewtInfo, and describes how we verified by examples the accuracy of our method. That section also examines the ill-conditioning of the information matrix for the three sets of skew-t MPLEs and computes and discusses the corresponding asymptotic correlations of the skew-t parameter MPLEs for D1, D2, D3. The "Standard Errors of Skew-t Parameter MPLEs" section uses Monte Carlo to evaluate the use of an expected information matrix to compute standard errors of skew-t MPLEs, in comparison with the use of an observed information matrix, and in comparison with Monte Carlo "true" standard errors, which are used as true skew-t parameters the MPLEs for stock returns D1, D2, D3. As a preliminary to this Monte Carlo, the first part of the "Standard Errors of Skew-t Parameter MPLEs" section discusses a problem that arises when using skew-t MPLEs to compute them for a large number of replicates, and shows how to deal with the problem. Finally, the last section discusses related future research that needs to be done.

Skew-t distributions
It is well-known that a random variable X with a standard t-distribution, i.e., with location parameter of value zero, scale parameter of value one, and ν degrees of freedom has the stochastic representation where Z follows a standard normal distribution and the independent random variable χ 2 ν follows a chisquared distribution with ν degrees of freedom. The following results and related details concerning skew-t random variables and their distributions are provided in Azzalini and Capitanio (2014).
A standard skew-t random variable Y 0,1, α, ν ∼ ST(0, 1, α, ν) with slant (skewness) parameter α and degrees of freedom parameter ν, can be obtained by replacing Z with an independent standard skew-normal random variable Z 0,1,α ∼ SN(0, 1, α), whose probability density function is where φ(x) is the standard normal density function and Φ(x) is the standard normal (cumulative) distribution function. The resulting standard skew-t random variable has the standard skew-t probability density where t ν (x) is the density function of a standard t-distribution random variable with ν degrees of freedom, and T ν (x) is the corresponding cumulative distribution function. A general skew-t random variable Y ξ,ω,α,ν with location parameter ξ and scale parameter ω, whose distribution we denote as ST(ξ, ω, α, ν), has the stochastic representation with the transformations A general skew-t probability density function of Y ξ, ω, α, ν is then given by with the transformations one would find that the variance of the standardized skew-t random variable z is: Note that as the slant parameter α ranges over (−∞, +∞), the parameter δ α ranges over the interval (−1, 1) , and when there is no skewness, i.e., when α = δ α = 0, the variance of z is that of a standard symmetric-t random variable.
It will sometimes be of interest to transform a skew-t distribution parameter vector (ξ, ω, α, ν) to the usual mean, standard deviation, skewness, and excess kurtosis vector (µ, σ, γ, κ). 1 This can be done with the following transformations, provided on pages 103-104 of AZC:

The maximum penalized likelihood estimation method
The log-likelihood for a single observation y with density f (y; θ) is where θ is a parameter vector. With θ = (ξ, ω, α, ν) and z = (y − ξ)/ω, it follows from (7) that for a skew-t distribution the log-likelihood function for a single observation y is For n independently and identically distributed skew-t observations y = (y 1 , y 2 , . . . , y n ) the loglikelihood is Letθ n = (ξ n ,ω n ,α n ,ν n ) denote a maximum-likelihood estimate (MLE) for sample size n. Azzalini and Arellano-Valle (2013) showed that with non-negligible probability the MLEα n of the skewness parameter α diverges. Consequently, they proposed a maximum penalized likelihood estimator (MPLE) to solve this problem, based on a penalized likelihood of the form where Q(θ) = Q(α, ν) is a non-negative quantity that penalizes the divergence of α by virtue of tending to infinity as α → ±∞ for each fixed ν. Details may be found in Section 3 of Azzalini and Arellano-Valle (2013).

Application to stock returns data
We illustrate the use of the sn package function st.mple() to compute skew-t distribution MPLE fits to three disjoint time segments of the returns of a large-cap stock with ticker D and company name The Dominion Resources, INC 2 . The code shown below creates the plot of monthly returns of this stock from December 1991 to September 2015 in Figure  The entire time history of the D returns is longer than one would normally use for portfolio construction, and we note that this time interval contains several major market crises, e.g., the Russian default in 1998, the dot-com bubble collapse in 2000, and the financial markets crises of 2007-2008. Thus we will fit skew-t distributions, and also symmetric-t distributions, to the stock with ticker D for the sub-intervals 1991-1996, 1998-2009, and 2010-2015. The first interval, consisting of 61 months, contains no market crises, and the same is true of the third interval that consists of 72 months. The second interval, consisting of 156 months, contains all three crises. For convenience, we refer to the stock D returns on the first, second and third time intervals as D1, D2, and D3. Based on this fact that D2 contains three market crises, along with the visual character of the D2 returns, we expected that we would obtain a skew-t fit with negative slant parameter and rather small degrees of freedom for D2. But for D1 and D3 we were not sure what to expect.

Skew-t information matrix
There are two equivalent forms of an information matrix, one of which involves the products of score functions that are derivatives with respect to each parameter of the log-likelihood for a single observation, and the other of which involves the partial derivatives of the score functions (second partial derivatives of the log-likelihood for a single observation) with respect to each parameter. 3 In our case here, where θ = (ξ, ω, α, ν) and the skew-t log-likelihood for a single observation is We also have the scalar score functions and the score vector is the gradient of the log-likelihood: The two equivalent forms of the information matrix are: The importance of the information matrix I(θ) is that V(θ) = I(θ) −1 is the asymptotic covariance 3 See for example Casella and Berger (2001). matrix of a consistent and asymptotically normal MPLEθ n = (ξ n ,ω n ,α n ,ν n ). Consequently V(θ n ) can be used to compute approximate standard errors in the usual way, i.e., by extracting the diagonal elements of V(θ n ), taking the square root of the result and then dividing by √ n. In the remainder of this section, we focus on a numerical integration method of evaluating I(θ), verifying the accuracy of the method, and analyzing the character of the resulting I(θ) for the MPLE parameters for the stock returns D1, D2, and D3.

Numerical evaluation of the skew-t information matrix
The skew-t information matrix in (17)-(18) does not admit of closed-form analytic expressions for its elements, and one must resort to numerical integration. It turns out that information matrix form (17) is considerably simpler than that of (18) for purposes of numerical integration, and so we focus on the former. For this purpose we need the following analytic expressions for the score functions (15) derived by DiCiccio and Monti (2011). Let where ψ(x) is the digamma function. Then the score functions are where We have developed the R package skewtInfo at https://github.com/chindhanai/skewtInfo for numerically evaluating Equation (17). The main function in this package is stInfoMat(), which has an argument type that controls the type of information matrix to be computed, with the choice type = "expected" resulting in computing an expected information matrix, and the choice type = "observed" resulting in computing an observed information matrix. We note that the computation of an observed information matrix for skew-t distributions is already available using the functions st.infoUv() in the sn package. But for the sake of stand-alone completeness of the skewtInfo package we included the capability to compute an observed information matrix.
The stInfoMat() function, with about 400 lines of R code, uses numerical quadrature as implemented in the function integrate() from the R base package, with the choice of tolerance of 10 −9 . The integrate() function maps the infinite range of parameters onto finite sub-intervals, and then uses globally adaptive interval subdivision in conjunction with Wynn's Epsilon algorithm extrapolation, with the basic step being Gauss-Kronrod quadrature. See Piessens et al. (1983) and the documentation of the function integrate for more information.
Note that the expectations in the bottom row of the information matrix involve a double integration due to the presence of ζ(z) in the last term of S ν (y), and hence the presence of ζ(z) in the pairwise products of (22) with (19)-(21). For example, as a part of computing the expected value of the score functions product S α · S ν , we need to evaluate 4 To compute such an integral in R, we do the following 1. Define the innermost integrand of (24) as a function of u and ν 2. Integrate the inner integrand with respect to u and define this inner integral as a function of z 4 We suppress the variable z dependency for w, τ and ζ for a more compact expression of the integrals.
3. Define the outer integrand to be the product of integral from Step 2 and the function

Integrate the vectorized integrand in
Step 3 with respect to y by using sapply() on the y range over the real line.

Usage example
The function stInfoMat() has three arguments: 1. y, a vector of skew-t random variables used to compute an observed information matrix. Its default is NULL.
2. dp, the skew-t direct parameter used to compute an expected information matrix, or an observed information matrix.
The object returned by a call to stInfoMat() is a list having four objects: 1. dp, the skew-t direct parameter used to compute an expected information matrix or an observed information matrix.
2. stInfoMat, an expected information matrix when type = "expected", and an observed information matrix when type = "observed".
3. type, the type of output information matrix.
4. SEMat, the asymptotic standard errors of the skew-t parameters when type = "expected", and the element-wise standard error of the observed information matrix in the case of type = "observed".
Here the asymptotic standard errors of the skew-t MPLEs are the square roots of the diagonal elements of the inverse of the expected information matrix, and the element-wise standard errors of the observed information matrix are the standard errors of the element-wise sample means defined in equation (27) in the next subsection.
The following code illustrates the use of stInfoMat() to compute the expected information matrix I(θ D2 ) for the case of the skew-t parameter values θ D2 = (0.0461, 0.0536, −1.06, 4.42), previously obtained as MPLE for D2 returns.

Expected information matrix accuracy verification
Recall the expression for a skew-t log-likelihood n (θ) in (12) for a sample of size n, where ST (θ; y i ) is the log-likelihood for a single observation, and the gradient of the log-likelihood is given by (16). It follows that the Hessian of n (θ) is where the expressions for the second partial derivatives above were derived by DiCiccio and Monti (2011). The observed information matrix I obs,n (θ) for a sample of size n is : It is a general result that the observed information matrix converges in probability to the expected information matrix: − 1 n ∇ 2 n (θ; y) → P I(θ) We shall confirm the accuracy of our numerical evaluation of I(θ) by evaluating the observed information matrix on the left-hand side of the above convergence result for very large n.
We could use either stInfoMat() with type = "observed" or the sn package function st.infoUv() to compute the observed information matrix. These two functions give virtually identical results, and since our skewtInfo package contains stInfoMat(), we use that function to compute both observed and expected information matrices for the case of the skew-t parameter vector θ D2 = (0.0461, 0.0536, −1.06, 4.42) with n = 100, 000, and compare the results. Table 2 shows the following results: the absolute differences (AD) and relative differences (RD) between the expected and observed information matrix, along with the standard errors (SE) of the observed information matrix estimates, and the t-statistics (AD/SE) for testing whether or not the observed information matrix values are equal to those of the expected information matrix.  Table 2: Absolute differences (AD), relative differences (RD), standard errors (SE), and t-statistics (tstat.AD) of diagonal elements (Top) and off-diagonal elements (Bottom) of the expected information matrix and the observed information matrix based on n = 100, 000 for the skew-t parameter θ D2 .
The accuracy of the results in this table indicate that the observed information matrix with a very large sample size n is sufficiently close to the expected information matrix for us to conclude that our expected information matrix computation is sufficiently accurate.

Information matrices condition numbers and parameters correlations
One obtains an MPLE asymptotic covariance matrix by computing the inverse of an expected information matrix, and the stability of such inversion depends on the condition number of the latter matrix. 5 In order to get a feeling for the extent of ill-conditioning of expected information matrices for the three skew-t MPLEs of the previous section, we computed the eigenvalues and condition numbers for the corresponding expected information matrices, and show the results in Table 3.
We see that the expected information matrix for θ D1 is very badly ill-conditioned, and the expected information matrices of θ D2 and θ D3 are also quite ill-conditioned, but less so than the information matrix for θ D1 . Nonetheless all three of these information matrices are positive definite and their inverses exist. 5 The condition number of a positive definite real matrix A is given by where λ min and λ max are the smallest eigenvalue and the largest eigenvalue of A.

Asymptotic correlations of the four parameter MPLEs
The correlation matrix obtained from such an MPLE asymptotic covariance matrix indicates how correlated the skew-t MPLEs will be in large samples. In order to see what these correlations look like, we display below the MPLE asymptotic correlation matrices for the returns D1, D2 and D3. The six pairwise asymptotic correlations are rather similar across the three skew-t parameter values as follows. The correlations between location and scale parameters tend to be quite positive (ranging from 0.734 to 0.885), the correlations between location are slant parameters are strongly negative (ranging from -.954 to -.802), and the correlations between location and DoF tend to be small and positive (ranging from 0.394 to 0.443). The scale and slant parameters are substantially negatively correlated (ranging from -.874 to -.706), and the scale and DoF parameters are rather positively correlated (ranging from 0.641 to 0.769). Finally, the slant and DoF parameters are somewhat negatively correlated (ranging from -.411 to -.359).

Standard errors of skew-t parameter MPLEs
Since our expected information matrix computation is highly accurate, we will use the inverse of the expected information matrix to compute standard errors in the usual way, i.e., by taking the square root of the variance estimates obtained as the diagonal elements of the estimated asymptotic covariance matrix (the inverse of the estimated expected information matrix) divided by n: We evaluate the performance of these standard errors in the following two ways: (a) we compare them with the "true" standard errors obtained via Monte Carlo, and (b) we compare them with the standard errors based on an observed information matrix computed in the sn package. These results will reveal whether or not there are any meaningful differences in the accuracy of the standard errors obtained using the expected information matrix versus using the observed information matrix.
Before we make such comparisons, the next subsection first discusses a basic problem that occurs in carrying out Monte Carlo studies with many replicates of st.mple() fits, and how to deal with the problem.

A Monte Carlo skew-t fitting difficulty and a solution
When carrying out our Monte Carlo studies involving generating a large number of random sample replicates from a skew-t distribution, and fitting a skew-t distribution with st.mple() to each replicate, we would often run into a problem that st.mple() sometimes results in a huge value for the degrees of freedom estimate and simultaneously an essentially zero estimate for the slant parameter. This problem is associated with a well-known singularity problem with a skewed normal distribution when the slant (skewness) parameter is zero, and a near singularity when the slant parameter is close to zero. The problem occurs when a randomly generated replicate of a skew-t random variable is close to normality, which happens the more so for small sample sizes.
We illustrate the behavior using M = 1, 000 replicates of samples of size n = 50, 100, 200, 400 from a skew-t distribution with parameter value θ D2 = (0.0461, 0.0536, − 1.06, 4.42) that we obtained as an st.mple() estimate in Table 1. 6 The code below implements this for the case n = 50, with the replicates stored in the matrix retReps, and the fits stored in the matrix skewtFits.
set.seed(123) # Skew-t parameters corresponding to D2 returns dp <-stFitD2 M <-1000 n <-50 retReps <-matrix(NA, ncol = n, nrow = M) skewtFits <-matrix(NA, ncol = 4, nrow = M) for (m in 1:M) { retReps[m, ] <-rst(n = n, dp = dp) } for (m in 1:M) { skewtFits[m, ] <-st.mple(y = retReps[m,], penalty = "Qpenalty")$dp } Now for each sample size 50, 100, 200, and 400, we identify a sub-matrix skewtFitsHugeDof of skewtFits for which the degrees of freedom estimate is larger than each of thresholds 20, 40, 100, 500 and 1,000, and print the number of rows of skewtFitsHugeDof for which this occurs for each of those thresholds, along with the parameter estimates of the first such row. The code below accomplishes this for the threshold value 1,000.  Table 4 displays the number and percent of replicates for which the st.mple() fit for the parameters choice θ D2 = (0.046, 0.054, −1.064, 4.416), has degrees of freedom larger than the threshold of 20, 40, 100, 500 and 1, 000, for each of the sample sizes 50, 100, 200, 400. The percentages of replicates whose DoF estimates are larger than the above thresholds indicate that the fits are quite unreliable for n = 50 and n = 100, but are much more reliable for sample size 200, where the percent of huge DoF estimates is less than 1% for all thresholds greater than 40, and at the sample size 400 it is only 0.3% for a threshold of 20, and 0.1% for larger thresholds. 7 In order to carry out our Monte Carlo studies with 1,000 replicates, whose results we discuss in the remainder of this subsection, we need to obtain that many replicates after deleting any replicates for which the DoF estimate is sufficiently large. For purposes of our study, we will use a threshold of 20, noting that any replicate with a DoF estimate larger than 20 is rather close to normality, and if significant skewness is indicated by the slant parameter estimate then one should fit a skew-normal distribution. Thus we carry out our Monte Carlo study to obtain 1,000 clean replicates, with no DoF estimate larger than 20, by first generating 1,300 replicates to clean, which will leave more than 1,000 replicates remaining for all cases considered, and take the first 1,000 of those remaining replicates for Monte Carlo calculations. 136 ( 13.6 %) 45 ( 4.5 %) 6 ( 0.6 %) 1 ( 0.1 %) Fit DoF > 500 126 ( 12.6 %) 39 ( 3.9 %) 5 ( 0.5 %) 1 ( 0.1 %) Fit DoF > 1000 123 ( 12.3 %) 36 ( 3.6 %) 5 ( 0.5 %) 1 ( 0.1 %) Table 4: Numbers and percentages of replicates for θ D2 whose MPLE fit has DoF greater than 20, 40, 100, 500, 1000.
Our main goal here is to do a simple Monte Carlo comparison of the MPLE values with true parameter values, where we take the true parameter values to be the three sets of values θ Di , i = 1, 2, 3 in Table 1 for the skew-t distribution. So we compute the sample mean values MC Di, n of the MPLEs across the 1,000 replicates, along with their Monte Carlo standard errors, and their percent relative biases: However, before we show the results of deleting MPLEs whose DoF estimate is larger than 20, we first show the results obtained without doing such deleting. These results, shown in Table 5 for sample sizes n = 100, 200, 400, with standard errors of the MC values in parentheses, reveal that for all three parameter sets considered, the average values of the DoF estimates are hugely positively biased. This is because a fraction of the st.mple() estimates of the DoF are huge outliers that very adversely distort the DoF Monte Carlo average value. 8 Table 6 then shows the results of deleting any MPLEs whose DoF estimate is greater than 20. In Table 6a where the true DoF parameter has a value close to 32 and a rejection threshold of 20 is used, it is not surprising that the DoF values are very negatively biased. However in Tables 6b and 6c, where the true DoF values are about 4.4 and 6.6 respectively, the DoF values are positively biased, and the relative bias is considerably smaller than in Table 6a where the true DoF parameter value is close to 32.     Further comparison of the Monte Carlo average values and relative biases of the location, scale and slant parameter estimates in Tables 5 and 6 reveal that for the stock returns data under consideration: (a) These parameter estimates are not much affected by whether or not MPLEs are rejected because their DoF estimate is larger than 20; (b) These parameter estimates all have relative bias percentages that decrease with increasing sample size, with small single digit relative bias percentages at sample size 400 for stocks D2 and D3. emphasize that this is a very narrow conclusion based on the returns subsets of a particular stock, and a much more extensive study is needed of large cross-sections of stock to explore this issue thoroughly.

Standard errors of skew-t MPLEs
A reason for studying the accuracy of the standard errors obtained from our skew-t expected information matrices is that in general one does not know whether or not the use of an expected information matrix yields standard errors that are more accurate, less accurate, or as accurate as those obtained using an observed information matrix. The statistical literature concerning this question includes, among other, an early paper by Efron and Hinkley (1978) that argues in favor of the observed information, and the more recent papers by Cao and Spall (2010) and Cao (2013) arguing that the use of the expected information matrix is preferred.
For our Monte Carlo study of the relative performance of standard errors obtained using our expected information matrix versus an observed information matrix, we focus on the two skewt parameter vectors θ D2, n and θ D3, n of the "Maximum Penalized Likelihood Estimation Method" section, and skip θ D1, n since its DoF parameter value calls for using a skew-normal or a normal distribution model. Noting that the Monte Carlo standard error of a standard error estimate under normality, relative to the standard deviation of the observations, is approximately 1/ 2 · (M − 1) where M is the number of Monte Carlo replicates, we use M = 4000 to obtain a standard error of about 1%. The detailed steps of our Monte Carlo are as follows.
For i = 2, 3 and for each of the sample sizes n = 50, 100, 200, 400 and 800, we do the following: 1. Generate a first random sample of size n of skew-t random random variables with parameter θ Di, n . 9 2. Use the random sample to fit a skew-t distribution with the sn package function st.mple(), thereby obtaining the set of skew-t parameter estimatesξ n, 1 ,ω n, 1 ,α n, 1 , ν n, 1 .
4. Remove the parameter estimate vectors for which the DoF estimateν n, m is larger than 20, and take the first 4000 of the resulting skew-t parameter estimate vectors.
5. Compute the Monte Carlo standard error SE MC of each of the four parameters as their sample standard deviations for the 4000 replicates.  The results in Table 7 are displayed graphically in the two Lattice plots of Figure 2. Each panel of the plots contains the three standard errors SE MC , SE obs , SE exp for one of the parameters, versus sample size. corresponding to columns in Table 7. The figure results indicate that there is often little difference between the standard errors computed with the expected information matrix versus those of the observed information matrix, and when there are significant differences, such as for the scale and DoF estimates for D3 at small sample sizes, the expected information matrix typically results in somewhat more accurate standard errors than the observed information matrix. While sample size 100 seems to be adequate for obtaining accurate standard errors for the location and slant parameter MPLEs, it appears that one needs sample size at least 200 to obtain accurate standard errors for scale parameter MPLEs. Furthermore, while the expected information matrix standard error of the DoF skew-t MPLE for D3 is quite accurate for sample sizes 100 and larger, this is not the case for D2, which indicates that quite large sample sizes are likely needed in order to obtain accurate standard errors for DoF skew-t MPLEs.

Concluding comments
Since there is no analytic formula for the Azzalini skew-t expected information matrix, we developed a numerical integration method of evaluation and implemented it in the function stInfoMat() in the R package skewtInfo. We made an initial exploration of the properties of the resulting expected information matrix for three skew-t parameter vectors obtained using on the sn package function st.mple() for three sets of monthly returns D1, D2, D3 of a single stock with ticker D, on three disjoint time intervals from December 1991 to September 2015. We confirmed the accuracy of our expected information matrix evaluation method for one of these skew-t parameter vectors by using the well-known fact that an observed information matrix converges in probability to the negative of the expected information matrix, and showing that for n = 100, 000 the observed information matrix values differ by a negligible amount from those of our expected information matrix. We examined the condition number of the expected information matrices for those three skew-t parameter vectors and found that although the matrices are positive definite, they are quite poorly conditioned to various extents. The asymptotic correlation patterns of the three skew-t parameter MPLEs was examined, and we found that they are fairly consistent in sign and magnitude across those thee parameter vectors, as described at the end of the "Information Matrices Condition Numbers and Paramters Correlation" subsection.
For the two of the three skew-t MPLE parameter vectors with single digit DoF values, we carried out a Monte Carlo study of the standard errors obtained with the expected information matrix, as compared with the observed information matrix, and comparing both with Monte Carlo reference standard errors. The results indicate that accurate standard errors for location and slant parameters can be obtained using both expected and observed information matrices for sample sizes of at last 100, but not for smaller sample sizes. For the scale and DoF parameters, it appears that sample sizes as large as 200 or 400 may be needed to obtain accurate standard errors, and in some cases the expected information matrix gives better results.