mixtox: An R Package for Mixture Toxicity Assessment

Mixture toxicity assessment is indeed necessary for humans and ecosystems that are continually exposed to a variety of chemical mixtures. This paper describes an R package, called mixtox, which offers a general framework of curve fitting, mixture experimental design, and mixture toxicity prediction for practitioners in toxicology. The unique features of mixtox include: (1) constructing a uniform table for mixture experimental design; and (2) predicting toxicity of a mixture with multiple components based on reference models such as concentration addition, independent action, and generalized concentration addition. We describe the various functions of the package and provide examples to illustrate their use and show the collaboration of mixtox with other existing packages (e.g., drc) in predicting toxicity of chemical mixtures.


Introduction
Environmental chemicals usually occur as complex mixtures rather than a single compound.Humans and ecosystems are continually exposed to a variety of chemical mixtures with changing composition under different circumstances.Public concern over environmental chemicals demands extensive risk assessment of various mixtures.Today, risk assessment of the effect of environmental mixtures is mainly based on a complex framework in the context of reference models (Backhaus and Faust, 2012).Predicting toxicity of mixtures using reference models is usually based on the fitting information of individual concentration response curves (Faust et al., 2003(Faust et al., , 2001)).Knowledge on the synergistic, antagonistic and additive effects of multiple stressors will be helpful for regulatory agencies in the risk assessment of environmental chemicals.
Curve fitting of individual concentration responses is the basis for predicting the effect of mixtures.Many sigmoidal regression functions have been proposed for the fitting of monotonic nonlinear concentration response data (Goutelle et al., 2008;Scholze et al., 2001;Spiess and Neumeyer, 2010).Sigmoidal functions (e.g., Logit and Weibull) with a lower limit of 0 and an upper limit of 1 (Scholze et al., 2001) are suitable for quantal response data.Other functions like three-or four-parameter Hill functions (Goutelle et al., 2008) and three-or four-parameter Weibull and Logistic functions (Spiess and Neumeyer, 2010) could be used for both quantal and continuous response data.Both proprietary (e.g., Origin) and free-to-download software (e.g., EPA Probit analysis program and BMDS; https://www.epa.gov/bmds) are available for curve fitting.Focusing on R packages we must mention drc (Ritz and Streibig, 2005), drfit (Ranke, 2016), and ezec (Kamvar, 2016).Package drc provides a suite of flexible and versatile model fitting and after-fitting functions to analyze concentration response data.
Appropriate mixture experiments need to be designed to analyze interactions (synergistic, antagonistic, or additive) between/among mixture components.Usually, fixed-ratio ray design is employed to systematically measure the effect of mixtures.One popular fixed-ratio ray design is the equal effect concentration ratio (EECR) (Faust et al., 2003(Faust et al., , 2001)), which designs mixtures according to the proportion of a particular effect concentration, say half maximal effect concentration (EC50), of individual chemicals.EECR has become a de facto standard for the experimental investigation of chemical interaction for all kind of mixtures.The call for simulating "environmentally realistic" mixtures (Backhaus and Faust, 2012) requires more sophisticated experimental design for mixtures.The uniform design concentration ratio (UDCR, or uniform design ray; Liu et al. 2015) was proposed to construct the "environmentally realistic" mixtures.Constructing appropriate uniform design tables (Ning et al., 2011) is the key to the sophisticated uniform experimental design.Uniform design tables with various number of runs, factors, and levels calculated by Professor Fang and his colleagues can be found on the website http://sites.stat.psu.edu/~rli/DMCE/UniformDesign/.However, those tables are not specifically tuned for mixture experimental design.The APTox program (Liu et al., 2012), assessment and prediction on toxicity of chemical mixtures, developed based on Visual Basic, provides a module to construct uniform design tables.To the best of our information, there is no R package available for the construction of uniform design table.
Concentration addition (CA) and independent action (IA) are two classical reference concepts that allow predicting toxicity of mixture substances (Backhaus and Faust, 2012).CA assumes to predict toxicity of mixture compounds with a similar mechanism of action.In contrast to CA, IA assumes mixture components act on different biological targets of an exposed organism independently.The The R Journal Vol.8/2, December 2016 ISSN 2073-4859 expected effect of mixtures can hence be calculated according to the joint probability of statistically independent events.The package drc (Ritz andStreibig, 2005, 2014) provides fitting models (e.g., CA, Hewlett, and Voelund) incorporating synergism/antagonism for binary and ternary mixtures.However, there is a need for evaluating the effect of mixtures with more than three components.The CA and IA models are suitable for fitting mixtures with quantal concentration response data (or quantal responses after proper transformation).Generalized concentration addition (GCA) (Howard and Webster, 2009) was proposed to examine mixtures containing partial agonists.GCA is appropriate for both quantal and continuous concentration response relationships.A following study (Hadrup et al., 2013) showed GCA could predict a larger range of the concentration response curve as compared with the CA and IA models.GCA is appropriate for both quantal and continuous concentration response relationships.
The goal of this paper is to describe an R package, called mixtox (Zhu, 2016), that allows to perform curve fitting, experimental design, and mixture toxicity assessment.In this paper, we will introduce curve fitting of individual concentration response data using a series of sigmoidal models, mixture experimental design based on different strategies, and mixture toxicity prediction based on CA, IA, and GCA, as well as follow-up analysis.
The paper is organized as follows.First, the statistical fundamentals underneath the curve fitting, experimental design, and mixture toxicity prediction are briefly recalled.Then the main functions in mixtox are listed and their use described.Finally, several examples are provided to illustrate the use of these functions.

Outline of statistical fundamentals
Consider a set of n concentration response data points (x 1 , y 1 ), (x 2 , y 2 ), • • • (x n , y n ) and model function y = f (x, θ) + e with p parameters θ = (θ 1 , θ 2 , • • • , θ p ).A non-linear least square regression was used to fit the concentration response data.The purpose of non-linear least square regression is to find the parameter θ to minimize the sum of squares of e (a vector of errors).With the help of the modified Levenberg-Marquardt algorithm, package minpack.lm(Elzhov et al., 2016) is more efficient than the built-in nls function in solving nonlinear least-squares problems.It was employed to fit the concentration response data instead of nls.

Goodness of fit statistics
The mixtox package provides 13 sigmoidal functions to fit concentration response data.A series of goodness of fit statistics were provided to select the best fit.These statistics include coefficient of determination (R 2 ), the bias-corrected coefficient of determination (R 2 adj ), root mean squared error (RMSE), mean absolute error (MAE), Akaike information criterion (AIC), the bias-corrected AICc, and Bayesian information criterion (BIC).The mathematical expression of these statistics can be found in Liao and McGee (2003) and Spiess and Neumeyer (2010).A variety of functions can be used to fit the concentration responses of one compound on an organism.Previously, the best model was selected based on the highest R 2 and R 2 adj , lowest RMSE, and lowest MAE.However, R 2 or even R 2 adj are not sensitive measures for nonlinear models as compared with AIC, AICc, and BIC (Spiess and Neumeyer, 2010).Here, we suggest that users select the best model based on the lowest AIC, AICc, and BIC.

Confidence intervals
The Delta method (Dybowski and Gant, 2001) is used to construct confidence intervals for predicted responses.The (non-simultaneous) confidence interval (CI) at the predictor value x is given as follows: where t is the inverse of Student's t cumulative distribution function, α is the confidence level (usually 95%); the superscript T denotes transpose; ν is the row vector of the Jacobian evaluated at a specified predictor value.
Var( θ) in (1) is the covariance matrix of the parameter estimates.
The R Journal Vol.8/2, December 2016 ISSN 2073-4859 where σ 2 is the squared residual standard error.J(θ) is the Jacobian matrix of f (θ) with respect to the coefficients.
The (non-simultaneous) prediction bounds for a new observation (Huet et al., 2004) (i.e., prediction interval, PI) were found to show better characterization on responses at the extremely low concentrations (Zhu et al., 2013).The PI (De Gryze et al., 2007;Huet et al., 2004) was recommended as the primary choice for uncertainty characterization: (3) The PI for new observations should be wider than the CI for the additional variance (σ 2 ) in predicting new responses (the fit plus random errors).

Uniform design table
Many experimental design methods such as generalized Latin hypercube design (Dette and Pepelyshev, 2010) and uniform design (Ning et al., 2011) can be used to construct chemical mixtures.The uniform design was incorporated into package mixtox.Uniform design is an efficient experimental design method to simulate environmentally realistic mixtures.It allows researchers to investigate mixtures with more chemicals (i.e., factors in uniform design) and concentrations (levels) while simultaneously minimizing the number of experiments (Liu et al., 2015).
As a prerequisite for constructing a uniform design table, researchers need to plan the number of runs, factors, and levels for a mixture experiment according to their experiment.Usually, the number of runs is an integer multiple of levels.Assume that we want to test the mixture effect of three compounds each with four different concentrations.Based on the three factors (compounds) and four levels (concentrations), a uniform design table with four runs would be suitable for this case.If we want to use a uniform design table with eight runs, then the levels (concentrations) need to be allocated in repetition to form pseudo-levels (Liang et al., 2001), the number of which equals that of runs.
The good lattice point method (Zhou and Fang, 2013) with a power generator was used to construct a uniform table based on the number of runs.The centered L 2 -discrepancy (CD 2 ) (Hickernell, 1996;Zhou and Fang, 2013) was employed to measure the uniformity and find the one with lowest discrepancy.
with the definition of θ ki and φ k, li as follows: and where n and s are the number of runs (levels or pseudo-levels) and the number of factors, respectively.

Reference models
For a well-defined mixture of n components, concentration addition is expressed mathematically as (Faust et al., 2001): where ECx i is the effect concentration of the i th component that causes an x% effect when applied individually at c i .The c i is expressed as: where p i is the proportion of the i th component in a mixture, c mix the concentration of the mixture and EC x, mix the concentration of the mixture that causes an effect of x%.The formula to predict the effect The R Journal Vol.8/2, December 2016 ISSN 2073-4859 of a mixture can be expressed as: The theoretical basis for independent action is defined as (Faust et al., 2001): where E(c mix ) is the total effect caused by a mixture at the concentration of c mix , and E(c i ) is the effect caused at c i of individual chemicals.For a fitted function ( f i ) based on the concentration response data of the i th component, E(c i ) is equal to f i (c i ).When E(c mix ) = x, (10) can be expressed as: (11) can be used to predict the mixture effect for IA.
Generalized concentration addition (GCA) is a natural extension of CA that could be applied to mixtures including full agonists and partial agonists or mixtures including full agonists and competitive antagonists (Howard and Webster, 2009).The general form of GCA is expressed as follows: Empirical data were used to fit function f i (c i ), and then predict the mixture response using the inverse function f −1 i (E).Previous studies used the Hill function with Hill coefficient equal to 1 to fit individual concentration response curves (Howard et al., 2010).It is suitable to fit response data with lower limit fixed at 0 and no fixed maximal effect.
where c is the concentration, K is the concentration causes 50% effect, and α is the maximal effect level of a chemical on a test organism.In mixtox, function (13) was labeled as Hill_two.The equation to predict the effect of a multiple-component mixture based on Hill_two is expressed as follows (Howard and Webster, 2009):

Overview of the package
This section provides a description of functions in the R package mixtox.We first detail the main functions that allow to fit concentration response data.Second, we describe a function to construct uniform design tables.Then we discuss functions for mixture toxicity prediction.
R> NOEC <-function(x, expr, sigLev = 0.05) The starting values (param) are indispensable for curveFit.The function tuneFit could help users to find proper starting values.Users need to provide their experimental concentration (conc), corresponding responses (rspn) and the equation (eq) they choose to fit the data (e.g., Weibull).Other terms (effv, rsq, highBar, bar, and sav) are optional and the detailed explanation can be found in the reference manual.
R> tuneFit <-function(conc, rspn, eq = "Weibull", effv, rsq = 0.6, highBar = 5000, + bar = 1000, sav = FALSE) tuneFit is essentially a sophisticated wrapper for nlsLM in the package minpack.lm.Like the function drm in package drc (Ritz and Streibig, 2005) which provides self starting values, tuneFit also provides starting values for all of the 13 sigmoidal functions.Those starting values were collected through the fitting of various concentration response data from various sources (e.g., our lab tests, published literature, and large databases such as ToxCast; Kavlock et al. 2012).The starting values for all of the 13 functions are stored in staval (e.g., staval$Logit would show all of the 1786 starting values for the Logit function).tuneFit provides a high frequency trial and error approach to deploy those starting values one by one until getting the best fit.This approach is quite efficient especially for the quantal concentration response data.

Construction of a uniform design table
The function unidTab can be called to generate uniform design tables.
R> unidTab <-function(lev, fac, algo = "cd2") To call unidTab, users need to provide the number of runs (levels, lev) and factors (fac).Normally, the number of runs equals the number of levels.If users need to do more runs than levels (the number of runs is n times that of levels, n = 1, 2, • • • ), a pseudo-level design needs to be performed.Usually, unidTab is incorporated into functions (e.g., caPred and iaPred) to predict toxicity of mixtures.Once users provide the levels (pseudo-levels) and fitting information of individual compounds, the unidTab will be called by caPred and iaPred to generate uniform table(s) with the lowest discrepancy.

Mixture toxicity prediction
The function caPred can be called to predict the effect of mixtures based on concentration addition.
R> caPred <-function(model, param, mixType = c("acr", "eecr", "udcr"), effv, + effPoints) Mixture toxicity prediction is based on the fitting information of individual concentration response curves.Users need to fit the individual concentration response data first and provide the fitting information (model used to fit individual concentration responses and corresponding fitted parameters param) to caPred.Function caPred provides three optional fix-ratio ray design methods: (1) arbitrary concentration ratio (acr), users can arbitrarily define the proportion of a component in a mixture.It also allows users to design mixtures according to certain experimental design methods (e.g., Latin hypercube design; Dette and Pepelyshev 2010); (2) equal effect concentration ratio (eecr); and (3) uniform design concentration ratio (udcr).If the mixture type is acr, the term effv is a vector of ratios or concentrations that will eventually be converted into the proportion of mixture components.If the mixture type is eecr or udcr, the argument effv is a numeric vector with single or multiple effect values.The argument effPoints is a vector of effect values (with a range of [0, 1]).
The function iaPred can be called to predict the effect of mixtures based on IA.
R> iaPred <-function(model, param, mixType = c("acr", "eecr", "udcr"), effv, + effPoints, lb = 1e-9, ub = 6) The R Journal Vol.8/2, December 2016 ISSN 2073-4859 Most of the arguments in iaPred are the same as those in caPred.The arguments of lb and ub are the lower and upper bounds of the concentration range where to find a solution for the constructed IA equation based on (11), respectively.Users can change these values based on their practical experimental system.
The function caPred can only predict mixture effects of chemicals with quantal concentration responses.That is, caPred only covers concentration response curves fitted by six functions (i.e., Hill, Weibull, Logit, BCW, BCL, and GL).
The function gcaHill can be used to predict the effect of mixtures based on ( 14) on the condition that all individual concentration responses should be fitted by the function Hill_two.

Illustrations
In this section, we illustrate several examples of the use of the functions described above.Example 1 deals with curve fitting.Example 2 deals with experimental design and mixture toxicity prediction based on CA, IA, and GCA.Example 3 deals with the connection of mixture toxicity prediction in mixtox with curve fitting in drc.Example 4 deals with follow-up analysis on mixture prediction.The mixtox package provides sigmoidal concentration response data (antibiotox) to demonstrate the usage of different functions.
The antibiotox dataset includes the long term toxicity of seven aminoglycosides (paromomycin sulfate (PAR), spectinomycin dihydrochloridehydrate (SPE), kanamycin sulfate (KAN), streptomycin sulfate (STR), dihydrostreptomycin sesquisulfate hydrate (DIH), gentamycin sulfate (GEN), and Neomycin sulfate (NEO)) and their 12 mixtures designed using different methods on fresh water photobacteria Vibro-qinghaiensis sp.Q67.The concentration unit of these antibiotics is mole per liter (mol/L).The bioluminesence of photobacteria was transformed into quantal form with response approaches 0 if chemical's concentration was extremely low and 100% inhibition if chemical's concentration was infinitely high.Detailed information on these data and test systems can be found in the reference manual.

Example 1: Curve fitting
Mixture toxicity prediction is based on the curve fitting information of individual compounds.For example, we need to use the Logit function to fit the concentration response data of PAR on photobacteria.To get proper starting values of α (location parameter) and β (slope parameter) for Logit, users need to use the function tuneFit.R> x <-antibiotox$PAR$x R> expr <-antibiotox$PAR$y R> y <-rowMeans(expr) R> tuneFit(x, y, eq = "Logit") The function tuneFit will return a list as follows:

1000001
The number 1000001 means the effect concentration (ecx) was not calculated because no effv was provided to tuneFit.These two functions could well describe the concentration response data of PAR on photobacteria (r 2 > 0.99).It is difficult to distinguish which fit is better purely based on r 2 .Users need to choose the better fit (i.e., Logit) based on AIC or BIC (the lower, the better).
Function tuneFit also provides an important feature of fitting a batch of concentration response data of different chemicals.Here, we would like to example curve fitting of two compounds in a single run.

$sta
Alpha Beta r2 adjr2 MAE RMSE AIC AICc BIC ecx fit_1 6.42 2.25 0.946 0.941 0.0447 0.0625 -30.7 -29.3 -29.7 1000001 fit_2 5.57 2.27 0.974 0.972 0.0452 0.0545 -34.0 -32.6 -33.0 1000001 In this case, r 2 is still useful to give an illustrative evaluation of the goodness of fit.In cases of various curves fitted by different functions, there is no way to compare their goodness of fit just based on AIC or BIC.Thus, we provide as many statistics as possible to allow users to choose the best fit in the context of their experimental systems.
Function curveFit could be used to fit the concentration response data of PAR using the starting values of 26.31 and 4.66 for Logit.
The NOEC and LOEC can be calculated using the function NOEC:

Example 2: Mixture toxicity prediction
The function caPred can be called to predict the effect of mixtures based on concentration addition.It provides three optional mixture design methods as mentioned before (ACR, EECR, and UDCR).The effect of mixtures of seven aminoglycosides combined on photobacteria according to the EECR design for 5% and 50% effect levels can be predicted by: R> model <-antibiotox$sgl$model R> param <-antibiotox$sgl$param R> caPred(model, param, mixType = "eecr", effv = c(0.05,0.5)) Function caPred will return a series of effects e in a range of [0, 1], predicted concentrations ca that cause effects e, and the proportion of individual components (pct) for the EECR mixture.The proportion of individual components is very useful for users to prepare mixtures in practical experiments.
Here we would like to show an example of connecting the curve fitting (drm) in drc with the mixture effects prediction (caPred and iaPred) in mixtox.Assume we need to predict the effect of mixtures of four antibiotics (PAR, SPE, KAN, and STR) on photobacteria.First, we need to fit these concentration response data (stored in antibiotox) using drm.The LL.2 function is selected to fit these response data.
The following code can be used to transform b and e in LL.2 into α and β in Logit numerically.

Example 4: Follow-up analysis
Figure 2 shows the comparison of CA and IA prediction with experimental observation of EECR mixtures.The code for this comparison is shown in Appendix.
We can conclude that the mixture toxicity of seven antibiotics can be predicted by the IA model since the IA curve locates in the 95% PI of the experimental curve.
Despite the CA and IA prediction at particular effect concentrations, it is also meaningful to know the CA and IA prediction at particular effects.For example, the luminescence of photobacteria will The R Journal Vol.8/2, December 2016 ISSN 2073-4859 The two functions first calculate the effect concentrations based on the fitted concentration response information of mixtures (i.e., the selected equations and associated coefficients contained in antibiotox$eecr.mix)according to the input effects effv (e.g., 5.0%).The concentration of an individual component (c i ) is computed from (8) based on the mixture's effect concentration and the ratio of components in the mixture antibiotox$pct (p i ).The IA predicted effect for EE5 and EE50 are 1.80% and 3.04%, respectively.The CA predicted effect of EE5 and EE50 are 2.78% and 4.25%, respectively.

Conclusions
Package mixtox mainly targets toxicologists in the study of chemicals' toxicity and the effect of mixtures.It offers a general framework of curve fitting, experimental design, and mixture prediction for practitioners in toxicology.The unique features of mixtox include: (1) construction of uniform design table; and (2) mixture toxicity prediction for multiple components based on CA, IA, and GCA.Function gcaHill is capable of examining mixtures containing partial agonists or even full agonists.However, we are lacking experimental data in mixtox to verify its ability.mixtox aims to predict the effect of mixtures with more than two components.It cannot make isoboles for binary mixtures.Users may need to draw isoboles using package drc or other software package based on the results calculated with mixtox.The current version of mixtox (version number 1.3.1) is available on CRAN.We received a lot of feedback from users since its release last year.Now it is still under rapid development to improve its performance and try to include more features in toxicology.

Figure 1 :
Figure 1: The fitting information of antibiotic PAR on photobacteria: (A) concentration response curves.Dot: observed; Black line: fitted CRC; Red dotted lines: 95% CI; Blue dotted lines: 95% PI; and (B) normal Q-Q plots of residuals.

Figure 2 :
Figure 2: Experimental, predicted CA, and IA concentration response curves of 2 mixtures designed by equal effect concentration ratio (EE05 and EE50).Dot: observed; Black line: fitted CRC; Grey dotted lines: 95% PI; Blue dashed line: IA prediction; Red dashed line: CA.The code is listed in Appendix.
Six functions (i.e., Hill, Weibull, Logit, BCW, BCL, and GL)are suitable to fit quantal responses in the range of [0, 1].The other 7 functions are suitable to fit continuous concentration responses with response range out of [0, 1].
The main function for curve fitting in mixtox is curveFit.This function provides various options.It requires concentration (x), experimental responses (expr), a suitable equation (eq), and corresponding The R Journal Vol.8/2, December 2016 ISSN 2073-4859 starting values (param).