The package Pstat calculates
Understanding the causes governing patterns of morphological variations in the wild represents a fundamental goal of evolutionary biology. In particular, the relative importance of selective and neutral processes behind the observed differentiation remains a crucial question.
Studies comparing differentiation in quantitative traits and neutral
markers have significantly increased over the last ten years (Leinonen et al. 2013).
Typically, a set of populations is sampled and the degree of genetic
differentiation is estimated for a set of molecular markers with the
Wright’s
As a consequence, the comparison between
Spitze (1993) introduced and defined the
In the wild, the estimation of the additive genetic variance components
is challenging as breeding design is impossible. Therefore,
A large number of studies have assessed the potential for natural
selection to affect morphological evolution by comparing phenotypic
divergence with neutral genetic divergence via a
After loading the package with library(Pstat)
, load the sample data
with data(test)
. This data frame contains A
, B
, C
, D
, and E
) to which each
individual belongs and eleven quantitative measures. An excerpt from the
sample data are presented in Table 1.
Populations | QM1 | QM2 | QM3 | QM4 | |||
---|---|---|---|---|---|---|---|
A | 0.18487253 | 0.4001979 | 0.1694021 | 42 | |||
2 | B | 0.24023500 | 0.4718000 | 0.2178500 | 46 | ||
3 | C | 0.23499676 | 0.4686213 | 0.2060222 | 25 | ||
4 | B | 0.20495223 | 0.3746026 | 0.1846816 | 51 | ||
5 | C | 0.20739220 | 0.4866461 | 0.2131618 | 19 | ||
6 | C | 0.22545341 | 0.3770903 | 0.1882165 | 28 | ||
7 | C | 0.18371681 | 0.4992361 | 0.2167194 | 25 | ||
The package can be used to transform data to eliminate variation resulting from allometric growth. Users have the choice between three alternatives:
Residuals of a linear regression, with one of the quantitative variables used as the regressor (Kuhry and Marcus 1977);
The allometric transformation described in Reist (1985); or
Aitchison’s log-ratio transform (Aitchison 1986).
Among a variety of univariate transformations that aim to separate size and shape variations, Reist (1985) showed that adjustments for size using a regression and residuals (the first option) and allometric adjustments to a standard size (the second option) are preferred since they allow the complete removal of size variations and have minimal impact on the correlation and covariance structure of the data. Unlike the first two options, the third transformation offers the benefit of keeping the same number of variables. We provide examples of each of the three alternatives below.
The first adjustment method provided by Pstat is a simple linear
regression. Assuming the existence of linear relationships between the
dependent variable and one of the quantitative traits, the Res
function returns a new data frame with the residuals of the regression.
The function’s arguments are as follows:
data: the studied data frame to be transformed with as many rows as individuals; the first column must contain the population to which the individual belongs and the other columns may contain quantitative variables.
reg: the name or the rank of the variable chosen as the regressor.
Rp: the names of the populations to be deleted. Default value:
Rp=0
, no population removed.
Ri: the line numbers of individuals to be deleted. Default
value: Ri=0
, no individuals removed.
We present sample output from the test
data, using one of the
quantitative traits as the regressor. A sample of the transformed data
output by Res
is presented in Table 2.
## Using the explanatory variable QM3 as the regressor
Res(data=test, reg="QM3")
Populations | QM1 | QM2 | QM4 | ||
---|---|---|---|---|---|
A | 0.0339245264 | 5.621424e-03 | 6.23817063 | ||
2 | B | 0.1001268662 | 4.497085e-02 | 8.44522196 | |
3 | C | 0.0922422473 | 4.966613e-02 | -12.11705813 | |
4 | B | 0.0574228940 | -3.014565e-02 | 14.67271191 | |
5 | C | 0.0662351079 | 6.293798e-02 | -18.38127681 | |
6 | C | 0.0787149904 | -3.001126e-02 | -8.45810788 | |
7 | C | 0.0433557311 | 7.315960e-02 | -12.51293868 | |
In the second adjustment method provided by Pstat, all morphometric measurements are standardized using the transformation proposed by Reist (1985).
Let
The ReistTrans
function returns a corrected data frame. Using QM3
as
the explanatory variable, we present a sample of the transformed data
frame in Table 3.
## Using QM3 as the explanatory variable (identified by column number)
ReistTrans(test, reg=3)
Populations | QM1 | QM2 | QM4 | ||
---|---|---|---|---|---|
A | -0.7445410 | -0.3859875 | 1.631722 | ||
2 | B | -0.6004703 | -0.3462059 | 1.648348 | |
3 | C | -0.6167708 | -0.3421063 | 1.388608 | |
4 | B | -0.6893556 | -0.4255755 | 1.708186 | |
5 | C | -0.6669355 | -0.3300087 | 1.266323 | |
6 | C | -0.6456670 | -0.4250906 | 1.446049 | |
7 | C | -0.7175846 | -0.3210021 | 1.384003 | |
The third adjustment method provided by Pstat performs the Aitchison log-ratio transformation to account for individual size-effects (Aitchison 1986).
Let
The AitTrans
function returns a corrected data frame. Sample output
are included in Table 4.
AitTrans(test)
Populations | QM1 | QM2 | QM3 | QM4 | ||
---|---|---|---|---|---|---|
A | -1.947544 | -1.6121417 | -1.985498 | 0.408832854 | ... | |
2 | B | -1.910214 | -1.6170919 | -1.952692 | 0.371908012 | |
3 | C | -1.834151 | -1.5343910 | -1.891299 | 0.192727037 | |
4 | B | -1.901481 | -1.6395625 | -1.946710 | 0.494436831 | |
5 | C | -1.832709 | -1.4622885 | -1.820792 | 0.129251912 | |
6 | C | -1.801889 | -1.5785008 | -1.880288 | 0.292211929 | |
7 | C | -1.938699 | -1.5045418 | -1.866950 | 0.195092172 | |
We are interested in determining the phenotypic differentiation across
the five populations for each of the eleven quantitative traits of the
example dataset. The function Pst
can determine the Pst
are as follows:
data: the input data frame with as many rows as individuals; the first column must contain the population label and the others quantitative variables.
ci: if ci=1
, the confidence intervals are added to ci=0
.
csh: the csh=1
.
va: a vector containing the names or column numbers of the
quantitative measures under consideration. If va=0
, all the
variables are selected. Default value: va=0
.
boot: the number of data frames generated to determine the
confidence interval with the bootstrap method. Default value:
boot=1000
.
Pw: the names of the two populations considered to obtain
pairwise Pw=0
, no pairwise analysis.
Rp: the names of the populations to be deleted. Default value:
Rp=0
, no populations removed.
Ri: the line numbers of individuals to be deleted. Default
value: Ri=0
, no individuals removed.
pe: the confidence level of the calculated interval. Default
value: pe=0.95
.
Let us apply the Pst
function to the test
dataset. The output from
Pst
will be a data frame:
## Example 1: Pairwise Pst values using populations C and D
Pst(test, csh=0.2, Pw=c("C","D"))
[1] "Populations sizes are:"
C D
76 32
Quant_Varia Pst_Values
1 QM1 0.1749659
2 QM2 0.7460913
... ... ...
4 QM10 0.9800028
## Example 2: Pst for the 2nd variable and QM7 with 99% confidence intervals
Pst(test, va=c(2,"QM7"), ci=1, boot=10000, Ri=c(5,117:121), pe=0.99)
[1] "Populations sizes are:"
A B C D E
12 76 72 30 4
Quant_Varia Pst_Values 99 %_LowBoundCI 99 %_UpBoundCI
1 QM2 0.8561307 0.7826177 0.9198395
2 QM7 0.8851413 0.7722856 0.9376501
The bootstrapped BootPst
form a
distribution for the selected quantitative trait. In addition to
arguments that are shared with Pst
, the BootPst
function has the
following additional arguments specific to the bootstrap procedure:
opt: if opt=0
, all the boot values of opt="ci"
, the ordered values and the confidence interval are
returned; and if opt="hist"
, the ordered values and the
distribution histogram of opt=0
.
va: the name or column number of the quantitative measure considered.
bars: the maximum number of bars the histogram may have. On the
x-axis, the interval bars
parts (there may
exist unfilled bars). Default value: bars=20
.
The output from the BootPst
function is a vector with the bootstrapped
values.
Let us apply the BootPst
function to test
dataset:
## Example 1: Bootstrapped 95% confidence intervals for three populations (B, C, and D).
## Note that populations A and E are dropped
BootPst(test, opt="ci", va="Body_length", Rp=c("A","E"))
[1] "The studied quantitative variable is:"
[1] "Body_length"
[1] "Populations sizes are:"
B C D
76 76 32
[1] "95 % confidence interval determined by 1000 bootstrap values:"
[1] 0.8757057 0.9585423
[1] 0.7938426 0.8338286 0.8510682 0.8512374 0.8545911 0.8551115 0.8552097
[8] 0.8637057 0.8641575 0.8644145 0.8659723 0.8671139 0.8671265 0.8676122
[15] 0.8686147 0.8702277 0.8708352 0.8711419 0.8718030 0.8721783 0.8734932
...
[995] 0.9621794 0.9625852 0.9634700 0.9644283 0.9650500 0.9689611
## Example 2: Histogram for the trait in column 3 (output in Figure 1)
BootPst(test, opt="hist", va=3, bars=50)
[1] "The studied quantitative variable is:"
[1] "QM3"
[1] "Populations sizes are:"
A B C D E
12 76 76 32 4
[1] "1000 bootstrap values and Pst distribution:"
[1] 0.1062747 0.1076470 0.1269888 0.1593121 0.1775196 0.2050347 0.2111617
[8] 0.2327508 0.2401064 0.2487401 0.2588179 0.2589942 0.2623706 0.2722956
[15] 0.2827915 0.2860497 0.2935858 0.2947525 0.2954878 0.2995198 0.3003267
...
[995] 0.8211326 0.8253874 0.8293417 0.8318546 0.8420100 0.8635299
Brommer (2011) and Lima (2012) offer plots that demonstrate how TracePst
.
Arguments specific to TracePst
include:
va: a vector containing the selected variables names or numbers
(i.e. those of the quantitative measures considered). If va=0
,
all the variables are selected. Default value: va=0
.
ci: if ci=1
, the confidence interval of ci=1
.
Fst: the value of Wright’s Fst=-1
, value of
xm: x-axis maximum. Default value: xm=2
.
pts: the number of points used to plot the curves. Default
value: pts=30
.
Let us apply the TracePst
function to the test
dataset. The plots
output are in Figure 2.
# Aitchison adjustment method:
trans_test=AitTrans(test)
# Plots illustrating how comparisons between Fst and Pst depends on c/h^2:
TracePst(trans_test, Fst=0.3, xm=3)
[1] "Populations sizes are:"
A B C D E
12 76 76 32 4
The use of
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Silva & Silva, "Pstat: An R Package to Assess Population Differentiation in Phenotypic Traits", The R Journal, 2018
BibTeX citation
@article{RJ-2018-010, author = {Silva, Stéphane Blondeau Da and Silva, Anne Da}, title = {Pstat: An R Package to Assess Population Differentiation in Phenotypic Traits}, journal = {The R Journal}, year = {2018}, note = {https://rjournal.github.io/}, volume = {10}, issue = {1}, issn = {2073-4859}, pages = {447-454} }