Peptides: a Package for Data Mining of Antimicrobial Peptides

Antimicrobial peptides (AMP) are a promising source of antibiotics with a broad spectrum activity against bacteria and low incidence of developing resistance. The mechanism by which an AMP executes its function depends on a set of computable physicochemical properties from the amino acid sequence. Peptides package was designed for allowing the quick and easy computation of ten structural characteristics own of the antimicrobial peptides, with the aim of generating data for increase the accuracy in classification and design of new amino acid sequences. Moreover, the options to read and plot XVG output files from GROMACS molecular dynamics package was included.


Introduction
Antimicrobial peptides are a promising source of antibiotics with a broad spectrum activity against bacteria and low incidence of developing resistance (Hancock, 2001).Multiple research have concluded that the natural biological activities of these peptides are coordinated by a sophisticated modulation of the hydrophobicity, amphipathicity, positive charge and a reduction in the hydrophobic moment (Yeaman and Yount, 2003;Fjell et al., 2012;Matsuzaki, 2009).Additionally to these four properties there are other descriptors that used in conjunction can provide useful information in the classification and design of antimicrobial peptides (Boman, 2003;Wang et al., 2009;Thomas et al., 2010;Piotto et al., 2012).
The computation of these properties for antimicrobial peptides is available free of charge through various web or native applications as ExPASy-protparam (Gasteiger et al., 2005), EMBOSS-pepstats (Rice et al., 2000), BioPerl (Stajich et al., 2002), CAMP (Thomas et al., 2010) and APD (Wang et al., 2009) databases.However, no application allows the computation of all properties.Some of these services allow only the computation of some properties for a sequence at a time, and have no option for downloading and handling this information in editable files.Others allow the calculation of more than one sequence, but they write an output file for each one of them, which makes it difficult to handle and analyse the data.
Taking the advantage of handling data, vectors and tables provided by R, we introduce the Peptides package.It allows quick and easy computation of ten structural characteristics own of the antimicrobial peptides (length, amino-acid composition, net charge, aliphatic index, molecular weight, isoelectric point, hydrophobic moment, potential peptide interaction index, instability index and GRAVY hydrophobicity index) in a single application.The Peptides package was designed with the aim of generating data to increase the accuracy in the classification process of new amino acid sequences.In addition, the option to read and plot XVG output files from GROMACS molecular dynamics package was included.
In this work we describe the computation of structural properties of the Lasiocepsin peptide (GLPRKILCAIAKKKGKCKGPLKLVCKC) using the Peptides package, step by step as an example.It is an alpha-helical antimicrobial peptide derived from the venom of eusocial bee Lasioglossum laticeps, identified in the Protein Data Bank with the 2MBD code (Monincová et al., 2012).Moreover, an example of a classification using a dataset of 23 variables computed for 100 peptides through Peptides is performed.It was found that using this dataset for classification through linear discriminant analysis and classification-regression trees allows classify antimicrobial peptides with an accuracy of 95% and 85%, respectively.

Installation and functions
Peptides includes thirteen functions and is available for download and installation through the CRAN servers.To install it, just type: > install.packages("Peptides")> library(Peptides)

Number of amino acids
As all proteins, the antimicrobial peptides are formed by linear chains of small residues known as amino acids attached to each other by peptide bonds.Antimicrobial peptides are characterized by a short length, they generally comprise less than 50 amino acids.This property minimizes the probability of being degraded by bacterial proteases (Kim et al., 2013).The function lengthpep counts the number of amino acids in a sequence and returns a vector with the count for each peptide used as argument.

Molecular weight
The molecular weight is the sum of the masses of each atom constituting a molecule.The molecular weight is directly related to the length of the amino acid sequence and is expressed in units called daltons (Da).Antimicrobial peptides due to its short length are characterized by a molecular weight <10 kDa (10000 Da).In Peptides the function mw computes the molecular weight using the same formulas and weights as ExPASy's "compute pI/mw" tool (Gasteiger et al., 2005).

Amino acid composition
Amino acids are zwitterionic molecules with an amine and a carboxyl group present in their structure.Some amino acids possess side chains with specific properties that allow grouping them in different ways.The aacomp function classifies amino acids based on their size, side chains, hydrophobicity, charge and their response to pH 7 following the categories listed in table 1.The output is a matrix with the number and percentage of amino acids of a particular class.

Net charge
Some side chains of certain amino acids can confer electric charge to the proteins under certain pH values.The sum of the charges of each of the amino acids is called net charge.Antimicrobial peptides have a positive net charge (of at least +2) at pH 7, which provides binding specificity to the negatively charged bacterial membranes through electrostatic interactions (Yeaman and Yount, 2003).
The charge function compute the net charge using Equation 1, a variation of Henderson Hasselbalch equation proposed by Moore (1985) wherein N are the number, j and i index represent the acidic (Aspartic Acid, Glutamic Acid, Cysteine and Tyrosine) and basic (Arginine, Lysine, and Histidine) functional groups of amino acids, respectively.
The net charge of a protein can be calculated specifying the pH value and one of the nine pKa scales availables (Bjellqvist, Dawson, EMBOSS, Lehninger, Murray, Rodwell, Sillero, Solomon or Stryer).

Isoelectric point
The isoelectric point (pI) is the pH at which the net charge of the protein is equal to 0. It is a variable that affects the solubility of the peptides under certain conditions of pH.When the pH of the solvent is equal to the pI of the protein, it tends to precipitate and lose its biological function.Antimicrobial peptides have an isoelectric point close to 10 (Torrent et al., 2011), which is very similar to soap or detergent and consistent with the proposed mechanisms of action for these peptides.
The calculation of the isoelectric point of a peptide may be performed through the function pI specifying one of the nine pKa scales available (Bjellqvist, Dawson, EMBOSS, Lehninger, Murray, Rodwell, Sillero, Solomon or Stryer).

Aliphatic index
It has been suggested that the aliphatic amino acids (A, I, L and V) are responsible for the thermal stability of proteins.The aliphatic index was proposed by Ikai (1980) and evaluates the thermostability of proteins based on the percentage of each of the aliphatic amino acids that build up proteins.This index is computed using Equation 2 wherein X A , X V , X I and X L are the mole percent (100 x mole fraction) of Alanine, Valine, Isoleucine and Leucine respectively.
Antimicrobial peptides tend to be more thermostable than proteins in general.For the calculation of the aliphatic index, the function aindex was included.

Instability index
The instability index was proposed by Guruprasad et al. (1990).This index predicts the stability of a protein based on its amino acid composition.It is calculated according to Equation 3 where L is equal to the length of the amino acid sequence; X i Y i is a dipeptide and DIWV is the dipeptide weight value on amino acid sequence of stable proteins.
Despite their short length (variable that this function penalizes), antimicrobial peptides tend to be considered stable with index values less than 40.The instability index can be calculated using the function instaindex incorporated in Peptides The R Journal Vol.XX/YY, AAAA 20ZZ ISSN 2073-4859 > instaindex(seq = "GLPRKILCAIAKKKGKCKGPLKLVCKC") [1] 2.237037

Boman index
The potential protein interaction index was proposed by Boman (2003) as an easy way to differentiate the action mechanism of hormones (protein-protein) and antimicrobial peptides (protein-membrane) through this index.It is calculated using the Equation 4 by adding each amino acid solubilities divided by the sequence length.This function predicts the potential peptide interaction with another protein.
During its mechanism of action, antimicrobial peptides tend to not interact with other proteins (the proposed mechanism of action is based on the interaction with membranes), so the values for the Boman index are usually negative or nearby to 0. To calculate the Boman index, the boman function is included within Peptides.

Hydrophobicity index
The hydrophobicity is an important stabilization force in protein folding; this force changes depending on the solvent in which the protein is found.It is considered the driving force of the peptide to the core of the membrane.The hydrophobicity index is calculated following the Equation 5, adding the hydrophobicity of individual amino acids and dividing this value by the length of the sequence.Highly expected transmembrane peptides generally have higher hydrophobicity values than 0.5 using Eisenberg scale.

Hydrophobic moment index
The hydrophobic moment was proposed by Eisenberg et al. (1982), as a quantitative measure of the amphiphilicity perpendicular to the axis of any periodic peptide structure.To calculate the hydrophobic moment, the function hmoment was included.The hydrophobic moment is computed according to Equation 6, using the standardized Eisenberg (1984) scale, windows (fragment of sequence) of eleven amino acids and specifying the rotational angle at which it should be calculated.Highly expected transmembrane peptides generally have lower hydrophobic moment values than 0.2 > hmoment(seq = "GLPRKILCAIAKKKGKCKGPLKLVCKC", angle = 100, window = 11) [1] 0.6170697 > hmoment(seq = "GLPRKILCAIAKKKGKCKGPLKLVCKC", angle = 160, window = 11) [1] 0.4617153 The R Journal Vol.XX/YY, AAAA 20ZZ ISSN 2073-4859 Membrane position Eisenberg et al. (1982) found a correlation between hydrophobicity and hydrophobic moment that defines the protein section as globular, transmembrane or superficial.The function calculates the hydrophobicity (H) and hydrophobic moment (µH) based on the standardized scale of Eisenberg (1984)

GROMACS files
Molecular dynamics is a source of in-silico biophysical data that contribute to the design, classification and testing of antimicrobial peptides.GROMACS (GROningen MAchine for Chemical Simulations) is a molecular dynamics package designed for simulations of proteins, lipids and nucleic acids.It is free, and open source released under the GNU General Public License.
The file format used by GROMACS is XVG.This format can be displayed in graphical form through GRACE program on UNIX/LINUX systems and GNUPlot on Windows.XVG are plain text files containing tabular data separated by tabs and two types of comments which containing data labels.Although manual editing is possible is not a viable option when working with multiple files of this type.
For ease of reading, information management and data plotting, the functions read.xvg and plot.xvg were incorporated.An example of how to read and plot the absolute distance between the center of mass of an antimicrobial peptide with respect to a POPG (1-palmitoyl-2-oleoyl-sn-glycero-3phosphoglycerol) pure lipid bilayer is presented below (Figure 1).> file <-system.file(file="xvg-files/POPG.xvg",package="Peptides") > md <-read.> install.packages("caret",dependencies=TRUE) > library(Peptides) > library(caret) The dataset was calculated using all the functions included in Peptides.It is available in the data.framewith 100 observations and 23 variables called pepdata.It includes the physicochemical properties and calculable indices for 50 antimicrobial (group= 1) and 50 non-antimicrobial (group= 0) peptides downloaded from the Protein Data Bank (Bernstein et al., 1978) and the APD database (Wang et al., 2009) respectively.

> plot(train.rpart)
To evaluate the accuracy of the classifier a prediction with testing data was performed.

Figure 1 :
Figure 1: Absolute distance between the center of mass of an antimicrobial peptide with respect to a POPG pure lipid bilayer.

Figure 2 :
Figure 2:plot(train.rpart)shows the relationship between the complex parameter and the resampled estimate area under the ROC curve used to find the best model.plot(tree) shows the rules to classify the antimicrobial peptides of Pepdata dataset Peptides package requires R version 1.2.2 or higher.Development releases of the package are available on the GitHub repository http://github.com/dosorio/peptides.

Table 1 :
Classification of the amino acids according to the properties of size and side chain Antimicrobial peptides are amphipathic (with similar proportions of polar and non-polar amino acids) and charged molecules.In Peptides the amino acid composition can be computed using the function aacomp.
The R Journal Vol.XX/YY, AAAA 20ZZISSN 2073-4859 using windows of 11 amino acids for calculate the theoretical fragment type.
To evaluate the potential of these data, the best CART model was selected by a K-fold cross-validation loop.
The training and testing data sets were created with the following commands.The R Journal Vol.XX/YY, AAAA