minval: An R package for MINimal VALidation of Stoichiometric Reactions

A genome-scale metabolic reconstruction is a compilation of all stoichiometric reactions that can describe the entire cellular metabolism of an organism, and they have become an indispensable tool for our understanding of biological phenomena, covering fields that range from systems biology to bioengineering. Interrogation of metabolic reconstructions are generally carried through Flux Balance Analysis, an optimization method in which the biological sense of the optimal solution is highly sensitive to thermodynamic unbalance caused by the presence of stoichiometric reactions whose compounds are not produced or consumed in any other reaction (orphan metabolites) and by mass unbalance. The minval package was designed as a tool to identify orphan metabolites and evaluate the mass and charge balance of stoichiometric reactions. The package also includes functions to characterize and write models in TSV and SBML formats, extract all reactants, products, metabolite names and compartments from a metabolic reconstruction.


Introduction
A chemical reaction is a process where a set of chemical compounds called reactants are transformed into others called products (Chen et al., 2013).The accepted way to represent a chemical reaction is called a stoichiometric reaction, where reactants are placed on the left and the products on the right separated by an arrow which indicates the direction of the reaction, as shown in equation 1 (Hendrickson, 1997).In biochemistry, a set of chemical reactions that transform a substrate into a product, after several chemical transformations is called a metabolic pathway (Lambert et al., 2011).The compilation of all stoichiometric reactions included in all metabolic pathways that can describe the entire cellular metabolism encoded in the genome of a particular organism is known as a genomescale metabolic reconstruction (Park et al., 2009) and has become an indispensable tool for studying metabolism of biological entities at the systems level (Thiele and Palsson, 2010).
Reconstruction of genome-scale metabolic models starts with a compilation of all known stoichiometric reactions for a given organism, according to the presence of enzyme-coding genes in its genome.The stoichiometric reactions catalyzed by these enzymes are usually downloaded from specialized databases such as KEGG (Kanehisa, 2000), BioCyc (Caspi et al., 2014), Reactome (Croft et al., 2014), BRENDA (Chang et al., 2015) or SMPDB (Jewison et al., 2014).However, the downloaded stoichiometric reactions are not always mass-charge balanced and don't represent complete pathways as to construct a high-quality metabolic reconstruction (Thiele and Palsson, 2010;Gevorgyan et al., 2008).Therefore the identification and curation of these type of reactions is a time-consuming process which the researcher have to complete manually using available literature or experimental data (Lakshmanan et al., 2014).
Genome-scale metabolic reconstructions are usually interrogated through Flux Balance Analysis (FBA), an optimization method that allows us to understand the metabolic status of the cell, to improve the production capability of a desired product or make a rapid evaluation of cellular physiology at genomic-scale (Kim et al., 2008;Park et al., 2009).Nevertheless, FBA method is high sensitive to thermodynamic unbalance, so in order to increase the validity of a biological extrapolation (i.e. an optimal solution) from a FBA analysis it is mandatory to avoid this type of unbalancing in mass conservation through all model reactions (Reznik et al., 2013).Another drawback when determining the validity of a metabolic reconstruction is the presence of reactions with compounds that are not produced or consumed in any other reaction (dead ends), generally known as orphan metabolites (Park et al., 2009;Thiele and Palsson, 2010).The presence of this type of metabolites can be problematic since they lead to an artificial cellular accumulation of metabolism products which generates a bias in the biological conclusions.Tracking these metabolites is also a time-consuming process, which most of the time has to be performed manually or partially automatized by in-house scripting.Given that typical genome-scale metabolic reconstructions account for hundreds or thousands of biochemical reactions, the manual curation of these models is a task that can lead to both, the introduction of new errors and to overlook some others.
The R Journal Vol.9/1, June 2017 ISSN 2073-4859 Two of the most popular implementations of FBA analysis are COBRA (Becker et al., 2007) and RAVEN (Agren et al., 2013) which operate as tools under the commercial MATLAB environment.On the R environment side sybil (Gelius-Dietrich et al., 2013) and abcdeFBA (Gangadharan and Rohatgi, 2012) are the most common ones.COBRA and RAVEN include some functions for mass and charge balance (checkMassChargeBalance and getElementalBalance respectively).These functions identify mass unbalanced reactions, based in the chemical formula or the IUPAC International Chemical Identifier (InChI) supplied manually by the user for each metabolite included in the genome-scale metabolic reconstruction.
With the aim of minimizing the manual introduction of thousands of chemical formulas in a genome-scale reconstruction as well as to avoid the sometimes limiting use of licensed software, we have developed the minval package.The minval package includes twelve functions designed to characterize, check and depurate metabolic reconstructions before its interrogation through Flux Balance Analysis (FBA).
To show the potential use of the functions included into the minval package, a human-readable model composed by a set of 19 stoichiometric reactions that represent the glycolysis process was included.Glycolysis is the metabolic pathway that converts a molecule of glucose (C 6 H 12 O 6 ), into two molecules of pyruvate (CH 3 COCOO − + H + ) through a sequence of ten enzyme-catalyzed reactions.Glycolysis occurs in most organisms in the cytosol of the cell and can be summarized as follows:

Installation and functions
The minval package includes twelve functions and is available for download and installation from CRAN, the Comprehensive R Archive Network.To install and load it, just type: > install.packages("minval")> library(minval) The minval package requires an R version 2.10 or higher.Development releases of the package are available in the GitHub repository http://github.com/gibbslab/minval.

Inputs and syntaxis
The functions included in minval package take as input a set of stoichiometric reactions where the metabolites should be separated by a plus symbol (+) between two blank spaces and may have just one stoichiometric coefficient before the name.The reactants should be separated from products by an arrow using the following symbol => for irreversible reactions and <=> for reversible reactions.The data can be loaded from traditional human-readable spreadsheets through other CRAN-available packages such as gdata, readxl or xlsx.To load the included glycolysis model just type: > glycolysisFile <-system.file("extdata","glycolysisModel.csv",package = "minval") > glycolysisModel <-read.csv(file= glycolysisFile,

Syntax validation
The first step for a metabolic reconstruction validation is to check the syntax of their stoichiometric reactions.The validateSyntax function validate the syntax (Equation 1) of all reactions in a metabolic reconstruction for several FBA implementations (i.e.COBRA and RAVEN) and returns a boolean value TRUE if the syntax is correct.Syntax validation is a critical step due valid stoichiometric reactions are required to write models in TSV or SBML formats.

Label Description
Default Value ID A list of single character strings containing the reaction abbreviations, Entries in the field abbreviation are used as reaction ids, so they must be unique.

Mandatory
DESCRIPTION A reaction description Optional (the column can be empty) REACTION A set of stoichiometric reactions with the previously described characteristics.
Mandatory GPR A set of genes joined by boolean operators as AND or OR, rules may be nested by parenthesis.GPR rules represent the relationship between genes to syntetize the required enzyme or enzymes to catalyze the stoichiometric reaction.
Optional (the column can be empty) LOWER.BOUND A list of numeric values containing the lower bounds of the reaction rates.If not set, zero is used for an irreversible reaction and -1000 for a reversible reaction.

SBML files
The standard format to share and store biological processes such as metabolic models is the Systems Biology Markup Language (SBML) format.The minval package includes the writeSBMLmod function which is able to write models in SBML format as follows: > writeSBMLmod(modelData = glycolysisModel, + modelID = "Glycolysis", + outputFile = "glycolysis.xml")solution is optimal value of objective function (fba): 6.000000 value of objective function (model): 6.000000 The R Journal Vol.9/1, June 2017 ISSN 2073-4859 The interrogated glycolysis model estimates a production of six molecules of pyruvate by each alpha-D-Glucose molecule, probably due a mass unbalance in their stoichiometric reactions.FBA methods are sensitive to thermodynamic (mass-charge) unbalance, so in order to achieve a valid biological extrapolation is mandatory to avoid this type of unbalancing in all model reactions.

Mass-Charge balance validation
The second step for a metabolic reconstruction validation is to check the stoichiometric reactions mass-charge balance.In a balanced stoichiometric reaction according to the Lomonosov-Lavoisier law, the mass comprising the reactants should be the same mass present in the products.This process requires the use of a reference with chemical formulas, molecular weights and/or net charges for each metabolite included in the metabolic model.
Reference values for each metabolite can be manually provided or downloaded through the downloadChEBI function included into the minval package from the Chemical Entities of Biological Interest (ChEBI) database, a freely available dictionary of molecular entities focused on 'small' chemical compounds involved in biochemical reactions.To download the latest version of the ChEBI database just type: > ChEBI <-downloadChEBI(release = "latest", + woAssociations = TRUE) The checkBalance function included into the minval package can test mass-charge balance using a user-given reference of formulas, masses or charges.The checkBalance function returns a boolean value TRUE if stoichiometric reaction is balanced.For this example an user provided reference was used.

] TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [13] TRUE TRUE TRUE TRUE TRUE TRUE TRUE
As is shown above, the third stoichiometric reaction is mass-unbalanced.It can be corrected replacing manually the unbalanced reaction by a balanced one as follows:

TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [16] TRUE TRUE TRUE TRUE
When all stoichiometric reactions are mass-balanced, then the model can be exported and loaded to be interrogated again: > writeSBMLmod(modelData = glycolysisModel, + modelID = "GlycolysisBalanced", + outputFile = "glycolysisBalanced.xml")> sybil::optimizeProb(sybilSBML::readSBMLmod("glycolysisBalanced.xml")) The R Journal Vol.9/ 2.000000 As shown above, the correct mass-charge balance allows predicting in an accurate way the net yield of pyruvate from an alpha-D-glucose molecule through the glycolytic pathway using FBA analysis.

Characterize model
A metabolic reconstruction generally includes three types of reactions compartmentalized, transport and exchange reactions.The compartmentalized reactions are those in where all involved metabolites (all reactants and products) are assigned to the same compartment.e.g.This function counts the number of reactions, computes the relative frequency of each reaction type (transport, exchange and compartmentalized), computes the relative frequency of reactions by compartment, counts the number of unique metabolites and computes the relative frequency of metabolites by compartment.The characterizeReactions function returns all these information as a labeled list.To show it potential use, the RECON 2.04 Human Metabolic Reconstruction (Thiele et al., 2013) was included in a human-readable format.To load and characterize it just type: > # Loading the Human Metabolic Reconstruction RECON 2.04 > RECON <-read.csv(system.file("extdata","rRECON2.csv",+ package = "minval")) > # Characterizing the stoichiometric reactions > charRECON <-characterizeReactions(reactionList = RECON$REACTION) > charRECON $nReactions [1]

Stoichiometric matrix
A metabolic reconstruction is often represented in a more compact form called the stoichiometry matrix (S).If a metabolic reconstruction has n reactions and m participating metabolites, then the stoichiometry matrix will have correspondingly m rows and n columns.Values in the stoichiometric matrix represent the metabolite coefficients in each reaction.To generate the stoichiometric matrix of a metabolic reconstruction just type:

Reactants and products
As described before, stoichiometric reactions represent the transformation of reactants into products in a chemical reaction.The reactants and products functions extract and return all reactants or products respectively in a stoichiometric reaction as a vector.If reaction is irreversible ( => ) then reactants and products are separated and returned afterward as follows: In reversible cases ( <=> ) all reactants at some point can act as products and vice versa, for that reason both functions return all reaction metabolites:

Metabolites
The metabolites function automatically identifies and lists all metabolites (with or without compartments) for a specific or a set of stoichiometric reactions.This list is usually required for programs that perform FBA analysis as an independent input spreadsheet.In this example we show how to extract all metabolites (reactants and products) included in a metabolic reconstruction with and without compartments.
generating dead-ends without flux.The orphanMetabolites function extracts all orphan compounds included into a metabolic reconstruction.

> orphanMetabolites(reactionList
Due not all orphans are not consumed and not produced, the orphanReactants function, identifies compounds that are not produced internally by any other reaction and should be added to the reconstruction, for instance, as an input exchange reaction following the protocol proposed by Thiele and Palsson (2010).

> orphanReactants(reactionList
By another side, the orphanProducts function, identifies compounds that are not consumed internally by any other reaction and should be added to the reconstruction, for instance, as an output exchange (sink) reaction.

Compartments
As well as in eukaryotic cells, in which not all reactions occur in all compartments, stoichiometric reactions in a metabolic reconstruction can be labeled to be restricted to a single compartment during FBA, by the assignment of a compartment label after each metabolite name.Some FBA implementations require the reporting of all compartments included in the metabolic reconstruction as an independent section of the human-readable input file.In this example, we show how to extract all compartments for all reactions included in the RECON 2.04 Human Metabolic Reconstruction (Thiele et al., 2013).

TSV files
Additional to the SBML format, the TSV format is the default input of metabolic models for the sybil R package.The TSV format is composed of three text files, following a character-separated (tab by default) value format where each line contains one entry (stoichiometric reaction and associated info).

Summary
We introduced the minval package to check the syntax validity, evaluate the mass-charge balance and extract all orphan metabolites of a set of stoichiometric reactions.Together, this steps represent the minimal validation that should be performed in a genome-scale metabolic reconstruction.Functions to characterize and export metabolic models in SBML and TSV formats as well as to extract all The R Journal Vol.9/1, June 2017 ISSN 2073-4859 reactants, products, metabolite names and compartments for a set of stoichiometric reactions were also introduced.Moreover, we also show in a step by step fashion, how this minimal evaluation process of mass balance can avoid an overestimation of the the net yield of pyruvate from an alpha-D-glucose molecule when using an unbalanced model of the glycolysis pathway.

Figure 1 :
Figure 1: Distribution by type (left) and by compartments (right) of the reactions included into the RECON 2.04 Human Metabolic Reconstruction(Thiele et al., 2013).
UPPER.BOUND A list of numeric values containing the upper bounds of the reaction rates.If not set, 1000 is used by default.
After load the metabolic model, it can be interrogated through FBA using the optimizeProb function of the sybil R package.In this case, the reaction R00200 was set as the objective function.The R00200 reaction describes the production of pyruvate from phosphoenolpyruvate, an alpha-D-Glucose derivate.
The transport reactions are those in where the involved metabolites are assigned to two or more compartments.e.g. 2 hco3[e] + na1[e] <=> 2 hco3[c] + na1[c], and finally, the exchange reactions are those used to import or release metabolites to the boundary.e.g.acetone[e] <=>.Characterize the stoichiometric reactions of a metabolic model is a required and time-consuming work.The minval package includes the characterizeReactions function to characterize the stoichiometric reactions and metabolites by type and compartment.