Nutrient pollution affects fresh and coastal waters around the globe. Planning mitigating actions requires tools to assess fluxes of nutrient emissions to waters and expected restoration impacts. Conceptual river basin models take advantage of data on nutrient emissions and concentrations at monitoring stations, providing a physical interpretation of monitored conditions, and enabling scenario analysis. The GREENeR package streamlines water quality model in a region of interest, considering nutrient pathways and the hydrological structure of the river network. The package merges data sources, analyzes local conditions, calibrate the model, and assesses yearly nutrient levels along the river network, determining contributions of load in freshwaters from diffuse and point sources. The package is enriched with functions to perform thorough parameter sensitivity analysis and for mapping nutrient sources and fluxes. The functionalities of the package are demonstrated using datasets from the Vistula river basin.
Nitrogen and phosphorus are key nutrients that heavily impact aquatic ecosystems. Detecting primary sources of nutrient pollution and their downstream spread, alongside assessing achievable reductions through restoration policies, are crucial for effective natural resource management. They aid in identifying priority intervention areas and planning actions to restore the ecological balance of receiving waters.
Modelling tools can be useful to assess the impacts of future scenarios, policy measures, and climate changes at the regional and continental scale (Seitzinger et al. 2005; Ludwig et al. 2010; Arheimer et al. 2012; Bouraoui et al. 2014; Bartosova et al. 2019; Beusen et al. 2022), and to check the coherence of different policy targets, for instance between water and agricultural policies. The assessment of policy scenarios requires a flexible and spatially detailed analysis to account for climatic, hydrological, and socio-economic gradients (Grizzetti et al. 2021).
Several types of models can be applied to predict the transport of nutrients in river basins Fu et al. (2019). Among them, statistical or conceptual models have the advantage of being readily applied in large watersheds. These model rely on calibration of few parameters for establishing links between emissions at sources and fate in the stream network. They take full advantage of nutrient emissions and concentrations data at monitoring stations, which are now accessible with increasing spatial and temporal resolution.
A classic example of river basin conceptual model is SPARROW (Smith et al. 1997; Schwarz et al. 2006), which is widely applied in the U.S. to assess water quality over large regions. SPARROW has inspired the Geospatial Regression Equation for European Nutrient (GREEN) losses model (Grizzetti et al. 2005; Grizzetti et al. 2012, 2021) which is adapted to European conditions. GREEN has been used for assessing the nutrient loads to the European seas (Grizzetti et al. 2012, 2021), nitrogen retention in European freshwaters (Grizzetti et al. 2012; La Notte et al. 2017), and for policy scenario analysis (Bouraoui et al. 2014; Leip et al. 2015; La Notte et al. 2017; Malagó et al. 2019; Grizzetti et al. 2021). Despite their usefulness, this type of river basin models are seldom compiled as dedicated R packages, for example SPARROW has some scripts available to process information for and analysis of models results in the R environment.
The GREEN model application comprises several key steps, including data extraction and organization, data harmonization and integration, examination and validation of input data sets, model calibration and parameter selection, model run, and result visualization. All of these features are now integrated into an R package (GREENeR) that comprises functions to streamline the process, evaluate and visualize all the steps, thus strengthening the robustness of model application.
GREEN (Grizzetti et al. 2005, 2008; Grizzetti et al. 2012, 2021) is a conceptual model to assess total nitrogen TN and total phosphorous TP from a region of interest (usually a river basin), accounting for both diffuse and point sources. The package allows the analysis of different scenarios of nutrient input in the region of interest, where “scenario” indicates a combination of annual time-series of inputs, such as nitrogen or phosphorus and the topological structure of the region. The model comprises several nutrient sources and pathways:
Once in the river network, all nutrient loads are reduced by in-stream retention in rivers and lakes.
Basin retention of agriculture sources is a decay function proportional to the inverse of the total annual precipitation in the catchment. Conversely, river retention is a decay function proportional to the river length, considered as a proxy for water residence time. Finally, lake retention is simulated as a function of the lakes residence time and average depth.
The basin is divided into spatial subunits (called catchments), with a given area, a river reach, an inlet node, and an outlet node. The catchments are topologically connected from the headwaters to the outlet in a cascading sequence. The sequence of nutrient load accumulation through the stream network is defined by Shreve’s order (Shreve 1966). Nutrient input from the different sources, basin and river retention are simulated in each catchment and routed through the river network. For each catchment \(i\) in the basin, the GREEN nutrient load \(L_i\) is estimated by the general equation:
\[\begin{equation} L_{i,y} = (1-Lret_i) \cdot (DSA_{i,y} \cdot (1-Bret_{i,y} )+ DSB_{i,y} + PS_{i,y} +U_{i,y} ) \cdot (1-Rret_i) \tag{1} \end{equation}\]where \(L_{i,y}\) is the nutrient load at the catchment outlet-node (ton/yr) in \(y\) year; and the other variables represent different sources and sinks of nutrients.
Sources of nutrients are:
\(DSA_{i,y}\). Annual nutrient diffuse sources on agricultural land in the catchment (ton/yr): mineral and manure fertilization, atmospheric nitrogen deposition, plant and soil fixation for TN; mineral and manure fertilization, and background losses for TP.
\(DSB_{i,y}\). Annual nutrient diffuse sources in the catchment related with scatter dwellings and atmospheric nitrogen deposition on non-agricultural land for TN (\(DSB_{i,y} = 0.38 \cdot FF_{i,y} \cdot AtmN_{i,y} + sd_{coeff} \cdot SdN_{i,y}\), where \(FF_{i,y}\) is the fraction of non-agricultural land cover in the catchment, and \(AtmN_{i,y}\) is the annual atmospheric nitrogen deposition on the catchment (ton/yr)); nutrient diffuse sources in the catchment related with scatter dwellings and background losses on non-agricultural areas for TP (\(DSB_{i,y} = sd_{coeff} \cdot SdP_{i,y})\).
\(PS_{i,y}\). Nutrient point sources in the catchment (ton/yr).
\(U_{i,y}\). Nutrient load from upstream catchments (ton/yr).
Sinks of nutrients are:
where \(z\) represents the average lake depth (m); \(RT\) is the hydraulic residence time (yr); and \(dref\) denotes a nutrient-related coefficient (\(dref = 7.3\) for TN and \(dref = 26\) for TP).
where \(NrmInvRain_{i,y}\) is the inverse of annual precipitation (mm) of the \(i\) catchment in \(y\) year, normalized by its maximum (Frank and Todeschini 1994). To keep basin retention coefficients comparable across regions, the minimum precipitation \(minPrec\) is set to \(50\) mm/y, and thus \(NrmInvRain_{i,y}\) is defined as
\[\begin{equation*} NrmInvRain_{i,y} = \frac{1 / \max(50, precipitation_{i,y})}{(1 / minPrec)} \end{equation*}\]where \(NrmLengthKm_i\) is the length (km) of the catchment reach, normalized by the maximum in the dataset:
\[\begin{equation*} NrmLengthKm_i = \frac{\textrm{catchment length } i \textrm{ reach, in km}}{\max(\textrm{Reach length in the region, in km})} \end{equation*}\]Since the maximum reach length depends on the region and its subdivision of network reaches, the calibrated coefficient \(alpha_L\) cannot be compared across regions that adopt different discretization. Note that basin retention varies from year to year according to annual precipitation, whereas the fraction of river and lake retention for a given catchment is constant in time.
Equation (1) is applied sequentially from the most upstream nodes to the basin outlet. The model parameters are:
This work presents a new efficient and enhanced implementation of the model GREEN developed as an R package named GREENeR. GREENeR is developed in the R statistical software and provides tools and methods for applying GREEN to an area of interest and assessing annual time series of nutrient loads in a river network and at the basin outlet, plus contributions of nutrient sources to these loads. Some of the key features of the package comprise: a) functions for the creation of scenarios from data sources; b) computational efficiency; c) parallel-capable ; d) an extended suite of fine tuning options to control model calibration; e) built-in parameter sensitivity analysis; and f) functions to perform customised post-process analyses.
The functions of GREENeR package are arranged in three groups: i) functions to perform graphical summaries of model inputs; ii) functions to perform model parameters calibration and sensitivity analysis; iii) functions to compute, analyze and visualize through graphs and maps the model outputs, i.e. total loads and contributions by source. The package supports parallel processing, which helps reduce the computational load in handling large basins.
The time-series can represent either historic (current or past) conditions that can be associated to observations for model calibration, or hypothetical conditions foreseen under theoretical changes (e.g. to forecast results of nutrient management plans, or under climatic change).
A scenario contains:
The geospatial geometry of the catchments of the region, which currently is defined according to the Catchment Characterisation and Modelling Database for European Rivers v2 (CCM2) (De Jager and Vogt 2007).
The annual time series (1990-2018) of nutrient inputs per catchment.
Additional catchment information (annual precipitation, annual forest fraction, lake retention, reach length).
Observed total load (TN or TP) from monitoring stations.
Two historical scenarios, one for TN and one for TP, of the Lay basin (France) are provided with the GREENeR package. The Lay basin has an area of \(1971\) km\(^2\), and is divided into \(189\) CCM2 catchments. The mean catchment area is \(10.4\) km\(^2\). Nutrient observations comprised \(22\) TN entries from six monitoring stations and \(58\) TP data entries from eight monitoring stations (data from WISE Waterbase (EEA 2021)).
Further, in this article we present examples drawn for the TN historic scenario of the transboundary Vistula (Wisla) river basin, one of the largest river basins of Europe. The Vistula scenario comprises \(15465\) CCM2 catchments that compose the \(193.894\) km\(^2\) river basin, TN inputs for 1990-2018, as well as 1364 observed TN loads from 412 monitoring stations. The TN input dataset is part of the larger dataset created to assess nutrient concentration in the European freshwaters (Vigiak et al. 2023).
A brief overview of the package and its main functions is given in Figure 1.
The key functions included in the package are:
read_geometry()
: imports the geospatial vector format (shapefile or ESRI shapefile) with the spatial information of the catchments in R.
read_NSdata()
: imports the annual time series of nutrient inputs per catchment and type of nutrient source (manure, mineral fertilization, point sources, scatter dwellings), plus the forest fraction, and the observed annual nutrient loads from monitoring station data. The original data are stored in several comma-separated tables (CSV files).
input_maps()
: creates a map showing the mean nutrient load input by source.
input_plot()
: creates either a grouped barplot representing the average input load by source for the whole basin, or three density plots showing the distribution of nutrient sources.
input_Tserie()
: creates a time series plot showing basin inputs by source.
shreve()
: returns the Shreve order of the catchments, a useful indicator of stream size, discharge, and drainage area (Strahler 1957) calculated based on the sum of the orders of up of upstream tributaries (Shreve 1966). Commercial GIS software usually provides a Shreve calculation function, but in other software, it is harder to find. In GREEN, the Shreve’s order defines the cascade of upstream-downstream catchments.
calib_green()
: conducts sensitivity analysis of model calibration utilizing a Latin Hypercube Sampling (LHS) scheme (Manache and Melching 2004) for three model parameters. It evaluates model performance by calculating several “goodness-of-fit” (GoF) metrics (Refsgaard and Henriksen 2004) against available observations during the specified simulation period (years).
select_params()
: extracts the best parameter set according to a selected GoF metric from the object generated by the calibration function, calib_green()
.
calib_boxplot()
: creates a figure of boxplots that show the relationships between the best parameter sets determined by the specific GoF metric in each boxplot title, and six other commonly used hydrological metrics (Mauricio Zambrano-Bigiarini 2014; Althoff and Rodrigues 2021). In the lower panel, the figure also highlights the distribution of model parameters; the value of the best parameter set is marked as a red dot in each boxplot.
region_nut_balance()
: runs the GREEN model with the selected parameter set, and returns the mean annual mass balance of nutrient fluxes for the whole simulation period of the region. The results of this function can be visualized using a Sankey diagram via the N4_sankey()
function.
green()
: runs the GREEN model with the selected parameter set and returns nutrient load time series for the simulation period. It generates less information than the green_shares()
function, but its execution is faster, so it is used as a function for calibration iterations and is embedded in the calib_green()
function.
green_shares()
: runs the GREEN model with the selected parameter set and returns time series of nutrient loads and the contributions of each nutrient source in the simulation period. The results of the model can be examined by nutrient_tserie()
and nutrient_maps()
functions.
scatter_plot()
: generates dot plots correlating parameters realizations in the calibration data with GoF metric, visualizing the impact of each parameter on model outcomes. The plot vary based on selected GoF metric from green_calib()
function.
simobs_annual_plot()
: generates scatter plots comparing model load predictions (PredictLoad) with observations (ObsLoad) for each year of the stimulation period.
GREENeR requires information on the nutrient inputs, the topology, and the geospatial geometry of the region of interest. The spatio-temporal input data must include all nutrient source fields, and differs in the two nutrient scenarios (TN or TP). In the case of TN (e.g. Figure 2), fields are (Equation (5) in Appendix 1):
\(Atmospheric\): Annual amount of atmospheric nitrogen deposition (ton/yr).
\(Mineral\): Annual amount of nitrogen from mineral fertilisers (ton/yr).
\(Manure\): Annual amount of nitrogen in manure fertilisers (ton/yr).
\(Fix\): Annual amount of nitrogen fixation by leguminous crops and fodder (ton/yr).
\(Soil\): Annual amount of nitrogen fixation in soils (ton/yr).
\(Sc. Dwellings\): Nitrogen input from scattered dwellings (ton/yr).
\(PointS\): Nitrogen input from point sources (ton/yr).
In the case of TP, fields are (Equation (6) in Appendix 1):
\(Background\): annual amount of phosphorus from background losses (ton/yr).
\(Mineral\): annual amount of phosphorus from mineral fertilisers (ton/yr).
\(Manure\): annual amount of phosphorus in manure fertilisers (ton/yr).
\(Sc. Dwellings\): phosphorus input from scattered dwellings (ton/yr).
\(PointS\): phosphorus input from point sources (ton/yr).
European datasets for 1990-2018 generated to assess historic and current nutrient fluxes (Vigiak et al. 2023) are available upon request. The model and the package are compatible with an external dataset and can be customized with local information as long as the data structure is respected.
In both nutrient cases, GREENeR needs additional annual catchment information:
\(ForestFraction\): annual fraction (0-1) of non-agricultural land cover area in the catchment (\(FF\) in Equations (5) and (6), in the Appendix 1).
\(Rain\): annual precipitation (mm, Equation (3)).
\(LakeFrRet\): lake retention fraction (0-1) (\(Lret\) in Equation (1), in the Appendix 1). Note, this differs for TN or TP scenarios (Equation (2)). At European scale, average lake depth and hydraulic residence time can be obtained from HydroLAKES database (https://www.hydrosheds.org/pages/hydrolakes, Messager et al. (2016)).
Length: is the length (km) of the catchment reach (Equation (4)).
The catchment topology outlines the hydrological network configuration within the region. Each catchment must have a unique numerical identifier (HydroID)), to establish the network structure via a source HydroID and destination HydroID table. It is important to note that any outlet of the basin will be given as destination HydroID identifier “-1”. Complementing the topology of the hydrologic network, the length of catchment reach should also be included (Equation (4)).
Finally, in order to calibrate the model, it is necessary to have some observed nutrient loads from monitoring stations associated to any catchment of the region. Ultimately, the quality, size, and spatial distribution of observed loads determine the robustness of the calibration process. In Europe, a large dataset of annual concentrations is publicly available (WISE Waterbase, EEA 2021), but annual flow must be derived from other sources (Grizzetti et al. 2021; Vigiak et al. 2023).
The GREENeR package performance and memory usage depends on the size of the region (number of catchments) and the number of years of the simulation. As a reference, the Danube basin (i.e. the largest European basin, with \(138013\) catchments) TN dataset for 1990-2018 requires approximately \(389\) Mb of memory to store the annual data. Table 1 shows the memory occupied by the scenarios for 6 European basins. In addition, it shows the computation times required in the execution of some key functions of the package. All executions were conducted on two computer configurations:
CPU1: Desktop running window 10 with one Intel(R) Core i5-8259U CPU (4 cores, two threads per core) CPU at 2.30 GHz and 16 GB of RAM memory
CPU2: Workstation running Linux with one AMD EPYC 7352 (24 cores, two threads per core) CPU at 2.3 GHz and 128 GB of RAM memory
The time required for the execution of the different functions increases linearly with the number of catchments (or Shreve order) of the region (Table 1). The calib_green()
function is the most computationally demanding. However, the parallel implementation of this function drastically reduces the time required for this process. Calibration can be carried out in about 6 hours for practical implementation in large basins.
Basin name | Shreve order | Area km\(^2\) | Number of catchments | Mem MB | CPU1 sec | CPU2 sec | CPU1 sec | CPU2 sec | CPU1 sec | CPU2 sec |
---|---|---|---|---|---|---|---|---|---|---|
Lay | 95 | 1971 | 189 | 0.5 | 6 | 4 | 15 | 12 | 316 | 31 |
Miño | 2803 | 16985 | 5572 | 15.7 | 57 | 45 | 147 | 90 | 4110 | 394 |
Seine | 2902 | 75989 | 5793 | 16.3 | 77 | 55 | 177 | 106 | 6320 | 452 |
Ebro | 9351 | 85611 | 18568 | 52.3 | 160 | 123 | 425 | 224 | 10876 | 995 |
Vistula | 7757 | 193894 | 15465 | 43.6 | 132 | 100 | 314 | 187 | 9004 | 814 |
Danube | 69505 | 802032 | 138013 | 388.6 | 1120 | 766 | 2855 | 1108 | 94682 | 5638 |
The entire procedure is summarized on Figure 1. Once the input data have been uploaded, the region scenario (Nutrient data and Catch data) is generated. The calib_green()
function explores the parameter set ranges with LHC scheme, calculating GoF of parameter sets. Sensitivity analysis of its results is conducted to determine the best parameter set. Finally, the green_shares()
function is used to estimate the nutrient loads and source apportionment (i.e. contribution to loads by source), per year and per catchment.
Assembling input data for running the GREEN model is time consuming. To facilitate the process, the read_NSdata()
function assembles and organises annual information to generate a list of two objects: Nutrient Time Series Data Object
and Catch Data Object
of GREEN scenario. It needs four CSV files with specific data, namely:
Time-series of annual nutrient inputs per catchment.
Time-series of annual observed loads at monitoring stations.
Basin topology and lake properties.
Other: precipitation, forest fraction, reach length.
The spatial identifiers (HydroID) and temporal (year) units must be coherent in all the files. Besides organizing the input data, the read_NSdata()
function also calculates the Shreve order of each catchment, normalises the precipitation (calculating \(NrmInvRain_{i,y}\)) and the reach length (\(NrmLengthKm_i\)).
csv_path <- "data/csv/"
scen <- read_NSdata(csv_path,"nutrients.csv","monitoring.csv", "forestFr.csv",
"precipitation.csv","topology.csv", "lakeProperties.csv",
"length.csv")
nutrient <- scen[[1]]
catch <- scen[[2]]
The shreve()
function can calculate the Shreve order independently based on the topology of the basin:
shreve_order <- shreve(catch)
Finally, the geospatial geometry of the catchments should be uploaded to enable the visualisation functionalities in map form. The read_geometry()
function loads the geometry information file, i.e. a geospatial vector with the geometry of the catchment region:
geometry <- read_geometry(file = "data/shapes/Wisla.shp")
The geo-reference information, geometry and attributes of the spatial entities must be in a shapefile format (.shp), editable with ArcGIS or similar software. The identifiers of each catchment (HydroID) must be consistent with those in the CSV files of the scenario dataset.
Once the scenario has been generated, the library includes functions to examine the nutrient sources in a basin, and explore their distribution over time or in space. The input_plot()
function provides annual average nutrient loads for the whole period and density plots of nutrient loads. The input_Tserie()
function allows to examine the temporal evolution of the inputs, whereas the input_maps()
function generates maps of nutrient inputs distribution in the basin (Figure 2).
input_maps(nutient, catch, geometry, "Wisla", "gr1")
The calibration process is essential to find a suitable set of parameters for the model. In GREENeR package, the core function calib_green()
enables to run a parameter sensitivity analysis and concurrently assessing the model performance according to several GoF metrics.
Sensitivity Analysis (SA) investigates how the variation in the output of a numerical model can be attributed to variations of its input factors. SA aims at identifying the most influential inputs or parameters, and quantifying how much they contribute to the variability/uncertainty of the model outputs. SA provides information on how much of the output variance is controlled by each parameter, or combination of parameters. SA is increasingly being used in environmental modelling for a variety of purposes, including uncertainty assessment, model calibration and diagnostic evaluation, dominant control analysis, and robust decision-making (Saltelli et al. 2011; Butler et al. 2014; Norton 2015; Pianosi et al. 2016; Mannina et al. 2018). This is achieved by running the model for many samples of the parameter space to determine their impact on the model outputs. SA allows identification of the parameters and input variables that strongly influence the model response (model output). Conversely, it may be of interest to the modeller to verify that although some model parameters may not be very well established they do not significantly contribute to output uncertainty. Saltelli et al. (2000) single out three main classes of SA methods: screening, local, and global methods. Local sensitivity analysis methods focus on assessing the impact of small variations in the input values of a model on the output results. Global sensitivity methods seek to understand how variations in the full range of input values affect the results. Puy et al. (2022) provide a comprehensive overview of several R packages for performing global sensitivity analysis.
Screening methods are economical and qualitative methods. Only screening methods are included in this implementation of the library as they provide a quick assessment of the relative importance of variables. This is particularly useful in exploratory studies and in cases where an initial subset of relevant variables needs to be identified.
Adequate sampling of parameter space is very important in model calibration. Several studies of uncertainty analysis in water resources problems (Melching and Bauwens 2001) concluded that Monte Carlo simulation and Latin Hypercube Sampling (LHS) (McKay et al. 1979; Carnell 2012) methods are very powerful, robust, and flexible. Other approaches are possible (Mauricio Zambrano-Bigiarini 2014), but the LHS has the advantage that it is easily parallelisable, it explores the full range of parameter sets, it does not direct the search depending on the values of previous iterations like other optimisation methods (gradients, genetic algorithms, etc.), and is therefore independent of the GoF metric. This allows performing a posteriori sensitivity analysis for several metrics without having to repeat the process.
The calib_green()
function computes 15 GoF metrics (Mauricio Zambrano-Bigiarini 2014; Althoff and Rodrigues 2021) (listed in Appendix 2), and returns an object with all parameter sets generated by the LHS and the corresponding GoF scores. Running calib_green()
requires the following settings:
The expected ranges for each parameter. The ranges are defined by two vectors of three values, one for the lower limits and one for the upper limits of the three parameters. The values correspond to each model parameter in sequence: \(alpha_P\) (Equation (3)), \(alpha_L\) (Equation (4)), and \(sd_{coeff}\) (Equations (5) and (6)).
The number of iterations to be performed during the calibration process. The higher the number of iterations, the more likely it is to achieve good results, but the longer the computation time. Depending on the basin and parameter range width, it is recommended to run at least 200 iterations to have enough information to continue the calibration process.
The years to be included in the calibration process.
n_iter <- 2000
low <- c(10, 0.000, 0.1)
upp <- c(50, 0.1, 0.9)
years <- c(1990:2018)
calibration <- calib_green(nutrient, catch, n_iter, low, upp, years)
The calibration process is automatically parallelised by the package and uses all computer cores except one. The computation time depends on the computer, the number of catchments in the basin, and the number of iterations (Table 1). The calib_green()
function returns a data frame with a row for each iteration with parameter values and the resulting 15 GoF metric scores. An example of four parameter sets is shown below (cut to the first five GoF metrics).
alpha_P alpha_L sd_coeff NSE mNSE rNSE KGE PBIAS %
s1 39.31203 0.07628558 0.6325056 0.5623 0.6452 0.9595 0.2512 -44.1
s2 35.06049 0.05510985 0.1577470 0.6733 0.6948 0.9563 0.3763 -36.3
s3 28.34947 0.04418381 0.7209566 0.8690 0.7737 0.8998 0.7926 -1.7
s4 40.73533 0.01351723 0.6195734 0.8370 0.7734 0.9607 0.6292 -22.6
The selection of the appropriate GoF metric(s) for calibration and evaluation of hydrological models can be challenging even for very experienced hydrologists. Choosing the right GoF metric for model calibration largely depends on the overall study scope, which defines the main interest (e.g. high or low load, upper or lower catchment area), and on the available observation dataset (size and quality) (Kumarasamy and Belmont 2018). Automated calibration typically relies, often exclusively, on a single GoF metric, with the Nash-Suttclife efficiency (NSE) been the most frequently used metric in hydrological models (Gupta et al. 2009; Westerberg et al. 2011; Wöhling et al. 2013). As described in Singh and Frevert (2002), a single criterion is inherently predisposed to bias towards certain components of the data time series. Automated procedures may be improved by using more than one criteria as discussed by Gupta et al. (1998). Although automation can help the calibration process become more objective, efficient and practical, it is not a substitute for expert hydrologic intuition and understanding. Whether automated or manual calibration is used, a common approach is to adjust the parameters that display the highest sensitivity (Madsen 2003; Doherty and Skahill 2006).
GREENeR includes several functions to assist in selecting the best parameter set. The calib_boxplot()
function shows (Figure 3) relationships between best parameter sets chosen according to one GoF parameter (title of each boxplot) in relation to six most frequently used metrics. Additionally, in the lower panel, the figure shows the distribution of model parameters in the most performing subset of parameter sets. The best parameter set according to each GoF metrics is marked as a red dot in each boxplot (Figure 3).
calib_boxplot(calibration, rate_bs = 5)
The select_params()
function extracts the parameter set that scored the best GoF metric of choice from the calibration result object.
best_params <- select_params(calibration, param = "NSE")
alpha_P <- best_params$alpha_P
alpha_L <- best_params$alpha_L
sd_coeff <- best_params$sd_coeff
It is not recommended to use the parameters extracted by the select_params()
function without having performed an analysis of all the calibration results. Instead, alternative parameter sets (according to different GoF) should be compared before making the final selection.
Screening and SA of model parameters can be done via the scatter_plot()
and calib_dot()
functions of GREENeR package. The scatter_plot()
function shows all parameter realizations in the LHS dataset against a selected GoF metric to visualize the influence of each parameter on the metric scores. The result depends on the GoF metric of choice (any metric calculated with calib_green()
can be selected) (Figure 4).
scatter_plot(calibration, param = "R2")
Figure 4 example shows that the highest R2 are achieved for values of the parameter \(alpha_P\) around \(33.3\) and for the parameter \(alpha_L\) around \(0.025\), whereas the model is insensitive to the \(sd_{coef}\) parameter. Further, scatter_plot()
function helps check if the parameter ranges defined in the search were suitable, or if the number of iterations was sufficient. In Figure 4, the best values of the parameters \(alpha_P\) and \(alpha_L\) are in the central part of the plots, thus the search parameter ranges were appropriate, and the distribution of dots is sufficient to visualize the effect of each parameter, indicating that the number of iterations was adequate.
The calib_dot()
function shows the distribution of parameters in relation to each other for a chosen GoF metric, and highlights potential correlations between parameters (Figure 5).
calib_dot(calibration, param = "KGE")
In Figure 5, the Kling-Gupta Efficiency metric (KGE, Althoff and Rodrigues 2021) shows interactions between \(alpha_P\) and \(alpha_L\), i.e. a higher \(alpha_P\) should be associated to a lower \(alpha_L\) for achieving similar KGE. The best \(alpha_P\) values identified for both R2 and KGE metrics are similar, whereas \(alpha_L\) values vary considerably (see also Figure 3).
The GREENeR package includes two useful model calibration functions: compare_calib()
and simobs_annual_plot()
. compare_calib()
shows a scatter plot that compares observed data points with corresponding values predicted by two parameter sets. Figure 6 displays a plot generated by the compare_calib()
for the Vistula (TN) basin scenario when using the best parameter sets based on NSE and rNSE metrics. The plot reveals that when rNSE parameter set result in an underestimation of loads in the upper range.
metrics <- c("NSE","rNSE")
compare_calib(nutrient, catch, alpha_p1 = 31, alpha_l1 = 0.02, sd_coef1 = 0.6,
alpha_p2 = 50, alpha_l2 = 0.01, sd_coef2 = 0.6, years = c(1990:2018),
name_basin = "Wisla", metrics)
The simobs_annual_plot()
function compares observed and predicted values over the years (Figure 7), which identifies erroneous model performance or data distribucion problems over time.
year_range <- c(1994, 1996:2001, 2006:2009, 2012)
simobs_annual_plot(nutrient, catch, alpha_P, alpha_L, sd_coeff, year_range,
name_basin = "Wisla", max_value = 10000)
Once the most appropriate parameter set has been selected, it is possible to run the model to estimate nutrient loads in the region. The green_shares()
function runs GREEN with the selected parameter set and returns catchment nutrient loads and the contributions by source in the simulation period.
The nutrient_tserie()
function shows the temporal evolution of the total load at the basin outlet in the simulation period with contributions by sources (Figure 8). Other options of the function show nutrient loads in different zones of the basin (upper, central and lower part).
nutrient_load <- green_shares(nutrient, catch, alpha_P, alpha_L, sd_coeff, years)
nutrient_tserie(nutrient_load, geometry, "Wisla basin", "gr3")
The nutrient_maps()
function uses the object returned by green_shares()
to generate maps of the distribution of nutrient loads by source for a given year or as the mean for several years (Figure 9). The results are shown in logarithmic scale to improve the visualization as nutrient loads in a region may vary over several orders of magnitude. Alternative options of the function show the total load at the outlets of the catchment or the specific loads (i.e. load per catchment area; in kt/km\(^2\)/y).
map_title <- "Output Loads for the Vistula basin"
nutrient_maps(nutrient_load, geometry, map_title, "gr1", legend_position = 1)
The region_nut_balance()
function runs GREEN with the selected parameter set, and returns the nutrient mass balance and fluxes of the region (mean value for the simulation period). The results of this function can be visualized in a Sankey diagram with the function N4_sankey()
(Figure 10) .
nut_bal <- region_nut_balance(nutrient, catch, alpha_P, alpha_L, sd_coeff,
years)
sank <- N4_sankey(nut_bal)