Nutrient pollution affects fresh and coastal waters around the globe. Planning mitigating actions requires tools to assess fluxes of nutrient emissions to waters and expected restoration impacts. Conceptual river basin models take advantage of data on nutrient emissions and concentrations at monitoring stations, providing a physical interpretation of monitored conditions, and enabling scenario analysis. The GREENeR package streamlines water quality model in a region of interest, considering nutrient pathways and the hydrological structure of the river network. The package merges data sources, analyzes local conditions, calibrate the model, and assesses yearly nutrient levels along the river network, determining contributions of load in freshwaters from diffuse and point sources. The package is enriched with functions to perform thorough parameter sensitivity analysis and for mapping nutrient sources and fluxes. The functionalities of the package are demonstrated using datasets from the Vistula river basin.

Nitrogen and phosphorus are key nutrients that heavily impact aquatic ecosystems. Detecting primary sources of nutrient pollution and their downstream spread, alongside assessing achievable reductions through restoration policies, are crucial for effective natural resource management. They aid in identifying priority intervention areas and planning actions to restore the ecological balance of receiving waters.

Modelling tools can be useful to assess the impacts of future scenarios, policy measures, and climate changes at the regional and continental scale (Seitzinger et al. 2005; Ludwig et al. 2010; Arheimer et al. 2012; Bouraoui et al. 2014; Bartosova et al. 2019; Beusen et al. 2022), and to check the coherence of different policy targets, for instance between water and agricultural policies. The assessment of policy scenarios requires a flexible and spatially detailed analysis to account for climatic, hydrological, and socio-economic gradients (Grizzetti et al. 2021).

Several types of models can be applied to predict the transport of nutrients in river basins Fu et al. (2019). Among them, statistical or conceptual models have the advantage of being readily applied in large watersheds. These model rely on calibration of few parameters for establishing links between emissions at sources and fate in the stream network. They take full advantage of nutrient emissions and concentrations data at monitoring stations, which are now accessible with increasing spatial and temporal resolution.

A classic example of river basin conceptual model is SPARROW (Smith et al. 1997; Schwarz et al. 2006), which is widely applied in the U.S. to assess water quality over large regions. SPARROW has inspired the Geospatial Regression Equation for European Nutrient (GREEN) losses model (Grizzetti et al. 2005; Grizzetti et al. 2012, 2021) which is adapted to European conditions. GREEN has been used for assessing the nutrient loads to the European seas (Grizzetti et al. 2012, 2021), nitrogen retention in European freshwaters (Grizzetti et al. 2012; La Notte et al. 2017), and for policy scenario analysis (Bouraoui et al. 2014; Leip et al. 2015; La Notte et al. 2017; Malagó et al. 2019; Grizzetti et al. 2021). Despite their usefulness, this type of river basin models are seldom compiled as dedicated R packages, for example SPARROW has some scripts available to process information for and analysis of models results in the R environment.

The GREEN model application comprises several key steps, including data extraction and organization, data harmonization and integration, examination and validation of input data sets, model calibration and parameter selection, model run, and result visualization. All of these features are now integrated into an R package (GREENeR) that comprises functions to streamline the process, evaluate and visualize all the steps, thus strengthening the robustness of model application.

GREEN (Grizzetti et al. 2005, 2008; Grizzetti et al. 2012, 2021) is a conceptual model to assess total nitrogen TN and total phosphorous TP from a region of interest (usually a river basin), accounting for both diffuse and point sources. The package allows the analysis of different scenarios of nutrient input in the region of interest, where “scenario” indicates a combination of annual time-series of inputs, such as nitrogen or phosphorus and the topological structure of the region. The model comprises several nutrient sources and pathways:

- agricultural diffuse sources include nutrients from mineral fertilisers and manure application, nitrogen crop and soil fixation. These sources undergo retention in the land phase (basin retention that account for e.g. crop uptake and volatilization losses) before reaching the stream;
- other diffuse emissions, such as from scattered dwellings (i.e., isolated houses and small agglomerations that are not connected to sewerage systems), and atmospheric nitrogen deposition (for nitrogen module) or background losses (for phosphorus module), are also reduced, e.g. due to soil processes, before reaching the stream network;
- point sources consist of urban and industrial wastewater discharges that are discharged into surface waters directly.

Once in the river network, all nutrient loads are reduced by in-stream retention in rivers and lakes.

Basin retention of agriculture sources is a decay function proportional to the inverse of the total annual precipitation in the catchment. Conversely, river retention is a decay function proportional to the river length, considered as a proxy for water residence time. Finally, lake retention is simulated as a function of the lakes residence time and average depth.

The basin is divided into spatial subunits (called catchments), with a given area, a river reach, an inlet node, and an outlet node. The catchments are topologically connected from the headwaters to the outlet in a cascading sequence. The sequence of nutrient load accumulation through the stream network is defined by Shreve’s order (Shreve 1966). Nutrient input from the different sources, basin and river retention are simulated in each catchment and routed through the river network. For each catchment \(i\) in the basin, the GREEN nutrient load \(L_i\) is estimated by the general equation:

\[\begin{equation} L_{i,y} = (1-Lret_i) \cdot (DSA_{i,y} \cdot (1-Bret_{i,y} )+ DSB_{i,y} + PS_{i,y} +U_{i,y} ) \cdot (1-Rret_i) \tag{1} \end{equation}\]where \(L_{i,y}\) is the nutrient load at the catchment outlet-node (ton/yr) in \(y\) year; and the other variables represent different sources and sinks of nutrients.

Sources of nutrients are:

\(DSA_{i,y}\). Annual nutrient diffuse sources on agricultural land in the catchment (ton/yr): mineral and manure fertilization, atmospheric nitrogen deposition, plant and soil fixation for TN; mineral and manure fertilization, and background losses for TP.

\(DSB_{i,y}\). Annual nutrient diffuse sources in the catchment related with scatter dwellings and atmospheric nitrogen deposition on non-agricultural land for TN (\(DSB_{i,y} = 0.38 \cdot FF_{i,y} \cdot AtmN_{i,y} + sd_{coeff} \cdot SdN_{i,y}\), where \(FF_{i,y}\) is the fraction of non-agricultural land cover in the catchment, and \(AtmN_{i,y}\) is the annual atmospheric nitrogen deposition on the catchment (ton/yr)); nutrient diffuse sources in the catchment related with scatter dwellings and background losses on non-agricultural areas for TP (\(DSB_{i,y} = sd_{coeff} \cdot SdP_{i,y})\).

\(PS_{i,y}\). Nutrient point sources in the catchment (ton/yr).

\(U_{i,y}\). Nutrient load from upstream catchments (ton/yr).

Sinks of nutrients are:

- \(Lret_i\) denotes the lake retention (fraction) of the \(i\) catchment. \(Lret\) is currently defined according to Kronvang et al. (2004), but limited to a \(10\%\) maximum reduction:

where \(z\) represents the average lake depth (m); \(RT\) is the hydraulic residence time (yr); and \(dref\) denotes a nutrient-related coefficient (\(dref = 7.3\) for TN and \(dref = 26\) for TP).

- \(Bret_{i,y}\) is the fraction of basin retention of the \(i\) catchment in \(y\) year:

where \(NrmInvRain_{i,y}\) is the inverse of annual precipitation (mm) of the \(i\) catchment in \(y\) year, normalized by its maximum (Frank and Todeschini 1994). To keep basin retention coefficients comparable across regions, the minimum precipitation \(minPrec\) is set to \(50\) mm/y, and thus \(NrmInvRain_{i,y}\) is defined as

\[\begin{equation*} NrmInvRain_{i,y} = \frac{1 / \max(50, precipitation_{i,y})}{(1 / minPrec)} \end{equation*}\]- \(Rret_i\) is the fraction of river retention of the \(i\) catchment:

where \(NrmLengthKm_i\) is the length (km) of the catchment reach, normalized by the maximum in the dataset:

\[\begin{equation*} NrmLengthKm_i = \frac{\textrm{catchment length } i \textrm{ reach, in km}}{\max(\textrm{Reach length in the region, in km})} \end{equation*}\]Since the maximum reach length depends on the region and its subdivision of network reaches, the calibrated coefficient \(alpha_L\) cannot be compared across regions that adopt different discretization. Note that basin retention varies from year to year according to annual precipitation, whereas the fraction of river and lake retention for a given catchment is constant in time.

Equation (1) is applied sequentially from the most upstream nodes to the basin outlet. The model parameters are:

- the basin retention coefficient (\(alpha_P\)), which together with annual precipitation regulates nutrient retention of diffuse agricultural sources (Equation (3));
- the river retention coefficient (\(alpha_L\): Equation (4)), which regulates the river retention per river km.
- the fraction of domestic diffuse sources that reaches the stream network (\(sd_{coeff}\)).

This work presents a new efficient and enhanced implementation of the model GREEN developed as an R package named GREENeR. GREENeR is developed in the R statistical software and provides tools and methods for applying GREEN to an area of interest and assessing annual time series of nutrient loads in a river network and at the basin outlet, plus contributions of nutrient sources to these loads. Some of the key features of the package comprise: a) functions for the creation of scenarios from data sources; b) computational efficiency; c) parallel-capable ; d) an extended suite of fine tuning options to control model calibration; e) built-in parameter sensitivity analysis; and f) functions to perform customised post-process analyses.

The functions of GREENeR package are arranged in three groups: i) functions to perform graphical summaries of model inputs; ii) functions to perform model parameters calibration and sensitivity analysis; iii) functions to compute, analyze and visualize through graphs and maps the model outputs, i.e. total loads and contributions by source. The package supports parallel processing, which helps reduce the computational load in handling large basins.

The time-series can represent either historic (current or past) conditions that can be associated to observations for model calibration, or hypothetical conditions foreseen under theoretical changes (e.g. to forecast results of nutrient management plans, or under climatic change).

A scenario contains:

The geospatial geometry of the catchments of the region, which currently is defined according to the Catchment Characterisation and Modelling Database for European Rivers v2 (CCM2) (De Jager and Vogt 2007).

The annual time series (1990-2018) of nutrient inputs per catchment.

Additional catchment information (annual precipitation, annual forest fraction, lake retention, reach length).

Observed total load (TN or TP) from monitoring stations.

Two historical scenarios, one for TN and one for TP, of the Lay basin (France) are provided with the GREENeR package. The Lay basin has an area of \(1971\) km\(^2\), and is divided into \(189\) CCM2 catchments. The mean catchment area is \(10.4\) km\(^2\). Nutrient observations comprised \(22\) TN entries from six monitoring stations and \(58\) TP data entries from eight monitoring stations (data from WISE Waterbase (EEA 2021)).

Further, in this article we present examples drawn for the TN historic scenario of the transboundary Vistula (Wisla) river basin, one of the largest river basins of Europe. The Vistula scenario comprises \(15465\) CCM2 catchments that compose the \(193.894\) km\(^2\) river basin, TN inputs for 1990-2018, as well as 1364 observed TN loads from 412 monitoring stations. The TN input dataset is part of the larger dataset created to assess nutrient concentration in the European freshwaters (Vigiak et al. 2023).

A brief overview of the package and its main functions is given in Figure 1.

The key functions included in the package are:

`read_geometry()`

: imports the geospatial vector format (shapefile or ESRI shapefile) with the spatial information of the catchments in R.`read_NSdata()`

: imports the annual time series of nutrient inputs per catchment and type of nutrient source (manure, mineral fertilization, point sources, scatter dwellings), plus the forest fraction, and the observed annual nutrient loads from monitoring station data. The original data are stored in several comma-separated tables (CSV files).`input_maps()`

: creates a map showing the mean nutrient load input by source.`input_plot()`

: creates either a grouped barplot representing the average input load by source for the whole basin, or three density plots showing the distribution of nutrient sources.`input_Tserie()`

: creates a time series plot showing basin inputs by source.`shreve()`

: returns the Shreve order of the catchments, a useful indicator of stream size, discharge, and drainage area (Strahler 1957) calculated based on the sum of the orders of up of upstream tributaries (Shreve 1966). Commercial GIS software usually provides a Shreve calculation function, but in other software, it is harder to find. In GREEN, the Shreve’s order defines the cascade of upstream-downstream catchments.`calib_green()`

: conducts sensitivity analysis of model calibration utilizing a Latin Hypercube Sampling (LHS) scheme (Manache and Melching 2004) for three model parameters. It evaluates model performance by calculating several “goodness-of-fit” (GoF) metrics (Refsgaard and Henriksen 2004) against available observations during the specified simulation period (years).`select_params()`

: extracts the best parameter set according to a selected GoF metric from the object generated by the calibration function,`calib_green()`

.`calib_boxplot()`

: creates a figure of boxplots that show the relationships between the best parameter sets determined by the specific GoF metric in each boxplot title, and six other commonly used hydrological metrics (Mauricio Zambrano-Bigiarini 2014; Althoff and Rodrigues 2021). In the lower panel, the figure also highlights the distribution of model parameters; the value of the best parameter set is marked as a red dot in each boxplot.`region_nut_balance()`

: runs the GREEN model with the selected parameter set, and returns the mean annual mass balance of nutrient fluxes for the whole simulation period of the region. The results of this function can be visualized using a Sankey diagram via the`N4_sankey()`

function.`green()`

: runs the GREEN model with the selected parameter set and returns nutrient load time series for the simulation period. It generates less information than the`green_shares()`

function, but its execution is faster, so it is used as a function for calibration iterations and is embedded in the`calib_green()`

function.`green_shares()`

: runs the GREEN model with the selected parameter set and returns time series of nutrient loads and the contributions of each nutrient source in the simulation period. The results of the model can be examined by`nutrient_tserie()`

and`nutrient_maps()`

functions.`scatter_plot()`

: generates dot plots correlating parameters realizations in the calibration data with GoF metric, visualizing the impact of each parameter on model outcomes. The plot vary based on selected GoF metric from`green_calib()`

function.`simobs_annual_plot()`

: generates scatter plots comparing model load predictions (PredictLoad) with observations (ObsLoad) for each year of the stimulation period.

GREENeR requires information on the nutrient inputs, the topology, and the geospatial geometry of the region of interest. The spatio-temporal input data must include all nutrient source fields, and differs in the two nutrient scenarios (TN or TP). In the case of TN (e.g. Figure 2), fields are (Equation (5) in Appendix 1):

\(Atmospheric\): Annual amount of atmospheric nitrogen deposition (ton/yr).

\(Mineral\): Annual amount of nitrogen from mineral fertilisers (ton/yr).

\(Manure\): Annual amount of nitrogen in manure fertilisers (ton/yr).

\(Fix\): Annual amount of nitrogen fixation by leguminous crops and fodder (ton/yr).

\(Soil\): Annual amount of nitrogen fixation in soils (ton/yr).

\(Sc. Dwellings\): Nitrogen input from scattered dwellings (ton/yr).

\(PointS\): Nitrogen input from point sources (ton/yr).

In the case of TP, fields are (Equation (6) in Appendix 1):

\(Background\): annual amount of phosphorus from background losses (ton/yr).

\(Mineral\): annual amount of phosphorus from mineral fertilisers (ton/yr).

\(Manure\): annual amount of phosphorus in manure fertilisers (ton/yr).

\(Sc. Dwellings\): phosphorus input from scattered dwellings (ton/yr).

\(PointS\): phosphorus input from point sources (ton/yr).

European datasets for 1990-2018 generated to assess historic and current nutrient fluxes (Vigiak et al. 2023) are available upon request. The model and the package are compatible with an external dataset and can be customized with local information as long as the data structure is respected.

In both nutrient cases, GREENeR needs additional annual catchment information:

\(ForestFraction\): annual fraction (0-1) of non-agricultural land cover area in the catchment (\(FF\) in Equations (5) and (6), in the Appendix 1).

\(Rain\): annual precipitation (mm, Equation (3)).

\(LakeFrRet\): lake retention fraction (0-1) (\(Lret\) in Equation (1), in the Appendix 1). Note, this differs for TN or TP scenarios (Equation (2)). At European scale, average lake depth and hydraulic residence time can be obtained from HydroLAKES database (https://www.hydrosheds.org/pages/hydrolakes, Messager et al. (2016)).

Length: is the length (km) of the catchment reach (Equation (4)).

The catchment topology outlines the hydrological network configuration within the region. Each catchment must have a unique numerical identifier (HydroID)), to establish the network structure via a source HydroID and destination HydroID table. It is important to note that any outlet of the basin will be given as destination HydroID identifier “-1”. Complementing the topology of the hydrologic network, the length of catchment reach should also be included (Equation (4)).

Finally, in order to calibrate the model, it is necessary to have some observed nutrient loads from monitoring stations associated to any catchment of the region. Ultimately, the quality, size, and spatial distribution of observed loads determine the robustness of the calibration process. In Europe, a large dataset of annual concentrations is publicly available (WISE Waterbase, EEA 2021), but annual flow must be derived from other sources (Grizzetti et al. 2021; Vigiak et al. 2023).

The GREENeR package performance and memory usage depends on the size of the region (number of catchments) and the number of years of the simulation. As a reference, the Danube basin (i.e. the largest European basin, with \(138013\) catchments) TN dataset for 1990-2018 requires approximately \(389\) Mb of memory to store the annual data. Table 1 shows the memory occupied by the scenarios for 6 European basins. In addition, it shows the computation times required in the execution of some key functions of the package. All executions were conducted on two computer configurations:

CPU1: Desktop running window 10 with one Intel(R) Core i5-8259U CPU (4 cores, two threads per core) CPU at 2.30 GHz and 16 GB of RAM memory

CPU2: Workstation running Linux with one AMD EPYC 7352 (24 cores, two threads per core) CPU at 2.3 GHz and 128 GB of RAM memory

The time required for the execution of the different functions increases linearly with the number of catchments (or Shreve order) of the region (Table 1). The `calib_green()`

function is the most computationally demanding. However, the parallel implementation of this function drastically reduces the time required for this process. Calibration can be carried out in about 6 hours for practical implementation in large basins.

Basin name | Shreve order | Area km\(^2\) | Number of catchments | Mem MB | CPU1 sec | CPU2 sec | CPU1 sec | CPU2 sec | CPU1 sec | CPU2 sec |
---|---|---|---|---|---|---|---|---|---|---|

Lay | 95 | 1971 | 189 | 0.5 | 6 | 4 | 15 | 12 | 316 | 31 |

Miño | 2803 | 16985 | 5572 | 15.7 | 57 | 45 | 147 | 90 | 4110 | 394 |

Seine | 2902 | 75989 | 5793 | 16.3 | 77 | 55 | 177 | 106 | 6320 | 452 |

Ebro | 9351 | 85611 | 18568 | 52.3 | 160 | 123 | 425 | 224 | 10876 | 995 |

Vistula | 7757 | 193894 | 15465 | 43.6 | 132 | 100 | 314 | 187 | 9004 | 814 |

Danube | 69505 | 802032 | 138013 | 388.6 | 1120 | 766 | 2855 | 1108 | 94682 | 5638 |

The entire procedure is summarized on Figure 1. Once the input data have been uploaded, the region scenario (Nutrient data and Catch data) is generated. The `calib_green()`

function explores the parameter set ranges with LHC scheme, calculating GoF of parameter sets. Sensitivity analysis of its results is conducted to determine the best parameter set. Finally, the `green_shares()`

function is used to estimate the nutrient loads and source apportionment (i.e. contribution to loads by source), per year and per catchment.

Assembling input data for running the GREEN model is time consuming. To facilitate the process, the `read_NSdata()`

function assembles and organises annual information to generate a list of two objects: `Nutrient Time Series Data Object`

and `Catch Data Object`

of GREEN scenario. It needs four CSV files with specific data, namely:

Time-series of annual nutrient inputs per catchment.

Time-series of annual observed loads at monitoring stations.

Basin topology and lake properties.

Other: precipitation, forest fraction, reach length.

The spatial identifiers (HydroID) and temporal (year) units must be coherent in all the files. Besides organizing the input data, the `read_NSdata()`

function also calculates the Shreve order of each catchment, normalises the precipitation (calculating \(NrmInvRain_{i,y}\)) and the reach length (\(NrmLengthKm_i\)).

```
csv_path <- "data/csv/"
scen <- read_NSdata(csv_path,"nutrients.csv","monitoring.csv", "forestFr.csv",
"precipitation.csv","topology.csv", "lakeProperties.csv",
"length.csv")
nutrient <- scen[[1]]
catch <- scen[[2]]
```

The `shreve()`

function can calculate the Shreve order independently based on the topology of the basin:

`shreve_order <- shreve(catch) `

Finally, the geospatial geometry of the catchments should be uploaded to enable the visualisation functionalities in map form. The `read_geometry()`

function loads the geometry information file, i.e. a geospatial vector with the geometry of the catchment region:

`geometry <- read_geometry(file = "data/shapes/Wisla.shp")`

The geo-reference information, geometry and attributes of the spatial entities must be in a shapefile format (.shp), editable with ArcGIS or similar software. The identifiers of each catchment (HydroID) must be consistent with those in the CSV files of the scenario dataset.

Once the scenario has been generated, the library includes functions to examine the nutrient sources in a basin, and explore their distribution over time or in space. The `input_plot()`

function provides annual average nutrient loads for the whole period and density plots of nutrient loads. The `input_Tserie()`

function allows to examine the temporal evolution of the inputs, whereas the `input_maps()`

function generates maps of nutrient inputs distribution in the basin (Figure 2).

`input_maps(nutient, catch, geometry, "Wisla", "gr1")`

The calibration process is essential to find a suitable set of parameters for the model. In GREENeR package, the core function `calib_green()`

enables to run a parameter sensitivity analysis and concurrently assessing the model performance according to several GoF metrics.

Sensitivity Analysis (SA) investigates how the variation in the output of a numerical model can be attributed to variations of its input factors. SA aims at identifying the most influential inputs or parameters, and quantifying how much they contribute to the variability/uncertainty of the model outputs. SA provides information on how much of the output variance is controlled by each parameter, or combination of parameters. SA is increasingly being used in environmental modelling for a variety of purposes, including uncertainty assessment, model calibration and diagnostic evaluation, dominant control analysis, and robust decision-making (Saltelli et al. 2011; Butler et al. 2014; Norton 2015; Pianosi et al. 2016; Mannina et al. 2018). This is achieved by running the model for many samples of the parameter space to determine their impact on the model outputs. SA allows identification of the parameters and input variables that strongly influence the model response (model output). Conversely, it may be of interest to the modeller to verify that although some model parameters may not be very well established they do not significantly contribute to output uncertainty. Saltelli et al. (2000) single out three main classes of SA methods: screening, local, and global methods. Local sensitivity analysis methods focus on assessing the impact of small variations in the input values of a model on the output results. Global sensitivity methods seek to understand how variations in the full range of input values affect the results. Puy et al. (2022) provide a comprehensive overview of several R packages for performing global sensitivity analysis.

Screening methods are economical and qualitative methods. Only screening methods are included in this implementation of the library as they provide a quick assessment of the relative importance of variables. This is particularly useful in exploratory studies and in cases where an initial subset of relevant variables needs to be identified.

Adequate sampling of parameter space is very important in model calibration. Several studies of uncertainty analysis in water resources problems (Melching and Bauwens 2001) concluded that Monte Carlo simulation and Latin Hypercube Sampling (LHS) (McKay et al. 1979; Carnell 2012) methods are very powerful, robust, and flexible. Other approaches are possible (Mauricio Zambrano-Bigiarini 2014), but the LHS has the advantage that it is easily parallelisable, it explores the full range of parameter sets, it does not direct the search depending on the values of previous iterations like other optimisation methods (gradients, genetic algorithms, etc.), and is therefore independent of the GoF metric. This allows performing a posteriori sensitivity analysis for several metrics without having to repeat the process.

The `calib_green()`

function computes 15 GoF metrics (Mauricio Zambrano-Bigiarini 2014; Althoff and Rodrigues 2021) (listed in Appendix 2), and returns an object with all parameter sets generated by the LHS and the corresponding GoF scores. Running `calib_green()`

requires the following settings:

The expected ranges for each parameter. The ranges are defined by two vectors of three values, one for the lower limits and one for the upper limits of the three parameters. The values correspond to each model parameter in sequence: \(alpha_P\) (Equation (3)), \(alpha_L\) (Equation (4)), and \(sd_{coeff}\) (Equations (5) and (6)).

The number of iterations to be performed during the calibration process. The higher the number of iterations, the more likely it is to achieve good results, but the longer the computation time. Depending on the basin and parameter range width, it is recommended to run at least 200 iterations to have enough information to continue the calibration process.

The years to be included in the calibration process.

```
n_iter <- 2000
low <- c(10, 0.000, 0.1)
upp <- c(50, 0.1, 0.9)
years <- c(1990:2018)
calibration <- calib_green(nutrient, catch, n_iter, low, upp, years)
```

The calibration process is automatically parallelised by the package and uses all computer cores except one. The computation time depends on the computer, the number of catchments in the basin, and the number of iterations (Table 1). The `calib_green()`

function returns a data frame with a row for each iteration with parameter values and the resulting 15 GoF metric scores. An example of four parameter sets is shown below (cut to the first five GoF metrics).

```
alpha_P alpha_L sd_coeff NSE mNSE rNSE KGE PBIAS %
s1 39.31203 0.07628558 0.6325056 0.5623 0.6452 0.9595 0.2512 -44.1
s2 35.06049 0.05510985 0.1577470 0.6733 0.6948 0.9563 0.3763 -36.3
s3 28.34947 0.04418381 0.7209566 0.8690 0.7737 0.8998 0.7926 -1.7
s4 40.73533 0.01351723 0.6195734 0.8370 0.7734 0.9607 0.6292 -22.6
```

The selection of the appropriate GoF metric(s) for calibration and evaluation of hydrological models can be challenging even for very experienced hydrologists. Choosing the right GoF metric for model calibration largely depends on the overall study scope, which defines the main interest (e.g. high or low load, upper or lower catchment area), and on the available observation dataset (size and quality) (Kumarasamy and Belmont 2018). Automated calibration typically relies, often exclusively, on a single GoF metric, with the Nash-Suttclife efficiency (NSE) been the most frequently used metric in hydrological models (Gupta et al. 2009; Westerberg et al. 2011; Wöhling et al. 2013). As described in Singh and Frevert (2002), a single criterion is inherently predisposed to bias towards certain components of the data time series. Automated procedures may be improved by using more than one criteria as discussed by Gupta et al. (1998). Although automation can help the calibration process become more objective, efficient and practical, it is not a substitute for expert hydrologic intuition and understanding. Whether automated or manual calibration is used, a common approach is to adjust the parameters that display the highest sensitivity (Madsen 2003; Doherty and Skahill 2006).

GREENeR includes several functions to assist in selecting the best parameter set. The `calib_boxplot()`

function shows (Figure 3) relationships between best parameter sets chosen according to one GoF parameter (title of each boxplot) in relation to six most frequently used metrics. Additionally, in the lower panel, the figure shows the distribution of model parameters in the most performing subset of parameter sets. The best parameter set according to each GoF metrics is marked as a red dot in each boxplot (Figure 3).

`calib_boxplot(calibration, rate_bs = 5)`

The `select_params()`

function extracts the parameter set that scored the best GoF metric of choice from the calibration result object.

```
best_params <- select_params(calibration, param = "NSE")
alpha_P <- best_params$alpha_P
alpha_L <- best_params$alpha_L
sd_coeff <- best_params$sd_coeff
```

It is not recommended to use the parameters extracted by the `select_params()`

function without having performed an analysis of all the calibration results. Instead, alternative parameter sets (according to different GoF) should be compared before making the final selection.

Screening and SA of model parameters can be done via the `scatter_plot()`

and `calib_dot()`

functions of GREENeR package. The `scatter_plot()`

function shows all parameter realizations in the LHS dataset against a selected GoF metric to visualize the influence of each parameter on the metric scores. The result depends on the GoF metric of choice (any metric calculated with `calib_green()`

can be selected) (Figure 4).

`scatter_plot(calibration, param = "R2")`