rpostgis : Linking R with a PostGIS Spatial Database

With the proliferation of sensors and the ease of data collection from online sources, large datasets have become the norm in many scientific disciplines, and efficient data storage, management, and retrival is imperative for large research projects. Relational databases provide a solution, but in order to be useful, must be able to be linked to analysis and visualization tools, such as R. Here, we present a package intended to facilitate integration of R with the open-source database software PostgreSQL, with a focus on its spatial extension, PostGIS. The package rpostgis (version 1.4.1) provides methods for spatial data handling (vector and raster) between PostGIS-enabled databases and R, methods for R "data.frame"s storage in PostgreSQL, and a set of convenient wrappers for common database procedures. We thus expect rpostgis to be useful for both (1) existing users of spatial data in R and/or PostGIS, and (2) R users who have yet to adopt relational databases for their projects.


Introduction
R has become an important tool for manipulating, analyzing, and displaying spatial (vector and raster) datasets.R has a growing set of contributed packages (see the CRAN Spatial task view).R can already import and export stand-alone vector (e.g, ESRI shapefiles) with relative ease using rgdal (Bivand et al., 2017) or maptools (Bivand and Lewin-Koh, 2017), and raster datasets using raster (Hijmans, 2016).With the proliferation and availability of geographic data from deployed sensors, open-GIS web data sources, and remotely-sensed environmental datasets (to name just a few), users are increasingly taking advantage of Database Management Systems (DBMS) with spatial database extenders, which store spatial data as an object in a database table column.DBMS are especially useful to researchers and scientists managing project datasets, due to the ability to store large amounts of data efficiently, specify data relationships among tables, maintain data integrity using constraints, ensure long-term storage, manage multiple users, and add, update, and retrieve data efficiently.
PostgreSQL (The PostgreSQL Global Development Group, 2017) is an advanced relational DBMS, and it is also free and open-source, making it popular and accessible to a large base of users.The PostgreSQL extension PostGIS (PostGIS Project Steering Committee, 2017) allows PostgreSQL to handle spatial data by introducing spatial data types (Geometry, Geography, Raster, and Topology), along with a library of functions which operate on objects of these types.For R users, the package RPostgreSQL (Conway et al., 2017), a backend for the generic R database interface package DBI (R Special Interest Group on Databases (R-SIG-DB) et al., 2017), provides the driver and methods to connect and interface with a PostgreSQL database, to do a variety of data import and export tasks, and to query the database.However, not all data types supported by PostgreSQL can be imported into equivalent R data types using RPostgreSQL; among these are the PostGIS spatial data types.In addition, some data types in R possess attributes (e.g., POSIX* time zones), which are not handled by RPostgreSQL reading and writing.Finally, while RPostgreSQL is an essential tool for users of PostgreSQL databases and R, it is not designed to facilitate new database users, who may not be familiar with database techniques, terminology, or Structured Query Language (SQL), the language used to interact with a DBMS.
Noting these gaps, we designed rpostgis as a general-purpose extension package for RPostgreSQL.The primary goal of rpostgis is to provide spatial data handling between R and PostGIS, and as such, it includes methods for bi-directional transfer of both vector and raster spatial datasets between R and PostGIS, spatial projection handling, and other PostGIS-related tools.In addition, the package also features methods for users who wish to work primarily (or exclusively) in R, including saving of R "data.frame"s(including data types and attributes) to PostgreSQL, and "SQL wrapper" functions which execute common database procedures.The latter provide not only convenient, script-able access to these procedures through R, but also can function as learning tools for new users of DBMS and SQL.In this paper, we explain and demonstrate the purpose and usage of rpostgis functions within a general workflow, following the three focal areas described above (PostGIS-specific, "data.frame"handling, and SQL wrappers).

Background PostgreSQL and rpostgis basics
While it is far beyond the scope of this paper to provide a complete introduction to PostgreSQL, it is important to introduce several basic features of the DBMS, and the common terms we use to refer to them, especially concerning their usage in rpostgis.
A PostgreSQL install creates a PostgreSQL server, which is the host of one or more databases.Databases within a server are self contained (data cannot be shared across databases).Users interact with the database server using Structured Query Language (SQL ) to write queries.Within a database, there are one or more named schemas, which are collections of database objects.In PostgreSQL, on new database creation, a default schema is created in the database with the name public.
Schemas can be used to organize object s in a database.Schemas can contain many different objects, including functions, data types, and sequences; the objects which users of rpostgis should be most aware of are tables and views.Tables store data, while standard views only store an SQL query, that when executed, creates a virtual table (in memory), using data from other tables or views.There is no functional difference between how tables and views (which both fall under the umbrella term relation) are referred to within a query, with the convention being 'schema_name.relation_name'.Relations typically contain one or more columns, and each column has a defined data type (e.g., integer, character, timestamp with time zone), and optionally a defined sub-type, that specifies what type of data the column can contain.A column can also store records representing a spatial type, which is described in the next section.Relations have 0 or more record s, or rows, which contain the data.
In rpostgis, most functions perform an action on a relation.Following RPostgreSQL convention, the relation name is supplied to the argument name, which can be given either a length-one or length-two character vector.One-length character elements are interpreted as a relation name, and looked for in the default user schema(s) (by default, 'public').Length-two character vectors are interpreted as schema and relation name, e.g., name = c("schema_name","table_name").It is generally recommended to provide both schema and relation name, since identical relation names can be reused across schemas.For ease of reading, from this point on we refer to relations (tables and views) generally as "tables" throughout this paper.This brief summary should provide the new user with enough terminology to understand this paper, and begin working with databases using rpostgis.New users are encouraged to learn more about PostgreSQL from its excellent online documentation (https://www.postgresql.org/docs/manuals/).

Spatial objects in PostGIS and R
Vector spatial data in PostGIS tables are stored as GEOMETRY (planar) or GEOGRAPHY (spherical) types in a column, and usually have a specified sub-type, such as POINT, LINESTRING, or POLY-GON.Each record of a GEOMETRY/GEOGRAPHY column represents one spatial feature, which represents one or more geometric objects.Records containing multiple geometric objects in one feature are specified as a MULTI sub-type (e.g., MULTIPOLYGON).Geometry and geography columns can also store a spatial reference identifier (SRID) attribute, which is an integer value referring to the spatial reference system (the "projection") of the spatial data.Since spherical data are represented using geographic longitude/latitude coordinates on the sphere (specifically, the spheroid defined by WGS 84), the Geography data type is restricted to 'SRID = 4326'.
Raster spatial data are stored in PostGIS using the RASTER data type.Raster columns also can store an SRID attribute defining the projection of the raster.One raster record can store one or more bands (rasters with identical spatial coverage but different data).In PostGIS, it is common to store one raster dataset in one PostGIS table, but split the raster into multiple tiles.A raster tile is one rectangular spatial subset, or block, of a raster dataset.One tile of a PostGIS raster type corresponds with one record in the corresponding table.Storing rasters in this way allows for more efficient spatial queries on the raster, by working on subsets of the raster.
Geometry, geography, and raster data types can all be exported from PostGIS using a variety of supplied export functions.In addition, the PostGIS library contains a large set of GIS functions to manage, construct, and edit Geometry objects, as well as measure spatial relationships between geometries, making it a full-featured GIS system.However, PostgreSQL/PostGIS is not packaged with a native software to visualize spatial data, so most users employ 3rd-party software to access the database whenever visualization is needed.For spatial data, desktop GIS systems like ArcGIS (ESRI, 2017) and QGIS (QGIS Development Team, 2017) can load spatial data directly as layers in the mapping environment, where users can take advantage of tools available in those software.
The R Journal Vol.10/1, July 2018 ISSN 2073-4859 Because of its advanced statistical, processing, and visualization capabilities, many users employ R as a front-end for their database systems.In R, the long-time standard for handling vector spatial objects are "Spatial*" classes (e.g., "SpatialPoints", "SpatialMultiPoints", "SpatialLines", and "SpatialPolygons" and their "Spatial*DataFrame" variants) provided by the package sp (Pebesma and Bivand, 2005).Likewise the raster package provides standard methods for handling raster datasets in R, in "Raster*" classes (e.g., "RasterLayer" for single-band, "RasterBrick" or "RasterStack" for multi-band rasters).There are several notable packages and utilities that assist in transfer of spatial data between PostGIS and sp "Spatial*"-objects or "Raster*"-objects in R: 1. rgdal : The R package providing bindings to GDAL (Geospatial Data Abstraction Library) provides the functions readOGR and writeOGR to read, and write, respectively, PostGIS tables with 'GEOMETRY' columns and R "Spatial*" 2. rgeos (Bivand and Rundel, 2017): The functions readWKT and writeWKT provide conversion between the WKT format and "Spatial*"-objects.WKT format "well-known text" is a standardized, text-based version of a vector geometry, which can be written and read using PostGIS functions.
3. wkb (TIBCO Software Inc., 2016): The R package offers the functions readWKB and writeWKB functions, which convert vector geometries to and from the WKB format ("well-Known binary").
4. Alternatively, it is possible to use system calls to command-line utilities such as ogr2ogr, shp2pgsql or raster2pgsql, which brings the full power of these utilities at the expense of additional complexity, especially for scripting: This solution mixes two syntaxes (R syntax and the one of the command-line utility) and requires the passing of passwords at every call.
Read and write functions from both rgeos and wkb are utilized in rpostgis import/export functions to provide the translations of geometries from and to R's "Spatial*" objects.The rpostgis functions wrap additional functionality around these low-level functions in the vector data export (pgInsert) and import (pgGetGeom) functions, to ease data transferability and managed data stored alongside vector geometries in R "data.frame"s or PostGIS tables.Also of importance is the recently developed package sf (Pebesma, 2017), which provides Simple Features access in R, for vector spatial data.Simple Features is an Open Geospatial Consortium (OGC) and International Organization for Standardization (ISO) standard for storing and accessing geometry objects, and is used by many RDBMS, including PostGIS.Like sp, sf provides standardized, comprehensive spatial data handling in R, notably using a "data.frame"with a geometry-list column to store Simple Feature objects, instead of the various "Spatial*" classes utilized in sp.The sf package also provides its own functions for reading and writing geometries from PostGIS databases, both through GDAL (using st_read/st_write) and directly (using st_read_db/st_write_db).At the time of writing, rpostgis reads from and writes to sp-class "Spatial*" objects, though we anticipate a migration to sf-class objects as they become the new standard for vector spatial data in R.

A note on permissions and privileges
A proper administration of a database, especially in a multi-user context, requires a consideration of the permissions given to users to access, write, or modify the database and its various objects.While it is beyond the scope of this paper to provide an exhaustive presentation of the issue, it is worth introducing the basic concepts from a PostgreSQL perspective: permissions on a database object are called "privileges," and there are several different ones, allowing to access ('SELECT') or modify them ('INSERT', 'UPDATE'), or even to create or delete them ('CREATE' and 'DELETE', respectively), among others.By default, the owner of an object (the user who created the object) and superusers (generally the DBMS administrator(s)) can do anything with it: they are granted all privileges on this object.PostgreSQL uses the functions 'GRANT' and 'REVOKE' to assign or revoke privileges of a certain user on a database object.We refer the interested reader (and the database administrator) to the official PostgreSQL documentation for more details about privileges.
Database privileges for a given user are not different by accessing the database through a connection from R, although it adds a layer of complexity as there will be R users and PostgreSQL users mixed in a session (as well as system users, which are distinct from database users, and we do not The R Journal Vol.10/1, July 2018 ISSN 2073-4859 consider here purposely).In other words, an R user could have different privileges on the database and its objects depending on the PostgreSQL user they use to connect to the database.In this paper, we will use the default PostgreSQL superuser 'postgres', which thus comes with all privileges granted on all database objects.This approach is fine for many use cases, especially single-user databases, but will be limited on multi-user databases, where privileges are set up at a finer grain.In general, all rpostgis functions that create, modify, or drop an object (e.g., dbIndex, dbAsDate, dbDrop) require ownership of the object being modified (so they require the appropriate 'CREATE' privileges).In addition, we specifically indicate special privileges that are necessary for the different functions of rpostgis when relevant.

PostGIS-specific functions
In the following two sections, we explain the main functionality and usage of the rpostgis package.Note that functions within rpostgis have one of two prefixes: "pg*" (for PostGIS-oriented functions, described in this section) and "db*" (for general PostgreSQL database functions, described in the next section), emulating the pattern established by DBI.The general presentation of the functions consist of (1) a code block presenting the generic version of the function(s) with all arguments and defaults listed, as well as specific privileges required, ( 2) descriptive text about the function(s), and in most cases (3) example calls of the function(s).
In this section, we use the term "PostGIS" generically to refer to the PostgreSQL/PostGIS DBMS; that is, PostgreSQL with the PostGIS extension installed.All usage of PostGIS within R begins with a connection to a database server and a particular database: here, using RPostgreSQL::dbConnect, we connect to the database 'rpostgis' which exists on the local computer/server ('localhost'): > library(rpostgis) > conn <-dbConnect(drv = "PostgreSQL", host = "localhost", dbname = "rpostgis", + user = "postgres", password = "postgres_password") It is not the purpose of this article to detail connection details, so we refer to documentation from DBI and RPostgreSQL for this aspect.
The "starter" function for new users of rpostgis is pgPostGIS.This function installs and/or checks the version of PostGIS currently available on the database.This is the first example of a function in rpostgis which implements the display and exec arguments.These respectively control printing (to the R console) of the constructed SQL query, and execution of the query on the database.When PostGIS is installed and ready to use, pgPostGIS returns TRUE.A standard in rpostgis: is when a function does not return an object, the function returns TRUE if the action was successfully executed in the database.Note that you can also enable the Topology, Tiger Geocoder, and SFCGAL extensions, with topology, tiger, and sfcgal arguments set to TRUE.
List geometries/rasters: pgListGeom and pgListRast > pgListGeom(conn, geog = TRUE) > pgListRast(conn) These pgList* functions return information on GEOMETRY/GEOGRAPHY and RASTER columns stored in any database table.In a PostGIS database, these information are stored in three views ('geometry_columns', 'geography_columns', and 'raster_columns', respectively), all of which are created during the installation of PostGIS in the 'public' schema of the database.On installation of PostGIS, a new table 'spatial_ref_sys' is created in the 'public' schema, which stores a large set of spatial reference systems, each of which has a unique (integer) spatial reference identifier (SRID), along with specifications of the projection in PROJ.4 and WKT.The function pgSRID allows users to check if a projection they are using in R (stored as a "sp::CRS" object, which contains the PROJ.4 representation of the projection) has matching SRID(s) in PostGIS.If there is no match and create.srid= TRUE, then pgSRID adds it to the 'spatial_ref_sys' table.In these cases, the user can also specify a desired SRID with new.srid; otherwise, rpostgis uses the next available value between 880001 and 889999.In the following example, we demonstrate how to find the SRID for the common WGS 1984 latitude/longitude projection: > crs <-sp::CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0") > pgSRID(conn, crs) [1] 4326 In some cases, pgSRID may return multiple SRIDs, since there are equivalent projections stored in multiple SRIDs in the 'spatial_ref_sys' table.Note that pgSRID is used in spatial data writing functions in rpostgis, with create.srid= TRUE.

Spatial data transfer
The following section describes the functions in rpostgis that transfer spatial data between R and PostGIS databases (Table 1  In this section we begin to use example datasets.We first load the well-known meuse dataset containing information on environmental observations from sample points along the Meuse river in the Netherlands.After loading the data, we create a "SpatialPointsDataFrame" object, setting its associated projection (oblique stereographic for the Netherlands, i.e.EPSG 28992) as the proj4string attribute, which is a standardized character representation of the projection of class "CRS": > library(sp) > data("meuse") > meuse <-SpatialPointsDataFrame(meuse[, 1:2], data = meuse[, + 3:length(meuse)], proj4string = sp::CRS("+init=epsg:28992")) > class(meuse) [1] "SpatialPointsDataFrame" attr(,"package") [ As evidenced by the large number of possible arguments, pgInsert is a flexible function that aims to provide a variety of methods for PostGIS 'INSERT's, from data originating in R as an object of class "Spatial*", "Spatial*DataFrame", or "data.frame".The most basic usage requires just a PostgreSQL connection, a database table name to insert into, and a data object of one of the three types.In our example, we also utilize the new.idargument, providing a new column name for a sequential ID ("gid"): > pgInsert(conn, "meuse", meuse, new.id= "gid") Creating new table... Using writeWKB from wkb package...

Data inserted into table "public"."meuse" [1] TRUE
The data was inserted into the new database table 'meuse_sp' in the 'public' schema, printing out several informative messages, including an indication that wkb::writeWKB was used to convert spatial objects to well-known binary in order to write to the database (if the optional package wkb is not installed, rgeos::writeWKT is used instead).So that is clear to users how and where data is written to the database, pgInsert is a fairly verbose function-however, these messages can be hidden by wrapping the function call within suppressMessages(...).
The R Journal Vol.10/1, July 2018 ISSN 2073-4859 Since pgInsert can also insert into existing tables, a partial.matchargument allows specifying if all columns in the R object must be present in the database in order to do the insert.The default is partial.match= FALSE, meaning if any column is found in the R object but not in the database table, an error is printed and no data is written in the database.Note that the check is uni-directionaldatabase table columns not found in the R data object do not affect the insert, except if the database column definition requires that it have assigned values (e.g., 'NOT NULL' with no assigned 'DEFAULT' value).
> pgGetGeom(conn, name, geom = "geom", gid = NULL, other.cols= TRUE, + clauses = NULL, boundary = NULL, query = NULL) Special privileges: pgGetGeom requires 'SELECT' on the table, and 'CREATE' in the schema to create a new view when both query and name are not null pgGetGeom returns vector spatial data from a PostGIS table storing a Geometry or Geography-the column name of either type can be supplied to the geom column.We can retrieve the full 'meuse' table which we just wrote to the database, demonstrating the most basic use of the function: > meuse.db<-pgGetGeom(conn, "meuse") Returning Point types in SpatialPoints*-class.
A message prints the type of "Spatial*" class that is created in R. The other.cols argument can be a vector of character names of database table columns, indicating which (if any) columns to return with the spatial data.The default is the full table (all columns), but other.cols= FALSE returns the spatial data only (i.e., a "Spatial*" object).The clauses argument can take additional SQL to modify the data to take from the table.For instance, using a 'WHERE' clause allows filtering results based on specific filter, as long as it is valid SQL (in this case, indexing the filtering column will significantly speed up the query; see the section "Create an index: dbIndex" below).Conversely, a full SQL query can be provided using the query argument; in this case the names argument can be NULL, or set to the name of a (new) view to create in the database using the specified query.This is the method to use when it is necessary to combine multiple database tables and return a geometry, or use PostGIS functions within a query, which highlights the potential of rpostgis for complex GIS operations in the database.
For example, we can use the PostGIS function 'ST_Buffer' to create 100-m buffers around points in our 'meuse' table, and then transform the layer to another projection using 'ST_Transform', with 'SRID = 4326' (WGS 1984, lat-lon).Note that we specified the output geometry in the SQL query 'AS geom', which is the default name for GEOMETRY columns for pgGetGeom.This query returns only the original points ID (gid) and a GEOMETRY column that can be directly loaded into a "SpatialPolygons" object using pgGetGeom.Since we specify gid as the unique ID for meuse.buff(using gid = "gid"), and there are no other columns of data to import, there is no data frame associated with this object.We can then plot the result to highlight the "flat" shape of the buffers after reprojection (Figure 1): > query <-"SELECT gid, ST_Transform(ST_Buffer(geom, 100), 4326) AS geom FROM meuse;" > meuse.buff<-pgGetGeom(conn, name = "meuse_buff", query = query, + gid = "gid") Returning Polygon types in SpatialPolygons*-class.

[1] TRUE
Since pgWriteRast is only for new database tables (or overwrites), the column storing the raster in the database is fixed to "rast".The function automatically splits the raster into tiles (called "blocks" in the raster package), with one tile for each record in the table, and adds automatic raster constraints (when constraints = TRUE, the default).Setting raster constraints in PostGIS allows the raster overview data to be registered correctly in the 'raster_columns' view, and include attributes of the raster such as scale, extent, SRID, and checks on alignment between blocks in the raster table .In this example case, we see that the raster only includes one block (1 × 1), since meuse.grid is relatively small (~8,000 cells); the default target block size in pgWriteRast is 10,000 cells (e.g., a 100 × 100 cell raster).Optionally, the user can specify the exact number of blocks to split the raster into using the blocks argument set to either a length-one or two integer vector (e.g., blocks = 5 splits the raster into 5 × 5 blocks, and blocks = c(2,4) splits the raster into 2 × 4 blocks).Note that specifying a smaller number of blocks will reduce write time using pgWriteRast.Block size should largely be determined by usage within PostGIS.For example, with larger blocks, PostGIS queries that only summarize across blocks will be less precise (since more total raster cells will fall within these blocks), and these queries could also be slower as a result.
The function pgWriteRast uses only the R raster package and SQL queries to create and populate rasters, meaning it is not dependent on external software or command-line utilities.While this makes it particularly useful as a cross-platform solution, and for new users who are not familiar with existing command-line tools for loading rasters into PostGIS (e.g., raster2pgsql), it may be less memoryefficient and take more time to process rasters than these command-line options.Users who frequently work with very large rasters should keep this consideration in mind.
Loading rasters from the database can be done with pgGetRast, where the column to retrieve from defaults to "rast".Since we did not specify any bands, the default is to return the first band: > lc.db <-pgGetRast(conn, "meuse_rast") > summary(lc.db)Multi-band writing and reading of is also possible with pgWriteRast and pgGetRast, as illustrated in this example using the RGB RasterBrick of the R logo (available in the raster package as a raster of 77 × 101 pixels, with three layers for red, green, and blue values).We also demonstrate the boundary argument method of pgGetRast by providing an extent as four numbers in the raster's projection [top, bottom, right, left].We finally plot the output in Figure 2: > rlogo <-raster::brick(system.file("external/rlogo.grd",package = "raster")) > pgWriteRast(conn, "rlogo", rlogo) Splitting 3 band(s) into 1 x 1 blocks...The R data class, proj4string attribute, and band/column names of R raster objects are also saved in the raster table, and are imported when pgGetRast re-creates the raster in R. The boundary argument can also be specified as a "Spatial*" object, where the raster is clipped to the bounding box of the object.This can greatly increase the speed of the import, and is recommended when working with large rasters.
The R Journal Vol.10/1, July 2018 ISSN 2073-4859 Users should note that "Raster*"-class data types are handled natively within the rpostgis raster functions."Spatial*" objects are converted to/from "Raster*"s within the function execution; as such, there may be a performance benefit to working with "Raster*" objects.
The function pgGetBoundary can return the outer bounding box of all spatial data (or a subset using clauses) in a PostGIS table, and the geom argument can actually take the column name of a Geometry, Geography, or Raster column.A single feature in a "SpatialPolygons" object is returned in the original projection of the spatial data, as in this example using the 'meuse_dist' raster table previously created: > (m.bound <-pgGetBoundary(conn, "meuse_rast", "rast"))

+units=m +no_defs
Add a POINT or LINESTRING geometry field: pgMakePts and pgMakeStp > pgMakePts(conn, name, colname = "geom", x = "x", y = "y", srid, + index = TRUE, display = TRUE, exec = TRUE) > > pgMakeStp(conn, name, colname = "geom", x = "x", y = "y", dx = "dx", + dy = "dy", srid, index = TRUE, display = TRUE, exec = TRUE) These are specialized functions that can be used to apply a new Geometry column to an existing database table, and are of particular use for managing sequential location data tables (i.e., trajectories).While pgMakePts requires only columns storing x and y coordinates to create a 'POINT GEOMETRY' column, pgMakeStp also needs increments along the x/y dimensions to the next point, and builds the line segments connecting those points into a 'LINESTRING GEOMETRY' column.
The R Journal Vol.10/1, July 2018 ISSN 2073-4859 Data frame columns have attributes which store important information about the class of data in the column, as well information specific to that class.However, when writing a "data.frame"object to databases (e.g, using RPostgreSQL::dbWriteTable), these attributes are not carried over.For users who want to work in R, store data in a database, and then wish to retrieve it in the same state that it was prior to storing, rpostgis introduces a "data frame mode" for its data writing and reading functions.
The first time it is used in a schema, data frame mode creates a new database table named '.R_df_defs' to store column types and attributes.Within the data table itself, data frame mode adds a column to store the "data.frame"row names (in '.R_rownames'), and another to act as the database table's primary key (in '.df_pkid'), and provide sorting for the "data.frame".Note that since '.R_df_defs' and '.df_pkid' are only intended for internal rpostgis usage, we begin them with the non-standard "." to function as a flag to users of the database.It is not recommended to name tables commonly referred to by users in this way, since non-standard naming of tables and column identifiers will require that they be double-quoted whenever they are used in a query.
When the table is imported into R using rpostgis reading functions (dbReadDataFrame or pgGetGeom), these metadata are applied to the resulting data frame.Currently (rpostgis version 1.4.0), the only data type attributes which are stored and re-created using data frame mode are time zones of "POSIXct" objects, and levels of factors (including ordered factors).However, handling of specific attributes for other data types can be added in a modular way to the main data frame mode functions (dbWriteDataFrame and dbReadDataFrame).
Using data frame mode methods, meuse can be sent to the database, and re-created in R, with no alterations: > dbWriteDataFrame(conn, "meuse_df", meuse) New R data frame definitions table created ("public".".R_df_defs").

Query executed:
DROP  > all.equal(meuse, meuse_df) [1] TRUE Using "data frame mode" with a "Spatial*" class object with pgInsert also saves the projection (as a proj4string attribute) of the spatial object in the database.This allows for equivalent re-creation of the meuse dataset in this example, since pgGetGeom re-applies this proj4string when importing the table 'meuse_df'.

SQL wrappers for database management
A collection of "db"-prefixed functions are available to add, remove, alter, and manage objects in the database, and SQL users will be familiar with the keywords in the function and attribute names.These functions share several common features: • The previously discussed conn, name, display, and exec arguments; • Each returns TRUE when the statement was successfully executed on the database.
In the following section, we present a short discussion of each function's purpose and a usage example.Note that all schema, table, and column names are applied in the database exactly as they are given in the rpostgis functions, including capitalization, special characters, digits, etc.In PostgreSQL, however, object identifiers need to start with a lowercase letter (a-z) or an underscore (_), and contain letters, underscores, and digits (0-9).Any object that does not follow these rules must be quoted (using double quotation marks) when used in SQL queries.For that reason, all object names used in rpostgis (as well as RPostgreSQL) are quoted by default in the SQL queries that it builds.Despite this, it is good practice to use object names that do not require quoting, including column names in any data frames sent to the database; using this approach facilitates SQL query writing.

Check and create
To demonstrate the use of schemas, we insert meuse into the 'rpsostgis_demo' schema, only with a slight modification of the time variable: example_time is converted to character as to avoid incorrect specification of the time zone, and is stored as text type in the database.
To work in the 'rpostgis_demo' schema, we need to specify the schema name as the first argument in name.We can also utilize the new.idargument to add a sequential number column 'meuse_id' to the new table, and alter.names to make sure all column names do not require quoting: > meuse$example_time <-as.character(meuse$example_time)> pgInsert(conn, c("rpostgis_demo", "meuse"), meuse, new.id= "meuse_id", + alter.names= TRUE) Commenting is essential to maintaining a well-documented database, and comments can be applied to any database object.Here we apply comments to both the new schema and table we just created: > dbComment(conn, "rpostgis_demo", + comment = "Schema storing example data for the rpostgis paper.",+ type = "schema") Query executed: COMMENT ON SCHEMA "rpostgis_demo" IS Schema storing example data for the rpostgis paper.; [1] TRUE > dbComment(conn, c("rpostgis_demo", "meuse"), + comment = "Meuse river example dataset from R sp package.",+ type = "table ") Query executed: COMMENT ON TABLE "rpostgis_demo"."meuse"IS Meuse river example dataset from R sp package.; [1] TRUE Add or remove a column: dbColumn > dbColumn(conn, name, colname, action = c("add", "drop"), coltype = "integer", + cascade = FALSE, display = TRUE, exec = TRUE) Adding or dropping a table column can be achieved with the dbColumn function: to demonstrate, we drop the 'dist' column, so as not to confuse it with 'dist_m': The R Journal Vol.10/1, July 2018 ISSN 2073-4859 > dbColumn(conn, c("rpostgis_demo", "meuse"), "dist", action = "drop") Query executed: ALTER TABLE "rpostgis_demo"."meuse"DROP COLUMN "dist" ; [1] TRUE Add a primary or foreign key: dbAddKey > dbAddKey(conn, name, colname, type = c("primary", "foreign"), + reference, colref, display = TRUE, exec = TRUE) Keys are an important element of a database, and are part of a broader group of elements used to maintain integrity of table data called constraints.Using dbAddKey, we can add a primary key (specifying a non-null column acting as the unique identifier of a single table) or a foreign key (specifying a column referencing a column in another "foreign" table, where all values must have a match).Note that dbAddKey can build keys on multiple columns (or refer to foreign keys on multiple columns) using the form colname = c("id1","id2","id3").Since we applied a new sequential ID column when we inserted meuse, we can now designate it as a primary key: > dbAddKey(conn, c("rpostgis_demo", "meuse"), "meuse_id", type = "primary") Query executed: ALTER TABLE "rpostgis_demo"."meuse"ADD PRIMARY KEY ("meuse_id"); [1] TRUE Create an index: dbIndex > dbIndex(conn, name, colname, idxname, unique = FALSE, method = c("btree", + "hash", "rtree", "gist"), display = TRUE, exec = TRUE) Indexes are another important database element, as they can store information which helps queries run efficiently on data which are commonly referenced (e.g., using a foreign key), sorted (e.g., timestamps), or compared to other columns (e.g., geometries).As one would expect, they are especially important for large tables, and geometry types (e.g., polygons representing complex boundaries or lines), and are essential for to efficiently retrieve data with pgGetGeom and a 'WHERE' clause.Note that, similarly to keys, dbIndex can build indexes on multiple columns.We can add an index on our point geometry using the preferred indexing method for 'GEOMETRY' in PostGIS: GIST (generic index structure): > dbIndex(conn, c("rpostgis_demo", "meuse"), "geom", method = "gist") Query executed: CREATE INDEX "meuse_geom_idx" ON "rpostgis_demo"."meuse"USING GIST ("geom"); -- The function dbAsDate allows us to convert a time stored as 'TEXT' column into a 'TIMESTAMP WITH TIME ZONE' type.We can demonstrate this with the 'example_time' column, which we know was added in R to represent times in the "Europe/Amsterdam" time zone: > dbAsDate(conn, c("rpostgis_demo", "meuse"), "example_time", tz = "Europe/Amsterdam") Query executed: The R Journal Vol.

Summary
Linking R with data storage services such as relational databases has become essential for sound data and project management.However, some users may be hesitant to adopt them for their projects, as databases (and SQL) can present a significant learning curve, translating data between R and databases is not always straightforward, or users may simply be unaware of free and open-source software such as PostgreSQL.In this paper, we presented rpostgis, an extension of the package RPostgreSQL, which we hope will provide a gateway to databases for current users of R, especially those working with spatial datasets.To summarize, rpostgis provides methods for transferring spatial data (vector and raster) between R and PostgreSQL/PostGIS, introduces a "data frame mode" for storing attributes of "data.frame"columns, and provides "SQL wrapper" functions for database management and maintenance, which can be also useful as learning tools for users unfamiliar with SQL and databases.
While we consider the base functionalities of rpostgis to be complete, we do foresee further development and evolution in several areas, and welcome collaboration to develop these.Providing a user-friendly R interface to the large library of PostGIS functions would be one area of great usefulness for spatial analysis.The "data frame mode" implementation is another area where progress could be made: for example, handling of R type attributes beyond "factor" and "POSIX*" types could be implemented in a modular fashion in dbWriteDataFrame and dbReadDataFrame, and requests and contributions are welcome.Likewise, more specific translations between R data types and PostgreSQL data types would be helpful improvement on current methods (e.g., translating R "integer" or "numeric" types to any of 10 PostgreSQL numeric data types); note that this work may be better suited for a generic package such as RPostgreSQL.
We also encourage users working with customized data classes for potentially large data sets in R to consider writing translations to a corresponding database model.For example, a related "extension" package of rpostgis is rpostgisLT (Dukai et al., 2017).This package specifically translates animal trajectory data objects stored as "ltraj" from the package adehabitatLT (Calenge, 2006) into a customized data model ("pgtraj") built and managed by rpostgisLT.This not only provides a useful storage option for potentially large animal tracking datasets; it also opens up the world of PostgreSQL/PostGIS to scientists working with trajectory datasets, who may previously have only used R and/or desktop GIS.
These solutions also allow R users to better share data with other users or applications, since databases provide a common data location and format for interoperabilty with other software.While it is beyond the scope of this paper, there are many other attractive aspects of databases (functions/triggers, automated backup and restore, multi-user control, user-friendly data sharing and visualizations through front-end server applications) which not only can make R users more efficient, but help broaden the reach of their projects.Finally, for scientists interested in reproducible science and research, databases provide a framework to script and automate a variety of data cleaning, summary, and analysis processes, as well as a stable, long-term storage solution.
To conclude this paper, we can let our example database speak, to show all GEOMETRY and RASTER columns created in the examples, and finally close the database connection by using RPostgreSQL::dbDisconnect, as it is good practice to discard all pending work and free up database The R Journal Vol.10/1, July 2018 ISSN 2073-4859
schema: dbSchema > dbSchema(conn, name, display = TRUE, exec = TRUE) Special privileges: dbSchema requires 'CREATE' privileges in the database.As previously mentioned, schemas are named storage partitions of a database, containing collections of other database objects.The function dbSchema creates a new schema name, or checks if it exists in the database: > dbSchema(conn, "rpostgis_demo") Query executed: CREATE SCHEMA "rpostgis_demo"; [1] TRUE The R Journal Vol.10/1, July 2018 ISSN 2073-4859 conn, name, date = "date", tz = NULL, display = TRUE, + exec = TRUE) contains a summary of all data transfer functions, including for non-spatial data).Outside of conn and name (discussed previously), there are several other arguments re-used across functions, which have consistent default values:• geom: this is the column name in the PostGIS table containing a Geometry or Geography data type.Defaults to "geom".
• rast: this is the column name in the PostGIS table containing a Raster data type.Defaults to "rast".•overwrite: in writing to database functions, this defaults to FALSE; overwrite = TRUE allows the user to delete ('DROP') the existing table and create a new one.

Table 1 :
Functions for data transfer between R and PostgreSQL/PostGIS in rpostgis.
Buffered points (as polygons) from meuse dataset, created and loaded from a PostGIS query.
Special privileges: pgWriteRast requires 'CREATE' privilege in schema.The pgWriteRast function sends R rasters to a new database table.We can demonstrate this loading the meuse.griddata set into a "SpatialPixelsDataFrame" object: TABLE IF EXISTS "public"."meuse_df"; Vacuuming a table is a maintenance operation that cleans out unused space in the database structure, generally due to recent changes to a table (adding or deleting rows, updating data, etc.).Another operation (analyze) is often performed along with a vacuum, and is used to update statistics about the table, which PostgreSQL uses to optimize queries.Both vacuum and analyze are optional operations, but recommended for tables that are frequently modified, as they can greatly improve query speeds.Note that PostgreSQL automatically vacuums (and analyze) databases with the autovacuum daemon.It is generally preferable to tune the daemon for automatic use, but dbVacuum provides a way to run the operation manually.The default behavior for dbVacuum is to both vacuum and analyze a table:The dbDrop function can be used to remove tables or schemas-just remember that this process is irreversible!Here we drop a table from the 'public' schema, which we created in previous examples: Finally, another "db*" function, dbTableInfo, returns a set of descriptive information about an existing database table, notably column names and types.This may be useful for reference prior to importing a table into R: