With the proliferation of sensors and the ease of data collection from online sources, large datasets have become the norm in many scientific disciplines, and efficient data storage, management, and retrival is imperative for large research projects. Relational databases provide a solution, but in order to be useful, must be able to be linked to analysis and visualization tools, such as R. Here, we present a package intended to facilitate integration of R with the open-source database software PostgreSQL, with a focus on its spatial extension, PostGIS. The package rpostgis (version 1.4.1) provides methods for spatial data handling (vector and raster) between PostGIS-enabled databases and R, methods for R "data.frame"
s storage in PostgreSQL, and a set of convenient wrappers for common database procedures. We thus expect rpostgis to be useful for both (1) existing users of spatial data in R and/or PostGIS, and (2) R users who have yet to adopt relational databases for their projects.
R has become an important tool for manipulating, analyzing, and displaying spatial (vector and raster) datasets. R has a growing set of contributed packages (see the CRAN Spatial task view). R can already import and export stand-alone vector (e.g, ESRI shapefiles) with relative ease using rgdal (Bivand et al. 2017) or maptools (Bivand and Lewin-Koh 2017), and raster datasets using raster (Hijmans 2016). With the proliferation and availability of geographic data from deployed sensors, open-GIS web data sources, and remotely-sensed environmental datasets (to name just a few), users are increasingly taking advantage of Database Management Systems (DBMS) with spatial database extenders, which store spatial data as an object in a database table column. DBMS are especially useful to researchers and scientists managing project datasets, due to the ability to store large amounts of data efficiently, specify data relationships among tables, maintain data integrity using constraints, ensure long-term storage, manage multiple users, and add, update, and retrieve data efficiently.
PostgreSQL (The PostgreSQL Global Development Group 2017) is an advanced relational DBMS, and it is also free and open-source, making it popular and accessible to a large base of users. The PostgreSQL extension PostGIS (PostGIS Project Steering Committee 2017) allows PostgreSQL to handle spatial data by introducing spatial data types (Geometry, Geography, Raster, and Topology), along with a library of functions which operate on objects of these types. For R users, the package RPostgreSQL (Conway et al. 2017), a backend for the generic R database interface package DBI (R Special Interest Group on Databases (R-SIG-DB) et al. 2017), provides the driver and methods to connect and interface with a PostgreSQL database, to do a variety of data import and export tasks, and to query the database. However, not all data types supported by PostgreSQL can be imported into equivalent R data types using RPostgreSQL; among these are the PostGIS spatial data types. In addition, some data types in R possess attributes (e.g., POSIX* time zones), which are not handled by RPostgreSQL reading and writing. Finally, while RPostgreSQL is an essential tool for users of PostgreSQL databases and R, it is not designed to facilitate new database users, who may not be familiar with database techniques, terminology, or Structured Query Language (SQL), the language used to interact with a DBMS.
Noting these gaps, we designed
rpostgis as a
general-purpose extension package for RPostgreSQL. The primary
goal of rpostgis is to provide spatial data handling between R and
PostGIS, and as such, it includes methods for bi-directional transfer of
both vector and raster spatial datasets between R and PostGIS, spatial
projection handling, and other PostGIS-related tools. In addition, the
package also features methods for users who wish to work primarily (or
exclusively) in R, including saving of R "data.frame"
s (including data
types and attributes) to PostgreSQL, and “SQL wrapper” functions which
execute common database procedures. The latter provide not only
convenient, script-able access to these procedures through R, but also
can function as learning tools for new users of DBMS and SQL. In this
paper, we explain and demonstrate the purpose and usage of
rpostgis functions within a general workflow, following the three
focal areas described above (PostGIS-specific, "data.frame"
handling,
and SQL wrappers).
While it is far beyond the scope of this paper to provide a complete introduction to PostgreSQL, it is important to introduce several basic features of the DBMS, and the common terms we use to refer to them, especially concerning their usage in rpostgis.
A PostgreSQL install creates a PostgreSQL server, which is the host of one or more databases. Databases within a server are self contained (data cannot be shared across databases). Users interact with the database server using Structured Query Language (SQL) to write queries. Within a database, there are one or more named schemas, which are collections of database objects. In PostgreSQL, on new database creation, a default schema is created in the database with the name public.
Schemas can be used to organize objects in a database. Schemas can
contain many different objects, including functions, data types, and
sequences; the objects which users of rpostgis should be most
aware of are tables and views. Tables store data, while standard views
only store an SQL query, that when executed, creates a virtual table (in
memory), using data from other tables or views. There is no functional
difference between how tables and views (which both fall under the
umbrella term relation) are referred to within a query, with the
convention being schema_name.relation_name
. Relations typically
contain one or more columns, and each column has a defined data type
(e.g., integer, character, timestamp with time zone), and optionally a
defined sub-type, that specifies what type of data the column can
contain. A column can also store records representing a spatial type,
which is described in the next section. Relations have 0 or more
records, or rows, which contain the data.
In rpostgis, most functions perform an action on a relation.
Following RPostgreSQL convention, the relation name is supplied to
the argument name
, which can be given either a length-one or
length-two character vector. One-length character elements are
interpreted as a relation name, and looked for in the default user
schema(s) (by default, public
). Length-two character vectors are
interpreted as schema and relation name, e.g.,
name = c("schema_name","table_name")
. It is generally recommended to
provide both schema and relation name, since identical relation names
can be re-used across schemas. For ease of reading, from this point on
we refer to relations (tables and views) generally as “tables”
throughout this paper.
This brief summary should provide the new user with enough terminology to understand this paper, and begin working with databases using rpostgis. New users are encouraged to learn more about PostgreSQL from its excellent online documentation (https://www.postgresql.org/docs/manuals/).
Vector spatial data in PostGIS tables are stored as GEOMETRY
(planar) or GEOGRAPHY (spherical) types in a column, and usually
have a specified sub-type, such as POINT, LINESTRING, or
POLYGON. Each record of a GEOMETRY/GEOGRAPHY column represents one
spatial feature, which represents one or more geometric objects. Records
containing multiple geometric objects in one feature are specified as a
MULTI sub-type (e.g., MULTIPOLYGON). Geometry and geography
columns can also store a spatial reference identifier (SRID)
attribute, which is an integer value referring to the spatial reference
system (the “projection”) of the spatial data. Since spherical data are
represented using geographic longitude/latitude coordinates on the
sphere (specifically, the spheroid defined by WGS 84), the Geography
data type is restricted to SRID = 4326
.
Raster spatial data are stored in PostGIS using the RASTER data type. Raster columns also can store an SRID attribute defining the projection of the raster. One raster record can store one or more bands (rasters with identical spatial coverage but different data). In PostGIS, it is common to store one raster dataset in one PostGIS table, but split the raster into multiple tiles. A raster tile is one rectangular spatial subset, or block, of a raster dataset. One tile of a PostGIS raster type corresponds with one record in the corresponding table. Storing rasters in this way allows for more efficient spatial queries on the raster, by working on subsets of the raster.
Geometry, geography, and raster data types can all be exported from PostGIS using a variety of supplied export functions. In addition, the PostGIS library contains a large set of GIS functions to manage, construct, and edit Geometry objects, as well as measure spatial relationships between geometries, making it a full-featured GIS system. However, PostgreSQL/PostGIS is not packaged with a native software to visualize spatial data, so most users employ 3rd-party software to access the database whenever visualization is needed. For spatial data, desktop GIS systems like ArcGIS (ESRI 2017) and QGIS (QGIS Development Team 2017) can load spatial data directly as layers in the mapping environment, where users can take advantage of tools available in those software.
Because of its advanced statistical, processing, and visualization
capabilities, many users employ R as a front-end for their database
systems. In R, the long-time standard for handling vector spatial
objects are "Spatial*"
classes (e.g., "SpatialPoints"
,
"SpatialMultiPoints"
, "SpatialLines"
, and "SpatialPolygons"
and
their "Spatial*DataFrame"
variants) provided by the package
sp (Pebesma and Bivand 2005). Likewise the
raster package provides standard methods for handling raster
datasets in R, in "Raster*"
classes (e.g., "RasterLayer"
for
single-band, "RasterBrick"
or "RasterStack"
for multi-band rasters).
There are several notable packages and utilities that assist in transfer
of spatial data between PostGIS and sp "Spatial*"
-objects or
"Raster*"
-objects in R:
rgdal : The R package providing bindings to GDAL (Geospatial
Data Abstraction Library) provides the functions readOGR
and
writeOGR
to read, and write, respectively, PostGIS tables with
GEOMETRY
columns and R "Spatial*"
objects. A limitation of
readOGR
is that it cannot query the database table to obtain only
a subset of the table; similarly writeOGR
can only write new
tables (or overwrite existing ones); it does not allow writing to
existing database tables. The rgdal package also provides
methods for reading rasters from PostGIS tables with readGDAL
,
though GDAL writing of rasters to PostGIS tables is (as of writing)
not available, but potentially still under development
(https://trac.osgeo.org/gdal/wiki/frmts_wtkraster.html). One major
drawback of this solution is that the PostGIS driver for rgdal
is not included by default on Windows operating systems, meaning
most Windows users cannot use these methods without manually
installing rgdal from source.
rgeos (Bivand and Rundel 2017):
The functions readWKT
and writeWKT
provide conversion between
the WKT format and "Spatial*"
-objects. WKT format “well-known
text” is a standardized, text-based version of a vector geometry,
which can be written and read using PostGIS functions.
wkb (TIBCO Software Inc. 2016): The R
package offers the functions readWKB
and writeWKB
functions,
which convert vector geometries to and from the WKB format
(“well-Known binary”).
Alternatively, it is possible to use system calls to command-line utilities such as ogr2ogr, shp2pgsql or raster2pgsql, which brings the full power of these utilities at the expense of additional complexity, especially for scripting: This solution mixes two syntaxes (R syntax and the one of the command-line utility) and requires the passing of passwords at every call.
Read and write functions from both rgeos and wkb are
utilized in rpostgis import/export functions to provide the
translations of geometries from and to R’s "Spatial*"
objects. The
rpostgis functions wrap additional functionality around these
low-level functions in the vector data export (pgInsert
) and import
(pgGetGeom
) functions, to ease data transferability and managed data
stored alongside vector geometries in R "data.frame"
s or PostGIS
tables.
Also of importance is the recently developed package
sf (Pebesma 2017), which provides
Simple Features access in R, for vector spatial data. Simple Features is
an Open Geospatial Consortium (OGC) and International Organization for
Standardization (ISO) standard for storing and accessing geometry
objects, and is used by many RDBMS, including PostGIS. Like sp,
sf provides standardized, comprehensive spatial data handling in
R, notably using a "data.frame"
with a geometry-list column to store
Simple Feature objects, instead of the various "Spatial*"
classes
utilized in sp. The sf package also provides its own
functions for reading and writing geometries from PostGIS databases,
both through GDAL (using st_read
/st_write
) and directly (using
st_read_db
/st_write_db
). At the time of writing, rpostgis
reads from and writes to sp-class "Spatial*"
objects, though we
anticipate a migration to sf-class objects as they become the new
standard for vector spatial data in R.
A proper administration of a database, especially in a multi-user
context, requires a consideration of the permissions given to users to
access, write, or modify the database and its various objects. While it
is beyond the scope of this paper to provide an exhaustive presentation
of the issue, it is worth introducing the basic concepts from a
PostgreSQL perspective: permissions on a database object are called
“privileges,” and there are several different ones, allowing to access
(SELECT
) or modify them (INSERT
, UPDATE
), or even to create or
delete them (CREATE
and DELETE
, respectively), among others. By
default, the owner of an object (the user who created the object) and
superusers (generally the DBMS administrator(s)) can do anything with
it: they are granted all privileges on this object. PostgreSQL uses the
functions GRANT
and REVOKE
to assign or revoke privileges of a
certain user on a database object. We refer the interested reader (and
the database administrator) to the official PostgreSQL documentation for
more details about privileges.
Database privileges for a given user are not different by accessing the
database through a connection from R, although it adds a layer of
complexity as there will be R users and PostgreSQL users mixed in a
session (as well as system users, which are distinct from database
users, and we do not consider here purposely). In other words, an R user
could have different privileges on the database and its objects
depending on the PostgreSQL user they use to connect to the database. In
this paper, we will use the default PostgreSQL superuser postgres
,
which thus comes with all privileges granted on all database objects.
This approach is fine for many use cases, especially single-user
databases, but will be limited on multi-user databases, where privileges
are set up at a finer grain. In general, all rpostgis functions
that create, modify, or drop an object (e.g., dbIndex
, dbAsDate
,
dbDrop
) require ownership of the object being modified (so they
require the appropriate CREATE
privileges). In addition, we
specifically indicate special privileges that are necessary for the
different functions of rpostgis when relevant.
In the following two sections, we explain the main functionality and
usage of the rpostgis package. Note that functions within
rpostgis have one of two prefixes: “pg*
” (for PostGIS-oriented
functions, described in this section) and “db*
” (for general
PostgreSQL database functions, described in the next section), emulating
the pattern established by DBI. The general presentation of the
functions consist of (1) a code block presenting the generic version of
the function(s) with all arguments and defaults listed, as well as
specific privileges required, (2) descriptive text about the
function(s), and in most cases (3) example calls of the function(s).
In this section, we use the term “PostGIS” generically to refer to the
PostgreSQL/PostGIS DBMS; that is, PostgreSQL with the PostGIS extension
installed. All usage of PostGIS within R begins with a connection to a
database server and a particular database: here, using
RPostgreSQL::dbConnect
, we connect to the database rpostgis
which
exists on the local computer/server (localhost
):
> library(rpostgis)
> conn <- dbConnect(drv = "PostgreSQL", host = "localhost", dbname = "rpostgis",
+ user = "postgres", password = "postgres_password")
It is not the purpose of this article to detail connection details, so we refer to documentation from DBI and RPostgreSQL for this aspect.
pgPostGIS
> pgPostGIS(conn, topology = FALSE, tiger = FALSE, sfcgal = FALSE,
+ display = TRUE, exec = TRUE)
Special privileges:
pgPostGIS
requires ownership of the database (or superuser role) to install the extension (CREATE EXTENSION
) the first time.
The “starter” function for new users of rpostgis is pgPostGIS
.
This function installs and/or checks the version of PostGIS currently
available on the database. This is the first example of a function in
rpostgis which implements the display
and exec
arguments.
These respectively control printing (to the R console) of the
constructed SQL query, and execution of the query on the database. When
PostGIS is installed and ready to use, pgPostGIS
returns TRUE
. A
standard in rpostgis: is when a function does not return an
object, the function returns TRUE
if the action was successfully
executed in the database. Note that you can also enable the Topology,
Tiger Geocoder, and SFCGAL extensions, with topology
, tiger
, and
sfcgal
arguments set to TRUE
.
pgListGeom
and pgListRast
> pgListGeom(conn, geog = TRUE)
> pgListRast(conn)
These pgList*
functions return information on GEOMETRY/GEOGRAPHY and
RASTER columns stored in any database table. In a PostGIS database,
these information are stored in three views (geometry_columns
,
geography_columns
, and raster_columns
, respectively), all of which
are created during the installation of PostGIS in the public
schema of
the database.
pgSRID
> pgSRID(conn, crs, create.srid = FALSE, new.srid = NULL)
Special privileges:
pgSRID
requiresINSERT
privilege on thespatial_ref_sys
table withcreate.srid = TRUE
.
On installation of PostGIS, a new table spatial_ref_sys
is created in
the public
schema, which stores a large set of spatial reference
systems, each of which has a unique (integer) spatial reference
identifier (SRID), along with specifications of the projection in
PROJ.4 and WKT. The function pgSRID
allows users to check if a
projection they are using in R (stored as a "sp::CRS"
object, which
contains the PROJ.4 representation of the projection) has matching
SRID(s) in PostGIS. If there is no match and create.srid = TRUE
, then
pgSRID
adds it to the spatial_ref_sys
table. In these cases, the
user can also specify a desired SRID with new.srid
; otherwise,
rpostgis uses the next available value between 880001 and 889999.
In the following example, we demonstrate how to find the SRID for the
common WGS 1984 latitude/longitude projection:
> crs <- sp::CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0")
> pgSRID(conn, crs)
1] 4326 [
In some cases, pgSRID
may return multiple SRIDs, since there are
equivalent projections stored in multiple SRIDs in the spatial_ref_sys
table. Note that pgSRID
is used in spatial data writing functions in
rpostgis, with create.srid = TRUE
.
The following section describes the functions in rpostgis that
transfer spatial data between R and PostGIS databases (Table
1 contains a summary of all data transfer
functions, including for non-spatial data). Outside of conn
and name
(discussed previously), there are several other arguments re-used across
functions, which have consistent default values:
geom
: this is the column name in the PostGIS table containing a
Geometry or Geography data type. Defaults to "geom"
.
rast
: this is the column name in the PostGIS table containing a
Raster data type. Defaults to "rast"
.
overwrite
: in writing to database functions, this defaults to
FALSE
; overwrite = TRUE
allows the user to delete (DROP
) the
existing table and create a new one.
Function | Reads from: | Writes to: |
---|---|---|
pgInsert |
"Spatial*" , "Spatial*DataFrame" , "data.frame" |
new or existing database table (with GEOMETRY or GEOGRAPHY column for spatial objects) |
pgWriteRast |
"RasterLayer" , "RasterBrick" , "RasterStack" , "SpatialPixels" , "SpatialPixelsDataFrame" , "SpatialGrid" , "SpatialGridDataFrame" |
new database table |
dbWriteDataFrame |
"data.frame" |
new database table |
pgGetGeom |
database table/view with GEOMETRY or GEOGRAPHY column | "Spatial*" , "Spatial*DataFrame" |
pgGetRast |
database table/view with RASTER column | "RasterLayer" , "RasterBrick" , "SpatialPixels" , "SpatialPixelsDataFrame" , "SpatialGrid" , "SpatialGridDataFrame" |
pgGetBoundary |
database table/view with GEOMETRY, GEOGRAPHY, or RASTER column | "SpatialPolygons" |
dbReadDataFrame |
database table/view | "data.frame" |
In this section we begin to use example datasets. We first load the
well-known meuse
dataset containing information on environmental
observations from sample points along the Meuse river in the
Netherlands. After loading the data, we create a
"SpatialPointsDataFrame"
object, setting its associated projection
(oblique stereographic for the Netherlands, i.e. EPSG 28992) as the
proj4string
attribute, which is a standardized character
representation of the projection of class "CRS"
:
> library(sp)
> data("meuse")
> meuse <- SpatialPointsDataFrame(meuse[, 1:2], data = meuse[,
+ 3:length(meuse)], proj4string = sp::CRS("+init=epsg:28992"))
> class(meuse)
1] "SpatialPointsDataFrame"
[attr(,"package")
1] "sp" [
> head(meuse@data)
cadmium copper lead zinc elev dist om ffreq soil lime landuse1 11.7 85 299 1022 7.909 0.00135803 13.6 1 1 1 Ah
2 8.6 81 277 1141 6.983 0.01222430 14.0 1 1 1 Ah
3 6.5 68 199 640 7.800 0.10302900 13.0 1 1 1 Ah
4 2.6 81 116 257 7.655 0.19009400 8.0 1 2 0 Ga
5 2.8 48 117 269 7.480 0.27709000 8.7 1 2 0 Ah
6 3.0 61 137 281 7.791 0.36406700 7.8 1 2 0 Ga
dist.m1 50
2 30
3 150
4 270
5 380
6 470
pgInsert
and pgGetGeom
> pgInsert(conn, name, data.obj, geom = "geom", df.mode = FALSE,
+ partial.match = FALSE, overwrite = FALSE, new.id = NULL,
+ row.names = FALSE, upsert.using = NULL, alter.names = FALSE,
+ encoding = NULL, return.pgi = FALSE, df.geom = NULL, geog = FALSE)
Special privileges:
pgInsert
requiresCREATE
privilege in schema for new tables,INSERT
privilege on existing tables, andUPDATE
privilege when usingupsert.using
argument.
As evidenced by the large number of possible arguments, pgInsert
is a
flexible function that aims to provide a variety of methods for PostGIS
INSERT
s, from data originating in R as an object of class
"Spatial*"
, "Spatial*DataFrame"
, or "data.frame"
. The most basic
usage requires just a PostgreSQL connection, a database table name to
insert into, and a data object of one of the three types. In our
example, we also utilize the new.id
argument, providing a new column
name for a sequential ID ("gid"
):
> pgInsert(conn, "meuse", meuse, new.id = "gid")
Creating new table...
Using writeWKB from wkb package...
"public"."meuse" Data inserted into table
1] TRUE [
The data was inserted into the new database table meuse_sp
in the
public
schema, printing out several informative messages, including an
indication that wkb::writeWKB
was used to convert spatial objects to
well-known binary in order to write to the database (if the optional
package wkb is not installed, rgeos::writeWKT
is used instead).
So that is clear to users how and where data is written to the database,
pgInsert
is a fairly verbose function—however, these messages can be
hidden by wrapping the function call within suppressMessages(...)
.
Since pgInsert
can also insert into existing tables, a partial.match
argument allows specifying if all columns in the R object must be
present in the database in order to do the insert. The default is
partial.match = FALSE
, meaning if any column is found in the R object
but not in the database table, an error is printed and no data is
written in the database. Note that the check is
uni-directional—database table columns not found in the R data object
do not affect the insert, except if the database column definition
requires that it have assigned values (e.g., NOT NULL
with no assigned
DEFAULT
value).
> pgGetGeom(conn, name, geom = "geom", gid = NULL, other.cols = TRUE,
+ clauses = NULL, boundary = NULL, query = NULL)
Special privileges:
pgGetGeom
requiresSELECT
on the table, andCREATE
in the schema to create a new view when bothquery
andname
are not null
pgGetGeom
returns vector spatial data from a PostGIS table storing a
Geometry or Geography—the column name of either type can be supplied
to the geom
column. We can retrieve the full meuse
table which we
just wrote to the database, demonstrating the most basic use of the
function:
> meuse.db <- pgGetGeom(conn, "meuse")
in SpatialPoints*-class. Returning Point types
A message prints the type of "Spatial*"
class that is created in R.
The other.cols
argument can be a vector of character names of database
table columns, indicating which (if any) columns to return with the
spatial data. The default is the full table (all columns), but
other.cols = FALSE
returns the spatial data only (i.e., a "Spatial*"
object). The clauses
argument can take additional SQL to modify the
data to take from the table. For instance, using a WHERE
clause allows
filtering results based on specific filter, as long as it is valid SQL
(in this case, indexing the filtering column will significantly speed up
the query; see the section “Create an index: dbIndex
” below).
Conversely, a full SQL query can be provided using the query
argument;
in this case the names
argument can be NULL
, or set to the name of a
(new) view to create in the database using the specified query
. This
is the method to use when it is necessary to combine multiple database
tables and return a geometry, or use PostGIS functions within a query,
which highlights the potential of rpostgis for complex GIS
operations in the database.
For example, we can use the PostGIS function ST_Buffer
to create 100-m
buffers around points in our meuse
table, and then transform the layer
to another projection using ST_Transform
, with SRID = 4326
(WGS
1984, lat-lon). Note that we specified the output geometry in the SQL
query AS geom
, which is the default name for GEOMETRY columns for
pgGetGeom
. This query returns only the original points ID (gid
) and
a GEOMETRY column that can be directly loaded into a "SpatialPolygons"
object using pgGetGeom
. Since we specify gid
as the unique ID for
meuse.buff
(using gid = "gid"
), and there are no other columns of
data to import, there is no data frame associated with this object. We
can then plot the result to highlight the “flat” shape of the buffers
after reprojection (Figure 1):
> query <- "SELECT gid, ST_Transform(ST_Buffer(geom, 100), 4326) AS geom FROM meuse;"
> meuse.buff <- pgGetGeom(conn, name = "meuse_buff", query = query,
+ gid = "gid")
in SpatialPolygons*-class. Returning Polygon types
"public"."meuse_buff". Created view
> par(mar = c(2, 2, 2, 2))
> plot(meuse.buff, axes = TRUE, asp = 1)
pgWriteRast
and pgGetRast
> pgWriteRast(conn, name, raster, bit.depth = NULL, blocks = NULL,
+ constraints = TRUE, overwrite = FALSE)
Special privileges:
pgWriteRast
requiresCREATE
privilege in schema.
The pgWriteRast
function sends R rasters to a new database table. We
can demonstrate this loading the meuse.grid
data set into a
"SpatialPixelsDataFrame"
object:
> data("meuse.grid")
> meuse.grid <- SpatialPointsDataFrame(meuse.grid[, 1:2], data = meuse.grid[,
+ 3:length(meuse.grid)], proj4string = sp::CRS("+init=epsg:28992"))
> gridded(meuse.grid) <- TRUE
> class(meuse.grid)
1] "SpatialPixelsDataFrame"
[attr(,"package")
1] "sp" [
The rpostgis package works with "Raster*"
-class or gridded
"Spatial*"
objects (i.e. "SpatialPixels"
, "SpatialGrid"
, and their
"*DataFrame"
counterparts). We can create directly send meuse.grid
,
a "SpatialPixelsDataFrame"
object, to a new raster table in the
database using pgWriteRast
. Its five variables are converted into
raster “bands” in the PostGIS data structure:
> pgWriteRast(conn, "meuse_rast", meuse.grid)
5 band(s) into 1 x 1 blocks... Splitting
1] TRUE [
Since pgWriteRast
is only for new database tables (or overwrites), the
column storing the raster in the database is fixed to "rast"
. The
function automatically splits the raster into tiles (called “blocks” in
the raster package), with one tile for each record in the table,
and adds automatic raster constraints (when constraints = TRUE
, the
default). Setting raster constraints in PostGIS allows the raster
overview data to be registered correctly in the raster_columns
view,
and include attributes of the raster such as scale, extent, SRID, and
checks on alignment between blocks in the raster table.
In this example case, we see that the raster only includes one block
(\(1 \times 1\)), since meuse.grid
is relatively small (~8,000 cells);
the default target block size in pgWriteRast
is 10,000 cells (e.g., a
\(100 \times 100\) cell raster). Optionally, the user can specify the
exact number of blocks to split the raster into using the blocks
argument set to either a length-one or two integer vector (e.g.,
blocks = 5
splits the raster into \(5 \times 5\) blocks, and
blocks = c(2, 4)
splits the raster into \(2 \times 4\) blocks). Note
that specifying a smaller number of blocks will reduce write time using
pgWriteRast
. Block size should largely be determined by usage within
PostGIS. For example, with larger blocks, PostGIS queries that only
summarize across blocks will be less precise (since more total raster
cells will fall within these blocks), and these queries could also be
slower as a result.
The function pgWriteRast
uses only the R raster package and SQL
queries to create and populate rasters, meaning it is not dependent on
external software or command-line utilities. While this makes it
particularly useful as a cross-platform solution, and for new users who
are not familiar with existing command-line tools for loading rasters
into PostGIS (e.g., raster2pgsql
), it may be less memory-efficient and
take more time to process rasters than these command-line options. Users
who frequently work with very large rasters should keep this
consideration in mind.
> pgGetRast(conn, name, rast = "rast", bands = 1, boundary = NULL)
Special privileges:
pgGetRast
requiresSELECT
privilege on the table.
Loading rasters from the database can be done with pgGetRast
, where
the column to retrieve from defaults to "rast"
. Since we did not
specify any bands
, the default is to return the first band:
> lc.db <- pgGetRast(conn, "meuse_rast")
> summary(lc.db)
Object of class SpatialPixelsDataFrame:
Coordinates
min max178440 181560
x 329600 333760
y : TRUE
Is projected:
proj4string +init=epsg:28992 +proj=sterea +lat_0=52.15616055555555
[+lon_0=5.38763888888889 +k=0.9999079 +x_0=155000 +y_0=463000
+ellps=bessel
+towgs84=565.4171,50.3319,465.5524,-0.398957,0.343988,-1.87740,4.0725
+units=m +no_defs]
: 3103
Number of points:
Grid attributes
cellcentre.offset cellsize cells.dim178460 40 78
s1 329620 40 104
s2 :
Data attributes
part.a :0.0000
Min. 1st Qu.:0.0000
:0.0000
Median :0.3986
Mean 3rd Qu.:1.0000
:1.0000 Max.
Multi-band writing and reading of is also possible with pgWriteRast
and pgGetRast
, as illustrated in this example using the RGB
RasterBrick
of the R logo (available in the raster
package as a
raster of \(77 \times 101\) pixels, with three layers for red, green, and
blue values). We also demonstrate the boundary
argument method of
pgGetRast
by providing an extent as four numbers in the raster’s
projection [top, bottom, right, left]. We finally plot the output in
Figure 2:
> rlogo <- raster::brick(system.file("external/rlogo.grd", package = "raster"))
> pgWriteRast(conn, "rlogo", rlogo)
3 band(s) into 1 x 1 blocks... Splitting
1] TRUE [
> just.r <- pgGetRast(conn, "rlogo", bands = TRUE, boundary = c(60, 0, 95, 35))
> just.r
: RasterBrick
class : 60, 60, 3600, 3 (nrow, ncol, ncell, nlayers)
dimensions : 1, 1 (x, y)
resolution : 35, 95, 0, 60 (xmin, xmax, ymin, ymax)
extent : +proj=merc +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
coord. ref. : in memory
data source : red, green, blue
names : 0, 0, 1
min values : 255, 255, 255 max values
> raster::plotRGB(just.r)
The R data class, proj4string
attribute, and band/column names of R
raster objects are also saved in the raster table, and are imported when
pgGetRast
re-creates the raster in R. The boundary
argument can also
be specified as a "Spatial*"
object, where the raster is clipped to
the bounding box of the object. This can greatly increase the speed of
the import, and is recommended when working with large rasters.
Users should note that "Raster*"
-class data types are handled natively
within the rpostgis raster functions. "Spatial*"
objects are
converted to/from "Raster*"
s within the function execution; as such,
there may be a performance benefit to working with "Raster*"
objects.
pgGetBoundary
> pgGetBoundary(conn, name, geom = "geom", clauses = NULL)
Special privileges:
pgGetBoundary
requiresSELECT
privilege on the table.
The function pgGetBoundary
can return the outer bounding box of all
spatial data (or a subset using clauses
) in a PostGIS table, and the
geom
argument can actually take the column name of a Geometry,
Geography, or Raster column. A single feature in a "SpatialPolygons"
object is returned in the original projection of the spatial data, as in
this example using the meuse_dist
raster table previously created:
> (m.bound <- pgGetBoundary(conn, "meuse_rast", "rast"))
: SpatialPolygons
class : 1
features : 178440, 181560, 329600, 333760 (xmin, xmax, ymin, ymax)
extent : +proj=sterea +lat_0=52.15616055555555 +lon_0=5.38763888888889
coord. ref. +k=0.9999079 +x_0=155000 +y_0=463000 +ellps=bessel
+towgs84=565.2369,50.0087,465.658,-0.406857,0.350733,-1.87035,4.0812
+units=m +no_defs
pgMakePts
and pgMakeStp
> pgMakePts(conn, name, colname = "geom", x = "x", y = "y", srid,
+ index = TRUE, display = TRUE, exec = TRUE)
>
> pgMakeStp(conn, name, colname = "geom", x = "x", y = "y", dx = "dx",
+ dy = "dy", srid, index = TRUE, display = TRUE, exec = TRUE)
These are specialized functions that can be used to apply a new Geometry
column to an existing database table, and are of particular use for
managing sequential location data tables (i.e., trajectories). While
pgMakePts
requires only columns storing x
and y
coordinates to
create a POINT GEOMETRY
column, pgMakeStp
also needs increments
along the x/y dimensions to the next point, and builds the line segments
connecting those points into a LINESTRING GEOMETRY
column.
"data.frame"
s in PostgreSQL: dbWriteDataFrame
and dbReadDataFrame
> dbWriteDataFrame(conn, name, df, overwrite = FALSE, only.defs = FALSE)
> dbReadDataFrame(conn, name, df = NULL)
Special privileges:
dbWriteDataFrame
requiresCREATE
privilege in schema.
Data frame columns have attributes which store important information
about the class of data in the column, as well information specific to
that class. However, when writing a "data.frame"
object to databases
(e.g, using RPostgreSQL::dbWriteTable
), these attributes are not
carried over. For users who want to work in R, store data in a database,
and then wish to retrieve it in the same state that it was prior to
storing, rpostgis introduces a “data frame mode” for its data
writing and reading functions.
The first time it is used in a schema, data frame mode creates a new
database table named .R_df_defs
to store column types and attributes.
Within the data table itself, data frame mode adds a column to store the
"data.frame"
row names (in .R_rownames
), and another to act as the
database table’s primary key (in .df_pkid
), and provide sorting for
the "data.frame"
. Note that since .R_df_defs
and .df_pkid
are only
intended for internal rpostgis usage, we begin them with the
non-standard “.” to function as a flag to users of the database. It is
not recommended to name tables commonly referred to by users in this
way, since non-standard naming of tables and column identifiers will
require that they be double-quoted whenever they are used in a query.
When the table is imported into R using rpostgis reading functions
(dbReadDataFrame
or pgGetGeom
), these metadata are applied to the
resulting data frame. Currently (rpostgis version 1.4.0), the only
data type attributes which are stored and re-created using data frame
mode are time zones of "POSIXct"
objects, and levels of factors
(including ordered factors). However, handling of specific attributes
for other data types can be added in a modular way to the main data
frame mode functions (dbWriteDataFrame
and dbReadDataFrame
).
We can demonstrate the utility of data frame mode with a simple example
on the dataset meuse
, this time as a data frame. To further
demonstrate the utility of data frame mode, we first add a date-time
("POSIXct"
) column to meuse
:
> data("meuse")
> meuse$example_time <- seq(as.POSIXct("2010-01-01 12:00", tz = "Europe/Amsterdam"),
+ by = "1 day", length.out = nrow(meuse))
> head(meuse$example_time)
1] "2010-01-01 12:00:00 CET" "2010-01-02 12:00:00 CET"
[3] "2010-01-03 12:00:00 CET" "2010-01-04 12:00:00 CET"
[5] "2010-01-05 12:00:00 CET" "2010-01-06 12:00:00 CET" [
First, we can demonstrate the base writing and reading of this table
using RPostgreSQL, in particular the functions dbWriteTable
and
dbReadTable
:
> RPostgreSQL::dbWriteTable(conn, "meuse_base", meuse)
1] TRUE [
> meuse.base <- RPostgreSQL::dbReadTable(conn, "meuse_base")
> all.equal(meuse, meuse.base)
1] "Component \"ffreq\": 'current' is not a factor"
[2] "Component \"soil\": 'current' is not a factor"
[3] "Component \"lime\": 'current' is not a factor"
[4] "Component \"landuse\": 'current' is not a factor"
[5] "Component \"example_time\": Mean absolute difference: 21274.84" [
The comparison with all.equal
indicates that four of the columns that
were factors in meuse
are no longer factors, and there is a time
difference between the example_time
columns in the two data frames. In
this case, times in meuse
were set to the time zone
“Europe/Amsterdam”, while our database (and R base) time zone is
“America/New_York”. The functions dbWriteTable
and dbReadTable
ignore time zone specifications in a "data.frame"
, assuming all
timestamps are in the local computer’s time zone. In case of mismatch
between the data and the user time zone, this results in incorrect
timestamps in both the database table (meuse_base
), and the recreated
R data frame (meuse.base
), as is the case here.
Using data frame mode methods, meuse
can be sent to the database, and
re-created in R, with no alterations:
> dbWriteDataFrame(conn, "meuse_df", meuse)
created ("public".".R_df_defs"). New R data frame definitions table
1] TRUE [
> meuse.df <- dbReadDataFrame(conn, "meuse_df")
> all.equal(meuse, meuse.df)
1] TRUE [
Since data frame mode writes column attributes (e.g., factor levels)
based on the data exactly as it appears in an R "data.frame"
,
dbWriteDataFrame
can only be used for new database tables (or
overwrite of existing ones). Data frame mode is also available for
"Spatial"
data using pgInsert
, with df.mode = TRUE
(the default is
set to FALSE). pgGetGeom
will automatically use "data.frame"
definitions (when they exist) for importing "Spatial*"
datasets. For
example, we can again remake meuse
as a "SpatialPointsDataFrame"
,
and send it to the database, overwriting the previous meuse_df
table:
> meuse <- SpatialPointsDataFrame(meuse[, 1:2], data = meuse[,
+ 3:length(meuse)], proj4string = sp::CRS("+init=epsg:28992"))
> pgInsert(conn, "meuse_df", meuse, df.mode = TRUE, overwrite = TRUE)
Using writeWKB from wkb package...
: Query executed
"public"."meuse_df"; DROP TABLE IF EXISTS
"public"."meuse_df" Data inserted into table
1] TRUE [
> meuse_df <- pgGetGeom(conn, "meuse_df")
in SpatialPoints*-class. Returning Point types
> all.equal(meuse, meuse_df)
1] TRUE [
Using “data frame mode” with a "Spatial*"
class object with pgInsert
also saves the projection (as a proj4string
attribute) of the spatial
object in the database. This allows for equivalent re-creation of the
meuse
dataset in this example, since pgGetGeom
re-applies this
proj4string
when importing the table meuse_df
.
A collection of “db
”-prefixed functions are available to add, remove,
alter, and manage objects in the database, and SQL users will be
familiar with the keywords in the function and attribute names. These
functions share several common features:
The previously discussed conn
, name
, display
, and exec
arguments;
Each returns TRUE
when the statement was successfully executed on
the database.
In the following section, we present a short discussion of each function’s purpose and a usage example.
dbSchema
> dbSchema(conn, name, display = TRUE, exec = TRUE)
Special privileges:
dbSchema
requiresCREATE
privileges in the database.
As previously mentioned, schemas are named storage partitions of a
database, containing collections of other database objects. The function
dbSchema
creates a new schema name
, or checks if it exists in the
database:
> dbSchema(conn, "rpostgis_demo")
: Query executed
"rpostgis_demo"; CREATE SCHEMA
1] TRUE [
Note that all schema, table, and column names are applied in the database exactly as they are given in the rpostgis functions, including capitalization, special characters, digits, etc. In PostgreSQL, however, object identifiers need to start with a lowercase letter (a-z) or an underscore (_), and contain letters, underscores, and digits (0-9). Any object that does not follow these rules must be quoted (using double quotation marks) when used in SQL queries. For that reason, all object names used in rpostgis (as well as RPostgreSQL) are quoted by default in the SQL queries that it builds. Despite this, it is good practice to use object names that do not require quoting, including column names in any data frames sent to the database; using this approach facilitates SQL query writing.
To demonstrate the use of schemas, we insert meuse
into the
rpsostgis_demo
schema, only with a slight modification of the time
variable: example_time
is converted to character as to avoid incorrect
specification of the time zone, and is stored as text type in the
database.
To work in the rpostgis_demo
schema, we need to specify the schema
name as the first argument in name
. We can also utilize the new.id
argument to add a sequential number column meuse_id
to the new table,
and alter.names
to make sure all column names do not require quoting:
> meuse$example_time <- as.character(meuse$example_time)
> pgInsert(conn, c("rpostgis_demo", "meuse"), meuse, new.id = "meuse_id",
+ alter.names = TRUE)
Creating new table...
-compliant (lowercase and replacing special characters with '_'). Making column names DB
Using writeWKB from wkb package...
"rpostgis_demo"."meuse" Data inserted into table
1] TRUE [
dbComment
> dbComment(conn, name, comment, type = c("table", "view", "schema"),
+ display = TRUE, exec = TRUE)
Commenting is essential to maintaining a well-documented database, and comments can be applied to any database object. Here we apply comments to both the new schema and table we just created:
> dbComment(conn, "rpostgis_demo",
+ comment = "Schema storing example data for the 'rpostgis' paper.",
+ type = "schema")
: Query executed
"rpostgis_demo" IS 'Schema storing example data for the ''rpostgis'' paper.'; COMMENT ON SCHEMA
1] TRUE [
> dbComment(conn, c("rpostgis_demo", "meuse"),
+ comment = "Meuse river example dataset from R 'sp' package.",
+ type = "table")
: Query executed
"rpostgis_demo"."meuse" IS 'Meuse river example dataset from R ''sp'' package.'; COMMENT ON TABLE
1] TRUE [
dbColumn
> dbColumn(conn, name, colname, action = c("add", "drop"), coltype = "integer",
+ cascade = FALSE, display = TRUE, exec = TRUE)
Adding or dropping a table column can be achieved with the dbColumn
function: to demonstrate, we drop the dist
column, so as not to
confuse it with dist_m
:
> dbColumn(conn, c("rpostgis_demo", "meuse"), "dist", action = "drop")
: Query executed
"rpostgis_demo"."meuse" DROP COLUMN "dist" ; ALTER TABLE
1] TRUE [
dbAddKey
> dbAddKey(conn, name, colname, type = c("primary", "foreign"),
+ reference, colref, display = TRUE, exec = TRUE)
Keys are an important element of a database, and are part of a broader
group of elements used to maintain integrity of table data called
constraints. Using dbAddKey
, we can add a primary key (specifying a
non-null column acting as the unique identifier of a single table) or a
foreign key (specifying a column referencing a column in another
“foreign” table, where all values must have a match). Note that
dbAddKey
can build keys on multiple columns (or refer to foreign keys
on multiple columns) using the form colname = c("id1", "id2", "id3")
.
Since we applied a new sequential ID column when we inserted meuse
, we
can now designate it as a primary key:
> dbAddKey(conn, c("rpostgis_demo", "meuse"), "meuse_id", type = "primary")
: Query executed
"rpostgis_demo"."meuse" ADD PRIMARY KEY ("meuse_id"); ALTER TABLE
1] TRUE [
dbIndex
> dbIndex(conn, name, colname, idxname, unique = FALSE, method = c("btree",
+ "hash", "rtree", "gist"), display = TRUE, exec = TRUE)
Indexes are another important database element, as they can store
information which helps queries run efficiently on data which are
commonly referenced (e.g., using a foreign key), sorted (e.g.,
timestamps), or compared to other columns (e.g., geometries). As one
would expect, they are especially important for large tables, and
geometry types (e.g., polygons representing complex boundaries or
lines), and are essential for to efficiently retrieve data with
pgGetGeom
and a WHERE
clause. Note that, similarly to keys,
dbIndex
can build indexes on multiple columns. We can add an index on
our point geometry using the preferred indexing method for GEOMETRY
in
PostGIS: GIST (generic index structure):
> dbIndex(conn, c("rpostgis_demo", "meuse"), "geom", method = "gist")
: Query executed
"meuse_geom_idx" ON "rpostgis_demo"."meuse" USING GIST ("geom"); CREATE INDEX
--
1] TRUE [
dbAsDate
> dbAsDate(conn, name, date = "date", tz = NULL, display = TRUE,
+ exec = TRUE)
The function dbAsDate
allows us to convert a time stored as TEXT
column into a TIMESTAMP WITH TIME ZONE
type. We can demonstrate this
with the example_time
column, which we know was added in R to
represent times in the “Europe/Amsterdam” time zone:
> dbAsDate(conn, c("rpostgis_demo", "meuse"), "example_time", tz = "Europe/Amsterdam")
: Query executed
"rpostgis_demo"."meuse"
ALTER TABLE "example_time" TYPE timestamptz
ALTER COLUMN
USING"example_time"::timestamp AT TIME ZONE 'Europe/Amsterdam';
1] TRUE [
dbVacuum
> dbVacuum(conn, name, full = FALSE, verbose = FALSE, analyze = TRUE,
+ display = TRUE, exec = TRUE)
Vacuuming a table is a maintenance operation that cleans out unused
space in the database structure, generally due to recent changes to a
table (adding or deleting rows, updating data, etc.). Another operation
(analyze) is often performed along with a vacuum, and is used to update
statistics about the table, which PostgreSQL uses to optimize queries.
Both vacuum and analyze are optional operations, but recommended for
tables that are frequently modified, as they can greatly improve query
speeds. Note that PostgreSQL automatically vacuums (and analyze)
databases with the autovacuum daemon. It is generally preferable to tune
the daemon for automatic use, but dbVacuum
provides a way to run the
operation manually. The default behavior for dbVacuum
is to both
vacuum and analyze a table:
> dbVacuum(conn, c("rpostgis_demo", "meuse"))
: Query executed
"rpostgis_demo"."meuse"; VACUUM ANALYZE
1] TRUE [
dbDrop
> dbDrop(conn, name, type = c("table", "schema", "view", "materialized view"),
+ ifexists = FALSE, cascade = FALSE, display = TRUE, exec = TRUE)
The dbDrop
function can be used to remove tables or schemas—just
remember that this process is irreversible! Here we drop a table from
the public
schema, which we created in previous examples:
> dbDrop(conn, "meuse_base")
: Query executed
"public"."meuse_base"; DROP TABLE
1] TRUE [
(
dbTableInfo)> dbTableInfo(conn, name, allinfo = FALSE)
Finally, another “db*
” function, dbTableInfo
, returns a set of
descriptive information about an existing database table, notably column
names and types. This may be useful for reference prior to importing a
table into R:
> dbTableInfo(conn, c("rpostgis_demo", "meuse"))
column_name data_type is_nullable1 meuse_id integer NO
2 cadmium double precision YES
3 copper double precision YES
4 lead double precision YES
5 zinc double precision YES
6 elev double precision YES
7 om double precision YES
8 ffreq text YES
9 soil text YES
10 lime text YES
11 landuse text YES
12 dist_m double precision YES
13 example_time timestamp with time zone YES
14 geom USER-DEFINED YES
character_maximum_length1 NA
2 NA
3 NA
4 NA
5 NA
6 NA
7 NA
8 NA
9 NA
10 NA
11 NA
12 NA
13 NA
14 NA
Linking R with data storage services such as relational databases has
become essential for sound data and project management. However, some
users may be hesitant to adopt them for their projects, as databases
(and SQL) can present a significant learning curve, translating data
between R and databases is not always straightforward, or users may
simply be unaware of free and open-source software such as PostgreSQL.
In this paper, we presented rpostgis, an extension of the package
RPostgreSQL, which we hope will provide a gateway to databases for
current users of R, especially those working with spatial datasets. To
summarize, rpostgis provides methods for transferring spatial data
(vector and raster) between R and PostgreSQL/PostGIS, introduces a “data
frame mode” for storing attributes of "data.frame"
columns, and
provides “SQL wrapper” functions for database management and
maintenance, which can be also useful as learning tools for users
unfamiliar with SQL and databases.
While we consider the base functionalities of rpostgis to be
complete, we do foresee further development and evolution in several
areas, and welcome collaboration to develop these. Providing a
user-friendly R interface to the large library of PostGIS functions
would be one area of great usefulness for spatial analysis. The “data
frame mode” implementation is another area where progress could be made:
for example, handling of R type attributes beyond "factor"
and
"POSIX*"
types could be implemented in a modular fashion in
dbWriteDataFrame
and dbReadDataFrame
, and requests and contributions
are welcome. Likewise, more specific translations between R data types
and PostgreSQL data types would be helpful improvement on current
methods (e.g., translating R "integer"
or "numeric"
types to any of
10 PostgreSQL numeric data types); note that this work may be better
suited for a generic package such as RPostgreSQL.
We also encourage users working with customized data classes for
potentially large data sets in R to consider writing translations to a
corresponding database model. For example, a related “extension” package
of rpostgis is
rpostgisLT
(Dukai et al. 2017). This package specifically translates animal trajectory
data objects stored as "ltraj"
from the package
adehabitatLT
(Calenge 2006) into a customized data model ("pgtraj"
) built and
managed by rpostgisLT. This not only provides a useful storage
option for potentially large animal tracking datasets; it also opens up
the world of PostgreSQL/PostGIS to scientists working with trajectory
datasets, who may previously have only used R and/or desktop GIS.
These solutions also allow R users to better share data with other users or applications, since databases provide a common data location and format for interoperabilty with other software. While it is beyond the scope of this paper, there are many other attractive aspects of databases (functions/triggers, automated backup and restore, multi-user control, user-friendly data sharing and visualizations through front-end server applications) which not only can make R users more efficient, but help broaden the reach of their projects. Finally, for scientists interested in reproducible science and research, databases provide a framework to script and automate a variety of data cleaning, summary, and analysis processes, as well as a stable, long-term storage solution.
To conclude this paper, we can let our example database speak, to show
all GEOMETRY and RASTER columns created in the examples, and finally
close the database connection by using RPostgreSQL::dbDisconnect
, as
it is good practice to discard all pending work and free up database
resources for other users:
> pgListGeom(conn)
schema_name table_name geom_column geometry_type type1 public meuse geom POINT GEOMETRY
2 public meuse_buff geom GEOMETRY GEOMETRY
3 public meuse_df geom POINT GEOMETRY
4 rpostgis_demo meuse geom POINT GEOMETRY
> pgListRast(conn)
schema_name table_name raster_column1 public meuse_rast rast
2 public rlogo rast
> RPostgreSQL::dbDisconnect(conn)
1] TRUE [
rgdal, maptools, raster, RPostgreSQL, DBI, rpostgis, sp, rgeos, wkb, sf, rpostgisLT, adehabitatLT
Databases, Spatial, SpatioTemporal, Tracking
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Bucklin & Basille, "rpostgis: Linking R with a PostGIS Spatial Database", The R Journal, 2018
BibTeX citation
@article{RJ-2018-025, author = {Bucklin, David and Basille, Mathieu}, title = {rpostgis: Linking R with a PostGIS Spatial Database}, journal = {The R Journal}, year = {2018}, note = {https://rjournal.github.io/}, volume = {10}, issue = {1}, issn = {2073-4859}, pages = {251-268} }