In spatial statistics the ability to visualize data and models superimposed with their basic social landmarks and geographic context is invaluable. ggmap is a new tool which enables such visualization by combining the spatial information of static maps from Google Maps, OpenStreetMap, Stamen Maps or CloudMade Maps with the layered grammar of graphics implementation of ggplot2. In addition, several new utility functions are introduced which allow the user to access the Google Geocoding, Distance Matrix, and Directions APIs. The result is an easy, consistent and modular framework for spatial graphics with several convenient tools for spatial data analysis.
Visualizing spatial data in R can be a challenging task. Fortunately the task is made a good deal easier by the data structures and plot methods of sp, RgoogleMaps, and related packages (Pebesma and Bivand 2006; Bivand et al. 2008; Loecher and Berlin School of Economics and Law 2013). Using those methods, one can plot the basic geographic information of (for instance) a shape file containing polygons for areal data or points for point referenced data. However, compared to specialized geographic information systems (GISs) such as ESRI’s ArcGIS, which can plot points, polygons, etc. on top of maps and satellite imagery with drag-down menus, these visualizations can be pretty disappointing. This article details some new methods for the visualization of spatial data in R using the layered grammar of graphics implementation of ggplot2 in conjunction with the contextual information of static maps from Google Maps, OpenStreetMap, Stamen Maps or CloudMade Maps (Wickham 2009, 2010). The result is an easy to use R package named ggmap. After describing the nuts and bolts of ggmap, we showcase some of its capabilities in a simple case study concerning violent crimes in downtown Houston, Texas and present an overview of a few utility functions.
Areal data is data which corresponds to geographical extents with polygonal boundaries. A typical example is the number of residents per zip code. Considering only the boundaries of the areal units, we are used to seeing areal plots in R which resemble those in Figure 1 (left).
While these kinds of plots are useful, they are not as informative as we would like in many situations. For instance, when plotting zip codes it is helpful to also see major roads and other landmarks which form the boundaries of areal units.
The situation for point referenced spatial data is often much worse. Since we can’t easily contextualize a scatterplot of points without any background information at all, it is common to add points as an overlay of some areal data—whatever areal data is available. The resulting plot looks like Figure 1 (right).
In most cases the plot is understandable to the researcher who has worked on the problem for some time but is of hardly any use to his audience, who must work to associate the data of interest with their location. Moreover, it leaves out many practical details—are most of the events to the east or west of landmark \(x\)? Are they clustered around more well-to-do parts of town, or do they tend to occur in disadvantaged areas? Questions like these can’t really be answered using these kinds of graphics because we don’t think in terms of small scale areal boundaries (e.g. zip codes or census tracts).
With a little effort better plots can be made, and tools such as maps, maptools, sp, or RgoogleMaps make the process much easier; in fact, RgoogleMaps was the inspiration for ggmap (Becker et al. 2013; Bivand and Lewin-Koh 2013).
Moreover, there has recently been a deluge of interest in the subject of mapmaking in R—Ian Fellows’ excellent interactive GUI-driven DeducerSpatial package based on Bing Maps comes to mind (Fellows et al. 2013). ggmap takes another step in this direction by situating the contextual information of various kinds of static maps in the ggplot2 plotting framework. The result is an easy, consistent way of specifying plots which are readily interpretable by both expert and audience and safeguarded from graphical inconsistencies by the layered grammar of graphics framework. The result is a spatial plot resembling Figure 2. Note that map images and information in this work may appear slightly different due to map provider changes over time.
murder <- subset(crime, offense == "murder")
qmplot(lon, lat, data = murder, colour = I('red'), size = I(3), darken = .3)
One advantage of making the plots with ggplot2 is the layered grammar of graphics on which ggplot2 is based (Wilkinson 2005; Wickham 2010). By definition, the layered grammar demands that every plot consist of five components :
a default dataset with aesthetic mappings,
one or more layers, each with a geometric object (“geom”), a statistical transformation (“stat”), and a dataset with aesthetic mappings (possibly defaulted),
a scale for each aesthetic mapping (which can be automatically generated),
a coordinate system, and
a facet specification.
Since ggplot2 is an implementation of the layered grammar of graphics, every plot made with ggplot2 has each of the above elements. Consequently, ggmap plots also have these elements, but certain elements are fixed to map components : the \(x\) aesthetic is fixed to longitude, the \(y\) aesthetic is fixed to latitude, and the coordinate system is fixed to the Mercator projection.1
The major theoretical advantage of using the layered grammar in plotting maps is that aesthetic scales are kept consistent. In the typical situation where the map covers the extent of the data, in ggmap the latitude and longitude scales key off the map (by default) and one scale is used for those axes. The same is true of colors, fills, alpha blendings, and other aesthetics which are built on top of the map when other layers are presented—each is allotted one scale which is kept consistent across each layer of the plot. This aspect of the grammar is particularly important for faceted plots in order to make a proper comparison across several plots. Of course, the scales can still be tricked if the user improperly specifies the spatial data, e.g. using more than one projection in the same map, but fixing such errors is beyond any framework.
The practical advantage of using the grammar is even better. Since the
graphics are done in ggplot2 the user can draw from the full range of
ggplot2’s capabilities to layer elegant visual content—geoms, stats,
scales, etc.—using the usual ggplot2 coding conventions. This was
already seen briefly in Figure 2 where the
arguments of qmplot are identical to that of ggplot2’s qplot; much
more will be seen shortly.
The basic idea driving ggmap is to take a downloaded map image, plot
it as a context layer using ggplot2, and then plot additional content
layers of data, statistics, or models on top of the map. In ggmap this
process is broken into two pieces – (1) downloading the images and
formatting them for plotting, done with get_map, and (2) making the
plot, done with ggmap. qmap marries these two functions for quick
map plotting (c.f. ggplot2’s ggplot), and qmplot attempts to wrap
up the entire plotting process into one simple command (c.f. ggplot2’s
qplot).
get_map functionIn ggmap, downloading a map as an image and formatting the image for
plotting is done with the get_map function. More specifically,
get_map is a wrapper function for the underlying functions
get_googlemap, get_openstreetmap, get_stamenmap, and
get_cloudmademap which accepts a wide array of arguments and returns a
classed raster object for plotting with ggmap.
As the most important characteristic of any map is location, the most
important argument of get_map is the location argument. Ideally,
location is a longitude/latitude pair specifying the center of the map
and accompanied by a zoom argument, an integer from 3 to 20 specifying
how large the spatial extent should be around the center, with 3 being
the continent level and 20 being roughly the single building level.
location is defaulted to downtown Houston, Texas, and zoom to 10,
roughly a city-scale.
While longitude/latitude pairs are ideal for specifying a location, they
are somewhat inconvenient on a practical level. For this reason,
location also accepts a character string. The string, whether
containing an address, zip code, or proper name, is then passed to the
geocode function which then determines the appropriate
longitude/latitude coordinate for the center. In other words, there is
no need to know the exact longitude/latitude coordinates of the center
of the map—get_map can determine them from more colloquial (“lazy”)
specifications so that they can be specified very loosely. For example,
since
> geocode("the white house")
lon lat
-77.03676 38.89784 works, "the white house" is a viable location argument. More details
on geocode and other utility functions are discussed at the end of
this article.
In lieu of a center/zoom specification, some users find a bounding box
specification more convenient. To accommodate this form of
specification, location also accepts numeric vectors of length four
following the left/bottom/right/top convention. This option is not
currently available for Google Maps.
While each map source has its own web application programming interface
(API), specification of location/zoom in get_map works for each by
computing the appropriate parameters (if necessary) and passing them to
each of the API specific get_* functions. To ensure that the resulting
maps are the same across the various sources for the same
location/zoom specification, get_map first grabs the appropriate
Google Map, determines its bounding box, and then downloads the other
map as needed. In the case of Stamen Maps and CloudMade Maps, this
involves a stitching process of combining several tiles (small map
images) and then cropping the result to the appropriate bounding box.
The result is a single, consistent specification syntax across the four
map sources as seen for Google Maps and OpenStreetMap in Figure
3.
baylor <- "baylor university"
qmap(baylor, zoom = 14)
qmap(baylor, zoom = 14, source = "osm")
Before moving into the source and maptype arguments, it is important
to note that the underlying API specific get_* functions for which
get_map is a wrapper provide more extensive mechanisms for downloading
from their respective sources. For example, get_googlemap can access
almost the full range of the Google Static Maps API as seen in Figure
4.
set.seed(500)
df <- round(data.frame(
x = jitter(rep(-95.36, 50), amount = .3),
y = jitter(rep( 29.76, 50), amount = .3)
), digits = 2)
map <- get_googlemap('houston', markers = df, path = df, scale = 2)
ggmap(map, extent = 'device')
source and maptype arguments of get_mapThe most attractive aspect of using different map sources (Google Maps,
OpenStreetMap, Stamen Maps, and CloudMade Maps) is the different map
styles provided by the producer. These are specified with the maptype
argument of get_map and must agree with the source argument. Some
styles emphasize large roadways, others bodies of water, and still
others political boundaries. Some are better for plotting in a
black-and-white medium; others are simply nice to look at. This section
gives a run down of the various map styles available in ggmap.
Google provides four different familiar types—terrain (default), satellite (e.g. Figure 13), roadmap, and hybrid (e.g. Figure 12). OpenStreetMap, on the other hand, only provides the default style shown in Figure 3.
Style is where Stamen Maps and CloudMade Maps really shine. Stamen Maps has three available tile sets—terrain (e.g. Figures 2 or 13), watercolor, and toner (for the latter two see Figure 5).
qmap(baylor, zoom = 14, source = "stamen", maptype = "watercolor")
qmap(baylor, zoom = 14, source = "stamen", maptype = "toner")