Ten things I learned making maps in R
I made choropleth maps of some New Zealand census data for a client. I needed to produce a dozen variations of the same basic map so I decided use R code to generate them.
It’s reasonably easy to make simple choropleth maps in R and there are lots of tutorials out there (eg this one by Nathan Yau). I also wanted to include a few geographic features on the maps (main roads, lakes, rivers), and some place labels. Something like these maps that I worked on with Figure.NZ, but without the interactivity.
For greatest control, I decided to use base graphics. I know about ggmap, but when I tried that previously I found it to be a bit slow to plot complex maps and tricky to style the maps exactly the way I want.
It turned out to be somewhat challenging to produce a map that looked good and was easily readable. The difficulties I encountered were partly due to my lack of experience with mapping in R, and because mapping functions are scattered across a number of R packages that aren’t all designed to work nicely together.
In case it’s helpful to someone, here are ten things I learned along the way.
- For handling spatial data, the Simple Features (sf) package is really cool because it represents spatial data as standard data frames with geometry data attached. This makes it easy to manipulate the data, using dplyr verbs and other Tidyverse packages. The older sp package uses custom spatial data objects that are a bit harder to work with.
- Plotting complex sf data to a PNG file seems to be much faster than plotting it on screen in RStudio.
- sf makes it really easy to join spatial and non-spatial data to make a choropleth map. It’s simply a matter of using something like dplyr’s
left_joinon the spatial and non-spatial data, assuming you have a common ID number or something in both datasets to use for the join.
- R plot output seems to lack more advanced anti-aliasing methods, which can make small features on maps hard to read and lines look jaggy. I got around this to some extent by outputting PNG files at very high resolution (eg 16,000 x 16,000 pixels) and scaling them down afterwards. But the results still don’t look as crisp and smooth as what you can get with Mapbox.
- Text labels on maps are often easier to read if they have a contrasting “halo” around them. The raster package has a text function for doing exactly this. Unfortunately, this expects a
Spatial*object as input (ie something from the sp package), not an sf data frame. Fortunately, sf provides the
as_Spatial()function for translating sf data frames into
- For some reason,
as_Spatial()expects a spatial features collection (sfc) data frame as its input. In my case I had a standard sf data frame of point locations and labels. To convert this to a
Spatial*object, I needed to use
as_Spatial(st_geometry(<my data frame>)). It took me a long time to figure out the intermediate
- Some, but not all, of the sf functions are prefixed with
sf_. This was a real brain-block for me when trying to remember sf functions.
- The maptools package provides a
pointLabel()function for arranging labels on a map in such a way that they don’t overlap too much. In theory this is very useful but I couldn’t get it to work well — it was pushing my labels so far away from their actual location that they became meaningless. I also couldn’t get
text()to work together so I could have non-overlapping map labels with halos.
- I wanted to put my map’s legend in the plot’s outer margin so that it wouldn’t overlap any important content in the map. That proved to be tricky. You can put a legend in the outer margin, if you set
xpd = NAwhen you call
legend(), but you also have to specify the x, y coordinates of the legend in terms of the coordinates used for the plot itself. In my case, since the different outputs were of varying geographic areas, it was tricky to work out what these coordinates should be for each map and get the legend in the right place automatically. I ended up putting the legend in the top left corner (using
x = "topleft"in
legend()) where it wouldn’t overlap too much map content.
- By default, R plots expand the horizontal and vertical axes 4% beyond the range of your data. This applies to plots of sf data frames too, so if you want to plot a particular geographic region, by default you will actually get something slightly larger. You can prevent this using
par(xaxs = “i”, yaxs = “i”). (Many times I have cursed
parwith its cryptic options).