Let's say you've got a spreadsheet with a metric for every county in a country that you want to map. There's no geo-data in the spreadsheet, so how to visualize it? This post expects some familiarity with QGIS, a free, open source mapping tool.
In my project, I was mapping cantons of the provinces of Ecuador, which are equivalent concepts to counties and states in the US. So below you'll sometimes see me use "counties" or "cantons" interchangeably.
Getting county-level boundaries to use for boundaries
A perfect source of county-level boundaries would appear to be the geoBoundaries project. However, I found two problems with it:
- First, the metadata for the county boundaries didn't include the state name. Or rather, I should say, I looked at the cantons in Ecuador they provided, and the province name was missing. To join with a spreadsheet, the combination of the county name and the state name needs to be used as a key, so this won't work.
- Second, when I compared Ecuador canton boundaries from this source with OpenStreetMap, and they did not always agree where the boundaries were! I was not able to determine which source was most accurate.
So the source of county-level data that I found that worked was OpenStreetMap. To access the data, you can query for the data you want using Overpass Turbo.
To construct your query, a bit of research is required, because the correct query to use can vary by country!
To query in Ecuador, the correct query was:
admin_level=6 and place=county in Ecuador
To find the correct value for
admin_level for your country, consult this this chart. For the United States, the correct value is also "6".
You can also try leaving out
place=county. That data might not be universally there.
After you visually see the result, you can use the "Export" option to save the result in the GeoJSON format. It can then be imported into QGIS under Layer➡️Add Layer➡️Vector Layer
Reality Check Your Data
At this point, I recommend exporting the data back into a CSV file to look at in a spreadsheet app. (I think QGIS has a spreadsheet view for the attribute table, but I need to find it...).
When reviewing the data for the cantons of Ecuador, I found that several lacked the "place=county" tag, and some had the province name missing and at least one had the wrong province given. I was able to correct these issues in OpenStreetMap upstream. That task is beyond the scope here, but you will at least need to resolve any issues in your own copy of the data.
Places in no county?!
There world is an interesting place. There are locations in Ecuador like El Piedrero that are in no "canton" and cantons that that not any province. Depending on where in the world you are mapping, be prepared to handle these cases if they exist there.
A good starting point is a page like the Cantons of Ecuador page that highlights anomalies like that. Look for a similar page for your target country.
The coastlines problem
For counties along a coast, the administrative boundary may extend into the ocean instead of following the coastline. This happened in the map data for Ecuador. That's probably not want you want for mapping.
To solve that, I found a source of coastline data, and downloaded the "Land Polygons" for the related region. I then copied the problematic canton to a new layer, and used "Vector: Geoprocessing Tools: Clip" to clip the canton to only the parts within the coastline, and then restored the result back to the original layer.
Creating a field to join on
Now you've got county-level data in pretty good shape, but want to join it with the spreadsheet data, QGIS needs to join on a single matching field in both data sets.
That field should a combination of the the county name and state name into a unique key. QGIS can help you create this field using the Field Calculator feature which is found within the Attribute Table, accessible via the F6 key.
Use the field calculator to create a new field in OSM data layer using a function like the following. Name it like "county_state" or "canton_province"
replace(title(CONCAT("name", ', ', "is_in:state")), map(' De ', ' de ', ' Del ', ' del ', 'Provincia De ','', 'Provincia Del ',''))
To break that down:
- First, join the canton name and province name with a a comma and space, like "$Canton, $Province"
- "Title case" the result
- Replace any instances of "De" with "de", and "Del" with "del"
- Remove "Province De" and "Provincia Del"
For a US-based project, maybe you need to remove "State of" and lowercase "Of" instead of "of".
Save your changes to the layer.
Opening your spreadsheet data, adding a column
QGIS can open and use spreadsheet data that doesn't have any geospatial data with it by joining on a column like the one we just created above.
- Under the Layer menu, select "Data Source Manager"
- Select "Delimited Text"
- Find the option to specify that the data has no spatial data.
On this layer, you'll also need a column that combines a county and state field. Again, use the "Field Calculator" button found within the Attribute Table for that. In my case, I had canton and province columns, so the formula looked like this:
REPLACE(TITLE(CONCAT("CANTON", ', ', "PROVINCIA")), ' De ', ' de ')
Joining spreadsheet data with map data and fixing data problems
Once you have matching columns in your spreadsheet and your map layer, you can join them. For that:
- Go to "Processing > Toolbox > Join Attributes by Field value"
- Select the map layer as the first layer and select your "county, state" column that you created
- Select the spreadsheet layer as the second layer and select your "county, state" column in that layer.
- There's an option to make a temporary layer for any unmatched counties, which might want to do.
- Press "Run".
Now everything works perfectly. Just kidding! Some of your column names might not have matched. In the data I had from Ecuador, there seemed to be nicknames for cantons, alternate names and at one least typo in the canton name.
Now you might have to enter an iterative process where you find and fix some mismatched names, fix your spreadsheet, try again, find more problems and repeat. Because I'm comfortable with the Linux command line, I saved all the find/replace operations I needed to in a shell script so I could easily repeat them later on new data from the source. If you don't have a way to automate the same find/replace operations again, it could be helpful to at least write them done if you need to repeat them. Here's my script looked like:
On Linux, I could then re-use the fixes later:
cat ecuador-survey-2.csv | ./ecuador-fix-cantons.sh >ecuador-survey-2-fixed-cantons.csv
There's also a more long-winded tutorial for joinining spreadsheet data with geo-spatial data.
I've joined my spreadsheet data with my map data, now what?
If you want to make a choropleth map– to color each county or canton based on a metric in your spreadsheet, then:
- Right click on the Joined map layer and select Properties➡️Symbology.
- For "Value", select the field you want to use. If the field contain values like "5%", you may need to strip out the percent sign with a function like:
- Select the Mode to classify the data you wish, and the number of classes. Consider limiting your classes to five to make the color scheme visually simpler and clearer.
- Press "Classify"
You've done it! For a more visual guide to making choropleth maps see Choropleth map in QGIS based on MS Excel Data. That tutorial also covers how to make a print layout that includes a legend.
- There's a QuickOSM plugin for QGIS that allows you to download OSM data using the Overpass query language directly into QGIS. This would save a step, but I found working with the Overpass Turbo websit efficient, so I haven't tried this. Let me know in a comment below if the plugin works well for you... or doesn't!