Insect Distributions and Biodiversity in the United States
Datasets
For this project, I used insect species occurrence data from the US Department of the Interior and cartographic boundary files from the United States Census Bureau.
The insect species occurrence dataset was tabular and contained an identified individual for each row. Each row included the date, coordinates, and scientific name for the individual. The dataset is hosted by the Global Biodiversity Information Facility, and this site was used to filter the occurrences to be within the 50 states before it was exported and used in the project. Once cleaned, a shape file was produced containing only the individuals that were identified at the species level, their year of occurrence, and the coordinates of the occurrence in WGS84.
The cartographic boundary files were used to determine in which states and counties each occurrence was reported. These boundary files used a resolution of 1:20m as this produced a good shape of state and county regions without causing difficulties when processing thousands of points in each shape.
Questions
How has biodiversity changed in the United States over time?
For this question, I hypothesized that insect biodiversity had decreased in the United States over time, possibly due to a lowering of species abundance. Using a biodiversity index like Simpson’s index or the Shannon-Wiener index, it would be possible to calculate the biodiversity with respect to the abundance of each species and how many of each species are found in a given region. Biodiversity over time would be evident in a line chart plotting the values of these biodiversity indices over the temporal range of the occurrence dataset for all occurrences in the United States. As well, this can be visualized using a choropleth map looking at state, county, and hex cell regions to get a more fine scale representation of insect biodiversity throughout the United States.
How have pollinator distributions changed over time?
For this question, I hypothesized that pollinator distributions had decreased in size over time. Changes in pollinator distributions would be evident by looking at the ranges of each species and their population densities, which could be visualized using a dot distribution map/bubble map or a choropleth map looking at the density of each species throughout the United States.
Charts
Chart 1 - Aggregate Biodiversity in the United States
This interactive chart plots the Simpson’s and Shannon-Wiener index values looking at all occurrences in the United States from this dataset. The X range can be shrunken using the slider widget to look at a subset of the years. Annotations are provided by hovering the mouse over the viewport, where the closest points vertical of the cursor will show the value and the associated year for the point on each line.
Idiom: Dual Y Axis Line Chart / Mark: Line
| Data: Attribute | Data: Attribute Type | Encode: Channel |
|---|---|---|
| Year | key, ordered | horizontal position (x-axis) |
| Simpson Biodiversity | value, quantitative | vertical position on a synchronized scale (y-axis) |
| Shannon-Wiener Biodiversity | value, quantitative | vertical position on a synchronized scale (y-axis) |
The chart was produced with the aggregate_biodiversity notebook, and the chart can be viewed from the aggregate_biodiversity HTML file.
Chart 2 - Choropleth Map of USA Biodiversity by State
This interactive visualization shows a map of the USA sectioned according to state boundaries. Each state uses an area mark which is saturated with the color blue to represent the amount of biodiversity that was recorded in the enclosed region. The range of years can be shrunken or expanded, filtering the number of years of occurrences to calculate biodiversity from. The color of each area can be changed using the two buttons, choosing whether to color the region according to its Simpson’s index value or its Shannon-Wiener index value. Hovering the cursor over an area will provide an annotation detailing the name of the state and the exact value calculated from the two indices.
Idiom: Choropleth Map / Mark: Area
| Data: Attribute | Data: Attribute Type | Encode: Channel |
|---|---|---|
| State | key, categorical | enclosed area on the map |
| Simpson’s Index | value, quantitative | magnitude, color saturation (saturation increases as biodiversity increases) |
| Shannon-Wiener Index | value, quantitative | magnitude, color saturation (saturation increases as biodiversity increases) |
The chart was produced with the biodiversity_state notebook, and the chart can be viewed from the biodiversity_state HTML file.
Chart 3 - Choropleth Map of USA Biodiversity by County
This interactive visualization shows a map of the USA sectioned according to county boundaries, a smaller region than the previous map. Each county uses an area mark which is saturated with the color blue to represent the amount of biodiversity that was recorded in the enclosed region. The range of years can be shrunken or expanded, filtering the number of years of occurrences to calculate biodiversity from. The color of each area can be changed using the two buttons, choosing whether to color the region according to its Simpson’s index value or its Shannon-Wiener index value. Hovering the cursor over an area will provide an annotation detailing the name of the state, county, and the exact value calculated from the two indices.
Idiom: Choropleth Map / Mark: Area
| Data: Attribute | Data: Attribute Type | Encode: Channel |
|---|---|---|
| County | key, categorical | enclosed area on the map |
| Simpson’s Index | value, quantitative | magnitude, color saturation (saturation increases as biodiversity increases) |
| Shannon-Wiener Index | value, quantitative | magnitude, color saturation (saturation increases as biodiversity increases) |
The chart was produced with the biodiversity_county notebook, and the chart can be viewed from the biodiversity_county HTML file.
Chart 4 - Heatmap of USA Biodiversity
This interactive visualization shows a map of the USA sectioned according to a generated set of hex grids, providing an even smaller region than the previous map. Sectioning the map this way provides a more realistic representation of animal habitats irrespective of state or county boundaries.
Each hex cell uses an area mark which is saturated with the color red to represent the amount of biodiversity that was recorded in the enclosed region. The range of years can be shrunken or expanded, filtering the number of years of occurrences to calculate biodiversity from. The color of each area can be changed using the two buttons, choosing whether to color the region according to its Simpson’s index value or its Shannon-Wiener index value. Hovering the cursor over an area will provide an annotation detailing the ID of the hex cell and the exact value calculated from the two indices. The size of each hex cell can be changed with the ‘Resolution’ tabs at the top of the graph, which will provide a smaller set of hex cells as resolution increases.
Idiom: Choropleth Map / Mark: Area
| Data: Attribute | Data: Attribute Type | Encode: Channel |
|---|---|---|
| Hex Cell | key, categorical | enclosed area on the map |
| Simpson’s Index | value, quantitative | magnitude, color saturation (saturation increases as biodiversity increases) |
| Shannon-Wiener Index | value, quantitative | magnitude, color saturation (saturation increases as biodiversity increases) |
The chart was produced with the biodiversity_hex notebook, and the chart can be viewed from the biodiversity_hex HTML file.
Chart 5 - Species Population Density in the USA
This interactive chart provides a visualization of population density and range for a selected species. Points on the map increase in size as the number of species of the same type increase in the area. The range of years can be shrunken or expanded, filtering the number of years from which populations are totaled. The species can be selected using the dropdown widget, which will update the map and allow for viewing the density and ranges of different insects. Hovering the cursor over a point will display an annotation showing the number of individuals of that species recorded in the area.
Idiom: Dot Distribution Map / Mark: Point
| Data: Attribute | Data: Attribute Type | Encode: Channel |
|---|---|---|
| Point | key, categorical | 2D location on the map |
| Population | value, quantitative | magnitude, size of circle glyph (size increases as population increases) |
The chart was produced with the distribution_bubblemap notebook, and the chart can be viewed from the distribution_bubblemap HTML file.
Chart 6 - Heatmap of USA Species Range and Density
This interactive chart is similar to the choropleth charts plotting biodiversity; instead, this one is used to plot population density across the United States. The range of years can be shrunken or expanded, filtering the number of years from which populations are totaled. The species can be selected using the dropdown widget, which will update the map and allow for viewing the density and ranges of different insects. Hovering the cursor over a hex cell will display an annotation showing ID of the cell and the number of individuals of that species recorded in the area.
Idiom: Choropleth Map / Mark: Area
| Data: Attribute | Data: Attribute Type | Encode: Channel |
|---|---|---|
| Hex Cell | key, categorical | enclosed area on the map |
| Population | value, quantitative | magnitude, color saturation (saturation increases as density increases) |
The chart was produced with the distribution_heatmap notebook, and the chart can be viewed from the distribution_heatmap HTML file.
Answers to Questions
With regard to my questions, charts 1, 2, 3, and 4 help answer the question of determining the changes in biodiversity of insects in the United States over time. Charts 5, and 6 address the question of how pollinator distributions have changed over time.
Using chart 1, it can be seen that biodiversity throughout the United Stats was low prior to the year 2000. However, this is most likely due to the lack of occurrence data during this time frame, which can be seen when viewing this year range in the choropleth maps, charts 2, 3, and 4. Biodiversity is showing a negative trend as years progress, though the possible explanations for this would require further research. This could be due there being relatively few insect trapping efforts throughout the United States, or other factors that effect insect populations.
With chart 5, it is possible to see how distributions of pollinators and other insects change over time. For example, with the western honey bee (Apis mellifera), its population has become denser in Delaware and has begun appearing more in Minnesota and California. These observations can be made with other species as well, all of which seem to suggest that pollinator ranges are expanding and becoming more prominent in certain areas of the United States. Chart 6 illustrates these areas of high density better, with areas containing the most individuals of a certain species being colored dark red.
Final Thoughts
Throughout the development of these charts, I was looking at how I could best visually mimic the charts in my sketches using available tools and libraries in Python. I came across the Bokeh library which appeared to be the best fit for this goal, though it did come with some challenges.
Most of the time in this project was spent familiarizing myself with the basics of geographic information systems, as this was a field I was unfamiliar with and creating these graphics would be difficult without knowing how to work with shape files, change projections, or generate a hex grid on a map.
Besides working with GIS tools, using Bokeh for creating interactive visualizations was rewarding, but took a lot of time and work. Part of this is due to the scarcity of documentation, where a lot of the available resources and tutorials are for outdated versions with many deprecated features.
Much of the interactivity in Bokeh is done through callbacks, which can be done in Python or JavaScript. For these charts, I chose to use JavaScript. Though I was not familiar with the language, one of the advantages is that the generated charts can be exported as an HTML file, which can be viewed on any Web browser and make it accessible to a wider audience. If Python callbacks were used, this would limit the charts to Jupyter notebooks, making it harder to present to other people such as stakeholders or policymakers.
Once I was familiar with GIS, JavaScript, and Bokeh, it would only take a couple of hours before one of these visualizations was produced.
References
-
Insect Species Occurrence Data from Multiple Projects Worldwide with Focus on Bees and Wasps in North America https://doi.org/10.15468/6autvb
-
2023 United States Cartographic Boundary Files https://catalog.data.gov/dataset/insect-species-occurrence-data-from-multiple-projects-worldwide-with-focus-on-bees-and-was-b3123
-
LibreTexts - Simpson’s Index and Shannon-Weiner Index https://stats.libretexts.org/Bookshelves/Applied_Statistics/Natural_Resources_Biometrics_(Kiernan)/10%3A_Quantitative_Measures_of_Diversity_Site_Similarity_and_Habitat_Suitability/10.01%3A_Introduction__Simpsons_Index_and_Shannon-Weiner_Index#title
-
Bokeh API - Twin Axes https://docs.bokeh.org/en/latest/docs/user_guide/basic/axes.html#twin-axes
-
Bokeh API - Range Slider https://docs.bokeh.org/en/latest/docs/first_steps/first_steps_9.html
-
H3 Pandas Documentation https://h3-pandas.readthedocs.io/en/latest/
-
H3 Global Hex Grid https://h3geo.org/docs/core-library/restable/#cell-areas
-
Creating Beautiful Hexagon Maps with Python https://python.plainenglish.io/creating-beautiful-hexagon-maps-with-python-25c9291eeeda
-
Glyph Size Legend https://discourse.bokeh.org/t/creating-a-legend-for-glyph-size/9587/2
-
Geoviews - Scatter & Bubble Maps https://coderzcolumn.com/tutorials/data-science/geoviews-scatter-and-bubble-maps
-
drgxfs - Normalize Value into an Arbitrary Range https://stats.stackexchange.com/q/281165
-
Data Science for Everyone - Introduction to Data Visualization Using Bokeh https://www.youtube.com/watch?v=VFpfZz6w9Oo&list=PL8eNk_zTBST-AwsJnOSxJGDf7jCMVIvXT