Skip to content

Geospatial Data

Foreword

Notes and code snippets. Python 3. From DataCamp.


Geospatial Data in Python

Install all the packages.
Some packages are pre-requisites to the others: to install GeoPandas, it requires Shapely, and to install Shapely, RTree, GDAL, and Fiona should be installed first.

  • RTree: a ctypes Python wrapper of libspatialindex that provides a number of advanced spatial indexing features;
  • GDAL: translator library for raster and vector geospatial data formats;
  • Fiona: Fiona reads and writes spatial data files;
  • Shapely: Geometric objects, predicates, and operations;
  • GeoPandas: extends the datatypes used by pandas to allow spatial operations on geometric types;
  • PySAL: a library of spatial analysis functions written in Python intended to support the development of high-level applications;
  • Missingno: Missing data visualization module for Python.

The dataset

We take hurricane Florence‘s trajectory for plotting points on a map of the US States.

# Loading the packages
import geopandas

import numpy as np
import pandas as pd

from shapely.geometry import Point

import missingno as msn

import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

Let’s look at the geospatial file from the US States (a GeoJSON file).

# Loading the file
country = geopandas.read_file("data/gz_2010_us_040_00_5m.json")
country.head(3)
GEO_ID STATE NAME LSAD CENSUSAREA geometry
0 0400000US01 01 Alabama 50645.326 (POLYGON ((-88.124658 30.28364, -88.0868119999...
1 0400000US02 02 Alaska 570640.950 (POLYGON ((-166.10574 53.988606, -166.075283 5...
2 0400000US04 04 Arizona 113594.084 POLYGON ((-112.538593 37.000674, -112.534545 3...

It is GeoDataFrame, which has all the regular characteristics of a Pandas DataFrame.

# Printing
type(country)
1
geopandas.geodataframe.GeoDataFrame

The column (geometry) containing the coordinates is a GeoSeries.

# Printing
type(country.geometry)
1
geopandas.geoseries.GeoSeries

Each value in the GeoSeries is a Shapely Object: a point, a segment, a polygon (and a multipolygon).
Each object can represent something: a point for a building, a segment for a street, a polygon for acity, and multipolygon for a country with multiple borders inside.
For more information about each Geometric object, consult this article.

# Printing
type(country.geometry[0])
1
shapely.geometry.multipolygon.MultiPolygon

Similar to a Pandas DataFrame, a GeoDataFrame can be plotted.

# Plotting the multipolygon
country.plot();

As we may see, the US map is relatively small compared to the frame.
It’s because the information includes Alaska, Hawaii, and Puerto Rico.
For this tutorial purpose, we can exclude Alaska and Hawaii as the hurricane did not go anywhere near those two states.
We can also add the figure size and colour to customize the plot

# Plotting
# Excluding Alaska and Hawaii with a conditional selection
country[country['NAME'].isin(['Alaska','Hawaii']) == False].plot(figsize=(30,20),
                                                                 color='#3B3C6E');

# Loading the hurricane data
florence = pd.read_csv('data/florence.csv')
florence.head(3)
AdvisoryNumber Date Lat Long Wind Pres Movement Type Name Received Forecaster
0 1 08/30/2018 11:00 12.9 18.4 30 1007 W at 12 MPH (280 deg) Potential Tropical Cyclone Six 08/30/2018 10:45 Avila
1 1A 08/30/2018 14:00 12.9 19.0 30 1007 W at 12 MPH (280 deg) Potential Tropical Cyclone Six 08/30/2018 13:36 Avila
2 2 08/30/2018 17:00 12.9 19.4 30 1007 W at 9 MPH (280 deg) Potential Tropical Cyclone Six 08/30/2018 16:36 Avila

Exploratory Data Analysis

The first thing to do is EDA:

  • Check the information, data type;
  • Find out about missing values;
  • Explore the descriptive statistics.
# Printing
florence.info()
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 105 entries, 0 to 104
Data columns (total 11 columns):
AdvisoryNumber    105 non-null object
Date              105 non-null object
Lat               105 non-null float64
Long              105 non-null float64
Wind              105 non-null int64
Pres              105 non-null int64
Movement          105 non-null object
Type              105 non-null object
Name              105 non-null object
Received          105 non-null object
Forecaster        104 non-null object
dtypes: float64(2), int64(2), object(7)
memory usage: 9.1+ KB

Checking missing values using the missingno package.
There is only one missing value in column Forecaster which we don’t need.
So we can ignore it.

# The package alias is msn (missingno)
# Printing
msn.bar(florence,
        color='darkolivegreen');

# Descriptive statistics (numerical columns, Series, fields or features only)
# Printing
florence.describe()
Lat Long Wind Pres
count 105.000000 105.000000 105.000000 105.000000
mean 25.931429 56.938095 74.428571 981.571429
std 7.975917 20.878865 36.560765 22.780667
min 12.900000 18.400000 25.000000 939.000000
25% 18.900000 41.000000 40.000000 956.000000
50% 25.100000 60.000000 70.000000 989.000000
75% 33.600000 76.400000 105.000000 1002.000000
max 42.600000 82.900000 140.000000 1008.000000

We only need the date, the coordinates: lat (latitude) and long (longitude), Wind speed, Pressure, and Name.
Movement and Type are optional, but the rest can be dropped.

# Droping all unused features
florence = florence.drop(['AdvisoryNumber', 'Forecaster', 'Received'],
                         axis=1)

# Printing
florence.head(3)
Date Lat Long Wind Pres Movement Type Name
0 08/30/2018 11:00 12.9 18.4 30 1007 W at 12 MPH (280 deg) Potential Tropical Cyclone Six
1 08/30/2018 14:00 12.9 19.0 30 1007 W at 12 MPH (280 deg) Potential Tropical Cyclone Six
2 08/30/2018 17:00 12.9 19.4 30 1007 W at 9 MPH (280 deg) Potential Tropical Cyclone Six

It is important to check out the longitude and latitude.
Here the longitude is west.
Add - to the Long values to correctly plot the data.
The US are west of the Greenwich meridian (0 degree).
Everything east of the 0 degree is positive, everything west is negative.
The same goes for the northern/southern hemispheres.

# Modifying
florence['Long'] = 0 - florence['Long']

# Printing
florence.head(3)
Date Lat Long Wind Pres Movement Type Name
0 08/30/2018 11:00 12.9 -18.4 30 1007 W at 12 MPH (280 deg) Potential Tropical Cyclone Six
1 08/30/2018 14:00 12.9 -19.0 30 1007 W at 12 MPH (280 deg) Potential Tropical Cyclone Six
2 08/30/2018 17:00 12.9 -19.4 30 1007 W at 9 MPH (280 deg) Potential Tropical Cyclone Six

Let’s combine the latitude and longitude to create coordinates, which will subsequently be turned into a GeoSeries for visualization purpose.

# Combining
florence['coordinates'] = florence[['Long', 'Lat']].values.tolist()

# Printing
florence.head(3)
Date Lat Long Wind Pres Movement Type Name coordinates
0 08/30/2018 11:00 12.9 -18.4 30 1007 W at 12 MPH (280 deg) Potential Tropical Cyclone Six [-18.4, 12.9]
1 08/30/2018 14:00 12.9 -19.0 30 1007 W at 12 MPH (280 deg) Potential Tropical Cyclone Six [-19.0, 12.9]
2 08/30/2018 17:00 12.9 -19.4 30 1007 W at 9 MPH (280 deg) Potential Tropical Cyclone Six [-19.4, 12.9]
# Changing the coordinates into GeoPoint
florence['coordinates'] = florence['coordinates'].apply(Point)

# Printing
florence.head(3)
Date Lat Long Wind Pres Movement Type Name coordinates
0 08/30/2018 11:00 12.9 -18.4 30 1007 W at 12 MPH (280 deg) Potential Tropical Cyclone Six POINT (-18.4 12.9)
1 08/30/2018 14:00 12.9 -19.0 30 1007 W at 12 MPH (280 deg) Potential Tropical Cyclone Six POINT (-19 12.9)
2 08/30/2018 17:00 12.9 -19.4 30 1007 W at 9 MPH (280 deg) Potential Tropical Cyclone Six POINT (-19.4 12.9)

Checking the type of the florence and column coordinates: it is a DataFrame and a Series.

# Printing
type(florence)
1
pandas.core.frame.DataFrame
# Printing
type(florence['coordinates'])
1
pandas.core.series.Series

Convert the DataFrame into a GeoDataFrame and check the types once again: it is a GeoDataFrame and GeoSeries.

# Converting
florence = geopandas.GeoDataFrame(florence, 
                                  geometry='coordinates')

# Printing
florence.head(3)
Date Lat Long Wind Pres Movement Type Name coordinates
0 08/30/2018 11:00 12.9 -18.4 30 1007 W at 12 MPH (280 deg) Potential Tropical Cyclone Six POINT (-18.4 12.9)
1 08/30/2018 14:00 12.9 -19.0 30 1007 W at 12 MPH (280 deg) Potential Tropical Cyclone Six POINT (-19 12.9)
2 08/30/2018 17:00 12.9 -19.4 30 1007 W at 9 MPH (280 deg) Potential Tropical Cyclone Six POINT (-19.4 12.9)
# Printing
type(florence)
1
geopandas.geodataframe.GeoDataFrame
# Printing
type(florence['coordinates'])
1
geopandas.geoseries.GeoSeries

Notice that even though it’s now a GeoDataFrame and a GeoSeries, we can still filter the rows.

# Filtering by the Name column
# (a hurricane is of category 6 and lower values are tropical storms)
florence[florence['Name']=='Six']

# Agregating by the Name column
florence.groupby('Name').Type.count()
1
2
3
4
5
6
Name
FLORENCE     6
Florence    85
SIX          4
Six         10
Name: Type, dtype: int64

Let’s find the average wind speed.

# Printing
print("The average wind speed of Hurricane Florence is {} mph, and it can go up to {} mph maximum".\
      format(round(florence.Wind.mean(),4),
             florence.Wind.max()))
1
The average wind speed of Hurricane Florence is 74.4286 mph, and it can go up to 140 mph maximum

So the average wind speed of hurricane Florence is 74.43 miles per hour (119.78 km per hour) and the maximum is 140 miles per hour (225.308 km per hour).
To imagine how scary this wind speed is, the Beaufort Wind Scale, developed by U.K Royal Navy, shows the appearance of wind effects on the water and on land.
With the speed of 48 to 55 miles per hours, it can already break and uproot trees, and cause “considerable structural damage”.

Visualization

A GeoDataFrame also has a plot method.

# Plotting
florence.plot(figsize=(20,10));

Because this GeoDataFrame only have coordinates information (location) of some points in time, we can only plot the position on a blank map.
Let’s plot the hurricane position on a US map to see where Florence transited and how strong the wind was at that time.
We use a US States map (data we loaded at the beginning) as the background and we overlay Florence’s positions on top.

# Plotting the US map
fig, ax = plt.subplots(1, figsize=(30,20))
base = country[country['NAME'].isin(['Alaska','Hawaii']) == False].plot(ax=ax,
                                                                        color='#3B3C6E')

# Plotting the hurricane position
florence.plot(ax=base,
              color='darkred',
              marker="*",
              markersize=20);

# Improving
fig, ax = plt.subplots(1, figsize=(20,20))
base = country[country['NAME'].isin(['Alaska','Hawaii']) == False].plot(ax=ax, color='#3B3C6E')

florence.plot(ax=base,
              column='Wind',
              marker="<",
              markersize=30,
              cmap='Oranges',
              label="Wind speed(mph)")

_ = ax.axis('off')
plt.legend(labels=["Wind Speed (mph)\n\ndarker = stronger"],
           fontsize=20,
           facecolor='orangered',
           markerscale=2,
           loc='best',
           scatterpoints=5,
           fancybox=True,
           edgecolor='#3B3C6E')
ax.set_title("Hurricane Florence",
             fontsize=30)

# Saving
plt.savefig('Hurricane_footage.png',bbox_inches='tight');

So the hurricane was strongest when it is offshore, near the east coast.
As it approached the land, the hurricane started losing its strength, but with wind speeds ranging 60 to 77 miles per hour can still make horrible damages.

For more, consult Python Geospatial Development Essentials.