Four-step Transportation Model

This is an update of a blog post I wrote three years ago. I have since created classes to simplify the process of creating a four-step model. The two classes we will use are Zones and OriginDestinationDataFrame.

[1]:
import ipywidgets as widgets
from IPython.display import display
import matplotlib.pyplot as plt
import us

from dpd.modeling import Zones, OriginDestinationDataFrame
/usr/local/lib/python3.9/site-packages/geopandas/_compat.py:111: UserWarning: The Shapely GEOS version (3.9.1-CAPI-1.14.2) is incompatible with the GEOS version PyGEOS was compiled with (3.9.0-CAPI-1.16.2). Conversions between both will be slow.
  warnings.warn(
WARNING:param.main: pandas could not register all extension types imports failed with the following error: cannot import name 'ABCIndexClass' from 'pandas.core.dtypes.generic' (/usr/local/lib/python3.9/site-packages/pandas/core/dtypes/generic.py)
[2]:
YEAR = "2017"

state = widgets.Select(
    options=list(map(lambda x: x.name, us.STATES)), description="State", rows=10
)
display(state)

The Zones class defines our analysis zones for the four step model. It is based on a Geopandas GeoDataFrame so it includes a geometry column that maps each zone. The other important columns are Production and Attraction which include the production and attraction data for each zone and the index which is used by the OriginDestinationDataFrame. Zones can be automatically created from US Census data with the from_uscensus method. Otherwise you need to create a GeoDataFrame with the aforementioned columns.

[3]:
zones = Zones.from_uscensus(str(us.states.lookup(state.value).fips), YEAR)
/usr/local/lib/python3.9/site-packages/dpd/uscensus/get_uscensus_data.py:31: FutureWarning: Could not cast to int64, falling back to object. This behavior is deprecated. In a future version, when a dtype is passed to 'DataFrame', either all columns will be cast to that dtype, or a TypeError will be raised.
  dataframe = pandas.DataFrame(
[4]:
zones.head(1)
[4]:
Name Production Attraction state county tract STATEFP COUNTYFP TRACTCE NAME_y ... FUNCSTAT ALAND AWATER INTPTLAT INTPTLON geometry ProductionAttractionSum Production Density Attraction Density ProductionAttractionSum Density
GEOID
35049001002 Census Tract 10.02, Santa Fe County, New Mexico 3612 0 35 049 001002 35 049 001002 10.02 ... S 2086269 0 +35.6654818 -105.9636628 POLYGON ((-105.97726 35.66269, -105.97708 35.6... 3612 1731.320362 0.0 1731.320362

1 rows × 22 columns

One of nice things about this being a GeoDataFrame is that you can easily plot it. In this case we can plot the “Production density” which is really just the population density. We can see each zone (big in rural areas, small in urban areas) and we can see the urban areas have higher densities. This is step one of our four-step model.

[5]:
zones["Production density"] = zones["Production"].map(int) / zones["ALAND"]
fig = plt.figure(figsize=(18, 16))
ax = fig.add_subplot(111)
zones.plot(column="Production density", cmap="RdYlGn", ax=ax)
[5]:
<AxesSubplot:>
../_images/notebooks_four_step_transportation_model_8_1.png

Next we need to create a OriginDestinationDataFrame. We could compute this directly from our Zones object above, but we can also create this by downloading the US Census LODES data using the from_lodes method. This data is already organized as an origin-destination matrix. This is step two of the four-step model.

[6]:
od = TripDataFrame.from_lodes(us.states.lookup(state.value).abbr.lower(), YEAR)
/usr/local/lib/python3.9/site-packages/dpd/modeling/origin_destination_dataframe.py:41: DtypeWarning: Columns (28,29,31,37,38) have mixed types. Specify dtype option on import or set low_memory=False.
  xwalk = pandas.read_csv(download_lodes_xwalk(st))
[7]:
od
[7]:
w_geocode h_geocode S000 SA01 SA02 SA03 SE01 SE02 SE03 SI01 SI02 SI03 createdate
trct_w trct_h
35001000107 35001000107 6300180019282126 6300180019290148 22 0 9 13 7 8 7 2 0 20 363619980
35001000108 1050030003214025 1050030003244008 3 1 1 1 0 3 0 0 0 3 60603330
35001000109 350010001071008 350010001091010 1 0 1 0 1 0 0 0 0 1 20201110
35001000111 350010001071008 350010001112007 1 1 0 0 0 0 1 0 0 1 20201110
35001000112 1750050005356019 1750050005607029 5 0 0 5 3 0 2 1 0 4 101005550
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
35061971400 35061970902 7012394280036520 7012394180421028 22 12 4 6 5 11 6 8 8 6 404022200
35061971000 17180365986092544 17180365790119452 62 15 38 9 4 40 18 13 38 11 989854390
35061971100 4207436568023410 4207436532013085 13 3 8 2 1 6 6 2 8 3 242413320
35061971300 3506197140017249 3506197130011257 11 3 7 1 2 6 3 1 2 8 202011100
35061971400 16128506844079148 16128506844079080 66 16 35 15 14 27 25 13 23 30 929251060

88273 rows × 13 columns

Notice we use a multi-index instead of an index and column. This is because most zones do not have trips to other zones. We could unstack this dataframe to create a dataframe with the index as the origin and the columns as the destination. However, this creates many relationships (159,845 or 65% in this case) with no trips (NaN).

[8]:
od["S000"].unstack()
[8]:
trct_h 35001000107 35001000108 35001000109 35001000110 35001000111 35001000112 35001000113 35001000114 35001000115 35001000116 ... 35061970404 35061970405 35061970700 35061970800 35061970901 35061970902 35061971000 35061971100 35061971300 35061971400
trct_w
35001000107 22.0 3.0 1.0 NaN 1.0 5.0 2.0 2.0 1.0 1.0 ... NaN NaN 2.0 NaN NaN NaN NaN NaN NaN NaN
35001000108 8.0 16.0 6.0 2.0 13.0 3.0 6.0 3.0 8.0 1.0 ... NaN NaN NaN NaN 3.0 NaN NaN NaN 1.0 2.0
35001000109 5.0 10.0 31.0 16.0 8.0 8.0 16.0 9.0 14.0 4.0 ... 4.0 1.0 1.0 NaN 2.0 1.0 3.0 2.0 1.0 NaN
35001000110 7.0 8.0 5.0 20.0 15.0 8.0 7.0 4.0 10.0 3.0 ... NaN NaN 1.0 NaN NaN NaN 1.0 NaN NaN 4.0
35001000111 4.0 5.0 1.0 9.0 18.0 3.0 6.0 5.0 1.0 2.0 ... 2.0 NaN 1.0 2.0 NaN 4.0 1.0 1.0 NaN 2.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
35061970902 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 3.0 3.0 4.0 4.0 5.0 5.0 2.0 3.0 2.0 1.0
35061971000 NaN NaN NaN 1.0 NaN 1.0 NaN 1.0 NaN NaN ... 19.0 6.0 30.0 48.0 22.0 5.0 56.0 9.0 12.0 9.0
35061971100 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN 4.0 11.0 10.0 3.0 7.0 8.0 3.0 NaN
35061971300 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 1.0 NaN 14.0 10.0 23.0 8.0 9.0 8.0 13.0 1.0
35061971400 2.0 1.0 2.0 5.0 1.0 1.0 3.0 2.0 2.0 2.0 ... 115.0 67.0 59.0 49.0 54.0 22.0 62.0 13.0 11.0 66.0

496 rows × 499 columns

We are going to skip step three for now. That step will probably get its own notebook. In step four, we assign traffic to our transportation network. For this we need to create a network. We start by building a simple network based on our zones by reducing each zone to it’s centroid. We then connect the neighboring centroids.

[9]:
zones.build_graph()
/usr/local/lib/python3.9/site-packages/geopandas/geodataframe.py:199: UserWarning: Pandas doesn't allow columns to be created via a new attribute name - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access
  super().__setattr__(attr, val)
[9]:
<networkx.classes.graph.Graph at 0x130914ee0>

We can then assign our OriginDestinationDataFrame traffic to our Zones transportation network.

[10]:
od.route_assignment(zones)
[10]:
<networkx.classes.graph.Graph at 0x130914ee0>

And we can plot the final result. This looks about right at first glance. Please take a look at the underlying methods to see what is going on.

[11]:
fig = plt.figure(1, figsize=(18, 16), dpi=90)
ax = fig.add_subplot(111)
zones.visualize_route_assignment(ax=ax)
../_images/notebooks_four_step_transportation_model_19_0.png
[12]:
zones.head()
[12]:
Name Production Attraction state county tract STATEFP COUNTYFP TRACTCE NAME_y ... INTPTLAT INTPTLON geometry ProductionAttractionSum Production Density Attraction Density ProductionAttractionSum Density Production density aea_centroid centroid
GEOID
35049001002 Census Tract 10.02, Santa Fe County, New Mexico 3612 0 35 049 001002 35 049 001002 10.02 ... +35.6654818 -105.9636628 POLYGON ((-105.97726 35.66269, -105.97708 35.6... 3612 1731.320362 0.0 1731.320362 1.731320e-03 POINT (35.665 -105.964) POINT (35.665 -105.964)
35049010400 Census Tract 104, Santa Fe County, New Mexico 3423 0 35 049 010400 35 049 010400 104 ... +35.6547921 -105.8796126 POLYGON ((-105.93988 35.65773, -105.93959 35.6... 3423 81.175154 0.0 81.175154 8.117515e-05 POINT (35.654 -105.892) POINT (35.654 -105.892)
35049001103 Census Tract 11.03, Santa Fe County, New Mexico 1731 0 35 049 001103 35 049 001103 11.03 ... +35.6560820 -105.9712060 POLYGON ((-105.98489 35.65625, -105.98488 35.6... 1731 777.802741 0.0 777.802741 7.778027e-04 POINT (35.656 -105.971) POINT (35.656 -105.971)
35049000600 Census Tract 6, Santa Fe County, New Mexico 2030 0 35 049 000600 35 049 000600 6 ... +35.6758514 -105.9446226 POLYGON ((-105.95818 35.67497, -105.95643 35.6... 2030 1244.532182 0.0 1244.532182 1.244532e-03 POINT (35.676 -105.945) POINT (35.676 -105.945)
35059950200 Census Tract 9502, Union County, New Mexico 4216 0 35 059 950200 35 059 950200 9502 ... +36.4880853 -103.4757229 POLYGON ((-104.00934 36.69766, -104.00823 36.6... 4216 0.425692 0.0 0.425692 4.256922e-07 POINT (36.482 -103.471) POINT (36.482 -103.471)

5 rows × 25 columns

[ ]: