Four-step Transportation Model¶

This is an update of a blog post I wrote three years ago. I have since created classes to simplify the process of creating a four-step model. The two classes we will use are Zones and OriginDestinationDataFrame.

[1]:

import ipywidgets as widgets
from IPython.display import display
import matplotlib.pyplot as plt
import us

from dpd.modeling import Zones, OriginDestinationDataFrame

/usr/local/lib/python3.9/site-packages/geopandas/_compat.py:111: UserWarning: The Shapely GEOS version (3.9.1-CAPI-1.14.2) is incompatible with the GEOS version PyGEOS was compiled with (3.9.0-CAPI-1.16.2). Conversions between both will be slow.
  warnings.warn(
WARNING:param.main: pandas could not register all extension types imports failed with the following error: cannot import name 'ABCIndexClass' from 'pandas.core.dtypes.generic' (/usr/local/lib/python3.9/site-packages/pandas/core/dtypes/generic.py)

[2]:

YEAR = "2017"

state = widgets.Select(
    options=list(map(lambda x: x.name, us.STATES)), description="State", rows=10
)
display(state)

The Zones class defines our analysis zones for the four step model. It is based on a Geopandas GeoDataFrame so it includes a geometry column that maps each zone. The other important columns are Production and Attraction which include the production and attraction data for each zone and the index which is used by the OriginDestinationDataFrame. Zones can be automatically created from US Census data with the from_uscensus method. Otherwise you need to create a GeoDataFrame with the aforementioned columns.

[3]:

zones = Zones.from_uscensus(str(us.states.lookup(state.value).fips), YEAR)

/usr/local/lib/python3.9/site-packages/dpd/uscensus/get_uscensus_data.py:31: FutureWarning: Could not cast to int64, falling back to object. This behavior is deprecated. In a future version, when a dtype is passed to 'DataFrame', either all columns will be cast to that dtype, or a TypeError will be raised.
  dataframe = pandas.DataFrame(

[4]:

zones.head(1)

[4]:

	Name	Production	Attraction	state	county	tract	STATEFP	COUNTYFP	TRACTCE	NAME_y	...	FUNCSTAT	ALAND	AWATER	INTPTLAT	INTPTLON	geometry	ProductionAttractionSum	Production Density	Attraction Density	ProductionAttractionSum Density
GEOID
35049001002	Census Tract 10.02, Santa Fe County, New Mexico	3612	0	35	049	001002	35	049	001002	10.02	...	S	2086269	0	+35.6654818	-105.9636628	POLYGON ((-105.97726 35.66269, -105.97708 35.6...	3612	1731.320362	0.0	1731.320362

1 rows × 22 columns

One of nice things about this being a GeoDataFrame is that you can easily plot it. In this case we can plot the “Production density” which is really just the population density. We can see each zone (big in rural areas, small in urban areas) and we can see the urban areas have higher densities. This is step one of our four-step model.

[5]:

zones["Production density"] = zones["Production"].map(int) / zones["ALAND"]
fig = plt.figure(figsize=(18, 16))
ax = fig.add_subplot(111)
zones.plot(column="Production density", cmap="RdYlGn", ax=ax)

[5]:

<AxesSubplot:>

../_images/notebooks_four_step_transportation_model_8_1.png

Next we need to create a OriginDestinationDataFrame. We could compute this directly from our Zones object above, but we can also create this by downloading the US Census LODES data using the from_lodes method. This data is already organized as an origin-destination matrix. This is step two of the four-step model.

[6]:

od = TripDataFrame.from_lodes(us.states.lookup(state.value).abbr.lower(), YEAR)

/usr/local/lib/python3.9/site-packages/dpd/modeling/origin_destination_dataframe.py:41: DtypeWarning: Columns (28,29,31,37,38) have mixed types. Specify dtype option on import or set low_memory=False.
  xwalk = pandas.read_csv(download_lodes_xwalk(st))

[7]:

od

[7]:

		w_geocode	h_geocode	S000	SA01	SA02	SA03	SE01	SE02	SE03	SI01	SI02	SI03	createdate
trct_w	trct_h
35001000107	35001000107	6300180019282126	6300180019290148	22	0	9	13	7	8	7	2	0	20	363619980
	35001000108	1050030003214025	1050030003244008	3	1	1	1	0	3	0	0	0	3	60603330
	35001000109	350010001071008	350010001091010	1	0	1	0	1	0	0	0	0	1	20201110
	35001000111	350010001071008	350010001112007	1	1	0	0	0	0	1	0	0	1	20201110
	35001000112	1750050005356019	1750050005607029	5	0	0	5	3	0	2	1	0	4	101005550
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
35061971400	35061970902	7012394280036520	7012394180421028	22	12	4	6	5	11	6	8	8	6	404022200
	35061971000	17180365986092544	17180365790119452	62	15	38	9	4	40	18	13	38	11	989854390
	35061971100	4207436568023410	4207436532013085	13	3	8	2	1	6	6	2	8	3	242413320
	35061971300	3506197140017249	3506197130011257	11	3	7	1	2	6	3	1	2	8	202011100
	35061971400	16128506844079148	16128506844079080	66	16	35	15	14	27	25	13	23	30	929251060

88273 rows × 13 columns

Notice we use a multi-index instead of an index and column. This is because most zones do not have trips to other zones. We could unstack this dataframe to create a dataframe with the index as the origin and the columns as the destination. However, this creates many relationships (159,845 or 65% in this case) with no trips (NaN).

[8]:

od["S000"].unstack()

[8]:

trct_h	35001000107	35001000108	35001000109	35001000110	35001000111	35001000112	35001000113	35001000114	35001000115	35001000116	...	35061970404	35061970405	35061970700	35061970800	35061970901	35061970902	35061971000	35061971100	35061971300	35061971400
trct_w
35001000107	22.0	3.0	1.0	NaN	1.0	5.0	2.0	2.0	1.0	1.0	...	NaN	NaN	2.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN
35001000108	8.0	16.0	6.0	2.0	13.0	3.0	6.0	3.0	8.0	1.0	...	NaN	NaN	NaN	NaN	3.0	NaN	NaN	NaN	1.0	2.0
35001000109	5.0	10.0	31.0	16.0	8.0	8.0	16.0	9.0	14.0	4.0	...	4.0	1.0	1.0	NaN	2.0	1.0	3.0	2.0	1.0	NaN
35001000110	7.0	8.0	5.0	20.0	15.0	8.0	7.0	4.0	10.0	3.0	...	NaN	NaN	1.0	NaN	NaN	NaN	1.0	NaN	NaN	4.0
35001000111	4.0	5.0	1.0	9.0	18.0	3.0	6.0	5.0	1.0	2.0	...	2.0	NaN	1.0	2.0	NaN	4.0	1.0	1.0	NaN	2.0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
35061970902	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	3.0	3.0	4.0	4.0	5.0	5.0	2.0	3.0	2.0	1.0
35061971000	NaN	NaN	NaN	1.0	NaN	1.0	NaN	1.0	NaN	NaN	...	19.0	6.0	30.0	48.0	22.0	5.0	56.0	9.0	12.0	9.0
35061971100	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	4.0	11.0	10.0	3.0	7.0	8.0	3.0	NaN
35061971300	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	1.0	NaN	14.0	10.0	23.0	8.0	9.0	8.0	13.0	1.0
35061971400	2.0	1.0	2.0	5.0	1.0	1.0	3.0	2.0	2.0	2.0	...	115.0	67.0	59.0	49.0	54.0	22.0	62.0	13.0	11.0	66.0

496 rows × 499 columns

We are going to skip step three for now. That step will probably get its own notebook. In step four, we assign traffic to our transportation network. For this we need to create a network. We start by building a simple network based on our zones by reducing each zone to it’s centroid. We then connect the neighboring centroids.

[9]:

zones.build_graph()

/usr/local/lib/python3.9/site-packages/geopandas/geodataframe.py:199: UserWarning: Pandas doesn't allow columns to be created via a new attribute name - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access
  super().__setattr__(attr, val)

[9]:

<networkx.classes.graph.Graph at 0x130914ee0>

We can then assign our OriginDestinationDataFrame traffic to our Zones transportation network.

[10]:

od.route_assignment(zones)

[10]:

<networkx.classes.graph.Graph at 0x130914ee0>

And we can plot the final result. This looks about right at first glance. Please take a look at the underlying methods to see what is going on.

[11]:

fig = plt.figure(1, figsize=(18, 16), dpi=90)
ax = fig.add_subplot(111)
zones.visualize_route_assignment(ax=ax)

../_images/notebooks_four_step_transportation_model_19_0.png

[12]:

zones.head()

[12]:

	Name	Production	Attraction	state	county	tract	STATEFP	COUNTYFP	TRACTCE	NAME_y	...	INTPTLAT	INTPTLON	geometry	ProductionAttractionSum	Production Density	Attraction Density	ProductionAttractionSum Density	Production density	aea_centroid	centroid
GEOID
35049001002	Census Tract 10.02, Santa Fe County, New Mexico	3612	0	35	049	001002	35	049	001002	10.02	...	+35.6654818	-105.9636628	POLYGON ((-105.97726 35.66269, -105.97708 35.6...	3612	1731.320362	0.0	1731.320362	1.731320e-03	POINT (35.665 -105.964)	POINT (35.665 -105.964)
35049010400	Census Tract 104, Santa Fe County, New Mexico	3423	0	35	049	010400	35	049	010400	104	...	+35.6547921	-105.8796126	POLYGON ((-105.93988 35.65773, -105.93959 35.6...	3423	81.175154	0.0	81.175154	8.117515e-05	POINT (35.654 -105.892)	POINT (35.654 -105.892)
35049001103	Census Tract 11.03, Santa Fe County, New Mexico	1731	0	35	049	001103	35	049	001103	11.03	...	+35.6560820	-105.9712060	POLYGON ((-105.98489 35.65625, -105.98488 35.6...	1731	777.802741	0.0	777.802741	7.778027e-04	POINT (35.656 -105.971)	POINT (35.656 -105.971)
35049000600	Census Tract 6, Santa Fe County, New Mexico	2030	0	35	049	000600	35	049	000600	6	...	+35.6758514	-105.9446226	POLYGON ((-105.95818 35.67497, -105.95643 35.6...	2030	1244.532182	0.0	1244.532182	1.244532e-03	POINT (35.676 -105.945)	POINT (35.676 -105.945)
35059950200	Census Tract 9502, Union County, New Mexico	4216	0	35	059	950200	35	059	950200	9502	...	+36.4880853	-103.4757229	POLYGON ((-104.00934 36.69766, -104.00823 36.6...	4216	0.425692	0.0	0.425692	4.256922e-07	POINT (36.482 -103.471)	POINT (36.482 -103.471)

5 rows × 25 columns

[ ]:

Four-step Transportation Model¶

dpd

Navigation

Related Topics