{ "cells": [ { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "# Density and public transportation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sufficient population and job density are important for a public transportation system. Therefore plotting densities and public transportation systems on the same map can reveal opportuntities to build more public transportation lines (where density is high, but there are no existing lines) and to build more housing/jobs (where public transportation lines exist, but densities are low).\n", "\n", "Let's create a map of population density. In the United States, we can get this geo-tagged population data from the United States Census Bureau. dpd.modeling has a class Zones to store this data and a method to automatically pull this data from the United States Census Bureau." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, we'll define the state that we are interested in mapping. States seems to be the best level to get data for. If we get data by county, there are lots of requests so it takes too long. However, data for the whole country is unnecessary." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import us\n", "from ipywidgets import Select\n", "from IPython.display import display\n", "\n", "YEAR = \"2017\"\n", "\n", "state = Select(\n", " options=list(map(lambda x: x.name, us.STATES)),\n", " description=\"State\",\n", " value=\"California\",\n", ")\n", "display(state)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can get the census data for California. The government assigns a number to each state so California is 06. B01003_001E is the population in each census tract. This is renamed to Total Population. Also, ALAND is the land area of the zone. We use these two values to compute a density as population per area. Also, we simplify the geometry column to speed up any geometric computations." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from dpd.modeling import Zones\n", "\n", "zones = Zones.from_uscensus(str(us.states.lookup(state.value).fips), YEAR)\n", "\n", "zones[\"geometry\"] = zones[\"geometry\"].apply(lambda x: x.simplify(0.001))\n", "\n", "zones.head(1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we can plot the population density of California." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "zones.explore(\n", " column=\"Total Population + Worker Population Density\", scheme=\"JenksCaspall\", k=20\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For the rest of the notebook, we will focus on Los Angeles County. We can query OpenStreetMap for the geometric data that represents Los Angeles County and then combine it with our Zones to reduce the amount of data to process" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from shapely.geometry import Polygon\n", "from shapely.ops import linemerge\n", "\n", "from dpd.osm import OSM\n", "\n", "relation = 396479 # Los Angeles County\n", "osm = OSM()\n", "osm.download_relation(relation)\n", "ways = [\n", " osm.ways[member[\"ref\"]].geo\n", " for member in osm.relations[relation][\"members\"]\n", " if member[\"type\"] == \"way\"\n", "]\n", "ways\n", "\n", "ways_merged = linemerge(ways)\n", "longest_length = 0\n", "for way in ways_merged.geoms:\n", " if way.length > longest_length:\n", " longest_length = way.length\n", " longest_way = way\n", "los_angeles_county = Polygon(longest_way)\n", "\n", "los_angeles_county" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "zones.geometry.within(los_angeles_county).value_counts()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "zones = zones[zones.geometry.within(los_angeles_county)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can use h3fy to recompute our population density into hexigons. This normalizes the geometries so they can be used for calculations later on." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from tobler.util import h3fy\n", "from tobler.area_weighted import area_interpolate\n", "\n", "zones.to_crs(epsg=4087, inplace=True)\n", "h3_zones = h3fy(zones, buffer=True)\n", "\n", "interpolated = area_interpolate(\n", " source_df=zones,\n", " target_df=h3_zones,\n", " intensive_variables=[\"Total Population\", \"Worker Population\", \"ALAND\"],\n", ")\n", "zones_interpolated = Zones(interpolated)\n", "zones_interpolated[\"Total Population Density\"] = (\n", " zones_interpolated[\"Total Population\"] / zones_interpolated[\"ALAND\"]\n", ")\n", "zones_interpolated[\"Worker Population Density\"] = (\n", " zones_interpolated[\"Worker Population\"] / zones_interpolated[\"ALAND\"]\n", ")\n", "zones_interpolated[\"Total Population + Worker Population\"] = (\n", " zones_interpolated[\"Total Population\"] + zones_interpolated[\"Worker Population\"]\n", ")\n", "zones_interpolated[\"Total Population + Worker Population Density\"] = (\n", " zones_interpolated[\"Total Population + Worker Population\"]\n", " / zones_interpolated[\"ALAND\"]\n", ")\n", "zones_interpolated.head(1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And we can plot the hexigons." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "zones_interpolated.explore(\n", " column=\"Total Population + Worker Population Density\", scheme=\"JenksCaspall\", k=20\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These hexigons are pretty big. Let's recompute them to be smaller and plot with lonboard. Lonboard is great for plotting lots of data quickly." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy\n", "from lonboard import Map, SolidPolygonLayer\n", "from mapclassify import JenksCaspall\n", "from palettable.matplotlib import Viridis_20\n", "\n", "h3_zones = h3fy(zones, resolution=8, buffer=True)\n", "\n", "interpolated = area_interpolate(\n", " source_df=zones,\n", " target_df=h3_zones,\n", " intensive_variables=[\"Total Population\", \"Worker Population\", \"ALAND\"],\n", ")\n", "zones_interpolated = Zones(interpolated)\n", "zones_interpolated[\"Total Population Density\"] = (\n", " zones_interpolated[\"Total Population\"] / zones_interpolated[\"ALAND\"]\n", ")\n", "zones_interpolated[\"Worker Population Density\"] = (\n", " zones_interpolated[\"Worker Population\"] / zones_interpolated[\"ALAND\"]\n", ")\n", "zones_interpolated[\"Total Population + Worker Population\"] = (\n", " zones_interpolated[\"Total Population\"] + zones_interpolated[\"Worker Population\"]\n", ")\n", "zones_interpolated[\"Total Population + Worker Population Density\"] = (\n", " zones_interpolated[\"Total Population + Worker Population\"]\n", " / zones_interpolated[\"ALAND\"]\n", ")\n", "\n", "layer = SolidPolygonLayer.from_geopandas(zones_interpolated, opacity=0.1)\n", "jc = JenksCaspall(\n", " zones_interpolated[\"Total Population + Worker Population Density\"].fillna(0), k=10\n", ")\n", "colors = []\n", "for color in jc.yb:\n", " colors.append(Viridis_20.colors[color])\n", "layer.get_fill_color = numpy.uint8(colors)\n", "\n", "Map(layers=[layer])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, it would be helpful to do the same exercise with job densiites. However, the US Census Bureau does not provide this information (\"Worker Population\": \"B08604_001E\") at the tract level like they do for population. This leaves us with two options which both require large downloads.\n", "\n", "1. We can download zip code worker populations.\n", "2. Or we can download LODES data which includes origin-destination information.\n", "\n", "We'll take a look at option 2 below. The LODES data is divided into three files: residential data, work data, and origin-destination data. There is also a cross-walk file that includes a translation from LODES GEOIDs to census tracts." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from dpd.modeling import TripDataFrame\n", "\n", "\"\"\"\n", "od = TripDataFrame.from_lodes(us.states.lookup(state.value).abbr.lower(), YEAR)\n", "\n", "od.head()\"\"\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can then combine the LODES data with our original output DataFrame (which includes the geometry) to add a job_density column." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\"\"\"zones = Zones(zones).production_attraction_from_lodes(od)\n", "h3_zones = h3fy(zones, resolution=9, buffer=True)\n", "\n", "interpolated = area_interpolate(\n", " source_df=zones,\n", " target_df=h3_zones,\n", " intensive_variables=[\"Total Population\", \"Worker Population\", \"ALAND\"],\n", ")\n", "zones_interpolated = Zones(interpolated)\n", "zones_interpolated[\"Total Population Density\"] = (\n", " zones_interpolated[\"Total Population\"] / zones_interpolated[\"ALAND\"]\n", ")\n", "zones_interpolated[\"Worker Population Density\"] = (\n", " zones_interpolated[\"Worker Population\"] / zones_interpolated[\"ALAND\"]\n", ")\n", "zones_interpolated[\"Total Population + Worker Population\"] = (\n", " zones_interpolated[\"Total Population\"] + zones_interpolated[\"Worker Population\"]\n", ")\n", "zones_interpolated[\"Total Population + Worker Population Density\"] = (\n", " zones_interpolated[\"Total Population + Worker Population\"]\n", " / zones_interpolated[\"ALAND\"]\n", ")\n", "\n", "zones_interpolated.head(1)\"\"\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And we can plot the job density like we ploted the population density above." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Los Angeles County Metro is planning to extend the C Line south towards Torrance. One of the design decisions is where to put a station: at the South Bay Galleria or Redondo Beach Transit Center? Let's look at the population density around these two locations. We can buffer the stop locations with a radius of 1000km and plot along with the density." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from folium import Map\n", "from geopandas import GeoSeries\n", "from shapely.geometry import Point\n", "\n", "station_locations = GeoSeries(\n", " data=[\n", " Point(-118.35884733230867, 33.86557221753289),\n", " Point(-118.35253437469025, 33.87283256915632),\n", " ],\n", " index=[\"South Bay Galleria\", \"Redondo Beach Transit Center\"],\n", " crs=\"EPSG:4326\",\n", ").to_crs(epsg=4087)\n", "\n", "folium_map = Map(location=(33.87, -118.35), zoom_start=14)\n", "station_locations.buffer(1000).explore(m=folium_map)\n", "zones.explore(\n", " column=\"Total Population + Worker Population Density\",\n", " scheme=\"JenksCaspall\",\n", " k=20,\n", " m=folium_map,\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from dpd.modeling import contour_dataframe\n", "\n", "folium_map = Map(location=(33.87, -118.35), zoom_start=14)\n", "zones.explore(\n", " column=\"Total Population + Worker Population Density\",\n", " scheme=\"JenksCaspall\",\n", " k=20,\n", " m=folium_map,\n", ")\n", "for station_location in station_locations:\n", " contour_dataframe(station_location, crs=station_locations.crs).explore(m=folium_map)\n", "folium_map" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from dpd.modeling import DistanceDataFrame\n", "\n", "distance_dataframe = DistanceDataFrame.from_origins_destinations(\n", " zones_interpolated.geometry.centroid,\n", " station_locations.geometry,\n", " method=\"distance\",\n", ")\n", "distance_dataframe.columns = station_locations.index\n", "distance_dataframe" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from matplotlib import pyplot as plt\n", "\n", "fig = plt.figure(figsize=(18, 16))\n", "ax = fig.add_subplot(111)\n", "(distance_dataframe / 1.35).hist(\n", " weights=zones_interpolated[\"Total Population + Worker Population\"],\n", " range=(0, 900),\n", " bins=30,\n", " cumulative=True,\n", " sharey=True,\n", " ax=ax,\n", ")\n", "ax.set_ylabel(\"Population (cumulative)\")\n", "ax.set_xlabel(\"Time (seconds)\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we download our public transportation systems so we can plot the lines. Here we can download the Los Angeles Metro Rail network from OpenStreetMap." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "from dpd.driving.network import Network\n", "\n", "query = \"\"\"\n", "[out:json][timeout:25];\n", "(\n", " relation[\"network\"=\"Metro Rail\"];\n", "\n", ");\n", "out body;\n", ">;\n", "out skel qt;\n", "\"\"\"\n", "\n", "network = Network.from_osm_query(query)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And we can plot the network on the original population density map." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from lonboard import Map, ScatterplotLayer\n", "\n", "layer = SolidPolygonLayer.from_geopandas(zones_interpolated, opacity=0.1)\n", "jc = JenksCaspall(\n", " zones_interpolated[\"Total Population + Worker Population Density\"], k=10\n", ")\n", "colors = []\n", "for color in jc.yb:\n", " colors.append(Viridis_20.colors[color])\n", "layer.get_fill_color = numpy.uint8(colors)\n", "layers = [layer]\n", "for route in network.routes:\n", " layers.append(\n", " ScatterplotLayer.from_geopandas(\n", " network.routes[route].stops, radius_min_pixels=10\n", " )\n", " )\n", "\n", "Map(layers=layers)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import folium\n", "\n", "folium_map = folium.Map(location=(34, -118.3), zoom_start=11)\n", "zones.explore(\n", " m=folium_map,\n", " column=\"Total Population + Worker Population Density\",\n", " scheme=\"JenksCaspall\",\n", " k=20,\n", ")\n", "for route in network.routes:\n", " network.routes[route].geometry.explore(m=folium_map)\n", "\n", "folium_map" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's look at accessibility. We will download one metro line and compute the distance to each stop from each zone." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from astropy import units\n", "from dpd.driving import Route\n", "\n", "route = Route.from_osm_relation(relation=2351006)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from dpd.modeling import DistanceDataFrame, TripDataFrame\n", "\n", "### Here is the \"old way\" of computing distances before we used hexogons\n", "# points = zones.polygons_to_points()\n", "# distance_dataframe = DistanceDataFrame.from_origins_destinations(\n", "# points.geometry, stops.geometry, method=\"distance\"\n", "# )\n", "\n", "stops = route.stops.to_crs(\"EPSG:4087\")\n", "distance_dataframe = DistanceDataFrame.from_origins_destinations(\n", " zones_interpolated.geometry.centroid,\n", " stops.geometry,\n", " method=\"distance\",\n", ")\n", "distance_dataframe.columns = stops.name\n", "distance_dataframe" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Distance DataFrame includes the distance to each stop. However, most people will use the closest stop. In this case, the plots below will be over-counting people who are within 15 minutes of multiple stops. The following code removes the distance to all but the closest stop. However, due to the sampling method, some stops may not be the closest stop to anyone (simply because our sampling resolution is low)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\"\"\"\n", "from numpy import inf\n", "\n", "for index, row in distance_dataframe.iterrows():\n", " row_min = row.min()\n", " distance_dataframe.loc[index] = (row == row_min).replace(True, row_min).replace(False, inf)\n", "\n", "distance_dataframe.head()\n", "\"\"\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And here we can plot the number of people within 5, 10, and 15 minutes of each metro stop." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from pandas import DataFrame\n", "\n", "times = [5, 10, 15]\n", "data = []\n", "for time in times:\n", " distance_to_time = lambda x: x / 1.35 < time * 60\n", " tdf = TripDataFrame(\n", " (distance_to_time(distance_dataframe) * 1).to_numpy()\n", " * zones_interpolated[\"Total Population + Worker Population\"].to_numpy()[\n", " :, None\n", " ],\n", " index=zones_interpolated.index,\n", " columns=stops.name,\n", " ).astype(int)\n", " data.append(tdf.sum())\n", "\n", "DataFrame(data=data, index=times, columns=distance_dataframe.columns).T.plot(kind=\"bar\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And here we can plot where people are within 15 minutes of a metro stop. This could be done much easier if we simply applied .buffer to each stop geometry." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And here we will plot the number of potential riders within 15 minutes of each stop." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from matplotlib import pyplot as plt\n", "\n", "fig = plt.figure(figsize=(18, 16))\n", "ax = fig.add_subplot(111)\n", "(distance_dataframe / 1.35).hist(\n", " weights=zones_interpolated[\"Total Population + Worker Population\"],\n", " range=(0, 900),\n", " bins=30,\n", " cumulative=True,\n", " sharey=True,\n", " ax=ax,\n", ")\n", "ax.set_ylabel(\"Population (cumulative)\")\n", "ax.set_xlabel(\"Time (seconds)\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from mapclassify import Quantiles\n", "\n", "zones_interpolated[\"Accessibility w/in 15 minutes\"] = tdf.T.sum()\n", "layer = SolidPolygonLayer.from_geopandas(zones_interpolated, opacity=0.1)\n", "jc = Quantiles(zones_interpolated[\"Accessibility w/in 15 minutes\"], k=10)\n", "colors = []\n", "for color in jc.yb:\n", " colors.append(Viridis_20.colors[color])\n", "layer.get_fill_color = numpy.uint8(colors)\n", "layers = [layer]\n", "\n", "Map(layers=layers)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Last, here are the Voronoi Polygons for the network. These show which is the closest stop for each polygon." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from shapely import voronoi_polygons, MultiPoint\n", "\n", "stops = {}\n", "for route in network.routes:\n", " for index, stop in network.routes[route].stops.iterrows():\n", " stops[stop[\"name\"]] = stop[\"geometry\"]\n", "\n", "v = voronoi_polygons(MultiPoint(list(stops.values())), extend_to=los_angeles_county)\n", "\n", "folium_map = folium.Map(location=(34, -118.3), zoom_start=11)\n", "vs = GeoSeries(v.geoms, crs=\"EPSG:4326\")\n", "vs.intersection(los_angeles_county).explore(m=folium_map)\n", "for route in network.routes:\n", " network.routes[route].stops.explore(m=folium_map)\n", "folium_map" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.5" } }, "nbformat": 4, "nbformat_minor": 4 }