{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "rental-chinese",
   "metadata": {},
   "source": [
    "# Gravity Model\n",
    "\n",
    "Everyone wants to see the map first, so here it is. In this article we will automate some of the analysis done in Alon Levy's [Streaming High-Speed Rail Crayoning](https://pedestrianobservations.com/2021/03/20/streaming-high-speed-rail-crayoning/). Mostly we will focus on the Gravity Model which is explained below.\n",
    "\n",
    "Note: this notebook has been updated to use population-weighted centroids (instead of geometric centroids) for the Combined Statistical Areas (CSA) and a multi-index for the CSA DataFrame."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cardiovascular-seattle",
   "metadata": {},
   "source": [
    "![gravity_model.png](gravity_model.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "sonic-casino",
   "metadata": {},
   "source": [
    "First we will import the libraries necessary for gathering data, performing our analysis, and plotting the results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "exterior-psychiatry",
   "metadata": {},
   "outputs": [],
   "source": [
    "import contextily as ctx\n",
    "import folium\n",
    "import geopandas\n",
    "from numpy import inf\n",
    "import pandas\n",
    "from shapely.geometry import LineString\n",
    "import ssl\n",
    "\n",
    "ssl._create_default_https_context = ssl._create_stdlib_context\n",
    "\n",
    "from dpd.modeling import DistanceDataFrame, TripDataFrame\n",
    "\n",
    "pandas.set_option(\"display.max_columns\", None)\n",
    "pandas.set_option(\"display.max_rows\", None)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "parallel-assistant",
   "metadata": {},
   "source": [
    "Thanks to [Kevin Wilson](https://github.com/khwilson) for creating a geojson file of population-weighted Combined Statistical Areas (CSA) and their corresponding populations. We can easily load this file directly as a GeoPandas GeoDataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a924c059-f92d-44a8-9774-fe9be231f21b",
   "metadata": {},
   "outputs": [],
   "source": [
    "csa = geopandas.read_file(\n",
    "    \"https://gist.githubusercontent.com/khwilson/2df2f86bc444d020b59a9ebe61b7944b/raw/4611d5a54dc976a17a6429e1a1d5deb628861069/csa_population_weighted_centroids.geojson\"\n",
    ")\n",
    "csa.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "printable-tours",
   "metadata": {},
   "source": [
    "We can use Folium to plot the CSAs on OpenStreetMap. Some cities like San Diego and Tampa are missing because they are outside a CSA. Note that we now use population-weighted centroids for each CSA."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3231b890-b430-464c-9848-b6270f8ce1a1",
   "metadata": {},
   "outputs": [],
   "source": [
    "style_function = lambda x: {\"radius\": x[\"properties\"][\"population\"] / 1000000}\n",
    "csa.explore(style_kwds={\"style_function\": style_function})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "happy-consultancy",
   "metadata": {},
   "source": [
    "In the next step, we will implenent our Gravity Model and use it to predict the ridership between all CSAs. The Gravity Model for our analysis is\n",
    "\n",
    "$ Annual\\ Traffic\\ Volume = G \\times \\frac{(Origin\\ population)^{a}\\ \\times\\ (Destination\\ population)^{b}}{(Distance\\ between\\ origin\\ and\\ destination)^{d}} $\n",
    "\n",
    "In the case of high speed rail, G=75,000, a=b=0.8, and d=2. Also, if the distance is less than 500km, we set distance to 500km. (These numbers are all from the article [Metcalfe’s Law for High-Speed Rail](https://pedestrianobservations.com/2020/02/13/metcalfes-law-for-high-speed-rail/)). The results of this step is a table with the predicted ridership between CSAs. I've dropped rows/columns where the predicted ridership is below 1 million people/year."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f723e204-d666-40f2-9472-2fdac926b0cf",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "distance_dataframe = (\n",
    "    DistanceDataFrame.from_origins_destinations(\n",
    "        origins=csa.geometry, destinations=csa.geometry, method=\"haversine\"\n",
    "    )\n",
    "    / 1000\n",
    ")\n",
    "distance_dataframe = distance_dataframe.where(distance_dataframe != 0, inf)\n",
    "distance_dataframe = distance_dataframe.where(distance_dataframe >= 500, 500)\n",
    "\n",
    "trip_dataframe = TripDataFrame.from_gravity_model(\n",
    "    csa.population / 1000000,\n",
    "    csa.population / 1000000,\n",
    "    distance_dataframe,\n",
    "    G=75000,\n",
    "    a=0.8,\n",
    "    b=0.8,\n",
    "    d=2,\n",
    ")\n",
    "trip_dataframe.index = csa.csa_name\n",
    "trip_dataframe.columns = csa.csa_name\n",
    "series = trip_dataframe.stack().astype(int)\n",
    "series.name = \"Millions of Annual Passengers\"\n",
    "series = series[series > 0]\n",
    "pandas.DataFrame(series)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "higher-tradition",
   "metadata": {},
   "source": [
    "Now we can plot these city pairs on a map. We will limit this plot to pairs with ridership potential above 2 million people/year. Line width is based on ridership potential. This map looks very famaliar to Alon Levy's map. California, Texas, Florida, the Midwest, and the Northeast Corridor all have strong potential for high-speed rail. There is not a strong potential for cross-country high-speed rail. Note that this analysis ignores many factors such as geography, cost, cooperation between routes, and cultural/societal factors. It also ignores international routes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "suspended-airline",
   "metadata": {},
   "outputs": [],
   "source": [
    "series = series[series > 1]\n",
    "linestrings = []\n",
    "for origin, destination in series.index:\n",
    "    linestrings.append(\n",
    "        {\n",
    "            \"geometry\": LineString(\n",
    "                [\n",
    "                    csa[csa[\"csa_name\"] == origin][\"geometry\"].iloc[0],\n",
    "                    csa[csa[\"csa_name\"] == destination][\"geometry\"].iloc[0],\n",
    "                ]\n",
    "            ),\n",
    "            \"passengers\": series[origin][destination],\n",
    "        }\n",
    "    )\n",
    "routes = geopandas.GeoDataFrame(linestrings)\n",
    "routes.crs = csa.crs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "exposed-understanding",
   "metadata": {},
   "outputs": [],
   "source": [
    "style_function = lambda x: {\"weight\": x[\"properties\"][\"passengers\"]}\n",
    "m = routes.explore(style_kwds={\"style_function\": style_function})\n",
    "\n",
    "style_function = lambda x: {\"radius\": x[\"properties\"][\"population\"] / 1000000}\n",
    "csa.explore(m=m, style_kwds={\"style_function\": style_function})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "hispanic-miracle",
   "metadata": {},
   "source": [
    "Last we can also plot the map as a png."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "irish-healing",
   "metadata": {},
   "outputs": [],
   "source": [
    "routes.to_crs(epsg=3857, inplace=True)\n",
    "ax = routes.plot(figsize=(18, 10), color=\"black\")\n",
    "ctx.add_basemap(ax, zoom=6)\n",
    "ax.set_axis_off()\n",
    "fig = ax.get_figure()\n",
    "fig.savefig(\"gravity_model.png\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "reported-migration",
   "metadata": {},
   "source": [
    "Below is my previous code for getting the list of Combined Statistical Areas (CSA) with their 2019 populations from Wikipedia. I also computed the geometric centroid of each CSA based on data from the United States Census Bureau. I then combined these two tables to create a table with both population and geometry. This has been replaced by [Kevin Wilson](https://github.com/khwilson)'s file above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "breathing-senator",
   "metadata": {},
   "outputs": [],
   "source": [
    "from dpd.wikipedia import get_wikipedia_table\n",
    "from dpd.utils import download_file\n",
    "\n",
    "url = \"https://en.wikipedia.org/wiki/Combined_statistical_area\"\n",
    "wikipeida_csa = get_wikipedia_table(url, 1)\n",
    "wikipeida_csa[\"Combined\\xa0statistical\\xa0area\"] = wikipeida_csa[\n",
    "    \"Combined\\xa0statistical\\xa0area\"\n",
    "].map(lambda x: x.replace(\"Combined Statistical Area\", \"CSA\"))\n",
    "wikipeida_csa[\"2023 estimate\"] = wikipeida_csa[\"2023 estimate\"].map(\n",
    "    lambda x: int(x.replace(\",\", \"\"))\n",
    ")\n",
    "\n",
    "url = \"https://www2.census.gov/geo/tiger/TIGER2020/CSA/tl_2020_us_csa.zip\"\n",
    "tiger_file = download_file(url)\n",
    "tiger = geopandas.GeoDataFrame.from_file(tiger_file)\n",
    "tiger[\"NAMELSAD\"] = tiger[\"NAMELSAD\"].map(lambda x: x.replace(\"--\", \"–\"))\n",
    "tiger[\"geometry\"] = tiger[\"geometry\"].map(lambda x: x.centroid)\n",
    "\n",
    "csa = pandas.merge(\n",
    "    tiger,\n",
    "    wikipeida_csa,\n",
    "    left_on=\"NAMELSAD\",\n",
    "    right_on=\"Combined\\xa0statistical\\xa0area\",\n",
    "    how=\"inner\",\n",
    ")\n",
    "csa.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "municipal-progress",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}