api

class dpd.analysis.Activity(name, start, end, cost, benefit, *args, **kwargs)

Bases: Series

an activity: something to do

property duration

Returns: duration (datetime.timedelta): the total duration of the Activity

class dpd.analysis.Alternative(name, *args, **kwargs)

Bases: DataFrame

A class to create an Alternative made up of Activities

add_activity(activity)
Parameters:

activity (dpd.analysis.Activity) – an Activity to include in the Alternative

property benefit

Returns: benefit: the sum of all Benefits

property benefit_cost_ratio

Returns: benefit_cost_ratio (float): the Benefit-Cost Ratio

cash_flow_diagram(ax, freq='Y')
Parameters:
  • ax – the axis for the plot

  • freq (str)

property cost

Returns: cost: the sum of all Costs

property duration

Returns: duration (datetime.timedelta): the total duration of the Alternative

property end

Returns: end (datetime.datetime): the latest end time

name
Parameters:

name (str) – the name of the Alternative

Returns:

an Alternative

Return type:

alternative (dpd.analysis.Alternative)

period_range_pivot(discount_rate=0.0, freq='Y')
Parameters:

freq (str)

Returns:

a Cost Table or Benefit Table

Return type:

period_range_pivot (pandas.DataFrame)

property start

Returns: start (datetime.datetime): the earliest start time

property timeline

Returns: timeline (dpd.analysis.timeline): the timeline for all the Activities in the Alternative

class dpd.analysis.Currency(currency, base_year, discount_rate, base_currency=None)

Bases: object

discount(year=None)
Parameters:

year (int) – e.g. 2023

Returns:

a unit that can be multiplied by a value. e.g. 100 * Currency.discount()

Return type:

currency (astropy.units.quantity.Quantity)

class dpd.analysis.Decision(criteria=None, alternatives=None, *args, **kwargs)

Bases: DataFrame

add_alternative(alternative)
add_criterion(criterion)
multiple_criteria_decision_analysis(method='weighted_sum_model')

https://en.wikipedia.org/wiki/Multiple-criteria_decision_analysis weighted_sum_model: https://en.wikipedia.org/wiki/Weighted_sum_model weighted_product_model: https://en.wikipedia.org/wiki/Weighted_product_model

plot(*args, **kwargs)

Make plots of Series or DataFrame.

Uses the backend specified by the option plotting.backend. By default, matplotlib is used.

Parameters:
  • data (Series or DataFrame) – The object for which the method is called.

  • x (label or position, default None) – Only used if data is a DataFrame.

  • y (label, position or list of label, positions, default None) – Allows plotting of one column versus another. Only used if data is a DataFrame.

  • kind (str) –

    The kind of plot to produce:

    • ’line’ : line plot (default)

    • ’bar’ : vertical bar plot

    • ’barh’ : horizontal bar plot

    • ’hist’ : histogram

    • ’box’ : boxplot

    • ’kde’ : Kernel Density Estimation plot

    • ’density’ : same as ‘kde’

    • ’area’ : area plot

    • ’pie’ : pie plot

    • ’scatter’ : scatter plot (DataFrame only)

    • ’hexbin’ : hexbin plot (DataFrame only)

  • ax (matplotlib axes object, default None) – An axes of the current figure.

  • subplots (bool or sequence of iterables, default False) –

    Whether to group columns into subplots:

    • False : No subplots will be used

    • True : Make separate subplots for each column.

    • sequence of iterables of column labels: Create a subplot for each group of columns. For example [(‘a’, ‘c’), (‘b’, ‘d’)] will create 2 subplots: one with columns ‘a’ and ‘c’, and one with columns ‘b’ and ‘d’. Remaining columns that aren’t specified will be plotted in additional subplots (one per column).

      Added in version 1.5.0.

  • sharex (bool, default True if ax is None else False) – In case subplots=True, share x axis and set some x axis labels to invisible; defaults to True if ax is None otherwise False if an ax is passed in; Be aware, that passing in both an ax and sharex=True will alter all x axis labels for all axis in a figure.

  • sharey (bool, default False) – In case subplots=True, share y axis and set some y axis labels to invisible.

  • layout (tuple, optional) – (rows, columns) for the layout of subplots.

  • figsize (a tuple (width, height) in inches) – Size of a figure object.

  • use_index (bool, default True) – Use index as ticks for x axis.

  • title (str or list) – Title to use for the plot. If a string is passed, print the string at the top of the figure. If a list is passed and subplots is True, print each item in the list above the corresponding subplot.

  • grid (bool, default None (matlab style default)) – Axis grid lines.

  • legend (bool or {'reverse'}) – Place legend on axis subplots.

  • style (list or dict) – The matplotlib line style per column.

  • logx (bool or 'sym', default False) – Use log scaling or symlog scaling on x axis.

  • logy (bool or 'sym' default False) – Use log scaling or symlog scaling on y axis.

  • loglog (bool or 'sym', default False) – Use log scaling or symlog scaling on both x and y axes.

  • xticks (sequence) – Values to use for the xticks.

  • yticks (sequence) – Values to use for the yticks.

  • xlim (2-tuple/list) – Set the x limits of the current axes.

  • ylim (2-tuple/list) – Set the y limits of the current axes.

  • xlabel (label, optional) –

    Name to use for the xlabel on x-axis. Default uses index name as xlabel, or the x-column name for planar plots.

    Changed in version 2.0.0: Now applicable to histograms.

  • ylabel (label, optional) –

    Name to use for the ylabel on y-axis. Default will show no ylabel, or the y-column name for planar plots.

    Changed in version 2.0.0: Now applicable to histograms.

  • rot (float, default None) – Rotation for ticks (xticks for vertical, yticks for horizontal plots).

  • fontsize (float, default None) – Font size for xticks and yticks.

  • colormap (str or matplotlib colormap object, default None) – Colormap to select colors from. If string, load colormap with that name from matplotlib.

  • colorbar (bool, optional) – If True, plot colorbar (only relevant for ‘scatter’ and ‘hexbin’ plots).

  • position (float) – Specify relative alignments for bar plot layout. From 0 (left/bottom-end) to 1 (right/top-end). Default is 0.5 (center).

  • table (bool, Series or DataFrame, default False) – If True, draw a table using the data in the DataFrame and the data will be transposed to meet matplotlib’s default layout. If a Series or DataFrame is passed, use passed data to draw a table.

  • yerr (DataFrame, Series, array-like, dict and str) – See Plotting with Error Bars for detail.

  • xerr (DataFrame, Series, array-like, dict and str) – Equivalent to yerr.

  • stacked (bool, default False in line and bar plots, and True in area plot) – If True, create stacked plot.

  • secondary_y (bool or sequence, default False) – Whether to plot on the secondary y-axis if a list/tuple, which columns to plot on secondary y-axis.

  • mark_right (bool, default True) – When using a secondary_y axis, automatically mark the column labels with “(right)” in the legend.

  • include_bool (bool, default is False) – If True, boolean values can be plotted.

  • backend (str, default None) – Backend to use instead of the backend specified in the option plotting.backend. For instance, ‘matplotlib’. Alternatively, to specify the plotting.backend for the whole session, set pd.options.plotting.backend.

  • **kwargs – Options to pass to matplotlib plotting method.

Returns:

If the backend is not the default matplotlib one, the return value will be the object returned by the backend.

Return type:

matplotlib.axes.Axes or numpy.ndarray of them

Notes

  • See matplotlib documentation online for more on this subject

  • If kind = ‘bar’ or ‘barh’, you can specify relative alignments for bar plot layout by position keyword. From 0 (left/bottom-end) to 1 (right/top-end). Default is 0.5 (center)

Examples

For Series:

(Source code)

For DataFrame:

(Source code)

For SeriesGroupBy:

(Source code)

For DataFrameGroupBy:

(Source code)

property weights
class dpd.analysis.Timeline(*args, **kwargs)

Bases: DataFrame

a timeline

add_activity(activity)
plot_gantt(ax)
class dpd.driving.EdgesDriver(body, edges, initial_driver_position_offset=None, driver_max_velocity=None, driver_final_velocity=None, *args, **kwargs)

Bases: Agent

begin_next_edge(extra_position)
end_current_edge(extra_position)
property geometry
property position
start_drive()
step()

A single step of the agent.

class dpd.driving.EdgesLanesDriver(lane=0, *args, **kwargs)

Bases: EdgesDriver

begin_next_edge(*args, **kwargs)
end_current_edge(*args, **kwargs)
lane_change(direction=1)
step()

my_index = self.current_edge.lanes[self.lane].index(self) if my_index > 0:

if self.lane < len(self.current_edge.lanes) - 1:

self.change_lane()

# there are bodies in front of us, need to check their position body_in_front_of_me = self.current_edge.lanes[self.lane][my_index - 1] self.max_position = numpy.min([self.max_position, body_in_front_of_me.position])

class dpd.driving.EdgesLanesNodesDriver(nodes, *args, **kwargs)

Bases: EdgesLanesDriver

begin_next_edge(*args, **kwargs)
begin_next_node()
end_current_edge(*args, **kwargs)
end_current_node()
static from_node_ids(edges_dict, nodes_dict, node_ids, *args, **kwargs)

Preplans the nodes and edges for a Driver based on Node IDs and a nodes_dict and edges_dict

edges_dict (dict): a dictionary with a tuple of (node_id[i], node_id[i+1] as the index for each edge value, this is often a networkx.DiGraph.edges or a pandas.DataFrame nodes_dict (dict): a dictionary with node_id as the index for each node value, this is often a networkx.DiGraph.nodes or a pandas.DataFrame node_ids ([node_id]): a list of Node IDs that describes the route a Driver takes

start_drive()
step()

my_index = self.current_edge.lanes[self.lane].index(self) if my_index > 0:

if self.lane < len(self.current_edge.lanes) - 1:

self.change_lane()

# there are bodies in front of us, need to check their position body_in_front_of_me = self.current_edge.lanes[self.lane][my_index - 1] self.max_position = numpy.min([self.max_position, body_in_front_of_me.position])

class dpd.driving.Network(routes=None)

Bases: object

a transporation network with one or more routes

add_route(index, route)
static from_felt_geojson(geodataframe, *args, **kwargs)
from_gtfs(*args, **kwargs)
static from_osm_query(query, osm=<dpd.osm.osm.OSM object>, *args, **kwargs)

Build a network from an OpenStreetMap Overpass API Query

Example Query: [out:json][timeout:25]; ( relation[“network”=”Metro Rail”]; ); out body; >; out skel qt;

static from_osm_relations(relations, osm=<dpd.osm.osm.OSM object>, *args, **kwargs)
class dpd.driving.Route(data, gague=<Quantity 1.435 m>, max_cant=<Quantity 0.1524 m>, max_cant_deficiency=<Quantity 0.075 m>, *args, **kwargs)

Bases: GeoDataFrame

the route a vehicle takes

add_stop(geometry, name)

Adds a stop at the given geometry named name.

property distance_to_point

Returns a list of distances between every pair of points along the route.

property distances

Returns a list of distances between every pair of points along the route. The length of the list is one less than the number of points on the route.

property edges
static from_gtfs(feed, route_id, service_id=None, shape_id=None, *args, **kwargs)
static from_osm_relation(relation, osm=None, *args, **kwargs)

Build a route from OpenStreetMaps data :param relation: the relation to build a route for :type relation: int :param osm: the osm that contains the route as a relation :type osm: dpd.OSM.osm

Returns:

a route to drive

Return type:

dpd.driving.Route

static from_way(way, crs, *args, **kwargs)
static from_ways(ways, crs, *args, **kwargs)

If there are multiple, disconnected ways, pick the longest one. Note: this does a lenght calculation on an unknown crs. This may be inaccurate.

property radius_of_curvature

Returns a list of the radii of curvature between every three points along the route. The lenght of the list is two less than the number of points on the route.

remove_stop(name)

Removes all stos named name.

property reversed
segments(dwell_time)
speed_limit(radius_of_curvature)

from https://en.wikipedia.org/wiki/Minimum_railway_curve_radius

property speed_limits

Returns a list of maximum speeds between every pair of points along the route based on the radius of curvature. The length of the list is one less than the number of points on the route.

property stops
property way
class dpd.driving.Schedule(trips=None, *args, **kwargs)

Bases: object

the schedule for a route

add_trip(index, trip)
property capacity
property cost
static from_gtfs(feed, route_id, direction_id)
static from_trip(trip, start='6 hour', end='24 hour', freq='20Min')
plot_schedule(ax)
reverse_distance()
property schedule
to_trajectory_collection()
class dpd.driving.Section(start_point, end_point, radius_of_curvature=None, right_handed=True, number_of_points=16, elevation='surface', mode='rail')

Bases: object

a section of a route: curved section (if there is a radius_of_curvature) or straight section start_point (Point): the start point of the section end_point (Point): the end point of ths section radius_of_curvature (float): the Radius of Curvature of the section (in meters)

property speed_limit

the maximum allowable cant (in meters) max_cant_deficiency (float): the maximum allowable cant deficiency (in meters) gague (float): the track gague (in meters)

Type:

max_cant

Type:

(float)

class dpd.driving.Trip(data, *args, **kwargs)

Bases: GeoDataFrame

a time-indexed drive along a route

static from_gtfs(feed, trip_id)
static from_model(df, route, include_geometry=True)
in_vehicle_travel_time(origin_stop, destination_stop)
plot_schedule(ax, include_stops=True)
reverse_distance()

Useful for plotting two trips that go in different directions on the same plot.

property stops
to_trajectory(index)
class dpd.driving.Vehicles(url, vehicle=None)

Bases: object

create_vehicle(index)
vehicle_dropdown_observer(value)
dpd.folium.folium_flask_app()
class dpd.geometry.GeometricDict(dict=None, crs=None, *args, **kwargs)

Bases: UserDict

A dictionsary of objects with a .geometry

plot(columns=['geometry'], **kwargs)
plot_folium(folium_map, columns=['geometry'], **kwargs)
to_crs(crs)

Could this be faster? Probably. Geopandas is about 2x faster because they vectorize. However, it is too much overhead to convert to Geopandas, transform, and convert back. Creating the transformer takes about 500ms so we could save the transformer and reuse it for subsequent transformations. We could also try multithreading.

to_geodataframe(columns=['geometry'])
to_geoseries()
to_json(columns=['geometry'])
dpd.geometry.circle_center_from_points_and_radius(point1, point2, radius, right_handed)

Finds the center of a circle with a given radius such that point1 and point2 are on the circumference of the circle. Based on https://stackoverflow.com/questions/36211171/finding-center-of-a-circle-given-two-points-and-radius. There can be three possible outcomes: no solutions, one solution, or two solutions. If there are no solutions (e.g. radius is less than half the distance between the points), a “ValueError: math domain error” will raise. If there is one solution, right_handed does not matter. If there are two solutions, use right_handed to choose solution the solution. The arc length of the solutions are very close.

dpd.geometry.circle_from_three_points(p0, p1, p2)

Calculates the center and radius of a circle from three points on the circumference of the circle. Based on https://www.geeksforgeeks.org/equation-of-circle-when-three-points-on-the-circle-are-given/ :param p0: the coordinates of a point on the circumference of the circle :type p0: float, float :param p1: the coordinates of a point on the circumference of the circle :type p1: float, float :param p2: the coordinates of a point on the circumference of the circle :type p2: float, float

Returns:

(x,y) are the coordinates of the center of the circle; r is the radius of the circle

Return type:

(x,y), r

dpd.geometry.curve_direction(p0, p1, p2)
dpd.geometry.dms_to_dd(coordinate)

coordinate: ‘40°48′31″N’

dpd.geometry.draw_arc(start_point, end_point, radius, right_handed, number_of_points)

Returns an array of points from start_point to end_point along the circumference of a circle with a given radius

class dpd.mechanics.Bicycle(x, y, theta, velocity, vehicle_length)

Bases: object

A bicycle model. References: * https://thomasfermi.github.io/Algorithms-for-Automated-Driving/Control/BicycleModel.html * https://arrow.tudublin.ie/cgi/viewcontent.cgi?article=1002&context=engschmecart

get_radius_from_vehicle_length_and_steering_angle(steering_angle)
step(delta_time=1, steering_angle=0)
class dpd.mechanics.Body(initial_position, *args, **kwargs)

Bases: Agent

A body.

class dpd.mechanics.DynamicBody(power, mass, max_acceleration=None, min_acceleration=None, *args, **kwargs)

Bases: KinematicBodyWithAcceleration

A class to simulate a dynamic body. Provides methods to move the body with constant power.

step()

A single step of the agent.

step_acceleration()
class dpd.mechanics.KinematicBody(initial_velocity, max_position=None, min_position=None, *args, **kwargs)

Bases: Body

A class to simulate a kinematic body. Provides methods to move the body with constant velocity.

step()

A single step of the agent.

step_position()
class dpd.mechanics.KinematicBodyWithAcceleration(initial_acceleration, max_velocity=None, min_velocity=None, max_deceleration=None, final_velocity=None, *args, **kwargs)

Bases: KinematicBody

A class to simulate a kinematic body. Provides methods to move the body with constant acceleration and decelerate the body with constant deceleration.

step()

A single step of the agent.

step_velocity()
dpd.mechanics.move(acceleration, initial_velocity, time, max_speed=None)
class dpd.mapping.Cycleway(link, segment_number)

Bases: Segment

class dpd.mapping.Intersection(name, geometry, input_links=None, output_links=None, **kwargs)

Bases: object

Intersection: a place where things intersect.

input_lanes are lanes which feed into the intersection. output_lanes are lanes which feed out of the intersection.

class dpd.mapping.Lane(link, segment_number)

Bases: Segment

Bases: object

Note: the output_intersection of a link means that link is an input_link of that intersection. And the input_intersection of a link means that link is an output_link of that intersection

update_segments_from_osm(number_of_lanes=0, parking=None, cycleway=None, sidewalk=None)
update_segments_from_streetmix(url, timeout=60)

Bases: GeometricDict

A class to hold Links.

plot_folium(folium_map, columns=['geometry', 'segments'], **kwargs)
class dpd.mapping.Map

Bases: object

A class for creating a link network map that includes links (made up of lanes) that go between intersections.

plot(include_intersections=False, include_links=True, filter_box=None, **kwargs)
plot_folium(include_intersections=False, include_links=True, filter_box=None, **kwargs)
to_geodigraph(intersection_attributes=[], link_attributes=[])
class dpd.mapping.Parking(link, type_)

Bases: object

class dpd.mapping.Segment(link, segment_number)

Bases: object

class dpd.mapping.Sidewalk(link, segment_number)

Bases: Segment

dpd.mapping.add_object_to_edges_and_nodes(graph, node_model)

Adds Edge and Node objects to each Edge and Node in a Graph

graph (networkx.Graph): a Graph that describes the transportation network

class dpd.modeling.DistanceDataFrame(data=None, index: Axes | None = None, columns: Axes | None = None, dtype: Dtype | None = None, copy: bool | None = None)

Bases: DataFrame

This is a class to compute an Origin-Destination DataFrame. It maps a distance, cost, or time function to origins and destinations. Output is a DataFrame with origins as rows and destinations as columns. The result can be merged with the origins or destinations dataframe. e.g. pandas.merge(origins, accessibility, left_index=True, right_index=True) This is similar to SciPy’s distance matrix: https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance_matrix.html.

  • Index is Origins

  • Columns are Destinations

  • Value is distance from origin to destination (Meters, Kilometers, Seconds, Minutes, Dollars)

  • Often built from Zones as a square matrix, but origins and destinations can be the same or different

static from_origins_destinations(origins, destinations, method='distance', distance_unit=1, *args, **kwargs)

origins (geopandas.GeoSeries): GeoSeries of list of origin points destinations (geopandas.GeoSeries): GeoSeries of list of destination points method distance_unit: unit to apply to all values. Currently only supported with method=distance method [“distance”, “haversine”, “OSRM”]: method used for calculating distance. Distance requires Points to be in meters. haversine and OSRM reqiures Points to be in (lon, lat). mode [“walking”, “cycling”, “driving”]: mode that is passed to OSRM.

class dpd.modeling.ModeChoiceModel(modes, probabilities)

Bases: object

predict(people)
class dpd.modeling.Population(data=None, index: Axes | None = None, columns: Axes | None = None, dtype: Dtype | None = None, copy: bool | None = None)

Bases: DataFrame

static from_trip_dataframe(trip_dataframe)
class dpd.modeling.TransportationModel(*args: Any, **kwargs: Any)

Bases: Model

get_dataframe()
step()

A single step. Fill in here.

class dpd.modeling.TripDataFrame(data=None, index: Axes | None = None, columns: Axes | None = None, dtype: Dtype | None = None, copy: bool | None = None)

Bases: DataFrame

A class to store an origin-destination matrix for the trip distribution step of a four-step model. This is similar to scikit-mobility’s FlowDataframe: https://scikit-mobility.github.io/scikit-mobility/reference/data_structures.html#module-skmob.core.flowdataframe

  • Index is Origin

  • Columns are Destinations

  • Value is number of trips from origin to destination

  • Often built from DistanceDataFrame using a GravityModel or IPFN

add_geometry_from_zones(zones, method=<function random_point_in_polygon>)
add_route_hw_from_osrm(url_base, mode)
static from_gravity_model(origin_population, destination_population, distance_dataframe, function='inverse', G=1, a=1, b=1, d=1, *args, **kwargs)
static from_ipfn(zones, cost_dataframe, *args, **kwargs)
static from_lodes(st, year, *args, **kwargs)
route_assignment(zones, column='S000')
class dpd.modeling.Zones(*args, **kwargs)

Bases: GeoDataFrame

A class to store four-step model zones.

  • Index is an identifier

  • Columns are Geometry (Polygons), Production, Attraction

build_graph(centroid_distance_dataframe=None)
calculate_distance_dataframe(method='haversine')

Calculate a dataframe containing the distance between the centroid of all zones.

calculate_trip_dataframe_from_ipfn(distance_dataframe=None)

Calculate an trip dataframe using IPFN.

static from_uscensus(state, year, include_units=False)
h3fy_area_interpolated(h3fy_kwds=None, area_interpolate_kwds=None)
polygons_to_points(num=10)
production_attraction_from_lodes(origin_destination_dataframe, column='S000')
visualize_route_assignment(linewidth=0.0001, ax=None)
dpd.modeling.contour_dataframe(point, crs=None, distance=2000, levels=[500, 1000, 1500, 2000], num=50, distance_dataframe_kwargs=None)
class dpd.osm.OSM

Bases: object

A class to query the Overpass API

add_node(osm)
add_relation(osm)
add_way(osm)
download_node(node_id)
download_relation(relation_id)
download_way(way_id)
execute_query(query, retry_attempts=4, timeout=600)
class dpd.osm.OSMMap(region)

Bases: Map

banned_modes(link)

Determine if any modes are banned on a link. Not used for now: we will trust the routing engine.”

build_intersections(intersections)
create_node_tags_lookup()
cycleway_calculator(link)
lane_calculator(link)
look_for_stop_signs()
sidewalk_calculator(link)
static speed_converter(speed)
class dpd.osrm.OSRM(region, profile, profile_directory='/usr/local/Cellar/osrm-backend/5.22.0_1/share/osrm/profiles/')

Bases: object

contract(*args, **kwargs)
extract(*args, **kwargs)
routed(*args, **kwargs)
dpd.osrm.route(origin, destination, url_base, mode, options='?annotations=nodes', timeout=600)
dpd.osrm.table(origins, destinations, url_base, mode, options='', timeout=600)
dpd.shapely.cut(line, distance)

Cuts a line in two at a distance from its starting point

dpd.shapely.find_way_in_ways(ways, points, buffer_distance=None)
dpd.shapely.random_point_in_polygon(polygon)
dpd.shapely.snap_point_to_linestring(linestring, point)

“Snaps the point to the closest point along the linestring. Adds that point to the linestring. Returns the new linestring and the closest point on the linestring.

dpd.shapely.uniform_points_in_polygon(polygon, num=50)
dpd.uscensus.download_lodes_data(data, st, part_or_seg, type_, year)

Download LODES OD file. APIS documentation from here: https://lehd.ces.census.gov/data/lodes/LODES7/LODESTechDoc7.4.pdf

Parameters:
  • data (str) – one of “od”, “rac”, or “wac” e.g. “od”

  • st (str) – lowercase, 2-letter postal code for a chosen state e.g. “ca”

  • part_or_seg (str) – If data is od, part of the state file, can have a value of either “main” or “aux”. Complimentary parts of the state file, the main part includes jobs with both workplace and residence in the state and the aux part includes jobs with the workplace in the state and the residence outside of the state. If data is rac or wac, segment of the workforce, can have the values of “S000”, “SA01”, “SA02”, “SA03”, “SE01”, “SE02”, “SE03”, “SI01”, “SI02”, or “SI03”. These correspond to the same segments of the workforce as are listed in the OD file structure. e.g. “main”

  • type (str) – Job Type, can have a value of “JT00” for All Jobs, “JT01” for Primary Jobs, “JT02” for All Private Jobs, “JT03” for Private Primary Jobs, “JT04” for All Federal Jobs, or “JT05” for Federal Primary Jobs. e.g. “JT00”

  • year (str) – Year of job data. Can have the value of 2002-2015 for most states. e.g. “2017”

Returns:

the local filename of the downloaded file

Return type:

str

dpd.uscensus.download_lodes_xwalk(st)

Download LODES Crosswalk file. APIS documentation from here: https://lehd.ces.census.gov/data/lodes/LODES7/LODESTechDoc7.4.pdf

Parameters:

st (str) – lowercase, 2-letter postal code for a chosen state e.g. “ca”

Returns:

the local filename of the downloaded file

Return type:

str

dpd.uscensus.get_uscensus_data(year, state, data=['NAME'], with_geometry=False, timeout=600)

Gets the specified data from the US Census API and returns it as a Pandas DataFrame.

Parameters:
  • year (str) – the year to get the data from

  • state (str) – the name of the state to get the data for

  • data ([str]) – the data to gather from the API

  • with_geometry (bool) – if geometric data should be added to the result

Returns:

A dataframe containing the data. If with_geometry is true, this is a geopandas.DataFrame

Return type:

pandas.DataFrame

dpd.utils.download_file(url, redownload=False, timeout=600)

Download a file locally. Does not download the file if it already exists.

Parameters:
  • url (str) – the url of the file to download

  • redownload (bool) – if the file exists, delete it and redownload it

Returns:

the local filename of the downloaded or existing file

Return type:

local_filename (str)

dpd.utils.timestring_to_timeobject(timestring)

handle the case where hours go past midnight

dpd.wikipedia.get_wikipedia_coordinates(url, timeout=60)

Get the latitude, longitude coordinates from a wikipedia page and return it as a tuple.

Parameters:

url (str) – the url of the wikipedia page from which to get the coordinates

Returns:

a tuple containing the coordinates

Return type:

(latitude, longitude)

dpd.wikipedia.get_wikipedia_table(url, number=0, styled=False, timeout=60)

Get a table from wikipedia and return it as a Pandas DataFrame.

Parameters:
  • url (str) – the url of the wikipedia page from which to get the table

  • number (int) – which table on the page to return (for pages with multiple tables)

Returns:

A dataframe containing the table

Return type:

pandas.DataFrame