SlideShare uma empresa Scribd logo
1 de 6
Building a Standard for Open Bikeshare Data
Originally published at Michael Schade’s Mystery Incorporated Blog
March 2nd, 2014

Should the bikeshare industry adopt an open data standard? As
bikesharing spreads to more cities, having a common method for
accessing and analyzing data will become more important. We know
that transit systems work best when agencies concentrate on their
core mission. Transit agencies are not in the information technology
business; all they should do is release their data to let third parties
build apps that let passengers use the systems.
To use open data, programmers need to know: Where is the
data? What are the files called? Which fields are available? What are
the fields called?
Bikesharing systems should adopt the standard of having a “data”
page which can be found by appending “data” immediately after the
main URL. This is what many U.S. government web sites are doing
(like justice.gov/data, dot.gov/data, state.gov/data, etc.) It would be
awesome to have consistent URLs like capitalbikeshare.com/data and
velib.paris.fr/data.
To standardize what the files are called, we have to decide how many
files are used, and what formats to use. Some systems do not
separate the station information data (which is static) from the station
status data (which is dynamic). The Capital Bikeshare XML file and
the Bixi Montreal XML file are examples of combining both static and
dynamic data in a single file (both use the Bixi public bike system).
This might be more convenient in some cases, but for systems that
frequently update their displays, it wastes a lot of bandwidth. This
process could be made more efficient by using two files. JCDecaux,
which manages many bikesharing systems in Europe, separates
the static data from the dynamic real-time data.
Denver‟s B-cycle doesn‟t seem to offer any data at all, though
Denver‟s Open Data Catalog does offer a variety of formats for data
about B-cycle Stations. I doubt this is the true, live, system data,
because the coordinates are given as street addresses and not latitude
and longitude coordinates.
In addition to information needed by apps, we also need historic data
in order to analyze how people use the system. The most common
kind is system metrics, such as the type released by Bay Area
Bikeshare. This typically shows ridership and membership totals, and
is good for showing how the system has grown. It would be updated at
the end of each day.
Planners and analysts rely on two other types of historic data: trip
history information shows every trip made within a certain period,
and station history data shows the status of the stations within a
certain period. The best example of the former is the Capital Bikeshare
trip history data page, which releases a new data set every quarter.
The latter is sometimes recorded by enthusiasts on their own initiative,
such as the CaBi Tracker website. In San Francisco, Eric Fisherkeeps a
daily log of Bay Area Bikeshare stats at trafficways.org/babs (I used
his data in Probing Data from Bay Area Bikeshare).
The trip history and station history files need a naming convention to
reflect the content‟s date range. CaBi‟s largest quarterly file is 72.5MB,
for the 572,919 trips in the 2nd quarter of 2012 (they have now
started zipping the files). A filename format like trips-2012-3-1-to-20125-30.csv would work well.
While the systems are expected to protect their customers‟ privacy by
not including customer IDs, users should be able to download their
own personal trip history files, and those files should use the same
format as the main trip history files.
Finally, there should be a standard way of summarizing general
information about the entire system. Who provides the equipment,
who runs the system, which jurisdictions participate, where the system
is located and what its boundaries are, what the hours of operation
are, what the operating season is, what the URL is and other contact
info. And to really integrate all the various systems, we also could
benefit from having the URL for a standard-size logo images, plus the
systems‟s colors. This System information file should also include data
found in a manifest file, namely, a list of all the associated open-data
files.
The system information should include definitions of available
membership types. This might merit being listed as a separate table.
Each membership type should include the cost and duration. We also
need to know how long rides can be, and what the charges are for
going beyond the time limit. For example, theCaBi pricing rules say
rides are free for the first 30 minutes; going up to 30 minutes longer
costs $2.00 for casual members (those with 1- or 3-day memberships)
and $1.50 for subscribers. In contrast, the Citi Bike pricing rules say
rides are free for the first 45 minutes; going up to 30 minutes longer
costs $4.00 for those with 24-hour & 7-day passes, and $2.50 for
those with annual memberships.
This table summarizes the six types of bikesharing data:
System information: general info
Station information: a mostly-static list of all stations
Station status: the number of available bikes and docks
System metrics: membership and trip totals
Trip history: every trip made during a given period
Station history: a history of the station status list
Here‟s how I would organize the files. I‟ll use ▶ to indicate a primary
key (one that must be unique within the system), and ▷ to indicate a
foreign key (one that references another table‟s primary key, and
which must exist).
The station information data is the information most likely to be
shared by bikeshare systems. At the very least, it includes the latitude
& longitude coordinates for every station, and the name. The file is
fairly static, changing mostly when new stations are added.
Here are the fields I would include, compared with CaBi (DC), Vélib
(Paris), and Denver‟s B-cycle to see what names they use.
Station information
proposal
CaBi
Vélib
B-cycle
id,
stationid ▶
number
GLOBALID
terminalName
name
name
name
STATION_NAME
STATION_ADDRESS,
address
(not used)
address
ADDRESS_LINE1,
ADDRESS_LINE2
(not
region
(not used)
CITY, STATE
used)
(not
zip
(not used)
ZIP
used)
lat
lat
latitude
(not used)
lng
long
longitude (not used)
installed
installDate
(not
(not used)
removed

removalDate

public

public

capacity

(not used)

message

(not used)

used)
(not
used)
(not
used)
(not
used)
(not
used)

(not used)
(not used)
NUM_DOCKS
(not used)

Most systems don‟t use a region field, but for multi-jurisdictional
systems, it is important to know which jurisdiction manages each
station. For example,Capital Bikeshare operates within DC,
Montgomery County, Arlington, and Alexandria. Bay Area
Bikeshare operates within San Francisco, Redwood City, Palo Alto,
Mountain View, and San Jose. Nice Ride operates within Minneapolis
and St Paul. Other systems could use this field to track which
neighborhood the station is in.
Vélib appends the postal code & city to the address field, but this
would be better as a separate fields. For example, the Bastille Richard
Lenoir station has an address of “2 BOULEVARD RICHARD LENOIR –
75011 PARIS”, but this should be just “2 BOULEVARD RICHARD
LENOIR”, with a zip of “75011″ and a city of “Paris.” And there is no
reason for Vélib to use all-uppercase letters. The data should be in the
proper mixed-case (using French rules for capitalization), and
programs can easily convert to uppercase if they wish.
I would suggest a message field so systems can communicate that a
station will be shutting down early, or moved to a new location. Or
during snow storms, the rebalancing van might not be able to service
a station.
Denver has other fields that should be considered for a standard.
“PROPERTY_TYPE” shows whether the station‟s location
is Private or Public. This could be expanded to show exactly who the
property owner or responsible agency is. “POWER_TYPE” has values
of Solar Only, Wired Only, and Solar with Wire Backup.
Cities often provide temporary stations. The station ID should
correspond to a specific location. If a station returns to the same
location for an annual event, it should re-use the old ID.
The station status file should have the smallest amount of data needed
to describe the current state of each station. This is the file that will be
called most often, potentially thousands of times per minute, so every
byte counts. And many people will be querying this data from mobile
devices, another reason to keep the file size as small as possible.
Here‟s how I would design the standard for this file, compared with
CaBi (DC) and Denver‟s B-cycle to see what names they use. Because
I couldn‟t find Denver‟s XML feed, I used CityBike„s Denver JSON feed.
Station status
proposal
CaBi
Denver B-cycle
stationid ▷ id, terminalName
id, idx
bikes
nbBikes
bikes
docks
nbEmptyDocks
free
open
locked
(not used)
time
lastCommWithServer timestamp
The bikes and docks numbers will generally add up to
the capacity value in the station information file, but if there are nonfunctioning bikes or docks, the total could be smaller. The open field
would be true or false. Sometimes stations are temporarily closed,
perhaps because they have become inaccessible. The timevalue shows
the last time the station communicated with the server. This is useful
to determine if the data might no longer be accurate, such as during a
power outage.
Notice we don‟t duplicate any of the fields in the station
information file, other than our foreign key, the stationid field.
The trip history file also needs to be as compact as possible, not
because people will be downloading it frequently, but because these
files could be used to store millions of records.
Trip history
startdate
startstation ▷
enddate
endstation ▷
bikeid
usertype
The duration of each trip can be computed on-the-fly and doesn‟t need
to be included in the file. The startstation and endstation values link up
to the stationid field in the station information file. The usertype field
describes the type of membership the rider has.
Though few systems release trip history data on a regular basis, there
have been occasions when systems have released data in support of a
visualization contest. The Hubway Data Visualization Challenge took
place in 2013, and included demographic data about the rider of each
trip: residential zip code, year of birth, and sex. The Divvy Data
Challenge (for Chicago) is currently underway; its data includes riders‟
year of birth and sex.
The station history file should be a list of every change in status
(available bikes and docks) for every station, listed in chronological
order. In order to avoid having to repeat the state of the entire system
when only a few stations have new values, the file should start with
every station, and thereafter list a station only when it has changed.
The initial value would be needed in order to compute the state of any
later times recorded in the file.
Station history
stationid ▷
bikes
docks
open
time
The dominant data format nowadays is either XML or JSON. CSV is
also a good choice, as long as the data fits in a tabular format,
consisting of simple rows and columns. For CSV files, the order of
fields should be consistent.
The values of the fields are numeric, string, Boolean, and timestamp.
Boolean is easily expressed as “true” or “false,” and Unix time is a
common way of recording date and time.
By publishing and standardizing bikesharing open data, developers and
analysts can make it easier for the public to make use of and discover
bikesharing systems across the globe, such as the Bike Share
Map by Oliver O‟Brian. The vendors, operators, and managing
jurisdictions should work together to create a standard that can be
used by everyone.

Mais conteúdo relacionado

Semelhante a Building a Standard for Open Bikeshare Data

BUILDING E-COMMERCE.pdf
BUILDING E-COMMERCE.pdfBUILDING E-COMMERCE.pdf
BUILDING E-COMMERCE.pdfLilianNjoki2
 
APTA TransITech 2013 - "Open Transit Data - A Developers Perspective"
APTA TransITech 2013 - "Open Transit Data - A Developers Perspective"APTA TransITech 2013 - "Open Transit Data - A Developers Perspective"
APTA TransITech 2013 - "Open Transit Data - A Developers Perspective"Sean Barbeau
 
Smart City Surveillance Running on Vehicles
Smart City Surveillance Running on VehiclesSmart City Surveillance Running on Vehicles
Smart City Surveillance Running on VehiclesMa'ayan Doron
 
Trisul netflow isp_features
Trisul netflow isp_featuresTrisul netflow isp_features
Trisul netflow isp_featurestrisulnsm
 
Cataloging Of Sessions in Genuine Traffic by Packet Size Distribution and Ses...
Cataloging Of Sessions in Genuine Traffic by Packet Size Distribution and Ses...Cataloging Of Sessions in Genuine Traffic by Packet Size Distribution and Ses...
Cataloging Of Sessions in Genuine Traffic by Packet Size Distribution and Ses...IOSR Journals
 
A Linked Data Dataset for Madrid Transport Authority's Datasets
A Linked Data Dataset for Madrid Transport Authority's DatasetsA Linked Data Dataset for Madrid Transport Authority's Datasets
A Linked Data Dataset for Madrid Transport Authority's DatasetsOscar Corcho
 
Using ArcGIS to Propose an On-Street Bicycle Network
Using ArcGIS to Propose an On-Street Bicycle NetworkUsing ArcGIS to Propose an On-Street Bicycle Network
Using ArcGIS to Propose an On-Street Bicycle NetworkBryan Townley
 
Automated Traffic Classification And Application Identification Using Machine...
Automated Traffic Classification And Application Identification Using Machine...Automated Traffic Classification And Application Identification Using Machine...
Automated Traffic Classification And Application Identification Using Machine...Jennifer Daniel
 
BIG DATA, a new way to achieve success in Enterprise Architecture.
BIG DATA, a new way to achieve success in Enterprise Architecture.BIG DATA, a new way to achieve success in Enterprise Architecture.
BIG DATA, a new way to achieve success in Enterprise Architecture.Georges Colin
 
Visual Analytics: Traffic Collisions in Italy
Visual Analytics: Traffic Collisions in ItalyVisual Analytics: Traffic Collisions in Italy
Visual Analytics: Traffic Collisions in ItalyRoberto Falconi
 
Running Head PROJECT DELIVERABLE 31PROJECT DELIVERABLE 310.docx
Running Head PROJECT DELIVERABLE 31PROJECT DELIVERABLE 310.docxRunning Head PROJECT DELIVERABLE 31PROJECT DELIVERABLE 310.docx
Running Head PROJECT DELIVERABLE 31PROJECT DELIVERABLE 310.docxtodd581
 
Driving the Data Pipelines for Connected Vehicles with Spring Cloud Data Flow
Driving the Data Pipelines for Connected Vehicles with Spring Cloud Data FlowDriving the Data Pipelines for Connected Vehicles with Spring Cloud Data Flow
Driving the Data Pipelines for Connected Vehicles with Spring Cloud Data FlowVMware Tanzu
 
MIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data ArchitectureMIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data ArchitectureEvan Chan
 
Cost to Serve of large scale Online Systems - final
Cost to Serve of large scale Online Systems - finalCost to Serve of large scale Online Systems - final
Cost to Serve of large scale Online Systems - finalAndrés Paz
 
Mining data for traffic detection system
Mining data for traffic detection systemMining data for traffic detection system
Mining data for traffic detection systemijccsa
 
ŠVOČ: Design and architecture of a web applications for interactive display o...
ŠVOČ: Design and architecture of a web applications for interactive display o...ŠVOČ: Design and architecture of a web applications for interactive display o...
ŠVOČ: Design and architecture of a web applications for interactive display o...Martin Puškáč
 

Semelhante a Building a Standard for Open Bikeshare Data (20)

BUILDING E-COMMERCE.pdf
BUILDING E-COMMERCE.pdfBUILDING E-COMMERCE.pdf
BUILDING E-COMMERCE.pdf
 
APTA TransITech 2013 - "Open Transit Data - A Developers Perspective"
APTA TransITech 2013 - "Open Transit Data - A Developers Perspective"APTA TransITech 2013 - "Open Transit Data - A Developers Perspective"
APTA TransITech 2013 - "Open Transit Data - A Developers Perspective"
 
Smart City Surveillance Running on Vehicles
Smart City Surveillance Running on VehiclesSmart City Surveillance Running on Vehicles
Smart City Surveillance Running on Vehicles
 
Trisul netflow isp_features
Trisul netflow isp_featuresTrisul netflow isp_features
Trisul netflow isp_features
 
Visualizing CDR Data
Visualizing CDR DataVisualizing CDR Data
Visualizing CDR Data
 
Cataloging Of Sessions in Genuine Traffic by Packet Size Distribution and Ses...
Cataloging Of Sessions in Genuine Traffic by Packet Size Distribution and Ses...Cataloging Of Sessions in Genuine Traffic by Packet Size Distribution and Ses...
Cataloging Of Sessions in Genuine Traffic by Packet Size Distribution and Ses...
 
Autonomous Driving: The Big Data Value Myth
Autonomous Driving: The Big Data Value MythAutonomous Driving: The Big Data Value Myth
Autonomous Driving: The Big Data Value Myth
 
A Linked Data Dataset for Madrid Transport Authority's Datasets
A Linked Data Dataset for Madrid Transport Authority's DatasetsA Linked Data Dataset for Madrid Transport Authority's Datasets
A Linked Data Dataset for Madrid Transport Authority's Datasets
 
Using ArcGIS to Propose an On-Street Bicycle Network
Using ArcGIS to Propose an On-Street Bicycle NetworkUsing ArcGIS to Propose an On-Street Bicycle Network
Using ArcGIS to Propose an On-Street Bicycle Network
 
Automated Traffic Classification And Application Identification Using Machine...
Automated Traffic Classification And Application Identification Using Machine...Automated Traffic Classification And Application Identification Using Machine...
Automated Traffic Classification And Application Identification Using Machine...
 
BIG DATA, a new way to achieve success in Enterprise Architecture.
BIG DATA, a new way to achieve success in Enterprise Architecture.BIG DATA, a new way to achieve success in Enterprise Architecture.
BIG DATA, a new way to achieve success in Enterprise Architecture.
 
SCE 3
SCE 3SCE 3
SCE 3
 
Visual Analytics: Traffic Collisions in Italy
Visual Analytics: Traffic Collisions in ItalyVisual Analytics: Traffic Collisions in Italy
Visual Analytics: Traffic Collisions in Italy
 
Running Head PROJECT DELIVERABLE 31PROJECT DELIVERABLE 310.docx
Running Head PROJECT DELIVERABLE 31PROJECT DELIVERABLE 310.docxRunning Head PROJECT DELIVERABLE 31PROJECT DELIVERABLE 310.docx
Running Head PROJECT DELIVERABLE 31PROJECT DELIVERABLE 310.docx
 
Driving the Data Pipelines for Connected Vehicles with Spring Cloud Data Flow
Driving the Data Pipelines for Connected Vehicles with Spring Cloud Data FlowDriving the Data Pipelines for Connected Vehicles with Spring Cloud Data Flow
Driving the Data Pipelines for Connected Vehicles with Spring Cloud Data Flow
 
MIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data ArchitectureMIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data Architecture
 
Cost to Serve of large scale Online Systems - final
Cost to Serve of large scale Online Systems - finalCost to Serve of large scale Online Systems - final
Cost to Serve of large scale Online Systems - final
 
DataBearings: A semantic platform for data integration on IoT, Artem Katasonov
DataBearings: A semantic platform for data integration on IoT, Artem KatasonovDataBearings: A semantic platform for data integration on IoT, Artem Katasonov
DataBearings: A semantic platform for data integration on IoT, Artem Katasonov
 
Mining data for traffic detection system
Mining data for traffic detection systemMining data for traffic detection system
Mining data for traffic detection system
 
ŠVOČ: Design and architecture of a web applications for interactive display o...
ŠVOČ: Design and architecture of a web applications for interactive display o...ŠVOČ: Design and architecture of a web applications for interactive display o...
ŠVOČ: Design and architecture of a web applications for interactive display o...
 

Mais de Mobility Lab

Communicating Transportation Options
Communicating Transportation OptionsCommunicating Transportation Options
Communicating Transportation Options Mobility Lab
 
Building bikeshare together
Building bikeshare togetherBuilding bikeshare together
Building bikeshare togetherMobility Lab
 
TransitCenter's "Limits of TDM Definitions and Potential for Greater Impact"
TransitCenter's "Limits of TDM Definitions and Potential for Greater Impact"TransitCenter's "Limits of TDM Definitions and Potential for Greater Impact"
TransitCenter's "Limits of TDM Definitions and Potential for Greater Impact"Mobility Lab
 
Blogging the Greater Greater Washington Way
Blogging the Greater Greater Washington WayBlogging the Greater Greater Washington Way
Blogging the Greater Greater Washington WayMobility Lab
 
Arlington County Transportation Funding 101
Arlington County Transportation Funding 101Arlington County Transportation Funding 101
Arlington County Transportation Funding 101Mobility Lab
 
Arlington, Virginia's Transportation Future
Arlington, Virginia's Transportation FutureArlington, Virginia's Transportation Future
Arlington, Virginia's Transportation FutureMobility Lab
 
Smart Fares: What if we sold transit fares like cell phone minutes?
Smart Fares: What if we sold transit fares like cell phone minutes?Smart Fares: What if we sold transit fares like cell phone minutes?
Smart Fares: What if we sold transit fares like cell phone minutes?Mobility Lab
 
Creating Better Places with Transportation Demand Management (TDM)
Creating Better Places with Transportation Demand Management (TDM)Creating Better Places with Transportation Demand Management (TDM)
Creating Better Places with Transportation Demand Management (TDM)Mobility Lab
 
Which Attributes Make a Community Successful?
Which Attributes Make a Community Successful?Which Attributes Make a Community Successful?
Which Attributes Make a Community Successful?Mobility Lab
 
Capital Bikeshare’s First Mile:Last-Mile Ridership
Capital Bikeshare’s First Mile:Last-Mile RidershipCapital Bikeshare’s First Mile:Last-Mile Ridership
Capital Bikeshare’s First Mile:Last-Mile RidershipMobility Lab
 
Integrating Community Development and Transportation Strategies
Integrating Community Development and Transportation StrategiesIntegrating Community Development and Transportation Strategies
Integrating Community Development and Transportation StrategiesMobility Lab
 
2012 Arlington Business Leaders Study
2012 Arlington Business Leaders Study2012 Arlington Business Leaders Study
2012 Arlington Business Leaders StudyMobility Lab
 
WeGoMil Real Time Ridesharing at BRAC
WeGoMil Real Time Ridesharing at BRACWeGoMil Real Time Ridesharing at BRAC
WeGoMil Real Time Ridesharing at BRACMobility Lab
 
Dominion Power Emergency Preparedness
Dominion Power Emergency PreparednessDominion Power Emergency Preparedness
Dominion Power Emergency PreparednessMobility Lab
 
ITS for Transit (Federal Transit Administration)
ITS for Transit (Federal Transit Administration)ITS for Transit (Federal Transit Administration)
ITS for Transit (Federal Transit Administration)Mobility Lab
 
VDOT Winter Weather Preparations
VDOT Winter Weather PreparationsVDOT Winter Weather Preparations
VDOT Winter Weather PreparationsMobility Lab
 
LEED/Transportation Symposium - Justin Schor
LEED/Transportation Symposium - Justin SchorLEED/Transportation Symposium - Justin Schor
LEED/Transportation Symposium - Justin SchorMobility Lab
 
ACCS 2011 WalkArlington highlights
ACCS 2011 WalkArlington highlightsACCS 2011 WalkArlington highlights
ACCS 2011 WalkArlington highlightsMobility Lab
 
Jennings monday - act canada making a difference
Jennings   monday - act canada making a differenceJennings   monday - act canada making a difference
Jennings monday - act canada making a differenceMobility Lab
 

Mais de Mobility Lab (20)

Communicating Transportation Options
Communicating Transportation OptionsCommunicating Transportation Options
Communicating Transportation Options
 
Building bikeshare together
Building bikeshare togetherBuilding bikeshare together
Building bikeshare together
 
TransitCenter's "Limits of TDM Definitions and Potential for Greater Impact"
TransitCenter's "Limits of TDM Definitions and Potential for Greater Impact"TransitCenter's "Limits of TDM Definitions and Potential for Greater Impact"
TransitCenter's "Limits of TDM Definitions and Potential for Greater Impact"
 
Blogging the Greater Greater Washington Way
Blogging the Greater Greater Washington WayBlogging the Greater Greater Washington Way
Blogging the Greater Greater Washington Way
 
Arlington County Transportation Funding 101
Arlington County Transportation Funding 101Arlington County Transportation Funding 101
Arlington County Transportation Funding 101
 
Arlington, Virginia's Transportation Future
Arlington, Virginia's Transportation FutureArlington, Virginia's Transportation Future
Arlington, Virginia's Transportation Future
 
Smart Fares: What if we sold transit fares like cell phone minutes?
Smart Fares: What if we sold transit fares like cell phone minutes?Smart Fares: What if we sold transit fares like cell phone minutes?
Smart Fares: What if we sold transit fares like cell phone minutes?
 
Creating Better Places with Transportation Demand Management (TDM)
Creating Better Places with Transportation Demand Management (TDM)Creating Better Places with Transportation Demand Management (TDM)
Creating Better Places with Transportation Demand Management (TDM)
 
Which Attributes Make a Community Successful?
Which Attributes Make a Community Successful?Which Attributes Make a Community Successful?
Which Attributes Make a Community Successful?
 
Capital Bikeshare’s First Mile:Last-Mile Ridership
Capital Bikeshare’s First Mile:Last-Mile RidershipCapital Bikeshare’s First Mile:Last-Mile Ridership
Capital Bikeshare’s First Mile:Last-Mile Ridership
 
Integrating Community Development and Transportation Strategies
Integrating Community Development and Transportation StrategiesIntegrating Community Development and Transportation Strategies
Integrating Community Development and Transportation Strategies
 
2012 Arlington Business Leaders Study
2012 Arlington Business Leaders Study2012 Arlington Business Leaders Study
2012 Arlington Business Leaders Study
 
WeGoMil Real Time Ridesharing at BRAC
WeGoMil Real Time Ridesharing at BRACWeGoMil Real Time Ridesharing at BRAC
WeGoMil Real Time Ridesharing at BRAC
 
Dominion Power Emergency Preparedness
Dominion Power Emergency PreparednessDominion Power Emergency Preparedness
Dominion Power Emergency Preparedness
 
ITS for Transit (Federal Transit Administration)
ITS for Transit (Federal Transit Administration)ITS for Transit (Federal Transit Administration)
ITS for Transit (Federal Transit Administration)
 
VDOT Winter Weather Preparations
VDOT Winter Weather PreparationsVDOT Winter Weather Preparations
VDOT Winter Weather Preparations
 
LEED/Transportation Symposium - Justin Schor
LEED/Transportation Symposium - Justin SchorLEED/Transportation Symposium - Justin Schor
LEED/Transportation Symposium - Justin Schor
 
Commuter Direct
Commuter DirectCommuter Direct
Commuter Direct
 
ACCS 2011 WalkArlington highlights
ACCS 2011 WalkArlington highlightsACCS 2011 WalkArlington highlights
ACCS 2011 WalkArlington highlights
 
Jennings monday - act canada making a difference
Jennings   monday - act canada making a differenceJennings   monday - act canada making a difference
Jennings monday - act canada making a difference
 

Último

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Último (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Building a Standard for Open Bikeshare Data

  • 1. Building a Standard for Open Bikeshare Data Originally published at Michael Schade’s Mystery Incorporated Blog March 2nd, 2014 Should the bikeshare industry adopt an open data standard? As bikesharing spreads to more cities, having a common method for accessing and analyzing data will become more important. We know that transit systems work best when agencies concentrate on their core mission. Transit agencies are not in the information technology business; all they should do is release their data to let third parties build apps that let passengers use the systems. To use open data, programmers need to know: Where is the data? What are the files called? Which fields are available? What are the fields called? Bikesharing systems should adopt the standard of having a “data” page which can be found by appending “data” immediately after the main URL. This is what many U.S. government web sites are doing (like justice.gov/data, dot.gov/data, state.gov/data, etc.) It would be awesome to have consistent URLs like capitalbikeshare.com/data and velib.paris.fr/data. To standardize what the files are called, we have to decide how many files are used, and what formats to use. Some systems do not separate the station information data (which is static) from the station status data (which is dynamic). The Capital Bikeshare XML file and the Bixi Montreal XML file are examples of combining both static and dynamic data in a single file (both use the Bixi public bike system). This might be more convenient in some cases, but for systems that frequently update their displays, it wastes a lot of bandwidth. This process could be made more efficient by using two files. JCDecaux, which manages many bikesharing systems in Europe, separates the static data from the dynamic real-time data. Denver‟s B-cycle doesn‟t seem to offer any data at all, though Denver‟s Open Data Catalog does offer a variety of formats for data about B-cycle Stations. I doubt this is the true, live, system data, because the coordinates are given as street addresses and not latitude and longitude coordinates.
  • 2. In addition to information needed by apps, we also need historic data in order to analyze how people use the system. The most common kind is system metrics, such as the type released by Bay Area Bikeshare. This typically shows ridership and membership totals, and is good for showing how the system has grown. It would be updated at the end of each day. Planners and analysts rely on two other types of historic data: trip history information shows every trip made within a certain period, and station history data shows the status of the stations within a certain period. The best example of the former is the Capital Bikeshare trip history data page, which releases a new data set every quarter. The latter is sometimes recorded by enthusiasts on their own initiative, such as the CaBi Tracker website. In San Francisco, Eric Fisherkeeps a daily log of Bay Area Bikeshare stats at trafficways.org/babs (I used his data in Probing Data from Bay Area Bikeshare). The trip history and station history files need a naming convention to reflect the content‟s date range. CaBi‟s largest quarterly file is 72.5MB, for the 572,919 trips in the 2nd quarter of 2012 (they have now started zipping the files). A filename format like trips-2012-3-1-to-20125-30.csv would work well. While the systems are expected to protect their customers‟ privacy by not including customer IDs, users should be able to download their own personal trip history files, and those files should use the same format as the main trip history files. Finally, there should be a standard way of summarizing general information about the entire system. Who provides the equipment, who runs the system, which jurisdictions participate, where the system is located and what its boundaries are, what the hours of operation are, what the operating season is, what the URL is and other contact info. And to really integrate all the various systems, we also could benefit from having the URL for a standard-size logo images, plus the systems‟s colors. This System information file should also include data found in a manifest file, namely, a list of all the associated open-data files. The system information should include definitions of available membership types. This might merit being listed as a separate table. Each membership type should include the cost and duration. We also need to know how long rides can be, and what the charges are for going beyond the time limit. For example, theCaBi pricing rules say
  • 3. rides are free for the first 30 minutes; going up to 30 minutes longer costs $2.00 for casual members (those with 1- or 3-day memberships) and $1.50 for subscribers. In contrast, the Citi Bike pricing rules say rides are free for the first 45 minutes; going up to 30 minutes longer costs $4.00 for those with 24-hour & 7-day passes, and $2.50 for those with annual memberships. This table summarizes the six types of bikesharing data: System information: general info Station information: a mostly-static list of all stations Station status: the number of available bikes and docks System metrics: membership and trip totals Trip history: every trip made during a given period Station history: a history of the station status list Here‟s how I would organize the files. I‟ll use ▶ to indicate a primary key (one that must be unique within the system), and ▷ to indicate a foreign key (one that references another table‟s primary key, and which must exist). The station information data is the information most likely to be shared by bikeshare systems. At the very least, it includes the latitude & longitude coordinates for every station, and the name. The file is fairly static, changing mostly when new stations are added. Here are the fields I would include, compared with CaBi (DC), Vélib (Paris), and Denver‟s B-cycle to see what names they use. Station information proposal CaBi Vélib B-cycle id, stationid ▶ number GLOBALID terminalName name name name STATION_NAME STATION_ADDRESS, address (not used) address ADDRESS_LINE1, ADDRESS_LINE2 (not region (not used) CITY, STATE used) (not zip (not used) ZIP used) lat lat latitude (not used) lng long longitude (not used) installed installDate (not (not used)
  • 4. removed removalDate public public capacity (not used) message (not used) used) (not used) (not used) (not used) (not used) (not used) (not used) NUM_DOCKS (not used) Most systems don‟t use a region field, but for multi-jurisdictional systems, it is important to know which jurisdiction manages each station. For example,Capital Bikeshare operates within DC, Montgomery County, Arlington, and Alexandria. Bay Area Bikeshare operates within San Francisco, Redwood City, Palo Alto, Mountain View, and San Jose. Nice Ride operates within Minneapolis and St Paul. Other systems could use this field to track which neighborhood the station is in. Vélib appends the postal code & city to the address field, but this would be better as a separate fields. For example, the Bastille Richard Lenoir station has an address of “2 BOULEVARD RICHARD LENOIR – 75011 PARIS”, but this should be just “2 BOULEVARD RICHARD LENOIR”, with a zip of “75011″ and a city of “Paris.” And there is no reason for Vélib to use all-uppercase letters. The data should be in the proper mixed-case (using French rules for capitalization), and programs can easily convert to uppercase if they wish. I would suggest a message field so systems can communicate that a station will be shutting down early, or moved to a new location. Or during snow storms, the rebalancing van might not be able to service a station. Denver has other fields that should be considered for a standard. “PROPERTY_TYPE” shows whether the station‟s location is Private or Public. This could be expanded to show exactly who the property owner or responsible agency is. “POWER_TYPE” has values of Solar Only, Wired Only, and Solar with Wire Backup. Cities often provide temporary stations. The station ID should correspond to a specific location. If a station returns to the same location for an annual event, it should re-use the old ID.
  • 5. The station status file should have the smallest amount of data needed to describe the current state of each station. This is the file that will be called most often, potentially thousands of times per minute, so every byte counts. And many people will be querying this data from mobile devices, another reason to keep the file size as small as possible. Here‟s how I would design the standard for this file, compared with CaBi (DC) and Denver‟s B-cycle to see what names they use. Because I couldn‟t find Denver‟s XML feed, I used CityBike„s Denver JSON feed. Station status proposal CaBi Denver B-cycle stationid ▷ id, terminalName id, idx bikes nbBikes bikes docks nbEmptyDocks free open locked (not used) time lastCommWithServer timestamp The bikes and docks numbers will generally add up to the capacity value in the station information file, but if there are nonfunctioning bikes or docks, the total could be smaller. The open field would be true or false. Sometimes stations are temporarily closed, perhaps because they have become inaccessible. The timevalue shows the last time the station communicated with the server. This is useful to determine if the data might no longer be accurate, such as during a power outage. Notice we don‟t duplicate any of the fields in the station information file, other than our foreign key, the stationid field. The trip history file also needs to be as compact as possible, not because people will be downloading it frequently, but because these files could be used to store millions of records. Trip history startdate startstation ▷ enddate endstation ▷ bikeid usertype The duration of each trip can be computed on-the-fly and doesn‟t need to be included in the file. The startstation and endstation values link up
  • 6. to the stationid field in the station information file. The usertype field describes the type of membership the rider has. Though few systems release trip history data on a regular basis, there have been occasions when systems have released data in support of a visualization contest. The Hubway Data Visualization Challenge took place in 2013, and included demographic data about the rider of each trip: residential zip code, year of birth, and sex. The Divvy Data Challenge (for Chicago) is currently underway; its data includes riders‟ year of birth and sex. The station history file should be a list of every change in status (available bikes and docks) for every station, listed in chronological order. In order to avoid having to repeat the state of the entire system when only a few stations have new values, the file should start with every station, and thereafter list a station only when it has changed. The initial value would be needed in order to compute the state of any later times recorded in the file. Station history stationid ▷ bikes docks open time The dominant data format nowadays is either XML or JSON. CSV is also a good choice, as long as the data fits in a tabular format, consisting of simple rows and columns. For CSV files, the order of fields should be consistent. The values of the fields are numeric, string, Boolean, and timestamp. Boolean is easily expressed as “true” or “false,” and Unix time is a common way of recording date and time. By publishing and standardizing bikesharing open data, developers and analysts can make it easier for the public to make use of and discover bikesharing systems across the globe, such as the Bike Share Map by Oliver O‟Brian. The vendors, operators, and managing jurisdictions should work together to create a standard that can be used by everyone.