Urban big data are increasingly being used in urban GIS research. This presentation, developed for an introductory GIS course, introduces the seven V's of big data: volume, variety, velocity, veracity, value, visualization, and variability. Major sources of urban big data include sensor systems, user-generated content, administrative systems, private sector data, humanities data, and hybrid data. These categories and the seven V's are illustrated with six examples of cutting-edge urban big data research: the Array of Things sensing network, Austin bike data, analysis of London's transport network, scraping Craigslist to analyze rental housing, the Mapping Inequality Project, and the EPA Smart Locations Database. The presentation concludes with a discussion of critiques surrounding the use of big data.
Call Girls Chakan Call Me 7737669865 Budget Friendly No Advance Booking
Intro to Big Data in Urban GIS Research
1. Introduction to Big Data
in Urban GIS Research
December 13, 2016
Intro to GIS – UP 506 – Fall 2016
Robert Goodspeed
Assistant Professor of Urban Planning
rgoodspe@umich.edu
2. Seven V’s of Big Data
Source: Al Hero and Brian Athey, MIDAS Overview, 6 October 2015,
Future of Data Science Conference
New Additions:
• Value
• Visualization
• Variability
3. Types of Urban Big Data
Urban Big Data Who? Example “V(s)” Illustrated
Sensor systems Public and private
utility and service
operators;
building/infrastructure
managers
Array of Things,
Chicago
Velocity, Variability
User-generated
content
Various; usually
private sector
systems
Austin Bike Data
Example
Veracity
Administrative
systems
Governments &
public vendors
Oyster Data,
Transport for London
Volume, Visualization
Private sector data Various Craigslist Rental
Listings Analysis
Value
Arts and Humanities
Data
Digital humanities
organizations
Mapping Inequality
Project
Variety
Hybrid data Data intermediaries Smart Locations
Database, EPA
Veracity
Categories from: Thakuriah, Piyushimita Vonu, Nebiyou Tilahun, Moira Zellner, P
Thakuriah, N Tilahun, and M Zellner. 2015. "Big Data and Urban Informatics: Innovations
and Challenges to Urban Planning and Knowledge Discovery." Proc. NSF Workshop on
Big Data and Urban Informatics.
4. Sensor Data – “Array of Things” Project, Chicago
Diagram:
https://arrayofthings.github.io/
Photo: Computation Institute,
UChicago;
Velocity, Variability (changing
sensors)
5. User-generated Data – Austin Bike Data
• Cycling data collection app
• Created by the City/County of
San Francisco
• Public sector control of data
• Used for surveys (random)
and for self-selected data
collection
• Fitness data tracking app &
social network
• Private company; sells data
• Data come from voluntary
users of app
CycleTracks Strava
6. User Generated Data – CycleTracks Analysis
Created to allow for the rigorous analysis & planning of bicycle infrastructure:
http://www.sfcta.org/modeling-and-travel-forecasting/cycletracks-iphone-and-
android
What were the characteristics of chosen
routes, vs. other possible routes?
Changes in bike accessibility due to
planned bicycle facilities, using calibrated
model
7. User Generated Data – Strava
Note use of private
algorithms to infer trip
types
8. User-generated data - What was learned?
Four Data Sources:
Magnetic loop, pneumatic
tube, GPS survey, Strava
• Two physical recorders
include all traffic (with
error)
• GPS Survey:
• A known sample of
an unknown
population
• Strava
• An unknown
sample of an
unknown population
Can we see the elephant yet?
Griffin, Greg Phillip, and Junfeng Jiao. 2014. "Crowdsourcing Bicycle Volumes: Exploring the Role
of Volunteered Geographic Information and Established Monitoring Methods." Griffin, GP, & Jiao,
J.(in press). Crowdsourcing Bicycle Volumes: Exploring the role of volunteered geographic
information and established monitoring methods. URISA Journal 27 (1).
Veracity – Which data provide best picture?
9. Administrative Data – Oyster Card, London
Photo: Engadget (https://www.engadget.com/2014/09/16/contactless-card-nfc-payments-london-
tube/)
Batty, M. and J. Reades, “Dynamics of Urban movements: Changes in the scaling of hubs in the
London rail network” http://www.complexcity.info/files/2011/08/BATTY-Strathclyde-Networks-
2011.pdf
Analysis of 1 day – 6.24 M swipes
Looks cool! What can you do?
• Analyze network structure
• Look for anomalies
• Other…? (no origins & destinations, or rider details) Volume, Visualization
10. Private Sector Data – Craigslist Rental Listings
Access: Automated analysis of websites (scraping), internal provision,
application programming interface (API)
11. Private Sector, Con’t - Need for processing
Boeing, Geoff, and Paul Waddell. 2016. "New Insights into Rental Housing Markets across
the United States Web Scraping and Analyzing Craigslist Rental Listings." Journal of
Planning Education and Research:0739456X16664789.
12. Figure 1. Map of the 1.5 million rental listings in the contiguous United States in our
geolocated data set.1.
Boeing, Geoff, and Paul Waddell. 2016. "New Insights into Rental Housing Markets across
the United States Web Scraping and Analyzing Craigslist Rental Listings." Journal of
Planning Education and Research:0739456X16664789.
Value of data yet to
be illustrated.
13. Arts and Humanities Data – Mapping Inequality Project
Background
• Private mortgage market in
America made possible by
public guarantees
• The Home Owners’ Loan
Corporate created “Residential
Security” maps in 30s & 40s
which marked black and
integrated areas as most risky;
effect was to limit mortgage
lending available in those areas
• Led to the Home Mortgage
Disclosure Act of 1975 &
Community Reinvestment Act,
laws which reveal where
mortgages are given (& to
whom), and encourage bank
investment in urban areas
Mapping Inequality
• Digitize & georeferenced maps
for cities nationwide
• Polygons available!
Access the map: https://dsl.richmond.edu/panorama/redlining/
More on the project: http://www.npr.org/sections/thetwo-way/2016/10/19/498536077/interactive-redlining-
map-zooms-in-on-americas-history-of-discrimination
Variety of data forms
14. Hybrid Data – EPA Smart Locations Database
• US Government Creator, Full Data Access & Detailed Documentation
• Hundreds of variables, computed from various public and private datasets
• Spatial variability, e.g., only some regions transit systems in GTFS format
• Hope they did it right! Illustrates veracity concerns with complex data.
Source: https://www.epa.gov/smartgrowth/smart-location-mapping
15. Critical Voices
Big data cannot replace government censuses: (Shearmur, 2015)
• Big Data typically describes users and markets, not populations
• Most data do not link variety of attributes (e.g., linking individuals to
households, neighborhoods, jobs)
Data alone are insufficient for understanding: (boyd and Crawford, 2012)
• Structures of data systems introduces biases, “the concepts and definitions
that structure Big Data are rarely what researchers need” (Shearmur, 2015)
• It’s easy to see patterns where none exist
• Data requires context for understanding
Unequal access to big data creates new digital divides (boyd and Crawford, 2012)
boyd, danah, and Kate Crawford. 2012. "Critical Questions For Big Data." Information, Communication
& Society 15 (5):662-679. doi: 10.1080/1369118X.2012.678878.
Shearmur, Richard. 2015. "Dazzled by data: Big Data, the census and urban geography." Urban
Geography 36 (7):965-968. doi: 10.1080/02723638.2015.1050922.
16. Look what we did! Oh, you want the data..
MIT’s Senseable City Lab
projects frequently analyze
proprietary corporate
datasets. (AT&T Calling Data
Shown)
Tech Firms Hire Researchers to Analyze
their Own Data
Trulia (Left); Uber (Right)
https://eng.uber.com/data-viz-intel/
https://www.trulia.com/blog/trends/low-income-housing/
17. … vs. the Emerging Open Science Paradigm
On Open Science: OECD (2015), “Making Open Science a Reality”, OECD Science
Technology and Industry Policy Papers, No. 25, OECD Publishing,
Paris. http://dx.doi.org/10.1787/5jrs2f963zs1-en
Image: http://www.sci-gaia.eu/osp/
18. Be a Force for (Big Data) Good
• Proactively consider ethical issues surrounding data,
including privacy, biases, and the potential for harm
• When appropriate, support open data initiatives and
efforts to “democratize data” especially for public sector
or scientific data
• When working as an analyst, pursue the greatest degree
of professional responsibility for the accuracy and
interpretation of your work
19. Thank You!
This presentation was developed for
the Fall 2016 offering of Intro. to GIS (UP 506)
Robert Goodspeed
rgoodspe@umich.edu
@RGoodspeed