Working With Data and Humans

•Transferir como PPTX, PDF•

1 gostou•648 visualizações

Daniel X. O'Neil

Tecnologia Educação

Me
• Daniel X. O’Neil
• Co-founder of EveryBlock
• 2007 Knight News Challenge
• Executive Director of Smart Chicago
Collaborative
• 2012 Knight Community Information
Challenge
@juggernautco

The Data Revolution
• I know about some, but not all of it
• Since about 2005
• Working with the Mayor’s Office in Chicago
• ChicagoWorksForYou.com
• Then at EveryBlock, where I was responsible
for data acquisition
@juggernautco

The Data Revolution
• 8 Principles of Open Government Data
• Independent Government Observers Task
Force
• POTUS Executive Orders on Inauguration Day
• Apps contests
• Municipal ordinances
• Socrata
• Code for America
@juggernautco

There’s Data and There’s Humans
• Talk to me about your data and your humans
in your projects
@juggernautco

Data
• Dense
• Sits by itself
• Not social
• Not self-aware
• Unable to contextualize itself
• Does not have any problems, because it
doesn’t care about anything
@juggernautco

People
• Naturally social
• Soft
• Have problems
• See everything in context
• Prone to mistakes
@juggernautco

Value from data
• Know more than anyone
• Surfacing from the hidden Web
• Context, context, context
• Even if it is just one data set mashed against
another data set
• Did it rain * Did property crime go up or down
• Foreclosures * Retail stores
• Also: the simple act of aggregation + text
@juggernautco

Ten Databases
• Building permits
• Business licenses
• Historic preservation list
• Sanborn maps (1929 and 1950)
• County assessor
• County recorder of deeds
• Original photography
• Google search for news coverage
• New York Times archive
• Walgreens surplus property
@juggernautco

We need a machine.
• A generic context engine
• To evenly distribute information
• And tell me what the information
means
• I know: that sounds like a
“reporter”
• But people used to think that
“search engine” sounded a lot like
“librarian”, too
• We need humans and machines
@juggernautco

It’s easy.
• Find dataset
• Review dataset
• Describe what the data means
• Find another dataset
• Describe what the other dataset
means
• Describe what the first dataset means
in the context of the second dataset
• Repeat
• Let’s do this thing.
@juggernautco

Call any time.
• @juggernautco
• (773) 960-6045
@juggernautco

Mais conteúdo relacionado

Destaque

The Chicago Police Department’s Information Collection for Automated Mapping ...Daniel X. O'Neil

Civic Data and Open Government: How Local Funders Can Get InvolvedDaniel X. O'Neil

2013 Carl Sandburg Literary Awards Dinner Author BiosDaniel X. O'Neil

A New Era of Responsibility: Renewing America’s PromiseDaniel X. O'Neil

Contract between Platinum-Poolcare Aquatech and the City of ChicagoDaniel X. O'Neil

Civic Summer Business IdeasDaniel X. O'Neil

Contact 4339: Rental And Placement Of Traffic Control Devices Chicago Departm...Daniel X. O'Neil

Affordable Internet Options (Launch of FreedomPop in Chicago)Daniel X. O'Neil

The CUTGroup BookDaniel X. O'Neil

Proposed American Reinvestment and Recovery Act ProjectsDaniel X. O'Neil

Road to Government 2.0: Technological Problems and Solutions for Transparency...Daniel X. O'Neil

Request for Proposal (RFP) No. 1390-13069 for Cook County Website Implementat...Daniel X. O'Neil

2009 - 2013 Affordable Housing Plan: Keeping Chicago’s neighborhoods affordab...Daniel X. O'Neil

10 Web 2.0 Ideas to Keep Your Intranet FreshDaniel X. O'Neil

Portland Regional Arts & Culture Council: Rpeort to CommunityDaniel X. O'Neil

Chicago Green Alley Handbook, 2010Daniel X. O'Neil

Destaque (16)

The Chicago Police Department’s Information Collection for Automated Mapping ...

Civic Data and Open Government: How Local Funders Can Get Involved

2013 Carl Sandburg Literary Awards Dinner Author Bios

A New Era of Responsibility: Renewing America’s Promise

Contract between Platinum-Poolcare Aquatech and the City of Chicago

Civic Summer Business Ideas

Contact 4339: Rental And Placement Of Traffic Control Devices Chicago Departm...

Affordable Internet Options (Launch of FreedomPop in Chicago)

The CUTGroup Book

Proposed American Reinvestment and Recovery Act Projects

Road to Government 2.0: Technological Problems and Solutions for Transparency...

Request for Proposal (RFP) No. 1390-13069 for Cook County Website Implementat...

2009 - 2013 Affordable Housing Plan: Keeping Chicago’s neighborhoods affordab...

10 Web 2.0 Ideas to Keep Your Intranet Fresh

Portland Regional Arts & Culture Council: Rpeort to Community

Chicago Green Alley Handbook, 2010

Semelhante a Working With Data and Humans

Turning Data Into NarrativeDaniel X. O'Neil

DXO On Big Data, Open Data, and the Perils of “Democracy by Spreadsheet” Daniel X. O'Neil

Designing for Digital 2017Cyd Harrell

Introducing The Visual OrganizationPhil Simon

Yay for DSSG!Daniel X. O'Neil

Postmortems orientados por dados - DEV201 - Sao Paulo SummitAmazon Web Services

Data Driven Postmortems - DEV201 - Sao Paulo SummitAmazon Web Services

Data journalismGlyn Mottershead

DXO: Smart Chicago, You, Power, Poetry, and PicturesDaniel X. O'Neil

Content Creation For Boring & Regulated Industries - PubCon Presentation 2012Nico Miceli

Intro open data hackdaygueste2d87d8

open data hackday introOpen Knowledge Foundation

Intro open data hackdayOpen Data Network

Tim willoughby - Presentation to Open IrelandTim Willoughby

Intro open data hackdaygueste2d87d8

Structured data: Where did that come from & why are Google asking for itRichard Wallis

Dxo mobile-dev-dayDaniel X. O'Neil

Big DataWeb Science Institute

Infoactive Hacks/Hackers presentationRoberto Rocha

The Digital Revolution Keeps on Giving (and Takig)Robin Raskin

Semelhante a Working With Data and Humans (20)

Turning Data Into Narrative

DXO On Big Data, Open Data, and the Perils of “Democracy by Spreadsheet”

Designing for Digital 2017

Introducing The Visual Organization

Yay for DSSG!

Postmortems orientados por dados - DEV201 - Sao Paulo Summit

Data Driven Postmortems - DEV201 - Sao Paulo Summit

Data journalism

DXO: Smart Chicago, You, Power, Poetry, and Pictures

Content Creation For Boring & Regulated Industries - PubCon Presentation 2012

Intro open data hackday

open data hackday intro

Intro open data hackday

Tim willoughby - Presentation to Open Ireland

Intro open data hackday

Structured data: Where did that come from & why are Google asking for it

Dxo mobile-dev-day

Big Data

Infoactive Hacks/Hackers presentation

The Digital Revolution Keeps on Giving (and Takig)

Mais de Daniel X. O'Neil

Widening your apertureDaniel X. O'Neil

Chicago Community Trust 2014 Annual ReportDaniel X. O'Neil

DHS Motion to Dismiss Protests in B-414175Daniel X. O'Neil

City of Chicago Tech Plan 18 Month Progress UpdateDaniel X. O'Neil

DXO Youth-Led Tech, July 2016Daniel X. O'Neil

Open Data: Roots, Impact, and PromiseDaniel X. O'Neil

Federal it-cost-commission-report accelerating-the mission-july 21.2016 Daniel X. O'Neil

CITY OF CHICAGO Office of Inspector General Audit and Program Review Section ...Daniel X. O'Neil

The Promise of People in Civic TechDaniel X. O'Neil

Madonna youthDaniel X. O'Neil

World chicago-italyDaniel X. O'Neil

GIS Data Sharing Policies & Procedures of the City of Chicago Department of I...Daniel X. O'Neil

The Chicago Police Department’s Information Collection for Automated Mapping...Daniel X. O'Neil

GIS!Daniel X. O'Neil

The Smart Chicago Model, Daniel X. O’Neil, Gigabit City Summit, January 2015Daniel X. O'Neil

Community Based Broadband Report by Executive Office of the PresidentDaniel X. O'Neil

Ordinance renaming plaza where Old Chicago Water Tower structure is located a...Daniel X. O'Neil

Ordinance renaming grand ballroom at Navy Pier as "Jane M. Byrne Grand Ballroom"Daniel X. O'Neil

Ordinance renaming International terminal at Chicago O'Hare International Air...Daniel X. O'Neil

DePaul College Prep Steam LabDaniel X. O'Neil

Mais de Daniel X. O'Neil (20)

Widening your aperture

Chicago Community Trust 2014 Annual Report

DHS Motion to Dismiss Protests in B-414175

City of Chicago Tech Plan 18 Month Progress Update

DXO Youth-Led Tech, July 2016

Open Data: Roots, Impact, and Promise

Federal it-cost-commission-report accelerating-the mission-july 21.2016

CITY OF CHICAGO Office of Inspector General Audit and Program Review Section ...

The Promise of People in Civic Tech

Madonna youth

World chicago-italy

GIS Data Sharing Policies & Procedures of the City of Chicago Department of I...

The Chicago Police Department’s Information Collection for Automated Mapping...

GIS!

The Smart Chicago Model, Daniel X. O’Neil, Gigabit City Summit, January 2015

Community Based Broadband Report by Executive Office of the President

Ordinance renaming plaza where Old Chicago Water Tower structure is located a...

Ordinance renaming grand ballroom at Navy Pier as "Jane M. Byrne Grand Ballroom"

Ordinance renaming International terminal at Chicago O'Hare International Air...

DePaul College Prep Steam Lab

Último

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz

Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer

Gen AI in Business - Global Trends Report 2024.pdfAddepto

Story boards and shot lists for my a level piececharlottematthew16

Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm

AI as an Interface for Commercial BuildingsMemoori

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang

Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxnull - The Open Security Community

Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed

Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

Working With Data and Humans

1. Working With Data and Humans Daniel X. O’Neil @juggernautco

2. Me • Daniel X. O’Neil • Co-founder of EveryBlock • 2007 Knight News Challenge • Executive Director of Smart Chicago Collaborative • 2012 Knight Community Information Challenge @juggernautco

3. The Data Revolution • I know about some, but not all of it • Since about 2005 • Working with the Mayor’s Office in Chicago • ChicagoWorksForYou.com • Then at EveryBlock, where I was responsible for data acquisition @juggernautco

6. The Data Revolution • 8 Principles of Open Government Data • Independent Government Observers Task Force • POTUS Executive Orders on Inauguration Day • Apps contests • Municipal ordinances • Socrata • Code for America @juggernautco

8. There’s Data and There’s Humans • Talk to me about your data and your humans in your projects @juggernautco

9. Data • Dense • Sits by itself • Not social • Not self-aware • Unable to contextualize itself • Does not have any problems, because it doesn’t care about anything @juggernautco

10. People • Naturally social • Soft • Have problems • See everything in context • Prone to mistakes @juggernautco

11. People make data

16. Value from data • Know more than anyone • Surfacing from the hidden Web • Context, context, context • Even if it is just one data set mashed against another data set • Did it rain * Did property crime go up or down • Foreclosures * Retail stores • Also: the simple act of aggregation + text @juggernautco

17. @juggernautco

18. Ten Databases • Building permits • Business licenses • Historic preservation list • Sanborn maps (1929 and 1950) • County assessor • County recorder of deeds • Original photography • Google search for news coverage • New York Times archive • Walgreens surplus property @juggernautco

19. We need a machine. • A generic context engine • To evenly distribute information • And tell me what the information means • I know: that sounds like a “reporter” • But people used to think that “search engine” sounded a lot like “librarian”, too • We need humans and machines @juggernautco

20. It’s easy. • Find dataset • Review dataset • Describe what the data means • Find another dataset • Describe what the other dataset means • Describe what the first dataset means in the context of the second dataset • Repeat • Let’s do this thing. @juggernautco

21. @juggernautco

22. Dedicated databases work

23.

24. Call any time. • @juggernautco • (773) 960-6045 @juggernautco

Notas do Editor

I’m Dan O’Neil, and I run the Smart Chicago Collaborative, an organization devoted to improving lives in Chicago through technology. Among other things, I work with Chicago city government, developers, and community groups to use civic data in new and useful ways. As a co-founder of EveryBlock, I’m also a previous Knight News Challenge granteeI certainly wouldn’t be doing any of this today if it weren’t for the vision of the Knight Foundation.
Let’s do a level-set
Let’s do a level-set
Explain EveryBlock
Explain EveryBlock
Let’s do a level-set
Explain EveryBlock
Let’s do a level-set
Data has certain characteristics
People have certain characteristics
Something to keep in mind– this data is generated and maintained by humans.
And if you use the default search for crime records, you get this screen.It has records going back to 2005.You fill out the form and you get your answers back.Pretty typical experience.
What you wouldn’t be able to tell, unless you searched the Dallas Police Web site more deeply, is this.The Dallas Police publishes an amazing cache of crime data in flat files.All of it, with no search, no letters or emails, going back 12 years.Why anyone would make any FOIA request– or why the Dallas Police would want anyone to do that– is beyond me.And this data has some of the most amazing crime details– the police narrative– that you can find in crime data anywhere.This is hidden in plain sight.
Lastly, I highly recommend the Data Journalism Handbook, which was created, in part, by many people in this room.It’s a really excellent resource.
Data is often more structured than you think.Over the weekend I participated in the Knight-Mozilla-MIT "Story & Algorithm" Hack Day run by Dan Sinker.I met a couple of Boston developers and we executed on a project I’ve had for about 7 years.Like many of you here, I’m not smart enough to actually make things, so I have to rely on the kindness of developers.What we made was “Condition of Anonymity”– a Web site that automatically pulls the reason that anonymity was granted to an anonymous source by a reporter for the New York Times.We often think about data as the stuff inside spreadsheets and published in flat files to FTP servers, but there is a whole world of semi-structured data like this hidden in plain sight, inside plain text.We used the NYT Search API to review every article in the NYT back to January 1, 2000 for the phrase, “condition of anonymity”, then used a natural language processing toolkit to find what I call the “because clauses”.There’s some gold in there.It takes an abundance of data types to tell a story.This story feels like a Walt Whitman poem to me.
The analysis is where it’s at.The most amazing insight I can share is that data is boring.I’ve had a long time to consider why that is true, and I think I have the answer.The reason is because people are boring.We forget that data is made by people.And most people are boring most of the timeEvery object should have a page on the Internet (so let’s get to work)
Here’s kind of a master example.I live near this building.It was been empty for a very long time.Then construction started.The construction was heralded by a building permit.But, of course, the building permit was boring.So I looked further.
I searched ten different databases and lo and beyhold, more data made it less boring.Why? Because almost all people are interesting some of the time.So if you look hard enough, you’ll find those stories.I found a business license for a 3-day pop-up store.So this place has been empty for decades, but was open for three days.And I missed it.It used to be a bank, and in 1937 I found out that– from the NYT archive, in PDF format– the hidden Web– that there was a bank run at this location in 1937.Again, not boring.
This machine can be described as a generic context engineTo evenly distribute informationAnd tell me what the information meansI know: that sounds like a “reporter”But people used to think that “search engine” sounded a lot like “librarian”, tooWe need humans and machines
Find datasetReview datasetDescribe what the data meansFind another datasetDescribe what the other dataset meansDescribe what the first dataset means in the context of the second datasetRepeatLet’s do this thing.
Here’s an example of two things:Finding data in unstructured text and finding interesting data.This is an Advanced Search in Google for the word “jimmied” in the Dallas crime data published by EveryBlock.So that site becomes a public, searchable instance of a previously hidden data set.Apparently police have used the word “jimmied” to describe an action taken by suspected criminals 2,430 times.All sorts of things are jimmied, apparently.It’s not boring.
We’ve noticed that custom applications created with dedicated budgets are reliably updated.Getting more connectivity between these established projects and the newer, open data projects is key.
Explain EveryBlock
Hi.

Working With Data and Humans

Recomendados

Recomendados

Mais conteúdo relacionado

Destaque

Destaque (16)

Semelhante a Working With Data and Humans

Semelhante a Working With Data and Humans (20)

Mais de Daniel X. O'Neil

Mais de Daniel X. O'Neil (20)

Último

Último (20)

Working With Data and Humans

Notas do Editor