This deck shows updated poster panels from "My Little Data in a Big Data World," presented at PYData NYC 2015. It describes an early process of retrieving personal data, securing and exploring it, and identifying the tools that will allow people to start cultivating and/or preserving their data identities with code.
2. !
!
!
!
!
!
!
You share on social media. You use email. Big data machines “think”
they know you once they have analyzed some of your patterns. Now
imagine “writing” a deliberate data story.
!
This poster describes an early process of
!
1) retrieving personal data,
!
2) securing and exploring it, and
!
3) identifying the tools that will allow individuals to start cultivating and/or
preserving their data identities with code.
Description
3. 3
Considerations at Each
Step:
Does it expose
personal data or
thought processes
to the web? To a
platform that does
not already have
the data?
Friction for beginners –
i.e. difficulty,
information available
online for someone
with limited knowledge
of jargon.
4. More on Friction
Most
documentation
describes how to
pull information
from distributed
API's.
Required
precautions might
delay beginners
and ultimately lose
them by seeming
(or being) too far
from their project
goals.
6. 6
USB 4 gigabytes
Did not encrypt USB drive but will
consider it in the next iteration to
discover if / how that changes the
process
Storage and Memory
Management
7. 7
Getting Your Data
Some social media sites have a page
where you can request a download file of
your data. I chose to use Twitter and found
my request link here: https://twitter.com/
settings/account
Timing: Sometimes the packaging and
prep of your data download can take
several minutes, several hours, or a few
days. The email alert that the data was
ready for download took less than two
minutes to arrive this time.
8. 8
Twitter Data Characteristics
Personal information that is / was already
consumable by the public.
!
Delivered as ZIP file with Twitter's
encryption while in transit.
!
Twitter archive file, which had a numerical
name, included the a tweets.csv file, which
matched the data type in Bokeh's example.
!
Each word in a tweet had its own column,
which made counting easier.
!
9. 9
The data I downloaded appeared
in a folder once I unzipped the file.
10. Grailbird.data.tweets_2008_12 =
[ {
"source" : "u003Ca href="http://twitter.com" rel="nofollow"u003ETwitter Web Clientu003C/au003E",
"entities" : {
"user_mentions" : [ ],
"media" : [ ],
"hashtags" : [ ],
"urls" : [ ]
},
"geo" : { },
"id_str" : "1064099455",
"text" : "I need cookies.",
"id" : 1064099455,
"created_at" : "2008-12-18 00:00:00 +0000",
"user" : {
"name" : "Dida Lakes",
"screen_name" : "dihaynes",
"protected" : false,
"id_str" : "18206386",
"profile_image_url_https" : "https://pbs.twimg.com/profile_images/654911288917659648/QKsP0wHR_normal.jpg",
"id" : 18206386,
This is a JSON file of my first tweet!
11. 11
!
Handled a .CSV file with > 7k tweets
Adequately displayed data
Sorting and counting tools did not require code
!
!
!
!
Problem: Dependencies
Solution: Anaconda
Understanding Data: OpenOffice Calc
as a Problem-Solving Tool
Not having the right software and configurations for
the new software you are installing causes errors.
Anaconda resolves a lot of those problems and is
recommended in the Bokeh documentation
12. From @dihaynes on Twitter
(sans cleansing) via Jason Davies.
You can explore more word cloud generators at
http://worditout.com/ and http://tagcrowd.com.
Word Cloud:
Visual Alternative to Calc
13. 13
Data that matched my
interest in employment
data for future projects
Learning opportunities via
dependencies that I could
use to interact with tools that
had previously posed too
much friction
It had a
presentation format
that would allow for
quick interactions
“Employment Sample”
visualization had colors
that resonated with me
Data Visualization Tool: Bokeh
14. Data visualization from Bokeh sample.
http://bokeh.pydata.org/en/latest/docs/gallery/unemployment.html
Visual Inspiration
15. 15
What is the role of
data science in
society?
What would you add
to the story of this
project?
What are some
moments when
data science and
storytelling are at
odds? When are
they not?
What questions do
you still have about
the data? About the
process?
Questions for Discussion?