1. WHAT TO EXPECT
WHEN YOU ARE
VISUALIZING
Krist Wongsuphasawat / @kristw
Based on true stories
Forever querying
Never-ending cleaning
Hopelessly prototyping
Last minute coding
and many more…
5. (P.S. These are actually not my robots, but our competitors’.)
Krist Wongsuphasawat / @kristw
Computer Engineer
Bangkok, Thailand
6. Krist Wongsuphasawat / @kristw
Computer Engineer
Bangkok, Thailand
PhD in Computer Science
Information Visualization
Univ. of Maryland
7. Krist Wongsuphasawat / @kristw
Computer Engineer
Bangkok, Thailand
IBM
Microsoft
PhD in Computer Science
Information Visualization
Univ. of Maryland
8. PhD in Computer Science
Information Visualization
Univ. of Maryland
IBM
Microsoft
Data Visualization Scientist
Twitter
Krist Wongsuphasawat / @kristw
Computer Engineer
Bangkok, Thailand
28. DATA SOURCES
Open data
Publicly available
Internal data
Private, owned by clients’ organization
Self-collected data
Manual, site scraping, etc.
Combine the above
29. MANY FORMS OF DATA
Standalone files
txt, csv, tsv, json, Google Docs, …, pdf*
APIs
better quality with more overhead
Databases
doesn’t necessary mean they are organized
Big data
bigger pain
32. CHALLENGES
Get relevant Tweets
hashtag: #oscars
keywords: “spotlight” (movie name)
Too big
Need to aggregate & reduce size
Slow
Long processing time (hours)
36. Pig / Scalding (slow)
GETTING BIG DATA
Hadoop Cluster
Data Storage
Tool
Your laptop Smaller dataset
37. Hadoop Cluster
Pig / Scalding (slow)
Data Storage
Tool
Final dataset
Tool node.js / python / excel (fast)
Your laptop
GETTING BIG DATA
Smaller dataset
38. CLEANING
Data come in different formats.
tsv to json
Quality of data collection.
null, missing data, typos, timestamp
Filter
Remove unnecessary data
Conversion
Change country code from 3-letter (USA) to 2-letter (US)
Correct time of day based on users’ timezone
Convert lat/lon to county
etc.
45. WHY?
Definition of “clean” depends on the task.
e.g. Restaurant reviews
Data issue can present itself anytime.
in the project timeline
It takes time to process data.
Run. Wait… Oops! Re-run. Wait…
46. RECOMMENDATIONS
Always think that you will have to do it again
document the process, automation
Reusable scripts
break a gigantic do-it-all function into smaller ones
Reusable data
keep for future project
53. WHAT TO EXPECT
timely
Deadline is strict. Also can be unexpected events.
wide audience
easy to explain and understand, multi-device support
one-off projects
content screening
62. While humans are busy killing each other,
ice zombies “White walkers” are invading from the North.
The only group who seems to care about this
is neutral group called the Night’s Watch.
63. HBO’s Game of Thrones
Based on a book series “A Song of Ice and Fire”
Medieval Fantasy. Knights, magic and dragons.
Many characters.
Anybody can die.
6 seasons (60 episodes) so far
Multiple storylines in each episode
72. Sample data
Character Count
Hodor 10000
Jon Snow 5000
Daenerys 4000
Bran Stark 3000
… …
*These numbers are made up for presentation, not real data.
73. When you play the game of vis,
you iterate or you die.
CHAPTER III
75. + episodes
The Guardian & Google Trends
http://www.theguardian.com/news/datablog/ng-interactive/2016/apr/22/game-of-thrones-the-most-googled-characters-episode-by-episode
80. Sample data
Character Count
Jon Snow+Sansa 1000
Tormund+Brienne 500
Bran Stark+Hodor 300
… …
Character Count
Hodor 10000
Jon Snow 5000
Daenerys 4000
… …
INDIVIDUALS CONNECTIONS
+ top emojis + top emojis
*These numbers are made up for presentation, not real data.
81. Graph
NODES LINKS
+ top emojis + top emojis
Character Count
Jon Snow+Sansa 1000
Tormund+Brienne 500
Bran Stark+Hodor 300
… …
Character Count
Hodor 1000
Jon Snow 500
Daenerys 400
… …
*These numbers are made up for presentation, not real data.
106. A#er switching episode
1. Store old positions for existing objects.
2. Assign new initial positions.*
3. Run simulation without updating <svg> for n rounds
4. Animate objects from old to new positions.
5. Resume simulation and update <svg> every tick.
113. Colors
Default: d3.category10()
Distinct but nothing about the context
Custom palette
Colors related to the groups/houses.
Black = Night’s Watch
Blue = North
Red = Daenerys
Gold = Lannister
…
135. WHAT TO EXPECT
richer, more features
to support exploration of complex data
more technical audience
product managers, engineers, data scientists
accuracy
designed for dynamic input
long-term projects
167. REFINE & POLISH
UX / UI
Color
Animation
Mobile support
Performance
Loading time, Data file size
“The little of visualisation design” by Andy Kirk
http://www.visualisingdata.com/2016/03/little-visualisation-design/
177. EXPECT
1) potential mismatches
2) different requirements
3) to clean data
4) to clean data a lot
5) to try and break things
Krist Wongsuphasawat / @kristw
kristw.yellowpigz.com
6) to iterate until it works
7) deadline
8) to refine and polish
9) to get feedback
10) to improve
179. Nicolas Garcia Belmonte, Robert Harris, Miguel Rios,
Simon Rogers, Jimmy Lin, Linus Lee, Chuang Liu,
and many colleagues at Twitter.
ACKNOWLEDGEMENT
180. RESOURCES
Images
Banana phone http://goo.gl/GmcMPq
Bar chart https://goo.gl/1G1GBg
Boss https://goo.gl/gcY8Kw
Champions League http://goo.gl/DjtNKE
Database http://goo.gl/5N7zZz
Fishing shark http://goo.gl/2fp4zW
Globe visualization http://goo.gl/UiGMMj
Harry Potter http://goo.gl/Q9Cy64
Holding phone http://goo.gl/It2TzH
Kiwi orange http://goo.gl/ejQ73y
Kiwi http://goo.gl/9yk7o5
Library https://goo.gl/HVeE6h
Library earthquake http://goo.gl/rBqBrs
Minion http://goo.gl/I19Ijg
NBA http://goo.gl/p7HBdG
NFL http://goo.gl/feQMZs
Orange & Apple http://goo.gl/NG6RIL
Pile of paper http://goo.gl/mGLQTx
Premier League http://goo.gl/AqIINO
Scrooge McDuck https://goo.gl/aKv8D7
The Sound of Music https://goo.gl/dqHlzj
Trash pile http://goo.gl/OsFfo3
Tyrion http://goo.gl/WaBonl
Watercolor Map by Stamen Design