SlideShare uma empresa Scribd logo
1 de 26
Baixar para ler offline
Data Design
                                                           2114.409: Creative Research Practice




HTTP://WWW.FLICKR.COM/PHOTOS/SERGIU_BACIOIU/4370021957/
Reflection
Status Check



Concerns

 Programming

 What can we build




                     HTTP://WWW.FLICKR.COM/PHOTOS/FLOWER87/76719859/
Course Outline
1. Foundations                 3. Prototyping
Introduction                   Crawling
Survey Methods / Data Mining   Text Mining
Visualization and Analysis     To be determined (TBD)
Social Mechanics               Project Update




2. Methods                     4. Refinement
Creativity and Brainstorming   TBD x3
Prototyping                    Project Presentations
Project Management             Reflection
Last Week: Building Blocks
    Clustering



   Classification
   & Regression


   Association
     Rules


     Outlier
    Detection
                   HTTP://WWW.FLICKR.COM/PHOTOS/OGIMOGI/2253657555/
This Week: Systems




HTTPS://WWW.FACEBOOK.COM/PHOTO.PHP?FBID=407391545956901&SET=A.407391429290246.110679.100000581776191&TYPE=3&THEATER
Data Mining Overview
How do I see and
                        Visualization, Storytelling
communicate answers?


What questions should
                        Design, Data Exploration
I ask of the data?

How do I clean and
                        Analysis Techniques
process the data?

How do I gather
                        Crawling, Surveys, UX Design
meaningful data?
Why might we prefer analysis?

         LABOR                       ACCURACY
Too many pictures to look at.   Can test for statistical
                                significance, etc.
Don’t know which are
interesting.                    Some patterns don’t
                                visualize easily.




                                         HTTP://WWW.FLICKR.COM/PHOTOS/STRIATIC/2144933705/
Clustering
Find natural
groupings in
the data



Organize data into classes:

‣ high intra-class similarity
‣ low inter-class similarity
Clustering
         Input Data                  Output Clusters



  Points                                           Hard
                                              OR



    OR




                                       Soft
Similarities                                  OR




         [ # of clusters ]              Hierarchical
Classification               Regression




Learn to map objects to   Learn map objects to
categories                continuous variables
Classification
Observations    X   Learn         f(x) = y
Labels          Y
                     Y = gender


 Male




Female
                                       X = height
The Whole Process
                     Data Set
                                Featurization



                   Featurized

                  Random Split (e.g. 90/10)



Training Data                                   Test Data
       Training



   Model
                          Evaluation




                      Results
Association Rules
Learn interesting
relations in the data




                        = proportion of events in which X occurs
Anomaly Detection

          Detect strange
          events in the data


            Simplest measure:
What Can
                                                  We Build?




HTTP://WWW.FLICKR.COM/PHOTOS/BPENDE/6736531173/
Collective Intelligence
Clicks,)      Likes,)      Updates,)   Ar,cles,)
Scrolls,)     Links,)      Reviews,)   Images,)
 Time)       Checkins)    Comments)     Video)




                   Collec,ve)            How can we harness the
                  Intelligence)
                                         activities of the world’s digital
                                         citizens to build new and
                                         useful consumer services?


                  Community)
Politics




The Korean elections are coming. How
does the Internet tell us more than
traditional polling ever could?
Politics




What issues are important?
Who are the influencers?
How can we segment/characterize support groups?
How do we spread our opinions more widely?
Who will win the election?
How can we build this?

 “Can social
media predict
  election
outcomes?”
 HTTP://WWW.USATODAY.COM/TECH/
 NEWS/STORY/2012-03-05/SOCIAL-
   SUPER-TUESDAY-PREDICTION/
          53374536/1
Tweet       Insert Magic
 Author
  Date         Here?
 Body
Retweets
Hashtags                                    Prediction
                                             Candidate
                                              Location
                          Classification &
Author      Clustering
                            Regression         Score
 Profile                                      Confidence
 Tweets
Favorites
Following
Followers   Association      Outlier
Location      Rules         Detection
Workshop
Sentiment +
                         Candidate              System Overview

Tweet Inputs



                                                         Correction based
                                      Scoring
                                                         on past elections



               Refinements




Author Inputs




                                                       RMSE Evaluation
Sentiment Detail
Input Observation   Feature Extractor



                                                          Classifier                 Output Label




                                                                                              Confusion Matrix
                                                                                                 Evaluation


                                        N-Gram Features




                                                                 Training Process



   Tweet + Label
Entertainment                                                              Food                                           Movements



            HTTP://WWW.FLICKR.COM/PHOTOS/STUCKINCUSTOMS/2786154526/       HTTP://WWW.FLICKR.COM/PHOTOS/WILLIA4/2504379334/         HTTP://WWW.FLICKR.COM/PHOTOS/GILSONROME/6247208325/




        Collaboration                                                   Shopping                                                        Travel



                    HTTP://WWW.FLICKR.COM/PHOTOS/FIDELMAN/4640722483/       HTTP://WWW.FLICKR.COM/PHOTOS/ZOOBOING/4473219605/      HTTP://WWW.FLICKR.COM/PHOTOS/FELIPENEVES/5414239936/




                Investing                                                Medicine                                                         Trust


HTTP://WWW.FLICKR.COM/PHOTOS/STUCKINCUSTOMS/2786154526/
           HTTP://WWW.FLICKR.COM/PHOTOS/TRAVEL_AFICIONADO/2396819536/   HTTP://WWW.FLICKR.COM/PHOTOS/AGECOMBAHIA/6425101047/    HTTP://WWW.FLICKR.COM/PHOTOS/MARKETINGFACTS/6758968163/
Homework: Data Mining
1. Form groups!

2. Choose a Collective Intelligence topic from
   Lecture 1, or propose similar.

3. Make a list of data sources that might
   provide insights to that topic.

4. Propose a set of meaningful questions about
   the data based on your intuition.

5. How would you have to clean/process your
   data to start answering those questions?

6. Consider clustering, association rules,
   anomaly detection, classification. For each
   technique, how might you apply it to the
   data and what would it show?

7. Document your work and be prepared to
   present.
                                                 HTTP://WWW.FLICKR.COM/PHOTOS/31907740@N00/4860840019/
Feedback

Mais conteúdo relacionado

Semelhante a Data Design

Andy Kirk Malofiej 20 Presentation
Andy Kirk Malofiej 20 PresentationAndy Kirk Malofiej 20 Presentation
Andy Kirk Malofiej 20 PresentationAndy Kirk
 
Dev and Ops Collaboration and Awareness at Etsy and Flickr
Dev and Ops Collaboration and Awareness at Etsy and FlickrDev and Ops Collaboration and Awareness at Etsy and Flickr
Dev and Ops Collaboration and Awareness at Etsy and FlickrJohn Allspaw
 
EDF2013: Selected Talk: Allan Hanbury: Algorithm any good? A Cloud-based Infr...
EDF2013: Selected Talk: Allan Hanbury: Algorithm any good? A Cloud-based Infr...EDF2013: Selected Talk: Allan Hanbury: Algorithm any good? A Cloud-based Infr...
EDF2013: Selected Talk: Allan Hanbury: Algorithm any good? A Cloud-based Infr...European Data Forum
 
Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Miningdataminers.ir
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining Phi Jack
 
[AWS LA Media & Entertainment Event 2015]: Cloud Analytics for Audience Engag...
[AWS LA Media & Entertainment Event 2015]: Cloud Analytics for Audience Engag...[AWS LA Media & Entertainment Event 2015]: Cloud Analytics for Audience Engag...
[AWS LA Media & Entertainment Event 2015]: Cloud Analytics for Audience Engag...Amazon Web Services
 
JoTechies -Azure Machine Learning
JoTechies -Azure Machine LearningJoTechies -Azure Machine Learning
JoTechies -Azure Machine LearningJoTechies
 
Future of test automation tools & infrastructure
Future of test automation tools & infrastructureFuture of test automation tools & infrastructure
Future of test automation tools & infrastructureAnand Bagmar
 
التقنيات المستخدمة لتطوير المكتبات
التقنيات المستخدمة لتطوير المكتباتالتقنيات المستخدمة لتطوير المكتبات
التقنيات المستخدمة لتطوير المكتباتMohammed El Rafie Tarabay
 
Graph Algorithms for Developers
Graph Algorithms for DevelopersGraph Algorithms for Developers
Graph Algorithms for DevelopersNeo4j
 
Measuring the Networked Nonprofit Book Launch
Measuring the Networked Nonprofit Book LaunchMeasuring the Networked Nonprofit Book Launch
Measuring the Networked Nonprofit Book LaunchBeth Kanter
 
2011/06/21 Microsoft Developer Day 2011—Design Decade
2011/06/21 Microsoft Developer Day 2011—Design Decade2011/06/21 Microsoft Developer Day 2011—Design Decade
2011/06/21 Microsoft Developer Day 2011—Design DecadeJustin Lee
 
South Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis PanelSouth Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis PanelTrey Grainger
 
Data Science Demystified
Data Science DemystifiedData Science Demystified
Data Science DemystifiedEmily Robinson
 
The Next-Generation SharePoint: Powered by Text Analytics
The Next-Generation SharePoint: Powered by Text Analytics The Next-Generation SharePoint: Powered by Text Analytics
The Next-Generation SharePoint: Powered by Text Analytics Peter Wren-Hilton
 
The Next Generation SharePoint: Powered by Text Analytics
The Next Generation SharePoint: Powered by Text AnalyticsThe Next Generation SharePoint: Powered by Text Analytics
The Next Generation SharePoint: Powered by Text AnalyticsAlyona Medelyan
 
How Can Analytics Improve Business?
How Can Analytics Improve Business?How Can Analytics Improve Business?
How Can Analytics Improve Business?Inside Analysis
 
Data imputation for unstructured dataset
Data imputation for unstructured datasetData imputation for unstructured dataset
Data imputation for unstructured datasetVibhore Agarwal
 

Semelhante a Data Design (20)

Andy Kirk Malofiej 20 Presentation
Andy Kirk Malofiej 20 PresentationAndy Kirk Malofiej 20 Presentation
Andy Kirk Malofiej 20 Presentation
 
Dev and Ops Collaboration and Awareness at Etsy and Flickr
Dev and Ops Collaboration and Awareness at Etsy and FlickrDev and Ops Collaboration and Awareness at Etsy and Flickr
Dev and Ops Collaboration and Awareness at Etsy and Flickr
 
EDF2013: Selected Talk: Allan Hanbury: Algorithm any good? A Cloud-based Infr...
EDF2013: Selected Talk: Allan Hanbury: Algorithm any good? A Cloud-based Infr...EDF2013: Selected Talk: Allan Hanbury: Algorithm any good? A Cloud-based Infr...
EDF2013: Selected Talk: Allan Hanbury: Algorithm any good? A Cloud-based Infr...
 
Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Mining
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
 
[AWS LA Media & Entertainment Event 2015]: Cloud Analytics for Audience Engag...
[AWS LA Media & Entertainment Event 2015]: Cloud Analytics for Audience Engag...[AWS LA Media & Entertainment Event 2015]: Cloud Analytics for Audience Engag...
[AWS LA Media & Entertainment Event 2015]: Cloud Analytics for Audience Engag...
 
JoTechies -Azure Machine Learning
JoTechies -Azure Machine LearningJoTechies -Azure Machine Learning
JoTechies -Azure Machine Learning
 
Future of test automation tools & infrastructure
Future of test automation tools & infrastructureFuture of test automation tools & infrastructure
Future of test automation tools & infrastructure
 
التقنيات المستخدمة لتطوير المكتبات
التقنيات المستخدمة لتطوير المكتباتالتقنيات المستخدمة لتطوير المكتبات
التقنيات المستخدمة لتطوير المكتبات
 
Graph Algorithms for Developers
Graph Algorithms for DevelopersGraph Algorithms for Developers
Graph Algorithms for Developers
 
Measuring the Networked Nonprofit Book Launch
Measuring the Networked Nonprofit Book LaunchMeasuring the Networked Nonprofit Book Launch
Measuring the Networked Nonprofit Book Launch
 
Sai kiran goud sem.ppt
Sai kiran goud sem.pptSai kiran goud sem.ppt
Sai kiran goud sem.ppt
 
2011/06/21 Microsoft Developer Day 2011—Design Decade
2011/06/21 Microsoft Developer Day 2011—Design Decade2011/06/21 Microsoft Developer Day 2011—Design Decade
2011/06/21 Microsoft Developer Day 2011—Design Decade
 
South Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis PanelSouth Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis Panel
 
Data Science Demystified
Data Science DemystifiedData Science Demystified
Data Science Demystified
 
The Next-Generation SharePoint: Powered by Text Analytics
The Next-Generation SharePoint: Powered by Text Analytics The Next-Generation SharePoint: Powered by Text Analytics
The Next-Generation SharePoint: Powered by Text Analytics
 
The Next Generation SharePoint: Powered by Text Analytics
The Next Generation SharePoint: Powered by Text AnalyticsThe Next Generation SharePoint: Powered by Text Analytics
The Next Generation SharePoint: Powered by Text Analytics
 
How Can Analytics Improve Business?
How Can Analytics Improve Business?How Can Analytics Improve Business?
How Can Analytics Improve Business?
 
Classification
ClassificationClassification
Classification
 
Data imputation for unstructured dataset
Data imputation for unstructured datasetData imputation for unstructured dataset
Data imputation for unstructured dataset
 

Mais de Michael Shilman

Controlled Experiments - Shengdong Zhao
Controlled Experiments - Shengdong ZhaoControlled Experiments - Shengdong Zhao
Controlled Experiments - Shengdong ZhaoMichael Shilman
 
Myoyoung Kim: Visual Storytelling, Infographics!
Myoyoung Kim: Visual Storytelling, Infographics!Myoyoung Kim: Visual Storytelling, Infographics!
Myoyoung Kim: Visual Storytelling, Infographics!Michael Shilman
 
Seungwon Hwang: Entity Graph Mining and Matching
Seungwon Hwang: Entity Graph Mining and MatchingSeungwon Hwang: Entity Graph Mining and Matching
Seungwon Hwang: Entity Graph Mining and MatchingMichael Shilman
 
Ignite Seoul: Machine Learning
Ignite Seoul: Machine LearningIgnite Seoul: Machine Learning
Ignite Seoul: Machine LearningMichael Shilman
 
Collective Intelligence Lecture 1: Introduction
Collective Intelligence Lecture 1: IntroductionCollective Intelligence Lecture 1: Introduction
Collective Intelligence Lecture 1: IntroductionMichael Shilman
 

Mais de Michael Shilman (7)

Project Management
Project ManagementProject Management
Project Management
 
Controlled Experiments - Shengdong Zhao
Controlled Experiments - Shengdong ZhaoControlled Experiments - Shengdong Zhao
Controlled Experiments - Shengdong Zhao
 
Iterative Prototyping
Iterative PrototypingIterative Prototyping
Iterative Prototyping
 
Myoyoung Kim: Visual Storytelling, Infographics!
Myoyoung Kim: Visual Storytelling, Infographics!Myoyoung Kim: Visual Storytelling, Infographics!
Myoyoung Kim: Visual Storytelling, Infographics!
 
Seungwon Hwang: Entity Graph Mining and Matching
Seungwon Hwang: Entity Graph Mining and MatchingSeungwon Hwang: Entity Graph Mining and Matching
Seungwon Hwang: Entity Graph Mining and Matching
 
Ignite Seoul: Machine Learning
Ignite Seoul: Machine LearningIgnite Seoul: Machine Learning
Ignite Seoul: Machine Learning
 
Collective Intelligence Lecture 1: Introduction
Collective Intelligence Lecture 1: IntroductionCollective Intelligence Lecture 1: Introduction
Collective Intelligence Lecture 1: Introduction
 

Último

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Último (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

Data Design

  • 1. Data Design 2114.409: Creative Research Practice HTTP://WWW.FLICKR.COM/PHOTOS/SERGIU_BACIOIU/4370021957/
  • 2. Reflection Status Check Concerns Programming What can we build HTTP://WWW.FLICKR.COM/PHOTOS/FLOWER87/76719859/
  • 3. Course Outline 1. Foundations 3. Prototyping Introduction Crawling Survey Methods / Data Mining Text Mining Visualization and Analysis To be determined (TBD) Social Mechanics Project Update 2. Methods 4. Refinement Creativity and Brainstorming TBD x3 Prototyping Project Presentations Project Management Reflection
  • 4. Last Week: Building Blocks Clustering Classification & Regression Association Rules Outlier Detection HTTP://WWW.FLICKR.COM/PHOTOS/OGIMOGI/2253657555/
  • 6. Data Mining Overview How do I see and Visualization, Storytelling communicate answers? What questions should Design, Data Exploration I ask of the data? How do I clean and Analysis Techniques process the data? How do I gather Crawling, Surveys, UX Design meaningful data?
  • 7. Why might we prefer analysis? LABOR ACCURACY Too many pictures to look at. Can test for statistical significance, etc. Don’t know which are interesting. Some patterns don’t visualize easily. HTTP://WWW.FLICKR.COM/PHOTOS/STRIATIC/2144933705/
  • 8. Clustering Find natural groupings in the data Organize data into classes: ‣ high intra-class similarity ‣ low inter-class similarity
  • 9. Clustering Input Data Output Clusters Points Hard OR OR Soft Similarities OR [ # of clusters ] Hierarchical
  • 10. Classification Regression Learn to map objects to Learn map objects to categories continuous variables
  • 11. Classification Observations X Learn f(x) = y Labels Y Y = gender Male Female X = height
  • 12. The Whole Process Data Set Featurization Featurized Random Split (e.g. 90/10) Training Data Test Data Training Model Evaluation Results
  • 13. Association Rules Learn interesting relations in the data = proportion of events in which X occurs
  • 14. Anomaly Detection Detect strange events in the data Simplest measure:
  • 15. What Can We Build? HTTP://WWW.FLICKR.COM/PHOTOS/BPENDE/6736531173/
  • 16. Collective Intelligence Clicks,) Likes,) Updates,) Ar,cles,) Scrolls,) Links,) Reviews,) Images,) Time) Checkins) Comments) Video) Collec,ve) How can we harness the Intelligence) activities of the world’s digital citizens to build new and useful consumer services? Community)
  • 17. Politics The Korean elections are coming. How does the Internet tell us more than traditional polling ever could?
  • 18. Politics What issues are important? Who are the influencers? How can we segment/characterize support groups? How do we spread our opinions more widely? Who will win the election?
  • 19. How can we build this? “Can social media predict election outcomes?” HTTP://WWW.USATODAY.COM/TECH/ NEWS/STORY/2012-03-05/SOCIAL- SUPER-TUESDAY-PREDICTION/ 53374536/1
  • 20. Tweet Insert Magic Author Date Here? Body Retweets Hashtags Prediction Candidate Location Classification & Author Clustering Regression Score Profile Confidence Tweets Favorites Following Followers Association Outlier Location Rules Detection
  • 22. Sentiment + Candidate System Overview Tweet Inputs Correction based Scoring on past elections Refinements Author Inputs RMSE Evaluation
  • 23. Sentiment Detail Input Observation Feature Extractor Classifier Output Label Confusion Matrix Evaluation N-Gram Features Training Process Tweet + Label
  • 24. Entertainment Food Movements HTTP://WWW.FLICKR.COM/PHOTOS/STUCKINCUSTOMS/2786154526/ HTTP://WWW.FLICKR.COM/PHOTOS/WILLIA4/2504379334/ HTTP://WWW.FLICKR.COM/PHOTOS/GILSONROME/6247208325/ Collaboration Shopping Travel HTTP://WWW.FLICKR.COM/PHOTOS/FIDELMAN/4640722483/ HTTP://WWW.FLICKR.COM/PHOTOS/ZOOBOING/4473219605/ HTTP://WWW.FLICKR.COM/PHOTOS/FELIPENEVES/5414239936/ Investing Medicine Trust HTTP://WWW.FLICKR.COM/PHOTOS/STUCKINCUSTOMS/2786154526/ HTTP://WWW.FLICKR.COM/PHOTOS/TRAVEL_AFICIONADO/2396819536/ HTTP://WWW.FLICKR.COM/PHOTOS/AGECOMBAHIA/6425101047/ HTTP://WWW.FLICKR.COM/PHOTOS/MARKETINGFACTS/6758968163/
  • 25. Homework: Data Mining 1. Form groups! 2. Choose a Collective Intelligence topic from Lecture 1, or propose similar. 3. Make a list of data sources that might provide insights to that topic. 4. Propose a set of meaningful questions about the data based on your intuition. 5. How would you have to clean/process your data to start answering those questions? 6. Consider clustering, association rules, anomaly detection, classification. For each technique, how might you apply it to the data and what would it show? 7. Document your work and be prepared to present. HTTP://WWW.FLICKR.COM/PHOTOS/31907740@N00/4860840019/