SlideShare uma empresa Scribd logo
1 de 26
Baixar para ler offline
Data Design
                                                           2114.409: Creative Research Practice




HTTP://WWW.FLICKR.COM/PHOTOS/SERGIU_BACIOIU/4370021957/
Reflection
Status Check



Concerns

 Programming

 What can we build




                     HTTP://WWW.FLICKR.COM/PHOTOS/FLOWER87/76719859/
Course Outline
1. Foundations                 3. Prototyping
Introduction                   Crawling
Survey Methods / Data Mining   Text Mining
Visualization and Analysis     To be determined (TBD)
Social Mechanics               Project Update




2. Methods                     4. Refinement
Creativity and Brainstorming   TBD x3
Prototyping                    Project Presentations
Project Management             Reflection
Last Week: Building Blocks
    Clustering



   Classification
   & Regression


   Association
     Rules


     Outlier
    Detection
                   HTTP://WWW.FLICKR.COM/PHOTOS/OGIMOGI/2253657555/
This Week: Systems




HTTPS://WWW.FACEBOOK.COM/PHOTO.PHP?FBID=407391545956901&SET=A.407391429290246.110679.100000581776191&TYPE=3&THEATER
Data Mining Overview
How do I see and
                        Visualization, Storytelling
communicate answers?


What questions should
                        Design, Data Exploration
I ask of the data?

How do I clean and
                        Analysis Techniques
process the data?

How do I gather
                        Crawling, Surveys, UX Design
meaningful data?
Why might we prefer analysis?

         LABOR                       ACCURACY
Too many pictures to look at.   Can test for statistical
                                significance, etc.
Don’t know which are
interesting.                    Some patterns don’t
                                visualize easily.




                                         HTTP://WWW.FLICKR.COM/PHOTOS/STRIATIC/2144933705/
Clustering
Find natural
groupings in
the data



Organize data into classes:

‣ high intra-class similarity
‣ low inter-class similarity
Clustering
         Input Data                  Output Clusters



  Points                                           Hard
                                              OR



    OR




                                       Soft
Similarities                                  OR




         [ # of clusters ]              Hierarchical
Classification               Regression




Learn to map objects to   Learn map objects to
categories                continuous variables
Classification
Observations    X   Learn         f(x) = y
Labels          Y
                     Y = gender


 Male




Female
                                       X = height
The Whole Process
                     Data Set
                                Featurization



                   Featurized

                  Random Split (e.g. 90/10)



Training Data                                   Test Data
       Training



   Model
                          Evaluation




                      Results
Association Rules
Learn interesting
relations in the data




                        = proportion of events in which X occurs
Anomaly Detection

          Detect strange
          events in the data


            Simplest measure:
What Can
                                                  We Build?




HTTP://WWW.FLICKR.COM/PHOTOS/BPENDE/6736531173/
Collective Intelligence
Clicks,)      Likes,)      Updates,)   Ar,cles,)
Scrolls,)     Links,)      Reviews,)   Images,)
 Time)       Checkins)    Comments)     Video)




                   Collec,ve)            How can we harness the
                  Intelligence)
                                         activities of the world’s digital
                                         citizens to build new and
                                         useful consumer services?


                  Community)
Politics




The Korean elections are coming. How
does the Internet tell us more than
traditional polling ever could?
Politics




What issues are important?
Who are the influencers?
How can we segment/characterize support groups?
How do we spread our opinions more widely?
Who will win the election?
How can we build this?

 “Can social
media predict
  election
outcomes?”
 HTTP://WWW.USATODAY.COM/TECH/
 NEWS/STORY/2012-03-05/SOCIAL-
   SUPER-TUESDAY-PREDICTION/
          53374536/1
Tweet       Insert Magic
 Author
  Date         Here?
 Body
Retweets
Hashtags                                    Prediction
                                             Candidate
                                              Location
                          Classification &
Author      Clustering
                            Regression         Score
 Profile                                      Confidence
 Tweets
Favorites
Following
Followers   Association      Outlier
Location      Rules         Detection
Workshop
Sentiment +
                         Candidate              System Overview

Tweet Inputs



                                                         Correction based
                                      Scoring
                                                         on past elections



               Refinements




Author Inputs




                                                       RMSE Evaluation
Sentiment Detail
Input Observation   Feature Extractor



                                                          Classifier                 Output Label




                                                                                              Confusion Matrix
                                                                                                 Evaluation


                                        N-Gram Features




                                                                 Training Process



   Tweet + Label
Entertainment                                                              Food                                           Movements



            HTTP://WWW.FLICKR.COM/PHOTOS/STUCKINCUSTOMS/2786154526/       HTTP://WWW.FLICKR.COM/PHOTOS/WILLIA4/2504379334/         HTTP://WWW.FLICKR.COM/PHOTOS/GILSONROME/6247208325/




        Collaboration                                                   Shopping                                                        Travel



                    HTTP://WWW.FLICKR.COM/PHOTOS/FIDELMAN/4640722483/       HTTP://WWW.FLICKR.COM/PHOTOS/ZOOBOING/4473219605/      HTTP://WWW.FLICKR.COM/PHOTOS/FELIPENEVES/5414239936/




                Investing                                                Medicine                                                         Trust


HTTP://WWW.FLICKR.COM/PHOTOS/STUCKINCUSTOMS/2786154526/
           HTTP://WWW.FLICKR.COM/PHOTOS/TRAVEL_AFICIONADO/2396819536/   HTTP://WWW.FLICKR.COM/PHOTOS/AGECOMBAHIA/6425101047/    HTTP://WWW.FLICKR.COM/PHOTOS/MARKETINGFACTS/6758968163/
Homework: Data Mining
1. Form groups!

2. Choose a Collective Intelligence topic from
   Lecture 1, or propose similar.

3. Make a list of data sources that might
   provide insights to that topic.

4. Propose a set of meaningful questions about
   the data based on your intuition.

5. How would you have to clean/process your
   data to start answering those questions?

6. Consider clustering, association rules,
   anomaly detection, classification. For each
   technique, how might you apply it to the
   data and what would it show?

7. Document your work and be prepared to
   present.
                                                 HTTP://WWW.FLICKR.COM/PHOTOS/31907740@N00/4860840019/
Feedback

Mais conteúdo relacionado

Semelhante a Data Design and Creative Research Practice

Andy Kirk Malofiej 20 Presentation
Andy Kirk Malofiej 20 PresentationAndy Kirk Malofiej 20 Presentation
Andy Kirk Malofiej 20 PresentationAndy Kirk
 
Dev and Ops Collaboration and Awareness at Etsy and Flickr
Dev and Ops Collaboration and Awareness at Etsy and FlickrDev and Ops Collaboration and Awareness at Etsy and Flickr
Dev and Ops Collaboration and Awareness at Etsy and FlickrJohn Allspaw
 
EDF2013: Selected Talk: Allan Hanbury: Algorithm any good? A Cloud-based Infr...
EDF2013: Selected Talk: Allan Hanbury: Algorithm any good? A Cloud-based Infr...EDF2013: Selected Talk: Allan Hanbury: Algorithm any good? A Cloud-based Infr...
EDF2013: Selected Talk: Allan Hanbury: Algorithm any good? A Cloud-based Infr...European Data Forum
 
Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Miningdataminers.ir
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining Phi Jack
 
[AWS LA Media & Entertainment Event 2015]: Cloud Analytics for Audience Engag...
[AWS LA Media & Entertainment Event 2015]: Cloud Analytics for Audience Engag...[AWS LA Media & Entertainment Event 2015]: Cloud Analytics for Audience Engag...
[AWS LA Media & Entertainment Event 2015]: Cloud Analytics for Audience Engag...Amazon Web Services
 
JoTechies -Azure Machine Learning
JoTechies -Azure Machine LearningJoTechies -Azure Machine Learning
JoTechies -Azure Machine LearningJoTechies
 
Future of test automation tools & infrastructure
Future of test automation tools & infrastructureFuture of test automation tools & infrastructure
Future of test automation tools & infrastructureAnand Bagmar
 
التقنيات المستخدمة لتطوير المكتبات
التقنيات المستخدمة لتطوير المكتباتالتقنيات المستخدمة لتطوير المكتبات
التقنيات المستخدمة لتطوير المكتباتMohammed El Rafie Tarabay
 
Graph Algorithms for Developers
Graph Algorithms for DevelopersGraph Algorithms for Developers
Graph Algorithms for DevelopersNeo4j
 
Measuring the Networked Nonprofit Book Launch
Measuring the Networked Nonprofit Book LaunchMeasuring the Networked Nonprofit Book Launch
Measuring the Networked Nonprofit Book LaunchBeth Kanter
 
2011/06/21 Microsoft Developer Day 2011—Design Decade
2011/06/21 Microsoft Developer Day 2011—Design Decade2011/06/21 Microsoft Developer Day 2011—Design Decade
2011/06/21 Microsoft Developer Day 2011—Design DecadeJustin Lee
 
South Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis PanelSouth Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis PanelTrey Grainger
 
Data Science Demystified
Data Science DemystifiedData Science Demystified
Data Science DemystifiedEmily Robinson
 
The Next Generation SharePoint: Powered by Text Analytics
The Next Generation SharePoint: Powered by Text AnalyticsThe Next Generation SharePoint: Powered by Text Analytics
The Next Generation SharePoint: Powered by Text AnalyticsAlyona Medelyan
 
The Next-Generation SharePoint: Powered by Text Analytics
The Next-Generation SharePoint: Powered by Text Analytics The Next-Generation SharePoint: Powered by Text Analytics
The Next-Generation SharePoint: Powered by Text Analytics Peter Wren-Hilton
 
How Can Analytics Improve Business?
How Can Analytics Improve Business?How Can Analytics Improve Business?
How Can Analytics Improve Business?Inside Analysis
 
Data imputation for unstructured dataset
Data imputation for unstructured datasetData imputation for unstructured dataset
Data imputation for unstructured datasetVibhore Agarwal
 

Semelhante a Data Design and Creative Research Practice (20)

Andy Kirk Malofiej 20 Presentation
Andy Kirk Malofiej 20 PresentationAndy Kirk Malofiej 20 Presentation
Andy Kirk Malofiej 20 Presentation
 
Dev and Ops Collaboration and Awareness at Etsy and Flickr
Dev and Ops Collaboration and Awareness at Etsy and FlickrDev and Ops Collaboration and Awareness at Etsy and Flickr
Dev and Ops Collaboration and Awareness at Etsy and Flickr
 
EDF2013: Selected Talk: Allan Hanbury: Algorithm any good? A Cloud-based Infr...
EDF2013: Selected Talk: Allan Hanbury: Algorithm any good? A Cloud-based Infr...EDF2013: Selected Talk: Allan Hanbury: Algorithm any good? A Cloud-based Infr...
EDF2013: Selected Talk: Allan Hanbury: Algorithm any good? A Cloud-based Infr...
 
Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Mining
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
 
[AWS LA Media & Entertainment Event 2015]: Cloud Analytics for Audience Engag...
[AWS LA Media & Entertainment Event 2015]: Cloud Analytics for Audience Engag...[AWS LA Media & Entertainment Event 2015]: Cloud Analytics for Audience Engag...
[AWS LA Media & Entertainment Event 2015]: Cloud Analytics for Audience Engag...
 
JoTechies -Azure Machine Learning
JoTechies -Azure Machine LearningJoTechies -Azure Machine Learning
JoTechies -Azure Machine Learning
 
Future of test automation tools & infrastructure
Future of test automation tools & infrastructureFuture of test automation tools & infrastructure
Future of test automation tools & infrastructure
 
التقنيات المستخدمة لتطوير المكتبات
التقنيات المستخدمة لتطوير المكتباتالتقنيات المستخدمة لتطوير المكتبات
التقنيات المستخدمة لتطوير المكتبات
 
Graph Algorithms for Developers
Graph Algorithms for DevelopersGraph Algorithms for Developers
Graph Algorithms for Developers
 
Measuring the Networked Nonprofit Book Launch
Measuring the Networked Nonprofit Book LaunchMeasuring the Networked Nonprofit Book Launch
Measuring the Networked Nonprofit Book Launch
 
Sai kiran goud sem.ppt
Sai kiran goud sem.pptSai kiran goud sem.ppt
Sai kiran goud sem.ppt
 
2011/06/21 Microsoft Developer Day 2011—Design Decade
2011/06/21 Microsoft Developer Day 2011—Design Decade2011/06/21 Microsoft Developer Day 2011—Design Decade
2011/06/21 Microsoft Developer Day 2011—Design Decade
 
South Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis PanelSouth Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis Panel
 
Data Science Demystified
Data Science DemystifiedData Science Demystified
Data Science Demystified
 
The Next Generation SharePoint: Powered by Text Analytics
The Next Generation SharePoint: Powered by Text AnalyticsThe Next Generation SharePoint: Powered by Text Analytics
The Next Generation SharePoint: Powered by Text Analytics
 
The Next-Generation SharePoint: Powered by Text Analytics
The Next-Generation SharePoint: Powered by Text Analytics The Next-Generation SharePoint: Powered by Text Analytics
The Next-Generation SharePoint: Powered by Text Analytics
 
How Can Analytics Improve Business?
How Can Analytics Improve Business?How Can Analytics Improve Business?
How Can Analytics Improve Business?
 
Classification
ClassificationClassification
Classification
 
Data imputation for unstructured dataset
Data imputation for unstructured datasetData imputation for unstructured dataset
Data imputation for unstructured dataset
 

Mais de Michael Shilman

Controlled Experiments - Shengdong Zhao
Controlled Experiments - Shengdong ZhaoControlled Experiments - Shengdong Zhao
Controlled Experiments - Shengdong ZhaoMichael Shilman
 
Myoyoung Kim: Visual Storytelling, Infographics!
Myoyoung Kim: Visual Storytelling, Infographics!Myoyoung Kim: Visual Storytelling, Infographics!
Myoyoung Kim: Visual Storytelling, Infographics!Michael Shilman
 
Seungwon Hwang: Entity Graph Mining and Matching
Seungwon Hwang: Entity Graph Mining and MatchingSeungwon Hwang: Entity Graph Mining and Matching
Seungwon Hwang: Entity Graph Mining and MatchingMichael Shilman
 
Ignite Seoul: Machine Learning
Ignite Seoul: Machine LearningIgnite Seoul: Machine Learning
Ignite Seoul: Machine LearningMichael Shilman
 
Collective Intelligence Lecture 1: Introduction
Collective Intelligence Lecture 1: IntroductionCollective Intelligence Lecture 1: Introduction
Collective Intelligence Lecture 1: IntroductionMichael Shilman
 

Mais de Michael Shilman (7)

Project Management
Project ManagementProject Management
Project Management
 
Controlled Experiments - Shengdong Zhao
Controlled Experiments - Shengdong ZhaoControlled Experiments - Shengdong Zhao
Controlled Experiments - Shengdong Zhao
 
Iterative Prototyping
Iterative PrototypingIterative Prototyping
Iterative Prototyping
 
Myoyoung Kim: Visual Storytelling, Infographics!
Myoyoung Kim: Visual Storytelling, Infographics!Myoyoung Kim: Visual Storytelling, Infographics!
Myoyoung Kim: Visual Storytelling, Infographics!
 
Seungwon Hwang: Entity Graph Mining and Matching
Seungwon Hwang: Entity Graph Mining and MatchingSeungwon Hwang: Entity Graph Mining and Matching
Seungwon Hwang: Entity Graph Mining and Matching
 
Ignite Seoul: Machine Learning
Ignite Seoul: Machine LearningIgnite Seoul: Machine Learning
Ignite Seoul: Machine Learning
 
Collective Intelligence Lecture 1: Introduction
Collective Intelligence Lecture 1: IntroductionCollective Intelligence Lecture 1: Introduction
Collective Intelligence Lecture 1: Introduction
 

Último

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 

Último (20)

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 

Data Design and Creative Research Practice

  • 1. Data Design 2114.409: Creative Research Practice HTTP://WWW.FLICKR.COM/PHOTOS/SERGIU_BACIOIU/4370021957/
  • 2. Reflection Status Check Concerns Programming What can we build HTTP://WWW.FLICKR.COM/PHOTOS/FLOWER87/76719859/
  • 3. Course Outline 1. Foundations 3. Prototyping Introduction Crawling Survey Methods / Data Mining Text Mining Visualization and Analysis To be determined (TBD) Social Mechanics Project Update 2. Methods 4. Refinement Creativity and Brainstorming TBD x3 Prototyping Project Presentations Project Management Reflection
  • 4. Last Week: Building Blocks Clustering Classification & Regression Association Rules Outlier Detection HTTP://WWW.FLICKR.COM/PHOTOS/OGIMOGI/2253657555/
  • 6. Data Mining Overview How do I see and Visualization, Storytelling communicate answers? What questions should Design, Data Exploration I ask of the data? How do I clean and Analysis Techniques process the data? How do I gather Crawling, Surveys, UX Design meaningful data?
  • 7. Why might we prefer analysis? LABOR ACCURACY Too many pictures to look at. Can test for statistical significance, etc. Don’t know which are interesting. Some patterns don’t visualize easily. HTTP://WWW.FLICKR.COM/PHOTOS/STRIATIC/2144933705/
  • 8. Clustering Find natural groupings in the data Organize data into classes: ‣ high intra-class similarity ‣ low inter-class similarity
  • 9. Clustering Input Data Output Clusters Points Hard OR OR Soft Similarities OR [ # of clusters ] Hierarchical
  • 10. Classification Regression Learn to map objects to Learn map objects to categories continuous variables
  • 11. Classification Observations X Learn f(x) = y Labels Y Y = gender Male Female X = height
  • 12. The Whole Process Data Set Featurization Featurized Random Split (e.g. 90/10) Training Data Test Data Training Model Evaluation Results
  • 13. Association Rules Learn interesting relations in the data = proportion of events in which X occurs
  • 14. Anomaly Detection Detect strange events in the data Simplest measure:
  • 15. What Can We Build? HTTP://WWW.FLICKR.COM/PHOTOS/BPENDE/6736531173/
  • 16. Collective Intelligence Clicks,) Likes,) Updates,) Ar,cles,) Scrolls,) Links,) Reviews,) Images,) Time) Checkins) Comments) Video) Collec,ve) How can we harness the Intelligence) activities of the world’s digital citizens to build new and useful consumer services? Community)
  • 17. Politics The Korean elections are coming. How does the Internet tell us more than traditional polling ever could?
  • 18. Politics What issues are important? Who are the influencers? How can we segment/characterize support groups? How do we spread our opinions more widely? Who will win the election?
  • 19. How can we build this? “Can social media predict election outcomes?” HTTP://WWW.USATODAY.COM/TECH/ NEWS/STORY/2012-03-05/SOCIAL- SUPER-TUESDAY-PREDICTION/ 53374536/1
  • 20. Tweet Insert Magic Author Date Here? Body Retweets Hashtags Prediction Candidate Location Classification & Author Clustering Regression Score Profile Confidence Tweets Favorites Following Followers Association Outlier Location Rules Detection
  • 22. Sentiment + Candidate System Overview Tweet Inputs Correction based Scoring on past elections Refinements Author Inputs RMSE Evaluation
  • 23. Sentiment Detail Input Observation Feature Extractor Classifier Output Label Confusion Matrix Evaluation N-Gram Features Training Process Tweet + Label
  • 24. Entertainment Food Movements HTTP://WWW.FLICKR.COM/PHOTOS/STUCKINCUSTOMS/2786154526/ HTTP://WWW.FLICKR.COM/PHOTOS/WILLIA4/2504379334/ HTTP://WWW.FLICKR.COM/PHOTOS/GILSONROME/6247208325/ Collaboration Shopping Travel HTTP://WWW.FLICKR.COM/PHOTOS/FIDELMAN/4640722483/ HTTP://WWW.FLICKR.COM/PHOTOS/ZOOBOING/4473219605/ HTTP://WWW.FLICKR.COM/PHOTOS/FELIPENEVES/5414239936/ Investing Medicine Trust HTTP://WWW.FLICKR.COM/PHOTOS/STUCKINCUSTOMS/2786154526/ HTTP://WWW.FLICKR.COM/PHOTOS/TRAVEL_AFICIONADO/2396819536/ HTTP://WWW.FLICKR.COM/PHOTOS/AGECOMBAHIA/6425101047/ HTTP://WWW.FLICKR.COM/PHOTOS/MARKETINGFACTS/6758968163/
  • 25. Homework: Data Mining 1. Form groups! 2. Choose a Collective Intelligence topic from Lecture 1, or propose similar. 3. Make a list of data sources that might provide insights to that topic. 4. Propose a set of meaningful questions about the data based on your intuition. 5. How would you have to clean/process your data to start answering those questions? 6. Consider clustering, association rules, anomaly detection, classification. For each technique, how might you apply it to the data and what would it show? 7. Document your work and be prepared to present. HTTP://WWW.FLICKR.COM/PHOTOS/31907740@N00/4860840019/