SlideShare uma empresa Scribd logo
1 de 27
Decoder Ring
           http://decoder-ring.net




Jeff Beeman jeff.beeman@asu.edu @doogiemac
             GLS Conference 2010
Background
• Fall 2009 semester
 • Seminars w/ Jim & Betty
 • Wanted to do some sort of emulation of
    work I had been reading (Gee, Hayes,
    Steinkuehler, Duncan, etc.)
 • Seemed to me the process for doing it
    was painful
Traditional process

                     Copy into         Take notes /
Find content
                     Word docs        hi-light phrases


       Come up w/            Manually transfer
    equations & charts        data to Excel


               (At least how I see it)
Traditional process

                    Copy into        Take notes /
Find content
                    Word docs       hi-light phrases


       Come up w/          Manually transfer
    equations & charts      data to Excel


        Wasting time... and it’s BORING
I’m lazy
• I want to
 • use technology to solve repetitive, boring
    problems for me
  • write something once, use it many times
  • take advantage of work others have
    already done
  • work with a lot of data
Better process
                  Create
Find content
                 importer


               Import content


                  Analyze
                  content

      Get someone else to do this
Initial requirements
• Abstracted, flexible, powerful data model
• Sustainable, low cost, framework
• Web based to facilitate collaboration
• Facilitate importing and browsing large data
  sets
• Automated reporting
Overview
Data model
                Collection
                Name                                     Taxonomy
                Description                              Name


 Post                     User                           Term
 Title                    Username                       Name
 Body                     Avatar                         Description
 Author                   Creation date
 Post date                Attributes (rank, sex, etc.)
 Parent post (optional)
 External identifier


All data normalized into Collections, Posts, Users, Taxonomies
Database-backed




• Reports can be generated on the fly
Database-backed




• Data can be queried and searched
Collaborative




• Multiple projects, multiple contributors
Open source
Getting the content
                                                  Collections

                                                 Posts

                                                  Users


Seems to be the overwhelmingly most difficult part of doing this
work.
Again, I’m lazy

• I have a tool that has a normalized,
  predictable data model.
• I can “scrape” websites or other data sets
  and put them into the data model.
Write once...




 Scrapers / importers
Reduced to as little
   work as possible
• Given a common file format, data is quick
  and easy to import into Decoder Ring
• Bad news: Scrapers need to be written for
  every site
• Good news: They’re very quick to write
  (average 4 - 8 hours each)
Analysis & Reporting




     Content navigation
Analysis & Reporting




      Content editing
Analysis & Reporting
Analysis & Reporting
This is great, but...
•   It’s making things faster, but what does it do
    that’s new?
    •   Collaboration, networking of researchers
    •   Immediate reporting provides insight where
        it may not otherwise be seen
•   Still some difficulties:
    •   How do you effectively communicate how to
        use / apply a taxonomy?
Demo
Todo
•   Per-collection taxonomy visibility
•   Per-collection access control
•   Cross-collection reports
•   Search-based reports (i.e. taxonomy term activity for all
    posts with the word "tutorial")
•   More accurate and faster search (Solr): i.e. All posts with
    "violence" near the words "games OR video games OR
    entertainment"
•   More robust hosting infrastructure (more users,
    collections)
Long-term todo
•   DR could "learn" over time about taxonomies
    and language: i.e. What words commonly
    appear in phrases tagged "scientific learning"?
•   Comparisons with external data: i.e. Thread
    activity corresponding to product release
    announcements (Starcraft II thread)
•   Web-based content import: Once a parser is
    written, the ability to queue up import via the
    DR website

Mais conteúdo relacionado

Mais procurados

Mdst3703 2013-09-12-semantic-html
Mdst3703 2013-09-12-semantic-htmlMdst3703 2013-09-12-semantic-html
Mdst3703 2013-09-12-semantic-htmlRafael Alvarado
 
Ebooks without Vendors: Using Open Source Software to Create and Share Meanin...
Ebooks without Vendors: Using Open Source Software to Create and Share Meanin...Ebooks without Vendors: Using Open Source Software to Create and Share Meanin...
Ebooks without Vendors: Using Open Source Software to Create and Share Meanin...Matt Weaver
 
The Paperless Instructor: Going All Digital in the Classroom at Brightspace I...
The Paperless Instructor: Going All Digital in the Classroom at Brightspace I...The Paperless Instructor: Going All Digital in the Classroom at Brightspace I...
The Paperless Instructor: Going All Digital in the Classroom at Brightspace I...D2L Barry
 
Tie presentation 2012
Tie presentation 2012Tie presentation 2012
Tie presentation 2012Erin Abruzzo
 
Drupal: an Overview
Drupal: an OverviewDrupal: an Overview
Drupal: an OverviewMatt Weaver
 
Ebooks without Vendors: Using Open Source Software to Create and Share Meanin...
Ebooks without Vendors: Using Open Source Software to Create and Share Meanin...Ebooks without Vendors: Using Open Source Software to Create and Share Meanin...
Ebooks without Vendors: Using Open Source Software to Create and Share Meanin...Matt Weaver
 

Mais procurados (11)

Mdst3703 2013-09-12-semantic-html
Mdst3703 2013-09-12-semantic-htmlMdst3703 2013-09-12-semantic-html
Mdst3703 2013-09-12-semantic-html
 
dmBridge & dmMonocle
dmBridge & dmMonocledmBridge & dmMonocle
dmBridge & dmMonocle
 
History and Features of Dropbox
History and Features of DropboxHistory and Features of Dropbox
History and Features of Dropbox
 
E-publishing
E-publishingE-publishing
E-publishing
 
Storing and sharing
Storing and sharingStoring and sharing
Storing and sharing
 
Ebooks without Vendors: Using Open Source Software to Create and Share Meanin...
Ebooks without Vendors: Using Open Source Software to Create and Share Meanin...Ebooks without Vendors: Using Open Source Software to Create and Share Meanin...
Ebooks without Vendors: Using Open Source Software to Create and Share Meanin...
 
The Paperless Instructor: Going All Digital in the Classroom at Brightspace I...
The Paperless Instructor: Going All Digital in the Classroom at Brightspace I...The Paperless Instructor: Going All Digital in the Classroom at Brightspace I...
The Paperless Instructor: Going All Digital in the Classroom at Brightspace I...
 
Tie presentation 2012
Tie presentation 2012Tie presentation 2012
Tie presentation 2012
 
Drupal: an Overview
Drupal: an OverviewDrupal: an Overview
Drupal: an Overview
 
Ebooks without Vendors: Using Open Source Software to Create and Share Meanin...
Ebooks without Vendors: Using Open Source Software to Create and Share Meanin...Ebooks without Vendors: Using Open Source Software to Create and Share Meanin...
Ebooks without Vendors: Using Open Source Software to Create and Share Meanin...
 
Apache Lucene 4
Apache Lucene 4Apache Lucene 4
Apache Lucene 4
 

Destaque

Beyond the interface to the interaction
Beyond the interface to the interactionBeyond the interface to the interaction
Beyond the interface to the interactionDavid Roth
 
LinkedIn - DMEF Summit 2012
LinkedIn - DMEF Summit 2012LinkedIn - DMEF Summit 2012
LinkedIn - DMEF Summit 2012Bela Florenthal
 
In e chapter meeting june 22 2010
In e chapter meeting june 22 2010In e chapter meeting june 22 2010
In e chapter meeting june 22 2010Zach Schmidt
 
E-textbooks Presentation Spring 2012
E-textbooks Presentation Spring 2012E-textbooks Presentation Spring 2012
E-textbooks Presentation Spring 2012Bela Florenthal
 
Drupal at ASU - Drupalcon 2010
Drupal at ASU - Drupalcon 2010Drupal at ASU - Drupalcon 2010
Drupal at ASU - Drupalcon 2010Jeff Beeman
 
DMEF Conference Vodcast Paper Fall 2011
DMEF Conference  Vodcast Paper Fall 2011DMEF Conference  Vodcast Paper Fall 2011
DMEF Conference Vodcast Paper Fall 2011Bela Florenthal
 
Sinónimos y antónimos (1)
Sinónimos y antónimos (1)Sinónimos y antónimos (1)
Sinónimos y antónimos (1)cedalm
 
UX Ukraine: The Kings are Dead
UX Ukraine: The Kings are DeadUX Ukraine: The Kings are Dead
UX Ukraine: The Kings are DeadDavid Roth
 
ASU DUG - Advanced CCK and Views
ASU DUG - Advanced CCK and ViewsASU DUG - Advanced CCK and Views
ASU DUG - Advanced CCK and ViewsJeff Beeman
 
ASU DUG Content Access Control and Workflow
ASU DUG Content Access Control and WorkflowASU DUG Content Access Control and Workflow
ASU DUG Content Access Control and WorkflowJeff Beeman
 
Working 5 To 9 Presentation
Working 5 To 9 PresentationWorking 5 To 9 Presentation
Working 5 To 9 PresentationHarriman House
 
SM Index Case EDGE Summit 2014
SM Index Case EDGE Summit 2014SM Index Case EDGE Summit 2014
SM Index Case EDGE Summit 2014Bela Florenthal
 
Вся боль Рунета из-за вирусов (SNCE 2014)
Вся боль Рунета из-за вирусов (SNCE 2014)Вся боль Рунета из-за вирусов (SNCE 2014)
Вся боль Рунета из-за вирусов (SNCE 2014)Nikolay Syusko
 
DrupalCon Austin: Planning for Performance
DrupalCon Austin: Planning for PerformanceDrupalCon Austin: Planning for Performance
DrupalCon Austin: Planning for PerformanceJeff Beeman
 

Destaque (18)

Beyond the interface to the interaction
Beyond the interface to the interactionBeyond the interface to the interaction
Beyond the interface to the interaction
 
Annualreportfinal
AnnualreportfinalAnnualreportfinal
Annualreportfinal
 
LinkedIn - DMEF Summit 2012
LinkedIn - DMEF Summit 2012LinkedIn - DMEF Summit 2012
LinkedIn - DMEF Summit 2012
 
In e chapter meeting june 22 2010
In e chapter meeting june 22 2010In e chapter meeting june 22 2010
In e chapter meeting june 22 2010
 
Library advocacy
Library advocacyLibrary advocacy
Library advocacy
 
E-textbooks Presentation Spring 2012
E-textbooks Presentation Spring 2012E-textbooks Presentation Spring 2012
E-textbooks Presentation Spring 2012
 
July slidecast
July slidecastJuly slidecast
July slidecast
 
Drupal at ASU - Drupalcon 2010
Drupal at ASU - Drupalcon 2010Drupal at ASU - Drupalcon 2010
Drupal at ASU - Drupalcon 2010
 
DMEF Conference Vodcast Paper Fall 2011
DMEF Conference  Vodcast Paper Fall 2011DMEF Conference  Vodcast Paper Fall 2011
DMEF Conference Vodcast Paper Fall 2011
 
Sinónimos y antónimos (1)
Sinónimos y antónimos (1)Sinónimos y antónimos (1)
Sinónimos y antónimos (1)
 
UX Ukraine: The Kings are Dead
UX Ukraine: The Kings are DeadUX Ukraine: The Kings are Dead
UX Ukraine: The Kings are Dead
 
ASU DUG - Advanced CCK and Views
ASU DUG - Advanced CCK and ViewsASU DUG - Advanced CCK and Views
ASU DUG - Advanced CCK and Views
 
MMA Green Calendars
MMA  Green CalendarsMMA  Green Calendars
MMA Green Calendars
 
ASU DUG Content Access Control and Workflow
ASU DUG Content Access Control and WorkflowASU DUG Content Access Control and Workflow
ASU DUG Content Access Control and Workflow
 
Working 5 To 9 Presentation
Working 5 To 9 PresentationWorking 5 To 9 Presentation
Working 5 To 9 Presentation
 
SM Index Case EDGE Summit 2014
SM Index Case EDGE Summit 2014SM Index Case EDGE Summit 2014
SM Index Case EDGE Summit 2014
 
Вся боль Рунета из-за вирусов (SNCE 2014)
Вся боль Рунета из-за вирусов (SNCE 2014)Вся боль Рунета из-за вирусов (SNCE 2014)
Вся боль Рунета из-за вирусов (SNCE 2014)
 
DrupalCon Austin: Planning for Performance
DrupalCon Austin: Planning for PerformanceDrupalCon Austin: Planning for Performance
DrupalCon Austin: Planning for Performance
 

Semelhante a Decoder Ring

Designing and Implementing Search Solutions
Designing and Implementing Search SolutionsDesigning and Implementing Search Solutions
Designing and Implementing Search SolutionsFindwise
 
Navigating the Mess of a Shared drive Migration to SharePoint
Navigating the Mess of a Shared drive Migration to SharePointNavigating the Mess of a Shared drive Migration to SharePoint
Navigating the Mess of a Shared drive Migration to SharePointJoanne Klein
 
Introduction to NVivo
Introduction to NVivoIntroduction to NVivo
Introduction to NVivoMarieke Guy
 
Drupal and Apache Stanbol
Drupal and Apache StanbolDrupal and Apache Stanbol
Drupal and Apache StanbolAlkuvoima
 
Sharepoint for Nonprofits: Introduction
Sharepoint for Nonprofits: IntroductionSharepoint for Nonprofits: Introduction
Sharepoint for Nonprofits: Introduction501 Commons
 
Practical Information Architecture
Practical Information ArchitecturePractical Information Architecture
Practical Information ArchitectureRob Bogue
 
SharePoint Saturday New york City - The importance of metadata #spsnyc
SharePoint Saturday New york City - The importance of metadata #spsnycSharePoint Saturday New york City - The importance of metadata #spsnyc
SharePoint Saturday New york City - The importance of metadata #spsnycVincent Biret
 
A SharePoint File Migration Framework
A SharePoint File Migration FrameworkA SharePoint File Migration Framework
A SharePoint File Migration FrameworkGerry Brimacombe
 
Navigating the mess of a Shared Network Drive Migration to SharePoint - SPS B...
Navigating the mess of a Shared Network Drive Migration to SharePoint - SPS B...Navigating the mess of a Shared Network Drive Migration to SharePoint - SPS B...
Navigating the mess of a Shared Network Drive Migration to SharePoint - SPS B...Joanne Klein
 
Reference material: Topics or databases?
Reference material: Topics or databases?Reference material: Topics or databases?
Reference material: Topics or databases?Ben Colborn
 
02-Lifecycle.pptx
02-Lifecycle.pptx02-Lifecycle.pptx
02-Lifecycle.pptxShree Shree
 
Build a modern data platform.pptx
Build a modern data platform.pptxBuild a modern data platform.pptx
Build a modern data platform.pptxIke Ellis
 
Information Architecture Explained
Information Architecture ExplainedInformation Architecture Explained
Information Architecture ExplainedLeigh White
 
How to SEO a Terrific - and Profitable - User Experience
How to SEO a Terrific - and Profitable - User ExperienceHow to SEO a Terrific - and Profitable - User Experience
How to SEO a Terrific - and Profitable - User ExperienceBrightEdge
 
Zero to Sixty with Oracle ApEx
Zero to Sixty with Oracle ApExZero to Sixty with Oracle ApEx
Zero to Sixty with Oracle ApExBradley Brown
 
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Lucidworks
 

Semelhante a Decoder Ring (20)

Designing and Implementing Search Solutions
Designing and Implementing Search SolutionsDesigning and Implementing Search Solutions
Designing and Implementing Search Solutions
 
Navigating the Mess of a Shared drive Migration to SharePoint
Navigating the Mess of a Shared drive Migration to SharePointNavigating the Mess of a Shared drive Migration to SharePoint
Navigating the Mess of a Shared drive Migration to SharePoint
 
Introduction to NVivo
Introduction to NVivoIntroduction to NVivo
Introduction to NVivo
 
Single Source Publishing: Utilizing XML and DITA
Single Source Publishing: Utilizing XML and DITASingle Source Publishing: Utilizing XML and DITA
Single Source Publishing: Utilizing XML and DITA
 
DatoConference2015
DatoConference2015DatoConference2015
DatoConference2015
 
Drupal and Apache Stanbol
Drupal and Apache StanbolDrupal and Apache Stanbol
Drupal and Apache Stanbol
 
Sharepoint for Nonprofits: Introduction
Sharepoint for Nonprofits: IntroductionSharepoint for Nonprofits: Introduction
Sharepoint for Nonprofits: Introduction
 
Practical Information Architecture
Practical Information ArchitecturePractical Information Architecture
Practical Information Architecture
 
SharePoint Saturday New york City - The importance of metadata #spsnyc
SharePoint Saturday New york City - The importance of metadata #spsnycSharePoint Saturday New york City - The importance of metadata #spsnyc
SharePoint Saturday New york City - The importance of metadata #spsnyc
 
A SharePoint File Migration Framework
A SharePoint File Migration FrameworkA SharePoint File Migration Framework
A SharePoint File Migration Framework
 
Navigating the mess of a Shared Network Drive Migration to SharePoint - SPS B...
Navigating the mess of a Shared Network Drive Migration to SharePoint - SPS B...Navigating the mess of a Shared Network Drive Migration to SharePoint - SPS B...
Navigating the mess of a Shared Network Drive Migration to SharePoint - SPS B...
 
Reference material: Topics or databases?
Reference material: Topics or databases?Reference material: Topics or databases?
Reference material: Topics or databases?
 
02-Lifecycle.pptx
02-Lifecycle.pptx02-Lifecycle.pptx
02-Lifecycle.pptx
 
Build a modern data platform.pptx
Build a modern data platform.pptxBuild a modern data platform.pptx
Build a modern data platform.pptx
 
Information Architecture Explained
Information Architecture ExplainedInformation Architecture Explained
Information Architecture Explained
 
How to SEO a Terrific - and Profitable - User Experience
How to SEO a Terrific - and Profitable - User ExperienceHow to SEO a Terrific - and Profitable - User Experience
How to SEO a Terrific - and Profitable - User Experience
 
Zero to Sixty with Oracle ApEx
Zero to Sixty with Oracle ApExZero to Sixty with Oracle ApEx
Zero to Sixty with Oracle ApEx
 
Metadata
MetadataMetadata
Metadata
 
Anchor modeling
Anchor modelingAnchor modeling
Anchor modeling
 
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
 

Último

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Último (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Decoder Ring

  • 1. Decoder Ring http://decoder-ring.net Jeff Beeman jeff.beeman@asu.edu @doogiemac GLS Conference 2010
  • 2. Background • Fall 2009 semester • Seminars w/ Jim & Betty • Wanted to do some sort of emulation of work I had been reading (Gee, Hayes, Steinkuehler, Duncan, etc.) • Seemed to me the process for doing it was painful
  • 3. Traditional process Copy into Take notes / Find content Word docs hi-light phrases Come up w/ Manually transfer equations & charts data to Excel (At least how I see it)
  • 4. Traditional process Copy into Take notes / Find content Word docs hi-light phrases Come up w/ Manually transfer equations & charts data to Excel Wasting time... and it’s BORING
  • 5. I’m lazy • I want to • use technology to solve repetitive, boring problems for me • write something once, use it many times • take advantage of work others have already done • work with a lot of data
  • 6.
  • 7.
  • 8. Better process Create Find content importer Import content Analyze content Get someone else to do this
  • 9. Initial requirements • Abstracted, flexible, powerful data model • Sustainable, low cost, framework • Web based to facilitate collaboration • Facilitate importing and browsing large data sets • Automated reporting
  • 11. Data model Collection Name Taxonomy Description Name Post User Term Title Username Name Body Avatar Description Author Creation date Post date Attributes (rank, sex, etc.) Parent post (optional) External identifier All data normalized into Collections, Posts, Users, Taxonomies
  • 12. Database-backed • Reports can be generated on the fly
  • 13. Database-backed • Data can be queried and searched
  • 14. Collaborative • Multiple projects, multiple contributors
  • 16. Getting the content Collections Posts Users Seems to be the overwhelmingly most difficult part of doing this work.
  • 17. Again, I’m lazy • I have a tool that has a normalized, predictable data model. • I can “scrape” websites or other data sets and put them into the data model.
  • 18. Write once... Scrapers / importers
  • 19. Reduced to as little work as possible • Given a common file format, data is quick and easy to import into Decoder Ring • Bad news: Scrapers need to be written for every site • Good news: They’re very quick to write (average 4 - 8 hours each)
  • 20. Analysis & Reporting Content navigation
  • 21. Analysis & Reporting Content editing
  • 24. This is great, but... • It’s making things faster, but what does it do that’s new? • Collaboration, networking of researchers • Immediate reporting provides insight where it may not otherwise be seen • Still some difficulties: • How do you effectively communicate how to use / apply a taxonomy?
  • 25. Demo
  • 26. Todo • Per-collection taxonomy visibility • Per-collection access control • Cross-collection reports • Search-based reports (i.e. taxonomy term activity for all posts with the word "tutorial") • More accurate and faster search (Solr): i.e. All posts with "violence" near the words "games OR video games OR entertainment" • More robust hosting infrastructure (more users, collections)
  • 27. Long-term todo • DR could "learn" over time about taxonomies and language: i.e. What words commonly appear in phrases tagged "scientific learning"? • Comparisons with external data: i.e. Thread activity corresponding to product release announcements (Starcraft II thread) • Web-based content import: Once a parser is written, the ability to queue up import via the DR website

Notas do Editor

  1. **** Why scraping data is difficult but possible - Many sites use different terminology and structure for what are essentially similar data types (post vs. discussion vs. thread; user vs. account) - Unpredictable markup on websites -- often BAD markup - Picture of malformed HTML - Creating a generic scraper tool would be sloppy, inaccurate, and error-prone - Fortunately, writing site-specific scrapers is a pretty straight-forward process - Roughly 4 hours per scraper, getting to be less as I gain more experience