Decoder Ring

•Transferir como KEY, PDF•

0 gostou•267 visualizações

Jeff Beeman

Presentation of my project Decoder Ring at the Games, Learning & Society Conference 2010.

Tecnologia

Decoder Ring
http://decoder-ring.net

Jeff Beeman jeff.beeman@asu.edu @doogiemac
GLS Conference 2010

Background
• Fall 2009 semester
• Seminars w/ Jim & Betty
• Wanted to do some sort of emulation of
work I had been reading (Gee, Hayes,
Steinkuehler, Duncan, etc.)
• Seemed to me the process for doing it
was painful

Traditional process

Copy into Take notes /
Find content
Word docs hi-light phrases

Come up w/ Manually transfer
equations & charts data to Excel

(At least how I see it)

Traditional process

Copy into Take notes /
Find content
Word docs hi-light phrases

Come up w/ Manually transfer
equations & charts data to Excel

Wasting time... and it’s BORING

I’m lazy
• I want to
• use technology to solve repetitive, boring
problems for me
• write something once, use it many times
• take advantage of work others have
already done
• work with a lot of data

Better process
Create
Find content
importer

Import content

Analyze
content

Get someone else to do this

Initial requirements
• Abstracted, ﬂexible, powerful data model
• Sustainable, low cost, framework
• Web based to facilitate collaboration
• Facilitate importing and browsing large data
sets
• Automated reporting

Database-backed

• Reports can be generated on the ﬂy

Database-backed

• Data can be queried and searched

Collaborative

• Multiple projects, multiple contributors

Getting the content
Collections

Posts

Users

Seems to be the overwhelmingly most difﬁcult part of doing this
work.

Again, I’m lazy

• I have a tool that has a normalized,
predictable data model.
• I can “scrape” websites or other data sets
and put them into the data model.

Reduced to as little
work as possible
• Given a common ﬁle format, data is quick
and easy to import into Decoder Ring
• Bad news: Scrapers need to be written for
every site
• Good news: They’re very quick to write
(average 4 - 8 hours each)

Analysis & Reporting

Content navigation

This is great, but...
• It’s making things faster, but what does it do
that’s new?
• Collaboration, networking of researchers
• Immediate reporting provides insight where
it may not otherwise be seen
• Still some difﬁculties:
• How do you effectively communicate how to
use / apply a taxonomy?

Todo
• Per-collection taxonomy visibility
• Per-collection access control
• Cross-collection reports
• Search-based reports (i.e. taxonomy term activity for all
posts with the word "tutorial")
• More accurate and faster search (Solr): i.e. All posts with
"violence" near the words "games OR video games OR
entertainment"
• More robust hosting infrastructure (more users,
collections)

Long-term todo
• DR could "learn" over time about taxonomies
and language: i.e. What words commonly
appear in phrases tagged "scientiﬁc learning"?
• Comparisons with external data: i.e. Thread
activity corresponding to product release
announcements (Starcraft II thread)
• Web-based content import: Once a parser is
written, the ability to queue up import via the
DR website

Mais conteúdo relacionado

Mais procurados

Mdst3703 2013-09-12-semantic-htmlRafael Alvarado

dmBridge & dmMonocleUniversity of Nevada, Las Vegas

History and Features of DropboxCarlo Stevanh Bautista

E-publishingGanesh Koli

Storing and sharingSamantha Oakley

Ebooks without Vendors: Using Open Source Software to Create and Share Meanin...Matt Weaver

The Paperless Instructor: Going All Digital in the Classroom at Brightspace I...D2L Barry

Tie presentation 2012Erin Abruzzo

Drupal: an OverviewMatt Weaver

Ebooks without Vendors: Using Open Source Software to Create and Share Meanin...Matt Weaver

Apache Lucene 4Grant Ingersoll

Mais procurados (11)

Mdst3703 2013-09-12-semantic-html

dmBridge & dmMonocle

History and Features of Dropbox

E-publishing

Storing and sharing

Ebooks without Vendors: Using Open Source Software to Create and Share Meanin...

The Paperless Instructor: Going All Digital in the Classroom at Brightspace I...

Tie presentation 2012

Drupal: an Overview

Ebooks without Vendors: Using Open Source Software to Create and Share Meanin...

Apache Lucene 4

Destaque

Beyond the interface to the interactionDavid Roth

AnnualreportfinalAmy Carpenter

LinkedIn - DMEF Summit 2012Bela Florenthal

In e chapter meeting june 22 2010Zach Schmidt

Library advocacyAmy Carpenter

E-textbooks Presentation Spring 2012Bela Florenthal

July slidecastStefanie Dinsbeer

Drupal at ASU - Drupalcon 2010Jeff Beeman

DMEF Conference Vodcast Paper Fall 2011Bela Florenthal

Sinónimos y antónimos (1)cedalm

UX Ukraine: The Kings are DeadDavid Roth

ASU DUG - Advanced CCK and ViewsJeff Beeman

MMA Green CalendarsBela Florenthal

ASU DUG Content Access Control and WorkflowJeff Beeman

Working 5 To 9 PresentationHarriman House

SM Index Case EDGE Summit 2014Bela Florenthal

Вся боль Рунета из-за вирусов (SNCE 2014)Nikolay Syusko

DrupalCon Austin: Planning for PerformanceJeff Beeman

Destaque (18)

Beyond the interface to the interaction

Annualreportfinal

LinkedIn - DMEF Summit 2012

In e chapter meeting june 22 2010

Library advocacy

E-textbooks Presentation Spring 2012

July slidecast

Drupal at ASU - Drupalcon 2010

DMEF Conference Vodcast Paper Fall 2011

Sinónimos y antónimos (1)

UX Ukraine: The Kings are Dead

ASU DUG - Advanced CCK and Views

MMA Green Calendars

ASU DUG Content Access Control and Workflow

Working 5 To 9 Presentation

SM Index Case EDGE Summit 2014

Вся боль Рунета из-за вирусов (SNCE 2014)

DrupalCon Austin: Planning for Performance

Semelhante a Decoder Ring

Designing and Implementing Search SolutionsFindwise

Navigating the Mess of a Shared drive Migration to SharePointJoanne Klein

Introduction to NVivoMarieke Guy

Single Source Publishing: Utilizing XML and DITASTC-Philadelphia Metro Chapter

DatoConference2015Debora Donato

Drupal and Apache StanbolAlkuvoima

Sharepoint for Nonprofits: Introduction501 Commons

Practical Information ArchitectureRob Bogue

SharePoint Saturday New york City - The importance of metadata #spsnycVincent Biret

A SharePoint File Migration FrameworkGerry Brimacombe

Navigating the mess of a Shared Network Drive Migration to SharePoint - SPS B...Joanne Klein

Reference material: Topics or databases?Ben Colborn

02-Lifecycle.pptxShree Shree

Build a modern data platform.pptxIke Ellis

Information Architecture ExplainedLeigh White

How to SEO a Terrific - and Profitable - User ExperienceBrightEdge

Zero to Sixty with Oracle ApExBradley Brown

MetadataDorothea Salo

Anchor modelingRafał Hryniewski

Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Lucidworks

Semelhante a Decoder Ring (20)

Designing and Implementing Search Solutions

Navigating the Mess of a Shared drive Migration to SharePoint

Introduction to NVivo

Single Source Publishing: Utilizing XML and DITA

DatoConference2015

Drupal and Apache Stanbol

Sharepoint for Nonprofits: Introduction

Practical Information Architecture

SharePoint Saturday New york City - The importance of metadata #spsnyc

A SharePoint File Migration Framework

Navigating the mess of a Shared Network Drive Migration to SharePoint - SPS B...

Reference material: Topics or databases?

02-Lifecycle.pptx

Build a modern data platform.pptx

Information Architecture Explained

How to SEO a Terrific - and Profitable - User Experience

Zero to Sixty with Oracle ApEx

Metadata

Anchor modeling

Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...

Último

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia

Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer

Real Time Object Detection Using Open CVKhem

What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco

A Year of the Servo Reboot: Where Are We Now?Igalia

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

Scaling API-first – The story of a global engineering organizationRadu Cotescu

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

GenCyber Cyber Security Day PresentationMichael W. Hawkins

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

Decoder Ring

1. Decoder Ring http://decoder-ring.net Jeff Beeman jeff.beeman@asu.edu @doogiemac GLS Conference 2010

2. Background • Fall 2009 semester • Seminars w/ Jim & Betty • Wanted to do some sort of emulation of work I had been reading (Gee, Hayes, Steinkuehler, Duncan, etc.) • Seemed to me the process for doing it was painful

3. Traditional process Copy into Take notes / Find content Word docs hi-light phrases Come up w/ Manually transfer equations & charts data to Excel (At least how I see it)

4. Traditional process Copy into Take notes / Find content Word docs hi-light phrases Come up w/ Manually transfer equations & charts data to Excel Wasting time... and it’s BORING

5. I’m lazy • I want to • use technology to solve repetitive, boring problems for me • write something once, use it many times • take advantage of work others have already done • work with a lot of data

8. Better process Create Find content importer Import content Analyze content Get someone else to do this

9. Initial requirements • Abstracted, ﬂexible, powerful data model • Sustainable, low cost, framework • Web based to facilitate collaboration • Facilitate importing and browsing large data sets • Automated reporting

10. Overview

11. Data model Collection Name Taxonomy Description Name Post User Term Title Username Name Body Avatar Description Author Creation date Post date Attributes (rank, sex, etc.) Parent post (optional) External identiﬁer All data normalized into Collections, Posts, Users, Taxonomies

12. Database-backed • Reports can be generated on the ﬂy

13. Database-backed • Data can be queried and searched

14. Collaborative • Multiple projects, multiple contributors

15. Open source

16. Getting the content Collections Posts Users Seems to be the overwhelmingly most difﬁcult part of doing this work.

17. Again, I’m lazy • I have a tool that has a normalized, predictable data model. • I can “scrape” websites or other data sets and put them into the data model.

18. Write once... Scrapers / importers

19. Reduced to as little work as possible • Given a common ﬁle format, data is quick and easy to import into Decoder Ring • Bad news: Scrapers need to be written for every site • Good news: They’re very quick to write (average 4 - 8 hours each)

20. Analysis & Reporting Content navigation

21. Analysis & Reporting Content editing

22. Analysis & Reporting

23. Analysis & Reporting

24. This is great, but... • It’s making things faster, but what does it do that’s new? • Collaboration, networking of researchers • Immediate reporting provides insight where it may not otherwise be seen • Still some difﬁculties: • How do you effectively communicate how to use / apply a taxonomy?

25. Demo

26. Todo • Per-collection taxonomy visibility • Per-collection access control • Cross-collection reports • Search-based reports (i.e. taxonomy term activity for all posts with the word "tutorial") • More accurate and faster search (Solr): i.e. All posts with "violence" near the words "games OR video games OR entertainment" • More robust hosting infrastructure (more users, collections)

27. Long-term todo • DR could "learn" over time about taxonomies and language: i.e. What words commonly appear in phrases tagged "scientiﬁc learning"? • Comparisons with external data: i.e. Thread activity corresponding to product release announcements (Starcraft II thread) • Web-based content import: Once a parser is written, the ability to queue up import via the DR website

Notas do Editor

**** Why scraping data is difficult but possible - Many sites use different terminology and structure for what are essentially similar data types (post vs. discussion vs. thread; user vs. account) - Unpredictable markup on websites -- often BAD markup - Picture of malformed HTML - Creating a generic scraper tool would be sloppy, inaccurate, and error-prone - Fortunately, writing site-specific scrapers is a pretty straight-forward process - Roughly 4 hours per scraper, getting to be less as I gain more experience

Decoder Ring

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (11)

Destaque

Destaque (18)

Semelhante a Decoder Ring

Semelhante a Decoder Ring (20)

Último

Último (20)

Decoder Ring

Notas do Editor