SlideShare uma empresa Scribd logo
1 de 20
Bionic Info Pro:
New Takes on an Old Theme
Machine Learning, Taxonomy Creation, Big Data,
Competitive Intelligence, and the Human Element
Elaine M. Lasda Bergman
Annual Conference
Special Libraries Association
Vancouver, BC, Canada
Monday, June 9, 2014
Overview
• A little bit about Machine Learning
• A little bit about Taxonomies
• A little bit about Big Data
• A little bit about Hybrid Techniques
NOT NEW:
Machine Learning for CI
Mena, Jesus. (1996). Data Mining for
Competitive Intelligence, Competitive
Intelligence Review, 7(4):18-25.
Refinement of Machine Learning
• Decision Trees/Classification
• Clustering
• Anomaly Detection
Refinement of Machine Learning
• Support Vector Machines-
– Predictive Classification
• Association Rules
– Marketbasket analysis
• Natural Language Processing
– Sentiment Analysis
Getting up to Speed
• http://efytimes.com
• 6 Video Tutorials and Playlists on
Machine Learning (January 2014)
NOT NEW: Taxonomies in
Information Retrieval
http://comsaad.blogspot.com/p/old-computer-photos.html
http://commons.wikimedia.org/wiki/File:A_Library_Primer_illustration_Joined_Hand.jpg
Need for Taxonomic Structures
http://farm9.staticflickr.com/8262/8673326413_4492b5dc68_o.jpg
NOT NEW: Datasets
http://www.conceptdraw.com/solution-park/resource/images/solutions/entity-relationship-diagram-(erd)/Diagramming-Crow's-Foot-ERD-Sample60.png
Enter BIG DATA
http://commons.wikimedia.org/wiki/File:DARPA_Big_Data.jpg
BigData Sources and AnalysisDataType Qualities Analysis Tools Result
Social Media Demographics API integration More profiles of like-
minded users
“Social Influencers” User Reviews NLP, Text Analysis Sentiment readings
“Internet of Things” Logs/Sensors/Check-Ins Parsing Usage and behavior
patterns
SaaS Cloud/Web-based/Subscription
software
Dist. data integration/in-memory caching
technology/API integration
Usage behavior patterns,
customer data, etc.
Public Data e.g., Amazon Data Market,
WorldBank, Wikipedia
All above (depends on data structure) Depends on Dataset (and
there are LOTS of them!)
Hadoop/MapReduce Volume! Parallel Processing/Parsing/Reduction Big patterns, correlations,
needles in haystacks
Data Warehouses Internal transactional data Likely same as above Correlations,
marketbasket, etc.
NoSQL/Columnar Volume! Fills gaps in Parallel processing tools Real time activity and
patterns
In-Stream Monitoring Network traffic (streaming
videos, system outages)
Packet evaluation, distributed query processing Network/Stream usage
patterns
Legacy Data Usually PDFs &
Documents/SemiStructured
Transformation tools(eg, Xenos d2e) + above Depends on content (could
be all)
http://www.zdnet.com/top-10-categories-for-big-data-sources-and-mining-technologies-7000000926/
Why “Concept Hierarchies” in
an Unstructured Environment?
Advantages
• When term is too low to appear in
frequent item/rulesets
• Create more interesting rules using
more general, aggregated concepts
[DVD, wheat bread, home electronics,
electronitcs, food]
Kumar, T.S. (2005) Introduction to Data Science
Disadvantages
• How low and how high in the hierarchy
do you set the threshold?
• Increased computation time
• If threshold is to high, redundant rules
for more specific terms can be
summarized by rules using more
general terms
Hybrid Taxonomic Development
• Understand your auto-classification
model
• Work with domain experts to create
basic taxonomy
• Test Taxonomy in the Model
• Rinse, repeat
Wendy Pohs,ASIS&T Bulletin 12/1/13
Domain Knowledge
and Thick Data
• Thick Data analysis primarily relies on human brain power to
process a small “N” while big data analysis requires
computational power (of course with humans writing the
algorithms) to process a large “N”.
• Big Data reveals insights with a particular range of data
points, while Thick Data reveals the social context of and
connections between data points. Big Data delivers numbers;
thick data delivers stories. Big data relies on machine
learning; thick data relies on human learning.
http://ethnographymatters.net/blog/2013/05/13/big-data-needs-thick-data/ (Tricia Wang)
Data Driven CI is Meaningless
Without Human/Domain
Knowledge
http://www.wired.com/2014/04/your-big-data-is-worthless-if-you-dont-bring-it-into-the-real-
world/
Recap
• Data Mining for CI is not new
• Refinement and Improvement
• Bigger, Weirder Data
Recap
• Where it’s at: Hybrid Schemas
• Thick Data, not just Big Data
• HUMAN ELEMENT IS ESSENTIAL
Questions?
Elaine Lasda Bergman
University at Albany
http://www.slideshare.net/librarian68
elasdabergman@albany.edu
@ElaineLibrarian

Mais conteúdo relacionado

Mais procurados

Research Data Services at the University of Utah
Research Data Services at the University of UtahResearch Data Services at the University of Utah
Research Data Services at the University of UtahRebekah Cummings
 
Why science needs open data – Jisc and CNI conference 10 July 2014
Why science needs open data – Jisc and CNI conference 10 July 2014Why science needs open data – Jisc and CNI conference 10 July 2014
Why science needs open data – Jisc and CNI conference 10 July 2014Jisc
 
SPARC Repositories conference in Baltimore - Nov 2010
SPARC Repositories conference in Baltimore - Nov 2010SPARC Repositories conference in Baltimore - Nov 2010
SPARC Repositories conference in Baltimore - Nov 2010Jisc
 
Data Management for Undergraduate Researchers
Data Management for Undergraduate ResearchersData Management for Undergraduate Researchers
Data Management for Undergraduate ResearchersRebekah Cummings
 
Linked Open Data for Digital Humanities
Linked Open Data for Digital HumanitiesLinked Open Data for Digital Humanities
Linked Open Data for Digital HumanitiesChristophe Guéret
 
Research Data Management at the University of Edinburgh
Research Data Management at the University of EdinburghResearch Data Management at the University of Edinburgh
Research Data Management at the University of EdinburghEDINA, University of Edinburgh
 
RDM and DMP intro
RDM and DMP introRDM and DMP intro
RDM and DMP introSarah Jones
 
LEARN Conference - How to cost
LEARN Conference - How to costLEARN Conference - How to cost
LEARN Conference - How to costJisc RDM
 
Research Data Management in the Humanities and Social Sciences
Research Data Management in the Humanities and Social SciencesResearch Data Management in the Humanities and Social Sciences
Research Data Management in the Humanities and Social SciencesCelia Emmelhainz
 
Research as infrastructure, Digital Humanities Congress, Sheffield 2012
Research as infrastructure, Digital Humanities Congress, Sheffield 2012Research as infrastructure, Digital Humanities Congress, Sheffield 2012
Research as infrastructure, Digital Humanities Congress, Sheffield 2012University of South Australlia
 
Who owns the data? Intellectual property considerations for academic research...
Who owns the data? Intellectual property considerations for academic research...Who owns the data? Intellectual property considerations for academic research...
Who owns the data? Intellectual property considerations for academic research...Rebekah Cummings
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data ManagementSarah Jones
 
Data Management and Horizon 2020
Data Management and Horizon 2020Data Management and Horizon 2020
Data Management and Horizon 2020Sarah Jones
 
RDM LIASA webinar
RDM LIASA webinarRDM LIASA webinar
RDM LIASA webinarSarah Jones
 
NISO Webinar on data curation services at the CDL
NISO Webinar on data curation services at the CDLNISO Webinar on data curation services at the CDL
NISO Webinar on data curation services at the CDLCarly Strasser
 
ANDS and Data Management
ANDS and Data ManagementANDS and Data Management
ANDS and Data ManagementJulia Gross
 
SLIDES | 12 time-saving tips for research support
SLIDES | 12 time-saving tips for research supportSLIDES | 12 time-saving tips for research support
SLIDES | 12 time-saving tips for research supportLibrary_Connect
 
Data Management for Undergraduate Research
Data Management for Undergraduate ResearchData Management for Undergraduate Research
Data Management for Undergraduate ResearchRebekah Cummings
 

Mais procurados (20)

Research Data Services at the University of Utah
Research Data Services at the University of UtahResearch Data Services at the University of Utah
Research Data Services at the University of Utah
 
Why science needs open data – Jisc and CNI conference 10 July 2014
Why science needs open data – Jisc and CNI conference 10 July 2014Why science needs open data – Jisc and CNI conference 10 July 2014
Why science needs open data – Jisc and CNI conference 10 July 2014
 
SPARC Repositories conference in Baltimore - Nov 2010
SPARC Repositories conference in Baltimore - Nov 2010SPARC Repositories conference in Baltimore - Nov 2010
SPARC Repositories conference in Baltimore - Nov 2010
 
Data Management for Undergraduate Researchers
Data Management for Undergraduate ResearchersData Management for Undergraduate Researchers
Data Management for Undergraduate Researchers
 
Linked Open Data for Digital Humanities
Linked Open Data for Digital HumanitiesLinked Open Data for Digital Humanities
Linked Open Data for Digital Humanities
 
Research Data Management at the University of Edinburgh
Research Data Management at the University of EdinburghResearch Data Management at the University of Edinburgh
Research Data Management at the University of Edinburgh
 
RDM and DMP intro
RDM and DMP introRDM and DMP intro
RDM and DMP intro
 
LEARN Conference - How to cost
LEARN Conference - How to costLEARN Conference - How to cost
LEARN Conference - How to cost
 
Research Data Management in the Humanities and Social Sciences
Research Data Management in the Humanities and Social SciencesResearch Data Management in the Humanities and Social Sciences
Research Data Management in the Humanities and Social Sciences
 
Research as infrastructure, Digital Humanities Congress, Sheffield 2012
Research as infrastructure, Digital Humanities Congress, Sheffield 2012Research as infrastructure, Digital Humanities Congress, Sheffield 2012
Research as infrastructure, Digital Humanities Congress, Sheffield 2012
 
Who owns the data? Intellectual property considerations for academic research...
Who owns the data? Intellectual property considerations for academic research...Who owns the data? Intellectual property considerations for academic research...
Who owns the data? Intellectual property considerations for academic research...
 
Introduction to Research Data Management - 2017-02-15 - MPLS Division, Univer...
Introduction to Research Data Management - 2017-02-15 - MPLS Division, Univer...Introduction to Research Data Management - 2017-02-15 - MPLS Division, Univer...
Introduction to Research Data Management - 2017-02-15 - MPLS Division, Univer...
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data Management
 
Data Management and Horizon 2020
Data Management and Horizon 2020Data Management and Horizon 2020
Data Management and Horizon 2020
 
RDM LIASA webinar
RDM LIASA webinarRDM LIASA webinar
RDM LIASA webinar
 
NISO Webinar on data curation services at the CDL
NISO Webinar on data curation services at the CDLNISO Webinar on data curation services at the CDL
NISO Webinar on data curation services at the CDL
 
ANDS and Data Management
ANDS and Data ManagementANDS and Data Management
ANDS and Data Management
 
SLIDES | 12 time-saving tips for research support
SLIDES | 12 time-saving tips for research supportSLIDES | 12 time-saving tips for research support
SLIDES | 12 time-saving tips for research support
 
Research Data Management Plan: How to Write One - 2017-02-01 - University of ...
Research Data Management Plan: How to Write One - 2017-02-01 - University of ...Research Data Management Plan: How to Write One - 2017-02-01 - University of ...
Research Data Management Plan: How to Write One - 2017-02-01 - University of ...
 
Data Management for Undergraduate Research
Data Management for Undergraduate ResearchData Management for Undergraduate Research
Data Management for Undergraduate Research
 

Semelhante a Bionic Info Pro - Taxonomies and Machine Learning SLA 2014

No Free Lunch: Metadata in the life sciences
No Free Lunch:  Metadata in the life sciencesNo Free Lunch:  Metadata in the life sciences
No Free Lunch: Metadata in the life sciencesChris Dwan
 
Pemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxPemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxelisarosa29
 
Data Science Consulting at ThoughtWorks -- NYC Open Data Meetup
Data Science Consulting at ThoughtWorks -- NYC Open Data MeetupData Science Consulting at ThoughtWorks -- NYC Open Data Meetup
Data Science Consulting at ThoughtWorks -- NYC Open Data MeetupDavid Johnston
 
Large Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine LearningLarge Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine Learningjaumebp
 
Converged IT and Data Commons
Converged IT and Data CommonsConverged IT and Data Commons
Converged IT and Data CommonsSimon Twigger
 
Gettingstartedwithdigitalcollectionsweb[1]
Gettingstartedwithdigitalcollectionsweb[1]Gettingstartedwithdigitalcollectionsweb[1]
Gettingstartedwithdigitalcollectionsweb[1]guest410707c
 
Modern Database Systems - Lecture 00
Modern Database Systems - Lecture 00Modern Database Systems - Lecture 00
Modern Database Systems - Lecture 00Michael Mathioudakis
 
Seminar presentation
Seminar presentationSeminar presentation
Seminar presentationKlawal13
 
H2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupH2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupSri Ambati
 
Mining Institutional Knowledge: Using Text and Data Mining to Enhance Discovery
Mining Institutional Knowledge: Using Text and Data Mining to Enhance DiscoveryMining Institutional Knowledge: Using Text and Data Mining to Enhance Discovery
Mining Institutional Knowledge: Using Text and Data Mining to Enhance DiscoveryMary Ellen Bates
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and LibariesRob Grim
 
Planning for Research Data Management
Planning for Research Data ManagementPlanning for Research Data Management
Planning for Research Data Managementdancrane_open
 
(Big) Data (Science) Skills
(Big) Data (Science) Skills(Big) Data (Science) Skills
(Big) Data (Science) SkillsOscar Corcho
 
Planning for Research Data Management: 26th January 2016
Planning for Research Data Management: 26th January 2016Planning for Research Data Management: 26th January 2016
Planning for Research Data Management: 26th January 2016IzzyChad
 
Data Science Training and Placement
Data Science Training and PlacementData Science Training and Placement
Data Science Training and PlacementAkhilGGM
 

Semelhante a Bionic Info Pro - Taxonomies and Machine Learning SLA 2014 (20)

No Free Lunch: Metadata in the life sciences
No Free Lunch:  Metadata in the life sciencesNo Free Lunch:  Metadata in the life sciences
No Free Lunch: Metadata in the life sciences
 
Pemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxPemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptx
 
TOUG Big Data Challenge and Impact
TOUG Big Data Challenge and ImpactTOUG Big Data Challenge and Impact
TOUG Big Data Challenge and Impact
 
Data Science Consulting at ThoughtWorks -- NYC Open Data Meetup
Data Science Consulting at ThoughtWorks -- NYC Open Data MeetupData Science Consulting at ThoughtWorks -- NYC Open Data Meetup
Data Science Consulting at ThoughtWorks -- NYC Open Data Meetup
 
Large Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine LearningLarge Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine Learning
 
Database part1-
Database part1-Database part1-
Database part1-
 
Converged IT and Data Commons
Converged IT and Data CommonsConverged IT and Data Commons
Converged IT and Data Commons
 
Gettingstartedwithdigitalcollectionsweb[1]
Gettingstartedwithdigitalcollectionsweb[1]Gettingstartedwithdigitalcollectionsweb[1]
Gettingstartedwithdigitalcollectionsweb[1]
 
Modern Database Systems - Lecture 00
Modern Database Systems - Lecture 00Modern Database Systems - Lecture 00
Modern Database Systems - Lecture 00
 
Seminar presentation
Seminar presentationSeminar presentation
Seminar presentation
 
H2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupH2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User Group
 
Mining Institutional Knowledge: Using Text and Data Mining to Enhance Discovery
Mining Institutional Knowledge: Using Text and Data Mining to Enhance DiscoveryMining Institutional Knowledge: Using Text and Data Mining to Enhance Discovery
Mining Institutional Knowledge: Using Text and Data Mining to Enhance Discovery
 
Big Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARLBig Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARL
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and Libaries
 
Planning for Research Data Management
Planning for Research Data ManagementPlanning for Research Data Management
Planning for Research Data Management
 
Critical infrastructure to promote data synthesis
Critical infrastructure to promote data synthesis Critical infrastructure to promote data synthesis
Critical infrastructure to promote data synthesis
 
Deep learning for NLP
Deep learning for NLPDeep learning for NLP
Deep learning for NLP
 
(Big) Data (Science) Skills
(Big) Data (Science) Skills(Big) Data (Science) Skills
(Big) Data (Science) Skills
 
Planning for Research Data Management: 26th January 2016
Planning for Research Data Management: 26th January 2016Planning for Research Data Management: 26th January 2016
Planning for Research Data Management: 26th January 2016
 
Data Science Training and Placement
Data Science Training and PlacementData Science Training and Placement
Data Science Training and Placement
 

Mais de Elaine Lasda

Your Systematic Review: Getting Started
Your Systematic Review: Getting StartedYour Systematic Review: Getting Started
Your Systematic Review: Getting StartedElaine Lasda
 
Research Impact in Specialized Settings: 3 Case Studies
Research Impact in Specialized Settings: 3 Case StudiesResearch Impact in Specialized Settings: 3 Case Studies
Research Impact in Specialized Settings: 3 Case StudiesElaine Lasda
 
The New Metrics: conference presentation
The New Metrics: conference presentationThe New Metrics: conference presentation
The New Metrics: conference presentationElaine Lasda
 
Maximizing Your Research Impact: 5 Quick Hits!
Maximizing Your Research Impact: 5 Quick Hits!Maximizing Your Research Impact: 5 Quick Hits!
Maximizing Your Research Impact: 5 Quick Hits!Elaine Lasda
 
Scholarly Metrics in Specialized Settings
Scholarly Metrics in Specialized SettingsScholarly Metrics in Specialized Settings
Scholarly Metrics in Specialized SettingsElaine Lasda
 
Personal Time Management
Personal Time ManagementPersonal Time Management
Personal Time ManagementElaine Lasda
 
Early Career Tactics to Increase Scholarly Impact
Early Career Tactics to Increase Scholarly ImpactEarly Career Tactics to Increase Scholarly Impact
Early Career Tactics to Increase Scholarly ImpactElaine Lasda
 
Computers in Libraries 2018 Workshop on Scholarly Metrics
Computers in Libraries 2018 Workshop on Scholarly MetricsComputers in Libraries 2018 Workshop on Scholarly Metrics
Computers in Libraries 2018 Workshop on Scholarly MetricsElaine Lasda
 
Computers in Libraries Scholarly Metrics Freebies
Computers in Libraries Scholarly Metrics FreebiesComputers in Libraries Scholarly Metrics Freebies
Computers in Libraries Scholarly Metrics FreebiesElaine Lasda
 
Data Literacy for Librarians - Day 2
Data Literacy for Librarians - Day 2Data Literacy for Librarians - Day 2
Data Literacy for Librarians - Day 2Elaine Lasda
 
Data Literacy for Librarians
Data Literacy for LibrariansData Literacy for Librarians
Data Literacy for LibrariansElaine Lasda
 
UAlbany Open Access Day Presentation on OER Grant
UAlbany Open Access Day Presentation on OER GrantUAlbany Open Access Day Presentation on OER Grant
UAlbany Open Access Day Presentation on OER GrantElaine Lasda
 
Open Educational Resources Faculty Workshop
Open Educational Resources Faculty WorkshopOpen Educational Resources Faculty Workshop
Open Educational Resources Faculty WorkshopElaine Lasda
 
Data and Libraries: How I learned to stop worrying and love the spreadsheet
Data and Libraries: How I learned to stop worrying and love the spreadsheetData and Libraries: How I learned to stop worrying and love the spreadsheet
Data and Libraries: How I learned to stop worrying and love the spreadsheetElaine Lasda
 
Altmetrics & Scholarly Publishing: the LIbrary Lay of the Land
Altmetrics & Scholarly Publishing: the LIbrary Lay of the LandAltmetrics & Scholarly Publishing: the LIbrary Lay of the Land
Altmetrics & Scholarly Publishing: the LIbrary Lay of the LandElaine Lasda
 
From Reputation to Citation: Varying Roles for Scholarly Metrics
From Reputation to Citation: Varying Roles for Scholarly MetricsFrom Reputation to Citation: Varying Roles for Scholarly Metrics
From Reputation to Citation: Varying Roles for Scholarly MetricsElaine Lasda
 
Open Educational Resources (OERs): A Game Changer For Higher Ed
Open Educational Resources (OERs): A Game Changer For Higher EdOpen Educational Resources (OERs): A Game Changer For Higher Ed
Open Educational Resources (OERs): A Game Changer For Higher EdElaine Lasda
 
Research Impact Roadshow
Research Impact RoadshowResearch Impact Roadshow
Research Impact RoadshowElaine Lasda
 
Gaining Insights Through Bibliometric Analysis
Gaining Insights Through Bibliometric AnalysisGaining Insights Through Bibliometric Analysis
Gaining Insights Through Bibliometric AnalysisElaine Lasda
 
Getting "Fancy" With Your Library Data!
Getting "Fancy" With Your Library Data!Getting "Fancy" With Your Library Data!
Getting "Fancy" With Your Library Data!Elaine Lasda
 

Mais de Elaine Lasda (20)

Your Systematic Review: Getting Started
Your Systematic Review: Getting StartedYour Systematic Review: Getting Started
Your Systematic Review: Getting Started
 
Research Impact in Specialized Settings: 3 Case Studies
Research Impact in Specialized Settings: 3 Case StudiesResearch Impact in Specialized Settings: 3 Case Studies
Research Impact in Specialized Settings: 3 Case Studies
 
The New Metrics: conference presentation
The New Metrics: conference presentationThe New Metrics: conference presentation
The New Metrics: conference presentation
 
Maximizing Your Research Impact: 5 Quick Hits!
Maximizing Your Research Impact: 5 Quick Hits!Maximizing Your Research Impact: 5 Quick Hits!
Maximizing Your Research Impact: 5 Quick Hits!
 
Scholarly Metrics in Specialized Settings
Scholarly Metrics in Specialized SettingsScholarly Metrics in Specialized Settings
Scholarly Metrics in Specialized Settings
 
Personal Time Management
Personal Time ManagementPersonal Time Management
Personal Time Management
 
Early Career Tactics to Increase Scholarly Impact
Early Career Tactics to Increase Scholarly ImpactEarly Career Tactics to Increase Scholarly Impact
Early Career Tactics to Increase Scholarly Impact
 
Computers in Libraries 2018 Workshop on Scholarly Metrics
Computers in Libraries 2018 Workshop on Scholarly MetricsComputers in Libraries 2018 Workshop on Scholarly Metrics
Computers in Libraries 2018 Workshop on Scholarly Metrics
 
Computers in Libraries Scholarly Metrics Freebies
Computers in Libraries Scholarly Metrics FreebiesComputers in Libraries Scholarly Metrics Freebies
Computers in Libraries Scholarly Metrics Freebies
 
Data Literacy for Librarians - Day 2
Data Literacy for Librarians - Day 2Data Literacy for Librarians - Day 2
Data Literacy for Librarians - Day 2
 
Data Literacy for Librarians
Data Literacy for LibrariansData Literacy for Librarians
Data Literacy for Librarians
 
UAlbany Open Access Day Presentation on OER Grant
UAlbany Open Access Day Presentation on OER GrantUAlbany Open Access Day Presentation on OER Grant
UAlbany Open Access Day Presentation on OER Grant
 
Open Educational Resources Faculty Workshop
Open Educational Resources Faculty WorkshopOpen Educational Resources Faculty Workshop
Open Educational Resources Faculty Workshop
 
Data and Libraries: How I learned to stop worrying and love the spreadsheet
Data and Libraries: How I learned to stop worrying and love the spreadsheetData and Libraries: How I learned to stop worrying and love the spreadsheet
Data and Libraries: How I learned to stop worrying and love the spreadsheet
 
Altmetrics & Scholarly Publishing: the LIbrary Lay of the Land
Altmetrics & Scholarly Publishing: the LIbrary Lay of the LandAltmetrics & Scholarly Publishing: the LIbrary Lay of the Land
Altmetrics & Scholarly Publishing: the LIbrary Lay of the Land
 
From Reputation to Citation: Varying Roles for Scholarly Metrics
From Reputation to Citation: Varying Roles for Scholarly MetricsFrom Reputation to Citation: Varying Roles for Scholarly Metrics
From Reputation to Citation: Varying Roles for Scholarly Metrics
 
Open Educational Resources (OERs): A Game Changer For Higher Ed
Open Educational Resources (OERs): A Game Changer For Higher EdOpen Educational Resources (OERs): A Game Changer For Higher Ed
Open Educational Resources (OERs): A Game Changer For Higher Ed
 
Research Impact Roadshow
Research Impact RoadshowResearch Impact Roadshow
Research Impact Roadshow
 
Gaining Insights Through Bibliometric Analysis
Gaining Insights Through Bibliometric AnalysisGaining Insights Through Bibliometric Analysis
Gaining Insights Through Bibliometric Analysis
 
Getting "Fancy" With Your Library Data!
Getting "Fancy" With Your Library Data!Getting "Fancy" With Your Library Data!
Getting "Fancy" With Your Library Data!
 

Último

Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 

Último (20)

Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 

Bionic Info Pro - Taxonomies and Machine Learning SLA 2014

  • 1. Bionic Info Pro: New Takes on an Old Theme Machine Learning, Taxonomy Creation, Big Data, Competitive Intelligence, and the Human Element Elaine M. Lasda Bergman Annual Conference Special Libraries Association Vancouver, BC, Canada Monday, June 9, 2014
  • 2. Overview • A little bit about Machine Learning • A little bit about Taxonomies • A little bit about Big Data • A little bit about Hybrid Techniques
  • 3. NOT NEW: Machine Learning for CI Mena, Jesus. (1996). Data Mining for Competitive Intelligence, Competitive Intelligence Review, 7(4):18-25.
  • 4. Refinement of Machine Learning • Decision Trees/Classification • Clustering • Anomaly Detection
  • 5. Refinement of Machine Learning • Support Vector Machines- – Predictive Classification • Association Rules – Marketbasket analysis • Natural Language Processing – Sentiment Analysis
  • 6. Getting up to Speed • http://efytimes.com • 6 Video Tutorials and Playlists on Machine Learning (January 2014)
  • 7. NOT NEW: Taxonomies in Information Retrieval http://comsaad.blogspot.com/p/old-computer-photos.html http://commons.wikimedia.org/wiki/File:A_Library_Primer_illustration_Joined_Hand.jpg
  • 8. Need for Taxonomic Structures http://farm9.staticflickr.com/8262/8673326413_4492b5dc68_o.jpg
  • 11. BigData Sources and AnalysisDataType Qualities Analysis Tools Result Social Media Demographics API integration More profiles of like- minded users “Social Influencers” User Reviews NLP, Text Analysis Sentiment readings “Internet of Things” Logs/Sensors/Check-Ins Parsing Usage and behavior patterns SaaS Cloud/Web-based/Subscription software Dist. data integration/in-memory caching technology/API integration Usage behavior patterns, customer data, etc. Public Data e.g., Amazon Data Market, WorldBank, Wikipedia All above (depends on data structure) Depends on Dataset (and there are LOTS of them!) Hadoop/MapReduce Volume! Parallel Processing/Parsing/Reduction Big patterns, correlations, needles in haystacks Data Warehouses Internal transactional data Likely same as above Correlations, marketbasket, etc. NoSQL/Columnar Volume! Fills gaps in Parallel processing tools Real time activity and patterns In-Stream Monitoring Network traffic (streaming videos, system outages) Packet evaluation, distributed query processing Network/Stream usage patterns Legacy Data Usually PDFs & Documents/SemiStructured Transformation tools(eg, Xenos d2e) + above Depends on content (could be all) http://www.zdnet.com/top-10-categories-for-big-data-sources-and-mining-technologies-7000000926/
  • 12. Why “Concept Hierarchies” in an Unstructured Environment?
  • 13. Advantages • When term is too low to appear in frequent item/rulesets • Create more interesting rules using more general, aggregated concepts [DVD, wheat bread, home electronics, electronitcs, food] Kumar, T.S. (2005) Introduction to Data Science
  • 14. Disadvantages • How low and how high in the hierarchy do you set the threshold? • Increased computation time • If threshold is to high, redundant rules for more specific terms can be summarized by rules using more general terms
  • 15. Hybrid Taxonomic Development • Understand your auto-classification model • Work with domain experts to create basic taxonomy • Test Taxonomy in the Model • Rinse, repeat Wendy Pohs,ASIS&T Bulletin 12/1/13
  • 16. Domain Knowledge and Thick Data • Thick Data analysis primarily relies on human brain power to process a small “N” while big data analysis requires computational power (of course with humans writing the algorithms) to process a large “N”. • Big Data reveals insights with a particular range of data points, while Thick Data reveals the social context of and connections between data points. Big Data delivers numbers; thick data delivers stories. Big data relies on machine learning; thick data relies on human learning. http://ethnographymatters.net/blog/2013/05/13/big-data-needs-thick-data/ (Tricia Wang)
  • 17. Data Driven CI is Meaningless Without Human/Domain Knowledge http://www.wired.com/2014/04/your-big-data-is-worthless-if-you-dont-bring-it-into-the-real- world/
  • 18. Recap • Data Mining for CI is not new • Refinement and Improvement • Bigger, Weirder Data
  • 19. Recap • Where it’s at: Hybrid Schemas • Thick Data, not just Big Data • HUMAN ELEMENT IS ESSENTIAL
  • 20. Questions? Elaine Lasda Bergman University at Albany http://www.slideshare.net/librarian68 elasdabergman@albany.edu @ElaineLibrarian

Notas do Editor

  1. Not an expert, I am a “LEARNER” a student
  2. “automatic discovery of patterns using software to analyze vast amounts of records in a database” What else was going on in techi n 1996
  3. The 1996 article mentioned transactional data, “all the rage” Marketing, Infentory, Risk mitigation Efficiency and waste allow us to formulate solutions in englisn
  4. “Library Hand” – we’ve been doing indexing, taxonomies, classsification since the beginning of our profession Machine created taxonomies are not new, text mining, extraction, and indexing have been automated since the 1960s. The earliest I could find was a paper published by the RAND corporation in 1961
  5. Wider need for classification- Building Enterprise Taxonomies, Stewart The pendulum – “searching” versus “browsing” paradigms Search = lack of context, precision versus recall, relevancy ranking, choice of terminology Proper syntax for each search tool, where to search? Spelling variants, bad labels Where do we find taxonomies and ontologies today? Here are some of their natural habitats Web sites Discipline/Domain Classification Machine Learning Algorithms Training dataset and a testing dataset. As heather points out in her book the Accidental Taxonoist, the efficacy of machine created taxonomies improves dramatically with human quality control
  6. Relational DBs – ENTITY RELATIONSHIP Legacy systems Hierarchical models Network models Diagram for a realtional database is in rows and columns, Classes, variables, attributies, qualities, fields observations instances, records, cases
  7. NoSQL Multimedia Unstructured
  8. Andrew Brust “Bigger data means weirder data” <-Jeffry Stanton in Intro to Data Science book
  9. Big Data a revolution that will transform how we live work and think Weed out data noise Algorithms can be programed with human quality control to account for redundancy and catch inconsistencies, different terms http://it.toolbox.com/blogs/irm-blog/the-benefits-of-a-data-taxonomy-4916 https://www.earley.com/blog/why-taxonomy-critical-master-data-management-mdm
  10. Autoclassification model: Linguistic/lexical: gather and rank representative words and phrases that are associated with the concepts to be classified;  Rules Based: no common syntax for developing rules; varies by tool. Rules syntax could be Boolean to the more complex syntax more commonly used in programming languages. Because of this lack of consistency, the people who create and maintain these rules will have a more specialized skill set and will require more training. Machine Learning/Predictive: And these systems rely on iteration to continuously validate. Traditional hierarchical taxonomy may not be needed, reference terms or document sets to model. Maintenance of machine learning systems = repeated training, especially when you add new content. You will also help revise the larger machine-learning model as you learn more about your content.
  11. Examples of Domain Knowledge -Big data revolution book – buliding inspectors needed to predict which buildings should have priority inspections wEb design for user generated content – automatically ccategorizes user driven content but taxonomy is refined by humans As refined, the autoclassifier improves,”gets smarter” We as knowledge experts fill in the gaps! We can be facilitators with those in the field/analysts and those programming the algorithms
  12. Example of meaningless data: Google Flu trends Scientific controlled experiments limit external sources, domain knowledge fills in the gaps in the real world data analysis http://www.wired.com/2014/04/your-big-data-is-worthless-if-you-dont-bring-it-into-the-real-world/