SlideShare uma empresa Scribd logo
1 de 52
BIG DATA
What is it and Why Do We Care?
Elaine M. Lasda Bergman
University at Albany
March 6, 2014
elasdabergman@albany.edu
Webinar Presentation
for the Special Libraries Association
What we’re going to cover today
• What is Big Data
• What is great about Big Data
• What is not so great
• The role of Librarians and Info Pros in the Big
Data landscape
• Tools and Resources
How Big is Big?
http://breadboxes.info/files/2012/01/bread-box.jpg
The Three Vs
•Variety
•Velocity
•Volume
Big Data Vs Open Data
Based on http://www.opendatanow.com/2013/11/new-big-data-vs-open-data-mapping-it-out/
BIG DATA OPEN GOV’T
OPEN DATA
Is Big Data a Game Changer?
http://bellwethergames.com/images/stories/blog/salvaged%20bits.jpg
Types of Data Scientists
• Statistics
• Mathematics
• Data Engineering
• Machine Learning
• Business
• Software engineering
• Visualization
• GIS
http://www.datasciencecentral.com/profiles/blogs/six-categories-of-data-scientists
Big Data is FANTASTIC!
http://4206e9.medialib.glogster.com/media/6bde80470b0f0ffe3b59b390fcb54a117c65f2406a167bd2589cabc3e9601461/excited-smiley-face.jpg
Applications of Big Data (in general)
ttp://analytics-arena.blogspot.com/2012/12/the-famous-beer-diaper-planogram.html
BIG Data is TERRIBLE!
http://startupmixology.tech.co/2010-chicago/staff/harper-reed
Caveats and limitations
http://www.guy-sports.com/fun_pictures/no_brain.jpg
False Correlations
http://www.cdc.gov/healthyweight/images/height.jpg
http://www.sbsd.k12.ca.us/cms/lib02/CA01001886/Centricity/Domain/569/kids_reading.jpg
Competencies for Info Pros/Librarians
Add Data Literacy!
http://remc12.wikispaces.com/file/view/InformationLit.jpg/32256581/InformationLit.jpg
What We Just Talked About
• The Three V’s
• Amazing Capabilities
• The Human Element
• Our Roles as Information Professionals
Now the Fun Stuff!
http://www.whee.com.sg/images/common/logo-whee.png
Read!
• Big Data: A Revolution that Will Transform
How We Live, Work, and Think, by Viktor
Mayer-Schonberger http://www.amazon.com/Big-Data-Revolution-
Transform-Think/dp/0544002695
• “For Dummies” Books
Read!
• An Introduction to Data Science, by Jeffrey
Stanton http://jsresearch.net/
• Frontiers in Massive Data Analysis
http://www.nap.edu/catalog.php?record_id=18374
General Resource Lists/Training
• Syracuse University Library Guide on Data Science
http://researchguides.library.syr.edu/datascience
• ALA ACRL “Keeping Up With Big Data” page
http://www.ala.org/acrl/publications/keeping_up_with/big_data
• Data Information Literacy at Purdue wiki
http://wiki.lib.purdue.edu/display/ste/Home
• MOOCs
Policy/Best Practices
• Council For Big Data, Ethics and Society
http://www.datasociety.net/initiatives/council-for-big-data-ethics-and-society/
• Research Data Management Principles, Practices, and
Prospects – CLIR
http://www.clir.org/pubs/reports/pub160
Policy/Best Practices
• Rebuilding the Mosaic
http://www.nsf.gov/pubs/2011/nsf11086/nsf11086.pdf
• GovLab
http://thegovlab.org/
• Terminology issues
Keep Current
Newsletters
• Data Science Weekly http://www.datascienceweekly.org/
• Data Science Central http://www.datasciencecentral.com/
• R-Bloggers http://www.r-bloggers.com/
Keep Current
Blogs
• Hilary Mason http://www.hilarymason.com/
• Mathbabe http://mathbabe.org/
• Bits Blog in NY Times http://bits.blogs.nytimes.com/
• No Free Hunch http://blog.kaggle.com/
• What’s the Big Data http://whatsthebigdata.com/
PLAY!
http://brainysmurf1234.files.wordpress.com/2011/10/sand-castle.png
Big, Open Data Sources
http://lightworkersalliance.com/wp-content/uploads/2011/06/Open-Door1.jpg
Google Data Explorer
https://www.google.com/publicdata/directory
Amazon Web Services
http://aws.amazon.com/
Scale Unlimited
http://www.scaleunlimited.com/datasets/public-datasets/
Database Structure/Data Analysis
• R http://cran.us.r-project.org/
• Hive/Hadoop http://hive.apache.org/
• PostgreSQL http://www.postgresql.org/
• Project Bamboo Dirt http://dirt.projectbamboo.org/
• Mlcomp http://mlcomp.org/
Visualization tools
http://us.123rf.com/400wm/400/400/lucadp/lucadp1204/lucadp120400012/13060060-one-crystal-ball-with-a-bar-chart-inside-it-a-concept-of-financial-and-business-forecasts-3d-render.jpg
Piktochart
http://piktochart.com/
Esri
http://www.esri.com/
Big ML
https://bigml.com/
ManyEyes
http://www-958.ibm.com/software/analytics/manyeyes/
Google Fusion Tables
https://support.google.com/fusiontables/answer/2571232?hl=en
Chartsbin
http://chartsbin.com/
iCharts
http://www.icharts.net/
Just Plain Cool!
http://images5.fanpop.com/image/photos/30600000/The-Fonz-arthur-fonzarelli-30631370-621-362.jpg
CSSeer
http://csseer.ist.psu.edu/
StreetBump
http://streetbump.org/
My Magic Plus
https://disneyworld.disney.go.com/plan/my-disney-experience/my-magic-plus/
Information is Beautiful
http://www.informationisbeautiful.net/
Facebook’s Data Science Page
https://www.facebook.com/data
Google Trends
http://www.google.com/trends/
Flowing Data
http://flowingdata.com/
GapMinder
http://www.gapminder.org/
One Final Note:
Professional Development
SLA Data Caucus initiative!
IASSIST http://www.iassistdata.org/
ASIS&T http://www.asis.org/
LinkedIN Groups see:
http://researchguides.library.syr.edu/content.php?pid=484454&sid=4078160
Contact Me
Elaine Lasda Bergman
elasdabergman@albany.edu
http://www.slideshare.net/librarian68/
@ElaineLibrarian on Twitter

Mais conteúdo relacionado

Mais procurados

Iris ai and academia.edu.
Iris ai and academia.edu. Iris ai and academia.edu.
Iris ai and academia.edu. Amal Jith
 
Let's Get Visible! with Karla Smith, Winnefox Library System
Let's Get Visible! with Karla Smith, Winnefox Library SystemLet's Get Visible! with Karla Smith, Winnefox Library System
Let's Get Visible! with Karla Smith, Winnefox Library SystemWiLS
 
Capacity Building: Data Science in the University At Rensselaer Polytechnic ...
Capacity Building: Data Science in the University  At Rensselaer Polytechnic ...Capacity Building: Data Science in the University  At Rensselaer Polytechnic ...
Capacity Building: Data Science in the University At Rensselaer Polytechnic ...James Hendler
 
20160414 23 Research Data Things
20160414 23 Research Data Things20160414 23 Research Data Things
20160414 23 Research Data ThingsKatina Toufexis
 
From Biology to Industry. A Blogger’s Journey to Data Science.
From Biology to Industry. A Blogger’s Journey to Data Science.From Biology to Industry. A Blogger’s Journey to Data Science.
From Biology to Industry. A Blogger’s Journey to Data Science.Shirin Elsinghorst
 
Big Data - Introduction and Research Topics - for Dutch Kadaster
Big Data - Introduction and Research Topics - for Dutch KadasterBig Data - Introduction and Research Topics - for Dutch Kadaster
Big Data - Introduction and Research Topics - for Dutch KadasterJust van den Broecke
 
Academic Research over internet
Academic Research over internetAcademic Research over internet
Academic Research over internetAbdul Wahid Uqaily
 
Discovery of IIIF Resources: Intro for Working Group / Vatican
Discovery of IIIF Resources: Intro for Working Group / VaticanDiscovery of IIIF Resources: Intro for Working Group / Vatican
Discovery of IIIF Resources: Intro for Working Group / VaticanRobert Sanderson
 
20161019-dlc-making-it-happen-together-demonstrating-resilience-thru-successf...
20161019-dlc-making-it-happen-together-demonstrating-resilience-thru-successf...20161019-dlc-making-it-happen-together-demonstrating-resilience-thru-successf...
20161019-dlc-making-it-happen-together-demonstrating-resilience-thru-successf...Andrew Bourgeois
 
JU Analytics Day Presentation by Naveen Agarwal, Creative Analytics Solutions...
JU Analytics Day Presentation by Naveen Agarwal, Creative Analytics Solutions...JU Analytics Day Presentation by Naveen Agarwal, Creative Analytics Solutions...
JU Analytics Day Presentation by Naveen Agarwal, Creative Analytics Solutions...Naveen Agarwal
 

Mais procurados (14)

Iris ai and academia.edu.
Iris ai and academia.edu. Iris ai and academia.edu.
Iris ai and academia.edu.
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
BigDataCSEKeyNote_2012
BigDataCSEKeyNote_2012BigDataCSEKeyNote_2012
BigDataCSEKeyNote_2012
 
Let's Get Visible! with Karla Smith, Winnefox Library System
Let's Get Visible! with Karla Smith, Winnefox Library SystemLet's Get Visible! with Karla Smith, Winnefox Library System
Let's Get Visible! with Karla Smith, Winnefox Library System
 
Capacity Building: Data Science in the University At Rensselaer Polytechnic ...
Capacity Building: Data Science in the University  At Rensselaer Polytechnic ...Capacity Building: Data Science in the University  At Rensselaer Polytechnic ...
Capacity Building: Data Science in the University At Rensselaer Polytechnic ...
 
20160414 23 Research Data Things
20160414 23 Research Data Things20160414 23 Research Data Things
20160414 23 Research Data Things
 
From Biology to Industry. A Blogger’s Journey to Data Science.
From Biology to Industry. A Blogger’s Journey to Data Science.From Biology to Industry. A Blogger’s Journey to Data Science.
From Biology to Industry. A Blogger’s Journey to Data Science.
 
Just Google It
Just Google ItJust Google It
Just Google It
 
Big Data - Introduction and Research Topics - for Dutch Kadaster
Big Data - Introduction and Research Topics - for Dutch KadasterBig Data - Introduction and Research Topics - for Dutch Kadaster
Big Data - Introduction and Research Topics - for Dutch Kadaster
 
Data Science and its impact on society
Data Science and its impact on societyData Science and its impact on society
Data Science and its impact on society
 
Academic Research over internet
Academic Research over internetAcademic Research over internet
Academic Research over internet
 
Discovery of IIIF Resources: Intro for Working Group / Vatican
Discovery of IIIF Resources: Intro for Working Group / VaticanDiscovery of IIIF Resources: Intro for Working Group / Vatican
Discovery of IIIF Resources: Intro for Working Group / Vatican
 
20161019-dlc-making-it-happen-together-demonstrating-resilience-thru-successf...
20161019-dlc-making-it-happen-together-demonstrating-resilience-thru-successf...20161019-dlc-making-it-happen-together-demonstrating-resilience-thru-successf...
20161019-dlc-making-it-happen-together-demonstrating-resilience-thru-successf...
 
JU Analytics Day Presentation by Naveen Agarwal, Creative Analytics Solutions...
JU Analytics Day Presentation by Naveen Agarwal, Creative Analytics Solutions...JU Analytics Day Presentation by Naveen Agarwal, Creative Analytics Solutions...
JU Analytics Day Presentation by Naveen Agarwal, Creative Analytics Solutions...
 

Destaque

Analytics 101 for startups
Analytics 101 for startupsAnalytics 101 for startups
Analytics 101 for startupsGoSquared
 
Internet of things, Big Data and Analytics 101
Internet of things, Big Data and Analytics 101Internet of things, Big Data and Analytics 101
Internet of things, Big Data and Analytics 101Mukul Krishna
 
Google Analytics 101 #SMAMI 2017
Google Analytics 101 #SMAMI 2017Google Analytics 101 #SMAMI 2017
Google Analytics 101 #SMAMI 2017Nicole Bullock
 
Google Analytics 101 | 2015
Google Analytics 101 |  2015Google Analytics 101 |  2015
Google Analytics 101 | 2015Insivia
 

Destaque (6)

Big data 101 v1
Big data 101 v1Big data 101 v1
Big data 101 v1
 
Big Data 101
Big Data 101Big Data 101
Big Data 101
 
Analytics 101 for startups
Analytics 101 for startupsAnalytics 101 for startups
Analytics 101 for startups
 
Internet of things, Big Data and Analytics 101
Internet of things, Big Data and Analytics 101Internet of things, Big Data and Analytics 101
Internet of things, Big Data and Analytics 101
 
Google Analytics 101 #SMAMI 2017
Google Analytics 101 #SMAMI 2017Google Analytics 101 #SMAMI 2017
Google Analytics 101 #SMAMI 2017
 
Google Analytics 101 | 2015
Google Analytics 101 |  2015Google Analytics 101 |  2015
Google Analytics 101 | 2015
 

Semelhante a Data 101- Big Data: What is it and Why Do We Care?

Big and Small Web Data
Big and Small Web DataBig and Small Web Data
Big and Small Web DataMarieke Guy
 
Informatics Transform : Re-engineering Libraries for the Data Decade
Informatics Transform : Re-engineering Libraries for the Data DecadeInformatics Transform : Re-engineering Libraries for the Data Decade
Informatics Transform : Re-engineering Libraries for the Data DecadeLiz Lyon
 
Research Data Management in Academic Libraries: Meeting the Challenge
Research Data Management in Academic Libraries: Meeting the ChallengeResearch Data Management in Academic Libraries: Meeting the Challenge
Research Data Management in Academic Libraries: Meeting the ChallengeSpencer Keralis
 
The purpose, practicalities, pitfalls and policies of managing and sharing da...
The purpose, practicalities, pitfalls and policies of managing and sharing da...The purpose, practicalities, pitfalls and policies of managing and sharing da...
The purpose, practicalities, pitfalls and policies of managing and sharing da...Danny Kingsley
 
NCME Big Data in Education
NCME Big Data  in EducationNCME Big Data  in Education
NCME Big Data in EducationPhilip Piety
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...ICPSR
 
Roadmaps, Roles and Re-engineering: Developing Data Informatics Capability in...
Roadmaps, Roles and Re-engineering: Developing Data Informatics Capability in...Roadmaps, Roles and Re-engineering: Developing Data Informatics Capability in...
Roadmaps, Roles and Re-engineering: Developing Data Informatics Capability in...LIBER Europe
 
Supporting Libraries in Leading the Way in Research Data Management
Supporting Libraries in Leading the Way in Research Data ManagementSupporting Libraries in Leading the Way in Research Data Management
Supporting Libraries in Leading the Way in Research Data ManagementMarieke Guy
 
Winter school in research data science research data management - final
Winter school in research data science research data management - finalWinter school in research data science research data management - final
Winter school in research data science research data management - finalARDC
 
ICPSR Data Services
ICPSR Data ServicesICPSR Data Services
ICPSR Data ServicesICPSR
 
Andrew Cox Research data management
Andrew Cox Research data managementAndrew Cox Research data management
Andrew Cox Research data managementIncisive_Events
 
Fsci 2018 thursday2_august_am6
Fsci 2018 thursday2_august_am6Fsci 2018 thursday2_august_am6
Fsci 2018 thursday2_august_am6ARDC
 
Teaching Data Science to Undergraduate Students
Teaching Data Science to Undergraduate StudentsTeaching Data Science to Undergraduate Students
Teaching Data Science to Undergraduate StudentsNicole Vasilevsky
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data ManagementSarah Jones
 
Managing and sharing data
Managing and sharing dataManaging and sharing data
Managing and sharing dataSarah Jones
 
Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013University of Washington
 
RDAP14: Collaboration and tension between institutions and units providing da...
RDAP14: Collaboration and tension between institutions and units providing da...RDAP14: Collaboration and tension between institutions and units providing da...
RDAP14: Collaboration and tension between institutions and units providing da...ASIS&T
 
Research Data Services at the University of Utah
Research Data Services at the University of UtahResearch Data Services at the University of Utah
Research Data Services at the University of UtahRebekah Cummings
 

Semelhante a Data 101- Big Data: What is it and Why Do We Care? (20)

Big and Small Web Data
Big and Small Web DataBig and Small Web Data
Big and Small Web Data
 
Informatics Transform : Re-engineering Libraries for the Data Decade
Informatics Transform : Re-engineering Libraries for the Data DecadeInformatics Transform : Re-engineering Libraries for the Data Decade
Informatics Transform : Re-engineering Libraries for the Data Decade
 
Sept 18 NISO Webinar: Research Data Curation, Part 2: Libraries and Big Data ...
Sept 18 NISO Webinar: Research Data Curation, Part 2: Libraries and Big Data ...Sept 18 NISO Webinar: Research Data Curation, Part 2: Libraries and Big Data ...
Sept 18 NISO Webinar: Research Data Curation, Part 2: Libraries and Big Data ...
 
Research Data Management in Academic Libraries: Meeting the Challenge
Research Data Management in Academic Libraries: Meeting the ChallengeResearch Data Management in Academic Libraries: Meeting the Challenge
Research Data Management in Academic Libraries: Meeting the Challenge
 
The purpose, practicalities, pitfalls and policies of managing and sharing da...
The purpose, practicalities, pitfalls and policies of managing and sharing da...The purpose, practicalities, pitfalls and policies of managing and sharing da...
The purpose, practicalities, pitfalls and policies of managing and sharing da...
 
NCME Big Data in Education
NCME Big Data  in EducationNCME Big Data  in Education
NCME Big Data in Education
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
 
Roadmaps, Roles and Re-engineering: Developing Data Informatics Capability in...
Roadmaps, Roles and Re-engineering: Developing Data Informatics Capability in...Roadmaps, Roles and Re-engineering: Developing Data Informatics Capability in...
Roadmaps, Roles and Re-engineering: Developing Data Informatics Capability in...
 
Supporting Libraries in Leading the Way in Research Data Management
Supporting Libraries in Leading the Way in Research Data ManagementSupporting Libraries in Leading the Way in Research Data Management
Supporting Libraries in Leading the Way in Research Data Management
 
Winter school in research data science research data management - final
Winter school in research data science research data management - finalWinter school in research data science research data management - final
Winter school in research data science research data management - final
 
ICPSR Data Services
ICPSR Data ServicesICPSR Data Services
ICPSR Data Services
 
Andrew Cox Research data management
Andrew Cox Research data managementAndrew Cox Research data management
Andrew Cox Research data management
 
Fsci 2018 thursday2_august_am6
Fsci 2018 thursday2_august_am6Fsci 2018 thursday2_august_am6
Fsci 2018 thursday2_august_am6
 
00-01 DSnDA.pdf
00-01 DSnDA.pdf00-01 DSnDA.pdf
00-01 DSnDA.pdf
 
Teaching Data Science to Undergraduate Students
Teaching Data Science to Undergraduate StudentsTeaching Data Science to Undergraduate Students
Teaching Data Science to Undergraduate Students
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data Management
 
Managing and sharing data
Managing and sharing dataManaging and sharing data
Managing and sharing data
 
Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013
 
RDAP14: Collaboration and tension between institutions and units providing da...
RDAP14: Collaboration and tension between institutions and units providing da...RDAP14: Collaboration and tension between institutions and units providing da...
RDAP14: Collaboration and tension between institutions and units providing da...
 
Research Data Services at the University of Utah
Research Data Services at the University of UtahResearch Data Services at the University of Utah
Research Data Services at the University of Utah
 

Mais de Elaine Lasda

Your Systematic Review: Getting Started
Your Systematic Review: Getting StartedYour Systematic Review: Getting Started
Your Systematic Review: Getting StartedElaine Lasda
 
Research Impact in Specialized Settings: 3 Case Studies
Research Impact in Specialized Settings: 3 Case StudiesResearch Impact in Specialized Settings: 3 Case Studies
Research Impact in Specialized Settings: 3 Case StudiesElaine Lasda
 
The New Metrics: conference presentation
The New Metrics: conference presentationThe New Metrics: conference presentation
The New Metrics: conference presentationElaine Lasda
 
Maximizing Your Research Impact: 5 Quick Hits!
Maximizing Your Research Impact: 5 Quick Hits!Maximizing Your Research Impact: 5 Quick Hits!
Maximizing Your Research Impact: 5 Quick Hits!Elaine Lasda
 
Scholarly Metrics in Specialized Settings
Scholarly Metrics in Specialized SettingsScholarly Metrics in Specialized Settings
Scholarly Metrics in Specialized SettingsElaine Lasda
 
Personal Time Management
Personal Time ManagementPersonal Time Management
Personal Time ManagementElaine Lasda
 
Early Career Tactics to Increase Scholarly Impact
Early Career Tactics to Increase Scholarly ImpactEarly Career Tactics to Increase Scholarly Impact
Early Career Tactics to Increase Scholarly ImpactElaine Lasda
 
Computers in Libraries 2018 Workshop on Scholarly Metrics
Computers in Libraries 2018 Workshop on Scholarly MetricsComputers in Libraries 2018 Workshop on Scholarly Metrics
Computers in Libraries 2018 Workshop on Scholarly MetricsElaine Lasda
 
Computers in Libraries Scholarly Metrics Freebies
Computers in Libraries Scholarly Metrics FreebiesComputers in Libraries Scholarly Metrics Freebies
Computers in Libraries Scholarly Metrics FreebiesElaine Lasda
 
Data Literacy for Librarians - Day 2
Data Literacy for Librarians - Day 2Data Literacy for Librarians - Day 2
Data Literacy for Librarians - Day 2Elaine Lasda
 
Data Literacy for Librarians
Data Literacy for LibrariansData Literacy for Librarians
Data Literacy for LibrariansElaine Lasda
 
UAlbany Open Access Day Presentation on OER Grant
UAlbany Open Access Day Presentation on OER GrantUAlbany Open Access Day Presentation on OER Grant
UAlbany Open Access Day Presentation on OER GrantElaine Lasda
 
Open Educational Resources Faculty Workshop
Open Educational Resources Faculty WorkshopOpen Educational Resources Faculty Workshop
Open Educational Resources Faculty WorkshopElaine Lasda
 
Data and Libraries: How I learned to stop worrying and love the spreadsheet
Data and Libraries: How I learned to stop worrying and love the spreadsheetData and Libraries: How I learned to stop worrying and love the spreadsheet
Data and Libraries: How I learned to stop worrying and love the spreadsheetElaine Lasda
 
Altmetrics & Scholarly Publishing: the LIbrary Lay of the Land
Altmetrics & Scholarly Publishing: the LIbrary Lay of the LandAltmetrics & Scholarly Publishing: the LIbrary Lay of the Land
Altmetrics & Scholarly Publishing: the LIbrary Lay of the LandElaine Lasda
 
From Reputation to Citation: Varying Roles for Scholarly Metrics
From Reputation to Citation: Varying Roles for Scholarly MetricsFrom Reputation to Citation: Varying Roles for Scholarly Metrics
From Reputation to Citation: Varying Roles for Scholarly MetricsElaine Lasda
 
Open Educational Resources (OERs): A Game Changer For Higher Ed
Open Educational Resources (OERs): A Game Changer For Higher EdOpen Educational Resources (OERs): A Game Changer For Higher Ed
Open Educational Resources (OERs): A Game Changer For Higher EdElaine Lasda
 
Research Impact Roadshow
Research Impact RoadshowResearch Impact Roadshow
Research Impact RoadshowElaine Lasda
 
Gaining Insights Through Bibliometric Analysis
Gaining Insights Through Bibliometric AnalysisGaining Insights Through Bibliometric Analysis
Gaining Insights Through Bibliometric AnalysisElaine Lasda
 
Getting "Fancy" With Your Library Data!
Getting "Fancy" With Your Library Data!Getting "Fancy" With Your Library Data!
Getting "Fancy" With Your Library Data!Elaine Lasda
 

Mais de Elaine Lasda (20)

Your Systematic Review: Getting Started
Your Systematic Review: Getting StartedYour Systematic Review: Getting Started
Your Systematic Review: Getting Started
 
Research Impact in Specialized Settings: 3 Case Studies
Research Impact in Specialized Settings: 3 Case StudiesResearch Impact in Specialized Settings: 3 Case Studies
Research Impact in Specialized Settings: 3 Case Studies
 
The New Metrics: conference presentation
The New Metrics: conference presentationThe New Metrics: conference presentation
The New Metrics: conference presentation
 
Maximizing Your Research Impact: 5 Quick Hits!
Maximizing Your Research Impact: 5 Quick Hits!Maximizing Your Research Impact: 5 Quick Hits!
Maximizing Your Research Impact: 5 Quick Hits!
 
Scholarly Metrics in Specialized Settings
Scholarly Metrics in Specialized SettingsScholarly Metrics in Specialized Settings
Scholarly Metrics in Specialized Settings
 
Personal Time Management
Personal Time ManagementPersonal Time Management
Personal Time Management
 
Early Career Tactics to Increase Scholarly Impact
Early Career Tactics to Increase Scholarly ImpactEarly Career Tactics to Increase Scholarly Impact
Early Career Tactics to Increase Scholarly Impact
 
Computers in Libraries 2018 Workshop on Scholarly Metrics
Computers in Libraries 2018 Workshop on Scholarly MetricsComputers in Libraries 2018 Workshop on Scholarly Metrics
Computers in Libraries 2018 Workshop on Scholarly Metrics
 
Computers in Libraries Scholarly Metrics Freebies
Computers in Libraries Scholarly Metrics FreebiesComputers in Libraries Scholarly Metrics Freebies
Computers in Libraries Scholarly Metrics Freebies
 
Data Literacy for Librarians - Day 2
Data Literacy for Librarians - Day 2Data Literacy for Librarians - Day 2
Data Literacy for Librarians - Day 2
 
Data Literacy for Librarians
Data Literacy for LibrariansData Literacy for Librarians
Data Literacy for Librarians
 
UAlbany Open Access Day Presentation on OER Grant
UAlbany Open Access Day Presentation on OER GrantUAlbany Open Access Day Presentation on OER Grant
UAlbany Open Access Day Presentation on OER Grant
 
Open Educational Resources Faculty Workshop
Open Educational Resources Faculty WorkshopOpen Educational Resources Faculty Workshop
Open Educational Resources Faculty Workshop
 
Data and Libraries: How I learned to stop worrying and love the spreadsheet
Data and Libraries: How I learned to stop worrying and love the spreadsheetData and Libraries: How I learned to stop worrying and love the spreadsheet
Data and Libraries: How I learned to stop worrying and love the spreadsheet
 
Altmetrics & Scholarly Publishing: the LIbrary Lay of the Land
Altmetrics & Scholarly Publishing: the LIbrary Lay of the LandAltmetrics & Scholarly Publishing: the LIbrary Lay of the Land
Altmetrics & Scholarly Publishing: the LIbrary Lay of the Land
 
From Reputation to Citation: Varying Roles for Scholarly Metrics
From Reputation to Citation: Varying Roles for Scholarly MetricsFrom Reputation to Citation: Varying Roles for Scholarly Metrics
From Reputation to Citation: Varying Roles for Scholarly Metrics
 
Open Educational Resources (OERs): A Game Changer For Higher Ed
Open Educational Resources (OERs): A Game Changer For Higher EdOpen Educational Resources (OERs): A Game Changer For Higher Ed
Open Educational Resources (OERs): A Game Changer For Higher Ed
 
Research Impact Roadshow
Research Impact RoadshowResearch Impact Roadshow
Research Impact Roadshow
 
Gaining Insights Through Bibliometric Analysis
Gaining Insights Through Bibliometric AnalysisGaining Insights Through Bibliometric Analysis
Gaining Insights Through Bibliometric Analysis
 
Getting "Fancy" With Your Library Data!
Getting "Fancy" With Your Library Data!Getting "Fancy" With Your Library Data!
Getting "Fancy" With Your Library Data!
 

Último

Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 

Último (20)

Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 

Data 101- Big Data: What is it and Why Do We Care?

Notas do Editor

  1. Thank you for the introduction Kendra.
  2. Here’s a slide of a slide…Dan Ariely, a behavioral economist at Duke University has been posting this analogy all over social media and at presentations. He alsohas a book called Predictably Irrational, which I have not read yet, but it talks about his work in behavioral predictions. He also has a number of Ted talks that are very interesting. So what is Big Data, really anyway?
  3. We may be wondering just “how big is big data”? If you played 20 questions as a kid you might have asked “is it bigger than a breadbox?” While some are reporting datasets in such unfathomable scale as Petabytes, exabytes, and zettabytes, really, any data that is too big for traditional technology. In other words, it is too big for our breadbox….A good working definition for most of us is a data that file is too big for Excel too load. In Excel 2013, the maximum size it can handle is 1,048,576 rows by 16,384 columns. But really, there are three features that make Big Data “big”
  4. A common definition of Big Data relies on what are known as the 3 v’s. These are Variety Velocity and Volume, a term first coined by Doug Laney at a firm called Gartner. Variety means that we are not just collecting more of the same data we’ve always collected. Instead we are collecting different types of data. Variety also means that we do not have the type of structured datasets we used to have in relational databases – you know, the ones with nice tables with neat and tidy rows and columns. Now data often is in forms that don’t fit in columns like video and audio, sensor data, documents, flash and so forth. You may have heard of “NoSQL” databases, which are an alternative to the traditional relational database models that accommodate this type of dataset. Velocity has to do with the tremendous speed at which we are collecting this data and the rate at which data is being generated. You may have heard stats such as facebook generates 500 Terabytes of data per day. Many businesses use clickstream analysis on their website which generates a great deal of data in a hurry. The IDC Digital Universe study indicates that by the year 2020 society will be generating 50 times the amount of information currently being generated. In 2011 this number was 1.8 Zettabytes A zettabyte is a 1 with 21 zeroes after it so the rate of growth in 20 years will truly be staggering.And this gets us to our final v, volume. As is likely obvious by looking at the first two V’s, the sheer amount of data that can be collected now is really kind of unfathomable. Google for example, receives over 2 million search queries IN A SINGLE MINUTE. 72 hours of new video are uploaded to Youtube in a minute. 47,000 apps are downloaded from iTunes every minute.
  5. Let’s take a moment before we go any further and discuss the differences between big data and open data. You can see by this Venn diagram that there are big data sets that are not open. These are proprietary datasets in business and other locations where security is an issue, but there are also datasets from scientific and government sources of big data that ARE open. Open Government data conversely is not all “big” but there is a great deal of public access to it on federal, state, and local levels. Furthermore there are open data sources that are not government sources, such as business and scientific data that are not necessarily “big” but are pubicly available. So this should give you an idea of how Big Data and Open Data are related.
  6. Is big data a game changer? First and foremost, big data turns the scientific method on its head. Traditionally, any inquiry or decision starts with a hypothesis. We make an educated guess, and then look for the data to support or contradict this hypothesis. In Big Data analytics, we start with the data, and we look for patterns. This data is unstructured, it can be multidisiciplinary, and it can be highly predictive. Also, traditionally an inquiry or decision seeks to find the answer as to WHY the hypothesis is confirmed or rejected. In big data analytics, we identify the patterns without necessarily receiving information as to why those patterns do exist.
  7. In his Data Science Central Blog, Vincent Granville has identified 9 types of data science specializations. Statistics – this area deals with testing and modeling, theoretical approaches and developing new techniques for approaching large datasetsMathematics – slightly different in that these people deal with operations research: optimization, quality control, etc.Data Engineering – those strong in data engineering deal mainly with the structure and architecture of databases/filesystems/storageSoftware engineering – know several programming languages and work on code development. Machine Learning– these experts are the ones that program the algorhithms and complex computations Business– these are subject experts in terms of determining appropriate metrics, ROI, what to include on a dashboardVisualization-- charts and graphs, making data analysis understandable to the user or decision makerGIS – focuses more exclusively on the spatial representation of data
  8. What big data allows us to do“human insight at machine scale”identify patterns – but also outliers and unique instancesBehavioral predictionsSentiment analysisActivity “hotspots” – geographic such as the Arab Spring, Google’s flu predictionFor the social sciences, we can get empirical evidence – surveys subjective, observational studies are not “natural habitat,” etc. Here are some examples of the amazing things that are being done with big data currently:
  9. Market-basket research: Diapers and Beer! Broccoli cam – sensors determine when the produce department is out of broccoli and sends worker out to refillNate Silver < - Moneyball – turned the scouting profession on its head. Netflix <- highly specific classifications of movie genres to create recommendationsLinguamatics: text mining predicted prime minister election using tweetsNYC fire inspectorsCataloged 60 pieces of metadata about all inspectable buildings, used to prioritize inspections
  10. Harper Reed, Obama campaign techie, in an October 2013 article in the Chronicle of Higher Ed Wired Campus blog, says Big Data is “bs”. It is used to generate fear in enterprises to spur equipment upgrades, in other words, spend money on technology. He says: “you can get a lot of this stuff done just in Excel” So, just having the capacity for scalability in an enterprise does not mean that you are “doing big data.”
  11. Big data requires more treatment and handling. This includesData cleansing: dirty data, missing data, more outliers, removing duplicatesParsing and treating: extracting data from its original source into something resembling a datasetTransformation into usable format is key
  12. Another issue is false patterns, false correlations. For example Gene Pease, in his Talent Management Blog notes that The height of an elementary school student is correlated to his or her reading level. In Jeffry Stanton’s text Introduction to Data Science he says “bigger means weirder.” So we need to be careful with regard to the assumptions and conclusions we derive from the data. Again, big data is not concerned with the “why” of a pattern, it only identifies that the pattern exists.As one author noted “when looking at the whole haystack, EVERYTHING looks like a needle”
  13. Big data is first and foremost a decision making tool. This means that for all the technology and fancy processing, storage and tools available, without competent subject matter experts to identify data flow in an organization or enterprise, identify the areas where data is lacking, and how the data can be used, it’s all for naught. The human element is what turns data information. So where do we, as information professionals fit into the equation?
  14. There are a number of directions we, as librarians and information professionals can pursue as we move into more data-driven activities in our organizations, mainly as an outgrowth of existing skill sets we posess. For example: Metadata extraction, creation, classificationPrivacy experts/intellectual freedomQuality experts – identify reliable and authoritative data sources and analysisPolicy advisors for our organizationCuration/selectionStorage/managementAccess/gatekeepersAssuring data can be turned into informationKnowledge managementCompetitive Intelligence“be the link pulling biz and IT together”Michelle Hudson of Yale: Some day We’re all going to be data librarians”
  15. In it’s article “Big Data’s Impact in the World”, The New York Times cited A report by the McKinsey Global Institute, the research arm of a well known consulting firm, projected that the United States needs 140,000 to 190,000 more workers with “deep analytical” expertise and 1.5 million more data-literate managers, whether retrained or hired. All disciplines are becoming increasingly data intensive whether political science, sociology, transportation, or the traditional sciences and medicine. As information professionals we have the opportunity to flex our Information Literacy muscles and extend them to Data Literacy. Those of us in higher education can add data literacy to our instructional and consultation activities, and librarians in other capacities can bring their own patrons and stakeholders up to speed on key data concepts – how to collect, store, gather, evaluate and interpret data. As my colleague Kim Silk of the University of Toronto has said to me: much as we teach people information and media literacy;data literacy – understanding what the data is telling us, understanding (significant or misleading) statistics, outliers, sample size,  correlations – is critical for 21st century citizens.”
  16. Our vendor partners are already getting in on the action. For example, Thomson Reuters’ Eikon desktop analysis software for financial offices has twitter and news sentiment analysis tools. These are primarily aimed at the financial sector, but what they do is allow for assessment of news events and predict the effect on changes in the financial markets. Many other partners are using big data internally to identify usability of their interfaces, frequency of use of resources, common search terms. As our vendor partners become more data driven, we will need to be data literate ourselves in order to understand the resources made available to us by our vendor partners, as well as how and why these resources work.
  17. So here in my opinion, are the major takeaways from the first part of this webinar: We know that velocity, variety, and volume are the hallmarks of big data. Big data isn’t just more of the same data, and it isn’t necessarily tons and tons of data (although often it is). A good rule of thumb is any dataset that is too big to fit in Excel is “big data” for our purposes. Big data holds the promise of amazing capabilities, through identifying both patterns and outliers in the data we have collected. We can identify behavioral patterns in an empirical way, such as through marketbasket analysis, or collect and use new types of metadata to improve safety practices. But this cannot be done without the human element. Technology upgrades are only part of the equation and may not even be necessary – it takes subject matter experts to ask the right questions, interpret, clean and collect the data. Finally, as information professionals we have the ability to be involved with data and data issues in a variety of capacities, but our main strength may be in Data Literacy initiatives for our patrons and stakeholders..We did not have time for: Stats lessons, Privacy issues, Computer processes, Data structure, Etcetcetc
  18. I’d like to move at this point on to recommending some resources for learning more about the topic. I am sure you realize that this presentation has only touched the tip of the iceberg on the topic of Big Data. There are many paths to pursue to learn more, many specializations to focus on.
  19. Big Data A Revolution, is a best seller I am in the middle of it right now and it gives a laymans understanding of the concepts and impact of big dataThere are lots of “for Dummies” books on various aspects of Big Data – many free in PDF form from various web sources. Big Data for Dummies, etc
  20. An Introduction to Data Science- open source (free!) textbook with lots of good information, an easy read, short chapters (available on iTunes)Frontiers in Massive Data Analysis- a report by the national academies press, discusses big data in mainly social science disciplines, free on web
  21. I will put the URLs in the slideshare version of this presentation.SU guide: data sources, programming guide, news, associations,linkedIN groups many free sourcesALA list of resources, academic focus, but there are many good articles and a good collection of informationData Information literacy wiki at Purdue is documenting the development of a standardized curriculum for data literacy and data science, and they are doing research as to the level of data literacy and critical instructionThere are a number of schools that are offering Massive Open Online Classes, Syracuse University offers one periodically, University of Washington has one, can be done online, Caltech, MIT Have the more technical/computing focused programs
  22. Another issue Librarians might be called upon for their expertise is information policy and best practices as they relate to data issues – use, storage, sharing, privacy, and so forth. Many of these practices are still in the process of being developed. For example the Council for Big Data Ethics and Society: hasn’t launched yet, is supposed to soon. It is a collaboration with National Science Foundation. Their website says they intend to “address such issues as security, privacy, equality, and access” to “develop frameworks to help researchers, practitioners and the public understand the social, ethical, legal, and policy issues that underpin the big data phenomenon. They have a newsletter sign-up but I have yet to receive anything from it. Research Data Management Services: primarily for academic libraries, this report deals with storage, access, repositories and data management in an academic environment but there may be lessons for other types of libraries as well.
  23. Here are a couple of other resources on big data policy and best practicesRebuilding the Mosaic: National Science Foundation Social Behavior and Economic Council’s report on data driven research in the social sciences related to world development. They identify focusing on population change, disparities, communications, media, and social networking in the future. GovLab: a blog on governance policies of science and technology – search “data” in the search box for some good articles related to big data governance and policyTerminology: this is another area that is an issue with current Big Data projects- computer scientists, social scientists, statisticians all have different language for the same things: case vs instance vs observation as an example == all equal the “rows” in a dataset. There is an argument that this ISO standard for statistical terminology should be amended to create a standardized language for data analytics
  24. These are some newsletters that can be delivered to your email inbox that I find useful. There are tons of these though, there may be others you will find on the web that are also useful. Data Science Weekly – free newsletter, variety of topics and includes jobsData Science Central – nice blog, newsletter with broad focus, professional development for the data scientist (or aspiring data scientist)R-Bloggers – tips and tricks for using the statistical software RForgot to mention the O’Reilly mailing lists. O’reilly as you may know is a publisher of IT manuals and provides blogs, other resources related to technology.
  25. Here are my favorite blogs on the topic, in no particular order. Hilary Mason – she’s a data scientist and she posts interesting articles about some data analysis, lots of visualizations, but also professional development topics for data professionals. She was an innovator who had an extensive role in in creating bit.ly – among other things, they are well known for a tool that will convert a long URL into something shorter and more manageable. She speaks a lot and hosts a data related conference in NYC called DataGotham. Mathbabe – cathyo’neil she is a mathematician but not an academic, she has some nice introductory posts for those interested in data science, less visualization than Hilary, she focuses more opinion and techniqueBits Blog – technology and business news from the New York TimesNo Free Hunch – problem solving bent – “the sport of data science” from Kaggle, a consulting company. They identify fun problems and solve them using data science techniques and they announce many competitions and challenges where data scientests can strut their stuff. What’s the Big Data- Gil Press, who has a column at Forbes, focuses on impact of big data in society, business, government, IT right now he’s done a lot about the market for big data and its influence on business
  26. Next I would like to show you some interesting tools that you can play with if you want to explore big data and its capabilities for yourself. There are a lot of open source resources that are available and user friendly.
  27. The first thing we will cover is finding datasets. There are a surprising number of sources for datasets out there that are free and online. Some are easier to use than others. I am pointing out three well known or interesting resources, but there are many others I could have included. These three that I have chosen will give you an idea of some of the variety of data that is out there.
  28. Google Data Explorer provides many datasets, and Google Trends, which we will talk about later provides visual display of data. Most of the public datasets available on Google Data Explorer are governmental in nature, as you can see by the list of data providers on the left.
  29. Amazon Web Services – a wide variety of datasets on many interesting topics, many of these are also government sources, but not all
  30. Scale Unlimited is a big data consulting firm that makes some big datasets freely available for testing and modeling purposes. They have a wide variety of datatypes including media, graphic, geographic. One of the datasets contains all of the Enron emails.
  31. These are some tools for creating databases and analyzing or querying your dataset. I must confess I am just learning about how these work now, so I only have brief explanations of them. R is an open source, command language tool for statistical analysis. I liked the old DIALOG, so I love R. It has many extensible packages that can create a lot of flexibility and precision in data analysis. Hive/Hadoop = both of these tools are run by Apache which is a Google spinoff. Both are open source. Hadoop allows for what is known as parallel processing – distributed computing. Hive is the language and infrastructure that allows you to query the data in Hadoop and do analysis. It is very similar to SQLPostgreSQL – provides an object relational database management system, which is used by Etsy and Creative commons, two organizations I think are very popular with librarians! Again, it uses a query language similar to SQLProject Bamboo Dirt: open source “digital research tools for scholarly use” a variety of tools for data management, analysis, visualization as well as other topics. MLcomp: compares and evaluates computer algorithms. Evaluate your algorithm on their existing dataset or Evaluate your dataset to see what is the best algorithm to use for it.
  32. Once you have queried and analyzed your data, you will want to display it in a manner that your patrons or stakeholders will understand and be able to use for making decisions. This is known as data visualization. Here are some cool tools that are free on the web.
  33. PiktoChart – very user friendly data visualization design and editing, as you can see mainly “infographics”
  34. Esri is a geospatial tool which means it is good at visualizing data that displayed using maps. For example here is a map related to commuting times across the US.
  35. Big ML – fee for service, but for datasets under 16 MB you can play with their visualization tools
  36. ManyEyes: from IBM – upload your dataset and create a wide variety of visualizations: maps, histograms, graphs, text based analysis
  37. GoogleFusion Tables – way of providing visualization for big or multiple datasets in table format – charts ,maps, network graph, etc.
  38. Chartsbin- with this tool you can create interactive (clickable) visualizations, that can be embedded in web pages or exported. They also share their own visualizations from various authoritative sources( government, scholarly journals, technical reports)
  39. iCharts – another nice one that allows for interactive widgets that can be embedded, published on the web, etc.
  40. Maybe you don’t want to get into analysis -- you just want to see what others are doing, here are some cool sites that give you a glimpse as to what various organizations are doing with big data and the results that they are making available to the public:
  41. CSSeer- crossover data from CiteSeer which is a free bibliometric (citation) analysis tool and wikipedia to recommend scholarly experts in a field.
  42. Streetbump- crowd-sourced pothole locator
  43. My Magic Plus- coming from Disney – you get a wristband that tracks your every move around the park, what you spend, where you go, how long you wait, what you buy, everything
  44. Information is Beautiful: independent “data journalist” David McCadless creates just gorgeous visual displays, and then the data is available in Google Docs for anyone to use
  45. Facebook Blog: fascinating articles and visualizations of what is happening with Facebook data
  46. Google Trends: what are people searching, visualizations, “zeitgeist”<- what did the world search for in 2013
  47. Flowing Data: fun visualizations on a variety of topics
  48. GapMinder: Educational bent, describes itself as a “museum” on the internet – focus is on world development: factfinding and needs assessment
  49. Professional Development opportunities abound for info pros who wish to get their feet wet in big data and data science. In fact, I am working with a group of SLA members to create a Data Caucus. We are currently working on amending our scope to be compatible with other SLA units, and hope to send out a revised petition soon, so be on the lookout for those emails!IASSIST – the International Association for Social Science Information Services and Technology is an organization for data users in the social sciences – a small group but international, emphasis is on research and teaching – library/information professionals and others ASIS&T – Association for Information Science and Technology – interdisciplinary, focused on technologyLinkedIN- check the SU library guide for some LinkedINgroups that deal with data issues.
  50. Thank you for your time and attention today. In a few days I will have these slides up on slideshare and they will include hotlinks to the resources I’ve been describing. Don’t forget about the Data Caucus and I hope you now have some starting points for learning more about Big Data. The term “big data” may be a buzzword – the practices and principles involved with big data issues are still evolving, but our capacity for ever increasing volume, velocity, and variety of data is not going to disappear any time soon. Do we have time for a few questions.