SlideShare uma empresa Scribd logo
1 de 43
It is time to learn about
     Big Data andHadoop!

                    David Strom
                       UMSL
                  November 2012
                david@strom.com
                 Twitter: @dstrom
Download this here: http://slideshare.net/davidstrom
My publications
Editorial management positions:
So Big Data really is everywhere!
•   Planes, trains and automobiles
•   Fun with maps
•   Big and little ovens
•   Lessons learned from P&G
•   Noteworthy scientists
•   And of course sex!
StartupCompass.co
The reason behind
Three skills for big data CEOs
• Strategic data planning. Data is the new raw
  material for any business.
• Analytical skills. CEOs should be incredibly
  smart about asking the right questions.
• Technology skills. Embrace the technology
  and make it a key part of your CEO skill set.
More from Jeff Jonas



         vs.
Mason’s 5-step Big Data process
•   Obtain
•   Scrub
•   Explore
•   Model
•   Interpret
Local Big Data Meetups
Thanks for your ideas!
• Copies of this presentation:
  http://slideshare.net/davidstrom
• My blog: http://strominator.com
• Follow me on Twitter: @dstrom
• Old school: david@strom.com




                   http://strominator.com   43

Mais conteúdo relacionado

Semelhante a Learn About Big Data and Hadoop

Meetup #1. Trends, talks, cool stuff.
Meetup #1. Trends, talks, cool stuff.Meetup #1. Trends, talks, cool stuff.
Meetup #1. Trends, talks, cool stuff.SPb_Data_Science
 
Data science opportunities
Data science opportunitiesData science opportunities
Data science opportunitiesJay Buckingham
 
Getting Started in Data Science
Getting Started in Data ScienceGetting Started in Data Science
Getting Started in Data ScienceThinkful
 
DataEd Slides: Data Management Best Practices
DataEd Slides: Data Management Best PracticesDataEd Slides: Data Management Best Practices
DataEd Slides: Data Management Best PracticesDATAVERSITY
 
Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Thinkful
 
FITC - Data Visualization in Practice
FITC - Data Visualization in PracticeFITC - Data Visualization in Practice
FITC - Data Visualization in PracticeRami Sayar
 
Data Journalism, AEJMC 2013
Data Journalism, AEJMC 2013Data Journalism, AEJMC 2013
Data Journalism, AEJMC 2013Cindy Royal
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientistryanorban
 
Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)Thinkful
 
Privacy, Ethics, and Future Uses of the Social Web
Privacy, Ethics, and Future Uses of the Social WebPrivacy, Ethics, and Future Uses of the Social Web
Privacy, Ethics, and Future Uses of the Social WebMatthew Russell
 
How to Become a Data Scientist – By Ryan Orban, VP of Operations and Expansio...
How to Become a Data Scientist – By Ryan Orban, VP of Operations and Expansio...How to Become a Data Scientist – By Ryan Orban, VP of Operations and Expansio...
How to Become a Data Scientist – By Ryan Orban, VP of Operations and Expansio...Galvanize
 
Data-Ed Webinar: A Framework for Implementing NoSQL, Hadoop
Data-Ed Webinar: A Framework for Implementing NoSQL, HadoopData-Ed Webinar: A Framework for Implementing NoSQL, Hadoop
Data-Ed Webinar: A Framework for Implementing NoSQL, HadoopDATAVERSITY
 
Data-Ed: A Framework for no sql and Hadoop
Data-Ed: A Framework for no sql and HadoopData-Ed: A Framework for no sql and Hadoop
Data-Ed: A Framework for no sql and HadoopData Blueprint
 
Disrupting with Data: Lessons from Silicon Valley
Disrupting with Data: Lessons from Silicon ValleyDisrupting with Data: Lessons from Silicon Valley
Disrupting with Data: Lessons from Silicon ValleyAnand Rajaraman
 
Getting started in data science (4:3)
Getting started in data science (4:3)Getting started in data science (4:3)
Getting started in data science (4:3)Thinkful
 
Getting started in data science (4:3)
Getting started in data science (4:3)Getting started in data science (4:3)
Getting started in data science (4:3)Thinkful
 

Semelhante a Learn About Big Data and Hadoop (20)

Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Meetup #1. Trends, talks, cool stuff.
Meetup #1. Trends, talks, cool stuff.Meetup #1. Trends, talks, cool stuff.
Meetup #1. Trends, talks, cool stuff.
 
Data science opportunities
Data science opportunitiesData science opportunities
Data science opportunities
 
Getting Started in Data Science
Getting Started in Data ScienceGetting Started in Data Science
Getting Started in Data Science
 
Literacy in the Age of Big Data
Literacy in the Age of Big DataLiteracy in the Age of Big Data
Literacy in the Age of Big Data
 
DataEd Slides: Data Management Best Practices
DataEd Slides: Data Management Best PracticesDataEd Slides: Data Management Best Practices
DataEd Slides: Data Management Best Practices
 
Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)
 
FITC - Data Visualization in Practice
FITC - Data Visualization in PracticeFITC - Data Visualization in Practice
FITC - Data Visualization in Practice
 
Data Journalism, AEJMC 2013
Data Journalism, AEJMC 2013Data Journalism, AEJMC 2013
Data Journalism, AEJMC 2013
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
 
Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)
 
Privacy, Ethics, and Future Uses of the Social Web
Privacy, Ethics, and Future Uses of the Social WebPrivacy, Ethics, and Future Uses of the Social Web
Privacy, Ethics, and Future Uses of the Social Web
 
Data science unit1
Data science unit1Data science unit1
Data science unit1
 
How to Become a Data Scientist – By Ryan Orban, VP of Operations and Expansio...
How to Become a Data Scientist – By Ryan Orban, VP of Operations and Expansio...How to Become a Data Scientist – By Ryan Orban, VP of Operations and Expansio...
How to Become a Data Scientist – By Ryan Orban, VP of Operations and Expansio...
 
Data-Ed Webinar: A Framework for Implementing NoSQL, Hadoop
Data-Ed Webinar: A Framework for Implementing NoSQL, HadoopData-Ed Webinar: A Framework for Implementing NoSQL, Hadoop
Data-Ed Webinar: A Framework for Implementing NoSQL, Hadoop
 
Data-Ed: A Framework for no sql and Hadoop
Data-Ed: A Framework for no sql and HadoopData-Ed: A Framework for no sql and Hadoop
Data-Ed: A Framework for no sql and Hadoop
 
Disrupting with Data: Lessons from Silicon Valley
Disrupting with Data: Lessons from Silicon ValleyDisrupting with Data: Lessons from Silicon Valley
Disrupting with Data: Lessons from Silicon Valley
 
Data science
Data science Data science
Data science
 
Getting started in data science (4:3)
Getting started in data science (4:3)Getting started in data science (4:3)
Getting started in data science (4:3)
 
Getting started in data science (4:3)
Getting started in data science (4:3)Getting started in data science (4:3)
Getting started in data science (4:3)
 

Mais de David Strom

Spark Twitter fails Mar2023
Spark Twitter fails Mar2023Spark Twitter fails Mar2023
Spark Twitter fails Mar2023David Strom
 
Getting Your First Cybersecurity Job
Getting Your First Cybersecurity JobGetting Your First Cybersecurity Job
Getting Your First Cybersecurity JobDavid Strom
 
Understanding passwordless technologies
Understanding passwordless technologiesUnderstanding passwordless technologies
Understanding passwordless technologiesDavid Strom
 
What endpoint protection solutions are available on the market today?
What endpoint protection solutions are available on the market today?What endpoint protection solutions are available on the market today?
What endpoint protection solutions are available on the market today?David Strom
 
Fears and fulfillment with IT security
Fears and fulfillment with IT securityFears and fulfillment with IT security
Fears and fulfillment with IT securityDavid Strom
 
Protecting your digital and online privacy
Protecting your digital and online privacyProtecting your digital and online privacy
Protecting your digital and online privacyDavid Strom
 
AI and cyber security: new directions, old fears
AI and cyber security: new directions, old fearsAI and cyber security: new directions, old fears
AI and cyber security: new directions, old fearsDavid Strom
 
The legalities of hacking back
The legalities of  hacking backThe legalities of  hacking back
The legalities of hacking backDavid Strom
 
How to market your book in today's social media world
How to market your book in today's social media worldHow to market your book in today's social media world
How to market your book in today's social media worldDavid Strom
 
​Understanding the Internet of Things
​Understanding the Internet of Things​Understanding the Internet of Things
​Understanding the Internet of ThingsDavid Strom
 
How to make your mobile phone safe from hackers
How to make your mobile phone safe from hackersHow to make your mobile phone safe from hackers
How to make your mobile phone safe from hackersDavid Strom
 
Implications and response to large security breaches
Implications and response to large security breaches Implications and response to large security breaches
Implications and response to large security breaches David Strom
 
Using social networks to find your next job (2017)
Using social networks to find your next job (2017)Using social networks to find your next job (2017)
Using social networks to find your next job (2017)David Strom
 
Security v. Privacy: the great debate
Security v. Privacy: the great debateSecurity v. Privacy: the great debate
Security v. Privacy: the great debateDavid Strom
 
Using OpenStack to Control VM Chaos
Using OpenStack to Control VM ChaosUsing OpenStack to Control VM Chaos
Using OpenStack to Control VM ChaosDavid Strom
 
Notable Twitter fails
Notable Twitter failsNotable Twitter fails
Notable Twitter failsDavid Strom
 
How to make the move towards hybrid cloud computing
How to make the move towards hybrid cloud computingHow to make the move towards hybrid cloud computing
How to make the move towards hybrid cloud computingDavid Strom
 
Listen to Your Customers: How IT Can Provide Better Support
Listen to Your Customers: How IT Can Provide Better SupportListen to Your Customers: How IT Can Provide Better Support
Listen to Your Customers: How IT Can Provide Better SupportDavid Strom
 
Network security practice: then and now
Network security practice: then and nowNetwork security practice: then and now
Network security practice: then and nowDavid Strom
 
Biggest startup mistakes
Biggest startup mistakesBiggest startup mistakes
Biggest startup mistakesDavid Strom
 

Mais de David Strom (20)

Spark Twitter fails Mar2023
Spark Twitter fails Mar2023Spark Twitter fails Mar2023
Spark Twitter fails Mar2023
 
Getting Your First Cybersecurity Job
Getting Your First Cybersecurity JobGetting Your First Cybersecurity Job
Getting Your First Cybersecurity Job
 
Understanding passwordless technologies
Understanding passwordless technologiesUnderstanding passwordless technologies
Understanding passwordless technologies
 
What endpoint protection solutions are available on the market today?
What endpoint protection solutions are available on the market today?What endpoint protection solutions are available on the market today?
What endpoint protection solutions are available on the market today?
 
Fears and fulfillment with IT security
Fears and fulfillment with IT securityFears and fulfillment with IT security
Fears and fulfillment with IT security
 
Protecting your digital and online privacy
Protecting your digital and online privacyProtecting your digital and online privacy
Protecting your digital and online privacy
 
AI and cyber security: new directions, old fears
AI and cyber security: new directions, old fearsAI and cyber security: new directions, old fears
AI and cyber security: new directions, old fears
 
The legalities of hacking back
The legalities of  hacking backThe legalities of  hacking back
The legalities of hacking back
 
How to market your book in today's social media world
How to market your book in today's social media worldHow to market your book in today's social media world
How to market your book in today's social media world
 
​Understanding the Internet of Things
​Understanding the Internet of Things​Understanding the Internet of Things
​Understanding the Internet of Things
 
How to make your mobile phone safe from hackers
How to make your mobile phone safe from hackersHow to make your mobile phone safe from hackers
How to make your mobile phone safe from hackers
 
Implications and response to large security breaches
Implications and response to large security breaches Implications and response to large security breaches
Implications and response to large security breaches
 
Using social networks to find your next job (2017)
Using social networks to find your next job (2017)Using social networks to find your next job (2017)
Using social networks to find your next job (2017)
 
Security v. Privacy: the great debate
Security v. Privacy: the great debateSecurity v. Privacy: the great debate
Security v. Privacy: the great debate
 
Using OpenStack to Control VM Chaos
Using OpenStack to Control VM ChaosUsing OpenStack to Control VM Chaos
Using OpenStack to Control VM Chaos
 
Notable Twitter fails
Notable Twitter failsNotable Twitter fails
Notable Twitter fails
 
How to make the move towards hybrid cloud computing
How to make the move towards hybrid cloud computingHow to make the move towards hybrid cloud computing
How to make the move towards hybrid cloud computing
 
Listen to Your Customers: How IT Can Provide Better Support
Listen to Your Customers: How IT Can Provide Better SupportListen to Your Customers: How IT Can Provide Better Support
Listen to Your Customers: How IT Can Provide Better Support
 
Network security practice: then and now
Network security practice: then and nowNetwork security practice: then and now
Network security practice: then and now
 
Biggest startup mistakes
Biggest startup mistakesBiggest startup mistakes
Biggest startup mistakes
 

Último

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Último (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

Learn About Big Data and Hadoop

  • 1. It is time to learn about Big Data andHadoop! David Strom UMSL November 2012 david@strom.com Twitter: @dstrom Download this here: http://slideshare.net/davidstrom
  • 2.
  • 3.
  • 5. So Big Data really is everywhere! • Planes, trains and automobiles • Fun with maps • Big and little ovens • Lessons learned from P&G • Noteworthy scientists • And of course sex!
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 20.
  • 22. Three skills for big data CEOs • Strategic data planning. Data is the new raw material for any business. • Analytical skills. CEOs should be incredibly smart about asking the right questions. • Technology skills. Embrace the technology and make it a key part of your CEO skill set.
  • 23.
  • 24. More from Jeff Jonas vs.
  • 25.
  • 26. Mason’s 5-step Big Data process • Obtain • Scrub • Explore • Model • Interpret
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42. Local Big Data Meetups
  • 43. Thanks for your ideas! • Copies of this presentation: http://slideshare.net/davidstrom • My blog: http://strominator.com • Follow me on Twitter: @dstrom • Old school: david@strom.com http://strominator.com 43

Notas do Editor

  1. V3Use older Stampede deck for URL sourcesAdd Target pregnant teen and Twitter map and Kibera at end
  2. So let's talk this morning about how Big Data does come from all corners of the globe and while it may not be evil, there are some fascinating examples of where it is being used by companies today and I'll review some of these case studies pulled from some of the articles that I and my colleagues in the IT trade press have been writing about over the past several months.
  3. Let's start with planes, trains and automobiles. I am sure many of you remembered this movie with Steve Martin and John Candy and their various misadventures. Well, when it comes to Big Data the applications are a bit more positive.
  4. As you know the US department of transportation collects monthly on-time statistics of each of the major airlines. But a better method is fromJeffrey Breen of Cambridge Aviation Research. He put this together to show sentiment analysis using the immediacy and accessibility of Twitter. He provides a real-time glimpse into consumer's frustration as you can see in this collection of Tweets.
  5. Here is his flowchart of how it put this all together, using R and various other data collection tools to score the tweets and summarize it for each airline and compare it with what the federal government provides.
  6. As you can see in this output, airlines such as Jet Blue and Southwest do a better job of customer service than the older, more established carriers such as Delta or United. So maybe it is a good thing that Southwest is now our major carrier here at Lambert, even though many of us miss all those non stop TWA flights to all those cities.
  7. Moving from planes to trucks is this story about how FedEx is collaborating with General Electric – which is providing the company with commercial charging stations for its electric vehicles. While Fedex can tell you where a particular package is located in its network, it has other Big Data dilemmas including whether it makes sense for them to use electric power for its delivery trucks. They got together with GE, utility Con Edison and Columbia University researchers. The group are developing artificial intelligence programs to manage when and where the electric trucks charge in a 10-vehicle pilot project.“We’re collecting data on what is the load on the facility, what is the load of each truck, how many miles does that truck drive,” says Sondhi. “The algorithms from Columbia will identify that a truck is going to drive 16 miles tomorrow, so don’t give it 30 amps, give it 8 amps so we minimize the load on the entire facility.”
  8. Your car has become a data hub, with USB ports, a SD card reader, Bluetooth connections to your phone and even a mobile Wifi hotspot. This next picture is a shot of the latest Ford My Touch dashboard that can be found in many of their cars. It provides all sorts of controls on what music you listen to, the indoor climate controls of your car, and a connection to your phone to dial your address book. Currently, Ford collects and aggregates data from the 4 million vehicles that use in-car sensing and remote app management software to create a virtuous cycle of information. The data allows Ford engineers to glean information on a range of issues, from how drivers are using their vehicles, to the driving environment, to electromagnetic forces affecting the vehicle, and feedback on other road conditions that could help them improve the quality, safety, fuel economy and emissions of the vehicle. Drivers willing to share how many miles they’ve traveled could get discounts between 10 and 40 percent in exchange for providing State Farm with a more accurate picture of their vehicle-use habits, which they obtain from directly accessing the Sync telematics systems in the cars electronically.
  9. And finally we have trains, specifically the trains operated in Helsinki for its transit network. The organization uses Big Data tools to provide real time information on where their trains are located, and you can watch this web page to see where they are and when they will arrive at your location. A number of other transit agencies are doing something similar.
  10. Twitter analyzed all their tweets and organized which ones got retweeted the most by state, split between Obama and Romney
  11. Speaking of maps, there are thousands of big data mapping apps. Google Maps is certainly popular, but there are other sites making it even easier called Crowdmap and OpenStreet Maps. Here is a map that was crowd sourced of a neighborhood outside of Nairobi Kenya which until this effort was pretty much an uncharted territory. Thanks to this citizen effort, the community put together a map with all sorts of resources located.
  12. Let me put up the next slide showing you something a bit more palatable. David Smith put this map together from about 400 wineries in the Napa Valley area. Not only can you scroll and zoom the map, but clicking on one of the winery markers will tell you its address and whether an appointment is required for tastings. He worked with Barry Rowlingson who used OpenStreetMaps and his own R package to build this. And while 400 data points doesn't sound like a very big collection of data, what these guys did is noteworthy since they used a collection of APIs and open source code to produce the final product.
  13. Big Data is also being used inside IT organizations itself, as this next example from Nationwide Insurance' IT department shows. My first job out of college was working for the IT department of a major insurance company in New York City, back in the punch card era, so this example is particularly poignant to me. Accurate estimates of IT work effort are critical for deciding where in technology a business should invest. Lacking experience with similar projects, the business is often at a loss for hard data. In this article, we describe our benefit from the power and convenience of R in the elicitation task, or, in other words, in quantifying the uncertainty around IT project lifespans using probability distributions. The IT researchers from the insurance company show how R's built in functionality makes the elicitation task painless, while demonstrating how the methodology can be implemented in a user-friendly format. The power of R's probability toolbox allowed us to rapidly prototype an application which transported the basic concepts of elicitation to the IT project management space.
  14. Big Data can be used in a variety of corporate settings and let's look at two examples of how it can control ovens big and small. This is a steel foundry. Here we are using R as a suitable means for solving the task of providing accurate, understandable and automatable models for the desired temperature predictions. The R-project has proved to be most useful for the implementation of the calculated results, the same as the external control of its functionalities in a process automation environment. The presented mathematical approach and the developed R-code and framework program enable steel plant production engineers and technical staff to plan, carry out and adjust their tasks and doings on the basis of highly stable and precise temperature preset-values. Instead of adding off-sets and thresholds to the assumed heat target temperatures and by that adding extra processing time and extra energy during each processing step, to be on the safe side and rather deliver the melt above the final casting temperature than below, the new temperature prediction model will allow for the optimization of process stability, throughput and material quality in the steel plant, especially in ladle treatment. 
  15. But Big Data can be used in the smallest of ovens. Here we are looking at a hospital autoclave, which is used for sterilizing instruments. This is just one type of Industrial equipment which are among the products that Axeda is working with other companies to rig with sensors and cellular connections. Each of these devices has an IP address and an Internet connection, so that use of those devices can then be monitored remotely, so that their supply, maintenance and management can all be optimized, without having to go and look at the machines themselves. "Typically engineers would find logs through customer tickets and it would take months to find trends based on call center traffic,” You can collect data about uptime, need for repairs, machine run completion and detergent levels into a smartphone app that hospital employees can use.
  16. Big Data can be used for all sorts of businesses, including in helping startups. The site Startup Compass collects data from tens of thousands of startups around the world. It then creates best practices, recommendations and benchmarks to help entrepreneurs make better product and business decisions. Startups can learn which key performance indicators actually matter. Most startups don’t even know which KPIs they should track or why they should track them. Second, they learn how their KPIs compare to other companies’ KPIs so they will know if they’re on the right track. See, for example, their customer acquisition costs. The third thing they learn is what actions they need to be taking. We help businesses take the next steps.”
  17. Big Data is also being used in some of the world's largest corporations. We are looking at Proctor and Gamble’s Business Sphere big data situation room in their Cincinnati HQ. A big data analyst drives these large screens that display data visualizations on sales, market share, ad spending and the like, so everyone in the meeting is seeing the same information based on 4 billion daily transactions of P&G products. P&G isn’t after new data types; it still wants to share and analyze point-of-sale, inventory, ad spending, and shipment data. What’s new is the higher frequency and speed at which P&G gets that data, and the finer granularity. Even with all this gear, P&G has about two-thirds of the real-time data it needs.
  18. They are trying to come to address the reason behind Why? was it a bad TV ad, out-of-stock shelves, or a competitor’s new product or price cut that caused a problem? Right now, the P&G IT team is working on automating analysis of the why, so employees get alerts when key events like a supply chain snafu or rival product launch happen. Their data visualizations can answer things such as -- Is a sales dip in detergent in France because of one retailer, so that’s where to focus?   Is that retailer buying less only in France, or across Europe? 
  19. http://www.readwriteweb.com/cloud/2012/02/strata-2012-3-essential-skills.phpDiego Saenz of Data Driven CEO
  20. Let's move on to some of the Big Data rock stars that I have interviewed and really enjoy hearing from. Jeff Jonas is a data scientist that now works for IBM. One of his jobs was designing the casino security systems in Las Vegas, where he currently lives. He worked for the surveillance intelligence group of several casinos, and automated various manual processes, adding facial recognition software that was key to slowing down the MIT card counting group. "We built [another] system to immediately identify risk in real time so they could get these people out of the casino quickly." This software is still offered by IBM as its InfoSphere Identity Insight event processing and identity tracking technology.
  21. If someone has three phone numbers - no big deal. On the other hand, if someone has five different dates of birth, that just doesn't seem quite right does it? That would be confusing. Why is this important? Well, if you are looking to analytics to make important decisions, wouldn't you want to know during the decision making process if there was related confusion ... before [any] action is taken.”
  22. Another great Big Data scientist is Hilary Mason who works for Bit.ly. She has analyzed shortened links posted to Twitter have a mean half life of 2.8 hours. Facebook boosts that to 3.2 hours, and direct sharing has a half-life of 3.4 hours. YouTube, however, beats them all hands down with a half life of 7.4 hours. In other words, you might get a slight edge by posting to Facebook versus Twitter (if you don't do both) but the content matters most. Good (or controversial) stuff rises to the top and has a longer life. Uninteresting stuff sinks quickly.
  23. you need to start thinking about how to make your data sets smaller. "Big Data usually refers to a data set that is too big to fit into your available memory, or too big to store on your own hard drive, or too big to fit into an Excel spreadsheet," says Mason. This is the "scrub" section. The smaller the dataset, the easier it is to manipulate.
  24. Mason and others have mentioned the now iconic Enron email archive that has since passed into the public domain and is used by a number of big data researchers to test their email algorithms and is available from a number of online academic websites -- It is an example of actual emails that forms the basis of many anti-spam programs these days, which is ironic given that their emails have outlasted the company where everyone once worked.
  25. Here we are looking at a scene along a very famous street in San Francisco, Haight Street. You might remember if you are old enough the wild times in the late 1960s and early 70s where the intersection of Haight and Asbury was the center of counterculture and hippiedom. Today the area is still pretty much out there. Jesper Andersen gave this talk at Strata eariler this year and showed how to integrate basic public data from the city, street and mapping data from Open Street Maps, real estate and rental listings data, data from social services like Foursquare, Yelp and Instagram, and analyze photographs of streets from mapping services to create a holistic view of the street. Surprisingly, you'll find a lot of Swedish folks on the upper half of Haight Street. Not surprisingly for San Francisco, many people on Haight speak Spanish or Japanese. Tweet stream analysis found that more negative sentiment on the lower part of the street, which corresponds with higher crime stats. I like this example because it shows what can be done to combine a variety of data sources to get more insight into where we all choose to live.
  26. MaxPoint Interactive used their technology to find (down to the neighborhood-level) which areas of the country are most interested in BBQ foods. analyzed billions of data points consumed by neighborhoods across the U.S. such as offline point-of-sale data, social media, videos, music, local Web pages, and online magazines and recipes related to barbeque foods. They found that there were two very distinct neighborhood types when it comes to barbequing — those that prefer chicken and those that prefer pork, and Seattle and Portland Oregon were the top two rated cities when it comes to BBQ.
  27. Here we are looking at a facsimile of an old newspaper – you remember newspapers, right? Ironically, it was called the New York Mirror. And while this and so many other newspapers have bit the dust, one operation that is still in business is The Associated Press. If you are looking for large content repositories, you probably can't get much larger than the article archive of the Associated Press. Today they announced they have launched a content analysis tool that is used to search the millions of articles in their archives to create custom archive products for their customers. The project makes use of a solution from MarkLogic, a major Big Data enabler that is used by many different kinds of publishers for this type of purpose, such as Lexis/Nexis. The AP didn't start out by using the MarkLogic solution, but tried to implement a more traditional relational database structure only to run into problems. Their archives are in XML, which was difficult to design the right kind of data structures. Plus, they didn't have a consistent metadata collection across the archives. The MarkLogic implementation took 16 weeks from start to finish and was the first time that the AP had made use of their services. It enables them to run complex, Boolean searches across millions of articles in our content archive and get back precise returns in seconds or minutes instead of days or weeks. This much quicker response time is already transforming their B2B product offerings and helps them to manage searching for unstructured content in near real-time. Users can query for particular keywords, and the AP can use the search query traffic to see trending topics and deliver article collections to particular B2B customers. For example, they could create references on a particular subject or moment in time.
  28. The AP isn't the only journalistic Big Data effort going on. Here we are looking at the site for a company is called ScraperWiki.com and was started by Julian Todd and Aidan McGuire, two U.K.-based analysts who have been long involved in opening up government data to the public.
  29. This is showing data that was mined from the UN peacekeeping troop levels, as one example of what you can do with the scraperwiki site. They have lots of public data sets that are available for anyone to analyze and try to help journalists publish the information.
  30. Moving closer to home we are looking at a St Louis based company called Appistry that is being used by a lot of different people for Big Data applications, including FedEx's logistics apps, Sprint's fraud detection services, and at defense contractor Northrop Grumman. San Francisco-based Presidio Health used a variety of products to boost its cloud performance. "Presidio had to handle a 16 times increase in data volume in a year and replace some aging hardware," says its CTO Thomas Gregory. It was able to increase its computing power by 70% without increasing the costs of its IT equipment. "We didn't want a lot of capital expense, and we wanted an environment that was safe and could spread our risk around." Presidio uses a combination of Eclipse and Spring-based open source software and Appistry for handling its cloud services management. "Appistry has integration with Spring, it was easy to use and saved us months of effort to move our software into this environment," he said. "Plus we don't have to expose any of our services externally.”
  31. http://www.forbes.com/sites/kashmirhill/2012/02/16/how-target-figured-out-a-teen-girl-was-pregnant-before-her-father-did/
  32. http://blog.okcupid.com/index.php/gay-sex-vs-straight-sex/Ok, on to sex. The dating site Okcupid looked through more than 4 million matches that they have made to find out patterns about gay and straight sexual preferences. The median number of sexual partners for both men and women are six, exploding the myth that gays are more promiscuous,
  33. Here are straight people who either have had or would like to have a same-sex experience in the continental U.S. and lower Canada. You can see some sharp geographic divides.Awesomely, the mountain West lives up to its Brokeback reputation, and Canada is orange nearly coast-to-coast. Even in the yellow and blue areas, you can see pockets of gay curiosity in interesting places: Austin, Madison, Asheville. Anywhere soy milk is served, basically. This is based on millions of responses, On averageactive users have answered about 3000 questions; they've hidden the profiles of several thousand users they aren't interested in; they've voted for about 4000 profiles.
  34. Here is another example from the OKcupid data set. They asked their members which is bigger, the earth or the sun, and you can see how the results sorted on based on gender. Guys really are dumber, sad to say.
  35. One of my favorite Big Data hotbeds is Kaggle. They routinely hosts various big data contests and this one that concluded last month was a way for Facebook to evaluate prospective employees. More than 400 people submitted entries.
  36. Still think big data is a lot of bull? Well, not according to the USDA. 8 million Holstein dairy cows in the United States, there is exactly one bull that has been scientifically calculated to be the very best in the land. He goes by the name of Badger-Bluff Fanny Freddie, who has 346 daughters who are on the books already. Their equations predicted from his DNA that he would be the best bullUSDA research geneticist reviewed pedigree records and looked at things such as milk production and fat and protein content to optimize the breed. To give you an idea of how this industry has changed, In 1942 the average dairy cow produced less than 5,000 pounds of milk in its lifetime. Now, the average cow produces over 21,000 pounds of milk.
  37. These are automated milking machines from a Swedish company DeLaval. They are one of the vendors who are responsible for this incredible increase in milk production. You can see the small computer control station on the right and there is even an Internet connection so that farmers can monitor the milk collection remotely and running their herd from a laptop.
  38. Finally, wanted to end today's presentation on a high note, as this article published earlier this summer in Tech Republic mentions the shift in IT jobs from coding to analysis. You folks are on the leading edge of that trend and so you should be feeling good about yourselves.
  39. Here are some of the local meetups if you want to learn more about Big Data.
  40. Thanks everyone for listening to me and good luck with your own Big Data explorations.