SlideShare a Scribd company logo
1 of 13
Faiz ul haque Zeya
MS CS University of Tulsa,OK,USA
Topics covered











1. Introduction
2.Bigdata: how big it is
3.Bigdata Technology.
4. Few examples of Big Data.
5. Airline reservation system
6. Google Translate.
7.Amazon recommendation.
8. Netflix recommendation.
9. Hadoop, Map reduce.
10. Q&A.
Introduction
 Large set of data. Site of peta byte, exa byte.
 Not stored relational.
 Massive scale computational.
 NO SQL queries.

 New technology like MAP REDUCE,HADOOP.
 Reason: Scalability and poor performance on large

scale.
How large it is

 Peta byte 10^15
 Zetta byte 10^21

Exabyte 10^ 18

 Google processed about 24 petabytes of data per day in

2009.[
 Yahoo stores 2 petabytes of data on behavior.
 eBay.com uses two data warehouses at 7.5 petabytes
and 40PB as well as a 40PB Hadoop cluster for search,
consumer recommendations, and merchandising.
BigData Technologies
 Relational database,SQL queries cannot handle such

amount of data.
 Therefore other technologies are requried
 MAP REDUCE parallel computation.
Few examples of Big Data
 Airplane reservation system.
 Google Translate.
 Netflix Movie recommendation
 Amazon Book recommendation
Airline reservation system
 Oren Etzioni of Washington ‘s venture capital based






startup Farecast.
It predicts based on past data whether airline prices
will go up or down.
Etzioni uses predictive model for that.
Microsoft purchase it for 110 M $
Make it part of BING search engine.
GOOGLE Translate
 Whole internet as training data.Corpus
 Google release Trillion word corpus in 2009.
 They accept messy data.
 Candide uses 3 million translated sentences.

 Google uses billions of pages from intenet.
Netflix Million $ prize
 Netflix announced to award 1M$ prize for the team

who improves the recommendation algorithm by 5%.
 They are movie recommender.
 Most of the sales are due to recommendations from
the site.
 Reason is that so many shows that the user don’t even
know.
Amazon’s recommendation
 Amazon uses item to item recommendation instead of

traditional collaborative recommendation.
 Item to item recommendation search for similar items
rather than similar users.
 This approach is scalable to large data set.
Map Reduce
 Q&A

More Related Content

Similar to Big data introduction

Machine Learning on Big Data with HADOOP
Machine Learning on Big Data with HADOOPMachine Learning on Big Data with HADOOP
Machine Learning on Big Data with HADOOPEPAM Systems
 
Ps Appliance Overview
Ps Appliance OverviewPs Appliance Overview
Ps Appliance Overviewtvstay
 
Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Eli White
 
Improve your Tech Quotient
Improve your Tech QuotientImprove your Tech Quotient
Improve your Tech QuotientTarence DSouza
 
BigData Meets the Federal Data Center
BigData Meets the Federal Data CenterBigData Meets the Federal Data Center
BigData Meets the Federal Data CenterAbe Usher
 
Hadoop Webinar 28July15
Hadoop Webinar 28July15Hadoop Webinar 28July15
Hadoop Webinar 28July15Edureka!
 
Is It A Right Time For Me To Learn Hadoop. Find out ?
Is It A Right Time For Me To Learn Hadoop. Find out ?Is It A Right Time For Me To Learn Hadoop. Find out ?
Is It A Right Time For Me To Learn Hadoop. Find out ?Edureka!
 
How to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st centuryHow to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st centuryAli Dasdan
 
Big data analytics 1
Big data analytics 1Big data analytics 1
Big data analytics 1gauravsc36
 
Hadoop for beginners free course ppt
Hadoop for beginners   free course pptHadoop for beginners   free course ppt
Hadoop for beginners free course pptNjain85
 
Big Data Analytics from a Practitioners View
Big Data Analytics from a Practitioners ViewBig Data Analytics from a Practitioners View
Big Data Analytics from a Practitioners ViewRaghu Kashyap
 

Similar to Big data introduction (20)

Not Your Mom's SEO
Not Your Mom's SEONot Your Mom's SEO
Not Your Mom's SEO
 
A Big Data Concept
A Big Data ConceptA Big Data Concept
A Big Data Concept
 
HadoopWorkshopJuly2014
HadoopWorkshopJuly2014HadoopWorkshopJuly2014
HadoopWorkshopJuly2014
 
Machine Learning on Big Data with HADOOP
Machine Learning on Big Data with HADOOPMachine Learning on Big Data with HADOOP
Machine Learning on Big Data with HADOOP
 
Ps Appliance Overview
Ps Appliance OverviewPs Appliance Overview
Ps Appliance Overview
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011
 
Big Data
Big DataBig Data
Big Data
 
Improve your Tech Quotient
Improve your Tech QuotientImprove your Tech Quotient
Improve your Tech Quotient
 
ANALYTICS OF DATA USING HADOOP-A REVIEW
ANALYTICS OF DATA USING HADOOP-A REVIEWANALYTICS OF DATA USING HADOOP-A REVIEW
ANALYTICS OF DATA USING HADOOP-A REVIEW
 
BigData Meets the Federal Data Center
BigData Meets the Federal Data CenterBigData Meets the Federal Data Center
BigData Meets the Federal Data Center
 
Hadoop Webinar 28July15
Hadoop Webinar 28July15Hadoop Webinar 28July15
Hadoop Webinar 28July15
 
Is It A Right Time For Me To Learn Hadoop. Find out ?
Is It A Right Time For Me To Learn Hadoop. Find out ?Is It A Right Time For Me To Learn Hadoop. Find out ?
Is It A Right Time For Me To Learn Hadoop. Find out ?
 
How to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st centuryHow to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st century
 
Big data analytics 1
Big data analytics 1Big data analytics 1
Big data analytics 1
 
Big data business case
Big data   business caseBig data   business case
Big data business case
 
L21 Big Data and Analytics
L21 Big Data and AnalyticsL21 Big Data and Analytics
L21 Big Data and Analytics
 
Hadoop for beginners free course ppt
Hadoop for beginners   free course pptHadoop for beginners   free course ppt
Hadoop for beginners free course ppt
 
Big Data Analytics from a Practitioners View
Big Data Analytics from a Practitioners ViewBig Data Analytics from a Practitioners View
Big Data Analytics from a Practitioners View
 
Web 3.0 Emerging
Web 3.0 EmergingWeb 3.0 Emerging
Web 3.0 Emerging
 

More from Faiz Zeya

structureformal1.ppt
structureformal1.pptstructureformal1.ppt
structureformal1.pptFaiz Zeya
 
elementsofZ.pptx
elementsofZ.pptxelementsofZ.pptx
elementsofZ.pptxFaiz Zeya
 
lec3forma.pptx
lec3forma.pptxlec3forma.pptx
lec3forma.pptxFaiz Zeya
 
FOLBUKCFAIZ.pptx
FOLBUKCFAIZ.pptxFOLBUKCFAIZ.pptx
FOLBUKCFAIZ.pptxFaiz Zeya
 
Word2vec-08032022-012238pm (1).pptx
Word2vec-08032022-012238pm (1).pptxWord2vec-08032022-012238pm (1).pptx
Word2vec-08032022-012238pm (1).pptxFaiz Zeya
 
Code completion using OpenAI APIs.pptx
Code completion using OpenAI APIs.pptxCode completion using OpenAI APIs.pptx
Code completion using OpenAI APIs.pptxFaiz Zeya
 
Types of machine learning.pptx
Types of machine learning.pptxTypes of machine learning.pptx
Types of machine learning.pptxFaiz Zeya
 
Linear algebraweek2
Linear algebraweek2Linear algebraweek2
Linear algebraweek2Faiz Zeya
 
Query expansion for search improvement by faizulhaque
Query expansion for search improvement by faizulhaque Query expansion for search improvement by faizulhaque
Query expansion for search improvement by faizulhaque Faiz Zeya
 

More from Faiz Zeya (9)

structureformal1.ppt
structureformal1.pptstructureformal1.ppt
structureformal1.ppt
 
elementsofZ.pptx
elementsofZ.pptxelementsofZ.pptx
elementsofZ.pptx
 
lec3forma.pptx
lec3forma.pptxlec3forma.pptx
lec3forma.pptx
 
FOLBUKCFAIZ.pptx
FOLBUKCFAIZ.pptxFOLBUKCFAIZ.pptx
FOLBUKCFAIZ.pptx
 
Word2vec-08032022-012238pm (1).pptx
Word2vec-08032022-012238pm (1).pptxWord2vec-08032022-012238pm (1).pptx
Word2vec-08032022-012238pm (1).pptx
 
Code completion using OpenAI APIs.pptx
Code completion using OpenAI APIs.pptxCode completion using OpenAI APIs.pptx
Code completion using OpenAI APIs.pptx
 
Types of machine learning.pptx
Types of machine learning.pptxTypes of machine learning.pptx
Types of machine learning.pptx
 
Linear algebraweek2
Linear algebraweek2Linear algebraweek2
Linear algebraweek2
 
Query expansion for search improvement by faizulhaque
Query expansion for search improvement by faizulhaque Query expansion for search improvement by faizulhaque
Query expansion for search improvement by faizulhaque
 

Recently uploaded

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 

Recently uploaded (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

Big data introduction

  • 1. Faiz ul haque Zeya MS CS University of Tulsa,OK,USA
  • 2. Topics covered           1. Introduction 2.Bigdata: how big it is 3.Bigdata Technology. 4. Few examples of Big Data. 5. Airline reservation system 6. Google Translate. 7.Amazon recommendation. 8. Netflix recommendation. 9. Hadoop, Map reduce. 10. Q&A.
  • 3. Introduction  Large set of data. Site of peta byte, exa byte.  Not stored relational.  Massive scale computational.  NO SQL queries.  New technology like MAP REDUCE,HADOOP.  Reason: Scalability and poor performance on large scale.
  • 4. How large it is  Peta byte 10^15  Zetta byte 10^21 Exabyte 10^ 18  Google processed about 24 petabytes of data per day in 2009.[  Yahoo stores 2 petabytes of data on behavior.  eBay.com uses two data warehouses at 7.5 petabytes and 40PB as well as a 40PB Hadoop cluster for search, consumer recommendations, and merchandising.
  • 5.
  • 6. BigData Technologies  Relational database,SQL queries cannot handle such amount of data.  Therefore other technologies are requried  MAP REDUCE parallel computation.
  • 7. Few examples of Big Data  Airplane reservation system.  Google Translate.  Netflix Movie recommendation  Amazon Book recommendation
  • 8. Airline reservation system  Oren Etzioni of Washington ‘s venture capital based     startup Farecast. It predicts based on past data whether airline prices will go up or down. Etzioni uses predictive model for that. Microsoft purchase it for 110 M $ Make it part of BING search engine.
  • 9. GOOGLE Translate  Whole internet as training data.Corpus  Google release Trillion word corpus in 2009.  They accept messy data.  Candide uses 3 million translated sentences.  Google uses billions of pages from intenet.
  • 10. Netflix Million $ prize  Netflix announced to award 1M$ prize for the team who improves the recommendation algorithm by 5%.  They are movie recommender.  Most of the sales are due to recommendations from the site.  Reason is that so many shows that the user don’t even know.
  • 11. Amazon’s recommendation  Amazon uses item to item recommendation instead of traditional collaborative recommendation.  Item to item recommendation search for similar items rather than similar users.  This approach is scalable to large data set.