SlideShare uma empresa Scribd logo
1 de 22
ECONOMY
BEHIND
BIG DATA
G R E G O R Y C H O I
W W W . M B A P R O G R A M M E R . C O M
BIG DATA VS DATA MINING
• Please don’ get confused with them! They are not
interchangeable
• I’ll explain why one by one
• Do you want to follow me?
BIG DATA
• It could be misleading that the goal of “Big Data” is to achieve
handle large scale data.
• The goal of Big data is to achieve “Scale-out” structure
– REDUCING COST
SCALE-UP VS SCALE-OUT
10 Core
10 Core
10 Core
10 Core
10 Core 10 Core 10 Core 10 Core
Scale -up
Scale – out
Increase computing power
in one machine
EXPENSIVE
Increase computing power by increasing the number of machine
CHEAP
SCALE-UP VS SCALE-OUT
• Think about this way
• Which one is cheaper?
– Quad-core (4 Core) PC x 2
– Octa-core (8 Core) PC x 1
• Generally Quad-core PC x 2 is cheaper than one octa-core PC.
– This is because only limited number of mother board makers produce the board
that support 8-core
WHY DO WE CHOOSE SCALE-OUT
OVER SCALE-UP STRUCTURE
THE DIFFICULTY OF SCALE-OUT
STRUCTURE
• How do we balance the CPU usage across the machines?
• If one machine fails, how do we manage it?
• How do we distribute the tasks to each machine?
• What if do we add one machine more?
• Conclusion: DIFFICULT
CASE 01 – BUSINESS TRANSACTION
IN RDBMS
• Let’s assume that we need to handle the 1 TB database
• 100 million transactions in a day
• You want to handle this without any failure
• You are a H/W architecture. What would you do?
H/W ARCHITECTURE FOR THAT
Commercial
DB
Unix
(40 Core)
Firewall / L2
Commercial
DB
Unix
(40 Core)
SAN Switch
Storage 1TB Storage 1TB
Mirroring
Cluster
ESTIMATED COST
[S/W]
DB License $5,000 / Core * 80 =
$400,000
Clustering $50,000
[H/W]
40 Core Unix x 2 = $1,000,000
Storage = $100,000
Switches = $30,000
Discretion: This is not an actual price. It depends on your sales history. I wrote this based upon my experienc
Total
Roughly
$2,000,000
PROBLEM
Your CFO probably tells you.
“That’s too expensive. Is there any
way to reduce the cost?”
CASE 02 – BUSINESS TRANSACTION
IN HADOOP
10 Core
HP DL380
x86
10 Core
HP DL380
x86
10 Core
HP DL380
x86
10 Core
HP DL380
x86
10 Core
HP DL380
x86
10 Core
HP DL380
x86
10 Core
HP DL380
x86
10 Core
HP DL380
x86
F/W
Switch
Suppose each server has 500 GB SCSI HDD. 500GB x 8 = 2 TB
It is able to support full mirroring option
ESTIMATED COST
[S/W]
Hadoop is open-source. It’s free!
[H/W]
10 Core x86 machine x 8 = $80,000
Switches = $30,000
Discretion: This is not an actual price. It depends on your sales history. I wrote this based upon my experienc
Total
Roughly
$110,000
vs $2,000,000 Unix +
Commercial DB
SCALABILITY
• Let’s assume that we have more customers. We need more computing
power.
[Unix + commercial DB]
I need to buy one more server, one more storage, and 40 core commercial DB license
=> Prohibitively expensive
[Linux + Hadoop]
Just add one more x86 server. It’s not a big deal.
=> Cheap
IS HADOOP ALIGHTY?
• No
– You have to use JAVA code in lieu of SQL
– You have to code Map-Reduce to retrieve the data or manipulate the data that
takes a form that you want.
– It doesn’t have sophisticated data management technology to get optimized
performance
– Open Source. Don’t expect any type of technical support
• With Commercial RDBMS, it has mutual supportive relationship.
– RDBMS: real time transaction
– Big Data: Business Intelligence
DATA MINING
• Please don’t get confused it with Big Data!
Where do we store the data How do we use the data
DATA MINING
Suppose that you are in charge of issuing credit cards.
You want to know who is likely to default…
You already have records of past transactions.
Gender Zipcode Age Education Income Default
Male 46637 33 Master $90,000 No
Female 10001 21 GED $50,000 Yes
… … … … … …
DATA MINING
Income
Age
35
$30,000
There is a certain group of people who are likely to
default.
ALGORITHMS
• K-nearest Algorithm
• Classification Tree
• Naïve Bayes
• Machine Learning
DATA MINING
• From existing data, identify the relationship between Y and X value.
– y=f(x1, x2, x3, …)
– It could be y = ax, y=log(x), y=exp(x). We don’t know, but machine is
capable of trying it to find out the best fitted model to account for Y value.
• AlphaGo, Google’s AI Go player, adopted this technology and advanced
it to ultimate level
– Y value: the probability to win this game
– X values: the positions of white and black stones
WHAT CAN WE DO WITH DATA
MINING?
• Combining with Big Data Technology
• Identify marketing opportunity
– Analyzing who has purchased our products?
• Financial Fraud
– Which transaction looks fraudulent?
• Artificial Intelligence
– Go, Chess, other games
• Etc.
Q&A
• If you have any question, feel free to ask me.
www.mbaprogrammer.com

Mais conteúdo relacionado

Último

introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 

Último (20)

Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
SHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationSHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions Presentation
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 

Destaque

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Destaque (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

The economy behind big data technology

  • 1. ECONOMY BEHIND BIG DATA G R E G O R Y C H O I W W W . M B A P R O G R A M M E R . C O M
  • 2. BIG DATA VS DATA MINING • Please don’ get confused with them! They are not interchangeable • I’ll explain why one by one • Do you want to follow me?
  • 3. BIG DATA • It could be misleading that the goal of “Big Data” is to achieve handle large scale data. • The goal of Big data is to achieve “Scale-out” structure – REDUCING COST
  • 4. SCALE-UP VS SCALE-OUT 10 Core 10 Core 10 Core 10 Core 10 Core 10 Core 10 Core 10 Core Scale -up Scale – out Increase computing power in one machine EXPENSIVE Increase computing power by increasing the number of machine CHEAP
  • 5. SCALE-UP VS SCALE-OUT • Think about this way • Which one is cheaper? – Quad-core (4 Core) PC x 2 – Octa-core (8 Core) PC x 1 • Generally Quad-core PC x 2 is cheaper than one octa-core PC. – This is because only limited number of mother board makers produce the board that support 8-core
  • 6. WHY DO WE CHOOSE SCALE-OUT OVER SCALE-UP STRUCTURE
  • 7. THE DIFFICULTY OF SCALE-OUT STRUCTURE • How do we balance the CPU usage across the machines? • If one machine fails, how do we manage it? • How do we distribute the tasks to each machine? • What if do we add one machine more? • Conclusion: DIFFICULT
  • 8. CASE 01 – BUSINESS TRANSACTION IN RDBMS • Let’s assume that we need to handle the 1 TB database • 100 million transactions in a day • You want to handle this without any failure • You are a H/W architecture. What would you do?
  • 9. H/W ARCHITECTURE FOR THAT Commercial DB Unix (40 Core) Firewall / L2 Commercial DB Unix (40 Core) SAN Switch Storage 1TB Storage 1TB Mirroring Cluster
  • 10. ESTIMATED COST [S/W] DB License $5,000 / Core * 80 = $400,000 Clustering $50,000 [H/W] 40 Core Unix x 2 = $1,000,000 Storage = $100,000 Switches = $30,000 Discretion: This is not an actual price. It depends on your sales history. I wrote this based upon my experienc Total Roughly $2,000,000
  • 11. PROBLEM Your CFO probably tells you. “That’s too expensive. Is there any way to reduce the cost?”
  • 12. CASE 02 – BUSINESS TRANSACTION IN HADOOP 10 Core HP DL380 x86 10 Core HP DL380 x86 10 Core HP DL380 x86 10 Core HP DL380 x86 10 Core HP DL380 x86 10 Core HP DL380 x86 10 Core HP DL380 x86 10 Core HP DL380 x86 F/W Switch Suppose each server has 500 GB SCSI HDD. 500GB x 8 = 2 TB It is able to support full mirroring option
  • 13. ESTIMATED COST [S/W] Hadoop is open-source. It’s free! [H/W] 10 Core x86 machine x 8 = $80,000 Switches = $30,000 Discretion: This is not an actual price. It depends on your sales history. I wrote this based upon my experienc Total Roughly $110,000 vs $2,000,000 Unix + Commercial DB
  • 14. SCALABILITY • Let’s assume that we have more customers. We need more computing power. [Unix + commercial DB] I need to buy one more server, one more storage, and 40 core commercial DB license => Prohibitively expensive [Linux + Hadoop] Just add one more x86 server. It’s not a big deal. => Cheap
  • 15. IS HADOOP ALIGHTY? • No – You have to use JAVA code in lieu of SQL – You have to code Map-Reduce to retrieve the data or manipulate the data that takes a form that you want. – It doesn’t have sophisticated data management technology to get optimized performance – Open Source. Don’t expect any type of technical support • With Commercial RDBMS, it has mutual supportive relationship. – RDBMS: real time transaction – Big Data: Business Intelligence
  • 16. DATA MINING • Please don’t get confused it with Big Data! Where do we store the data How do we use the data
  • 17. DATA MINING Suppose that you are in charge of issuing credit cards. You want to know who is likely to default… You already have records of past transactions. Gender Zipcode Age Education Income Default Male 46637 33 Master $90,000 No Female 10001 21 GED $50,000 Yes … … … … … …
  • 18. DATA MINING Income Age 35 $30,000 There is a certain group of people who are likely to default.
  • 19. ALGORITHMS • K-nearest Algorithm • Classification Tree • Naïve Bayes • Machine Learning
  • 20. DATA MINING • From existing data, identify the relationship between Y and X value. – y=f(x1, x2, x3, …) – It could be y = ax, y=log(x), y=exp(x). We don’t know, but machine is capable of trying it to find out the best fitted model to account for Y value. • AlphaGo, Google’s AI Go player, adopted this technology and advanced it to ultimate level – Y value: the probability to win this game – X values: the positions of white and black stones
  • 21. WHAT CAN WE DO WITH DATA MINING? • Combining with Big Data Technology • Identify marketing opportunity – Analyzing who has purchased our products? • Financial Fraud – Which transaction looks fraudulent? • Artificial Intelligence – Go, Chess, other games • Etc.
  • 22. Q&A • If you have any question, feel free to ask me. www.mbaprogrammer.com