SlideShare a Scribd company logo
1 of 16
Wrestling Large Data Volumes to the Ground Daniel Austin Yahoo! Exceptional Performance March 24, 2011 Large-Scale Production Engineering Meetup
Agenda: A Boy and His Prototype ,[object Object],[object Object],[object Object]
Project X: Hi Performance, Low Budget ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Business Intelligence in Three Acts Data Collection Data Storage Data Analysis (out of scope)
Data Collection: Tools – Gomez Last Mile ,[object Object],[object Object],[object Object],[object Object],[object Object]
Data Collection: Tools – Talend Open Studio ,[object Object],[object Object],[object Object],[object Object],[object Object]
Data Storage: Tools – Infobright MySQL Engine ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Intro to Data Products ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
How to design ETL chains? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Diagram: ETL for Data Validation  2. Semantic Validation Step 1. Syntactic Validation Step
Simple 3NF Level 1 Data Model for HTTP ,[object Object],[object Object],[object Object],[object Object]
Level 1: The Boss Battle!
Some Best Practices ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Project X: What We Learned ,[object Object],[object Object],[object Object],[object Object],[object Object]
Endgame! Analysis in Near Real-Time
Thank You! Daniel Austin Yahoo! Exceptional Performance @daniel_b_austin [email_address] March 24, 2011 Large-Scale Production Engineering Meetup

More Related Content

What's hot

Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)
Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)
Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)TAUS - The Language Data Network
 
Matlab-Assignment-Projects
Matlab-Assignment-ProjectsMatlab-Assignment-Projects
Matlab-Assignment-ProjectsPhdtopiccom
 
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016MLconf
 
Space Codesign at TandemLaunch 20150414
Space Codesign at TandemLaunch 20150414Space Codesign at TandemLaunch 20150414
Space Codesign at TandemLaunch 20150414Space Codesign
 
Reproducibility with Unstructured Data in 3 steps
Reproducibility with Unstructured Data in 3 stepsReproducibility with Unstructured Data in 3 steps
Reproducibility with Unstructured Data in 3 stepsGleb Mezhanskiy
 
VSSML17 Review. Summary Day 2 Sessions
VSSML17 Review. Summary Day 2 SessionsVSSML17 Review. Summary Day 2 Sessions
VSSML17 Review. Summary Day 2 SessionsBigML, Inc
 
Matlab-Assignment-Help-India
Matlab-Assignment-Help-IndiaMatlab-Assignment-Help-India
Matlab-Assignment-Help-IndiaPhdtopiccom
 
Matlab-Master-Thesis-Projects
Matlab-Master-Thesis-ProjectsMatlab-Master-Thesis-Projects
Matlab-Master-Thesis-ProjectsPhdtopiccom
 

What's hot (8)

Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)
Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)
Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)
 
Matlab-Assignment-Projects
Matlab-Assignment-ProjectsMatlab-Assignment-Projects
Matlab-Assignment-Projects
 
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016
 
Space Codesign at TandemLaunch 20150414
Space Codesign at TandemLaunch 20150414Space Codesign at TandemLaunch 20150414
Space Codesign at TandemLaunch 20150414
 
Reproducibility with Unstructured Data in 3 steps
Reproducibility with Unstructured Data in 3 stepsReproducibility with Unstructured Data in 3 steps
Reproducibility with Unstructured Data in 3 steps
 
VSSML17 Review. Summary Day 2 Sessions
VSSML17 Review. Summary Day 2 SessionsVSSML17 Review. Summary Day 2 Sessions
VSSML17 Review. Summary Day 2 Sessions
 
Matlab-Assignment-Help-India
Matlab-Assignment-Help-IndiaMatlab-Assignment-Help-India
Matlab-Assignment-Help-India
 
Matlab-Master-Thesis-Projects
Matlab-Master-Thesis-ProjectsMatlab-Master-Thesis-Projects
Matlab-Master-Thesis-Projects
 

Viewers also liked

Webnews The assistance and salvage tugs draw crowds
Webnews The assistance and salvage tugs draw crowdsWebnews The assistance and salvage tugs draw crowds
Webnews The assistance and salvage tugs draw crowdsBOURBON
 
Loi C-28 Anti-pourriel : Comment se préparer ?
Loi C-28 Anti-pourriel : Comment se préparer ?Loi C-28 Anti-pourriel : Comment se préparer ?
Loi C-28 Anti-pourriel : Comment se préparer ?Cyberimpact
 
I Will!
I Will!I Will!
I Will!pbarto
 
Présentation de la conférence Horizon 2012
Présentation de la conférence Horizon 2012Présentation de la conférence Horizon 2012
Présentation de la conférence Horizon 2012BOURBON
 

Viewers also liked (6)

Big Foot Conferenece. June 5. Opening Slideshow. The Carpathians_Harald Egerer
Big Foot Conferenece. June 5. Opening Slideshow. The Carpathians_Harald EgererBig Foot Conferenece. June 5. Opening Slideshow. The Carpathians_Harald Egerer
Big Foot Conferenece. June 5. Opening Slideshow. The Carpathians_Harald Egerer
 
Webnews The assistance and salvage tugs draw crowds
Webnews The assistance and salvage tugs draw crowdsWebnews The assistance and salvage tugs draw crowds
Webnews The assistance and salvage tugs draw crowds
 
Loi C-28 Anti-pourriel : Comment se préparer ?
Loi C-28 Anti-pourriel : Comment se préparer ?Loi C-28 Anti-pourriel : Comment se préparer ?
Loi C-28 Anti-pourriel : Comment se préparer ?
 
I Will!
I Will!I Will!
I Will!
 
Intergenerational Leaning Handbook_Thomas Fischer_MENON
Intergenerational Leaning Handbook_Thomas Fischer_MENONIntergenerational Leaning Handbook_Thomas Fischer_MENON
Intergenerational Leaning Handbook_Thomas Fischer_MENON
 
Présentation de la conférence Horizon 2012
Présentation de la conférence Horizon 2012Présentation de la conférence Horizon 2012
Présentation de la conférence Horizon 2012
 

Similar to Wrestling Large Data Volumes to the Ground

Etl with apache impala by athemaster
Etl with apache impala by athemasterEtl with apache impala by athemaster
Etl with apache impala by athemasterAthemaster Co., Ltd.
 
Performance Analysis of Idle Programs
Performance Analysis of Idle ProgramsPerformance Analysis of Idle Programs
Performance Analysis of Idle Programsgreenwop
 
ShaREing Is Caring
ShaREing Is CaringShaREing Is Caring
ShaREing Is Caringsporst
 
Building a Testable Data Access Layer
Building a Testable Data Access LayerBuilding a Testable Data Access Layer
Building a Testable Data Access LayerTodd Anglin
 
Dataflow in 104corp - DataConTW2018
Dataflow in 104corp - DataConTW2018Dataflow in 104corp - DataConTW2018
Dataflow in 104corp - DataConTW2018Gavin Lin
 
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010ivan provalov
 
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...Yann Cluchey
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...Databricks
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop User Group
 
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftPowering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftJie Li
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist SoftServe
 
ConFoo 2017: Introduction to performance optimization of .NET web apps
ConFoo 2017: Introduction to performance optimization of .NET web appsConFoo 2017: Introduction to performance optimization of .NET web apps
ConFoo 2017: Introduction to performance optimization of .NET web appsPierre-Luc Maheu
 
Productionalizing ML : Real Experience
Productionalizing ML : Real ExperienceProductionalizing ML : Real Experience
Productionalizing ML : Real ExperienceIhor Bobak
 
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Databricks
 
Java EE 7 with Apache Spark for the World’s Largest Credit Card Core Systems ...
Java EE 7 with Apache Spark for the World’s Largest Credit Card Core Systems ...Java EE 7 with Apache Spark for the World’s Largest Credit Card Core Systems ...
Java EE 7 with Apache Spark for the World’s Largest Credit Card Core Systems ...Hirofumi Iwasaki
 
Natural Laws of Software Performance
Natural Laws of Software PerformanceNatural Laws of Software Performance
Natural Laws of Software PerformanceGibraltar Software
 
PHP Performance: Principles and tools
PHP Performance: Principles and toolsPHP Performance: Principles and tools
PHP Performance: Principles and tools10n Software, LLC
 
[2C6]Everyplay_Big_Data
[2C6]Everyplay_Big_Data[2C6]Everyplay_Big_Data
[2C6]Everyplay_Big_DataNAVER D2
 
Why Wordnik went non-relational
Why Wordnik went non-relationalWhy Wordnik went non-relational
Why Wordnik went non-relationalTony Tam
 

Similar to Wrestling Large Data Volumes to the Ground (20)

Etl with apache impala by athemaster
Etl with apache impala by athemasterEtl with apache impala by athemaster
Etl with apache impala by athemaster
 
Performance Analysis of Idle Programs
Performance Analysis of Idle ProgramsPerformance Analysis of Idle Programs
Performance Analysis of Idle Programs
 
ShaREing Is Caring
ShaREing Is CaringShaREing Is Caring
ShaREing Is Caring
 
Building a Testable Data Access Layer
Building a Testable Data Access LayerBuilding a Testable Data Access Layer
Building a Testable Data Access Layer
 
Dataflow in 104corp - DataConTW2018
Dataflow in 104corp - DataConTW2018Dataflow in 104corp - DataConTW2018
Dataflow in 104corp - DataConTW2018
 
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
 
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftPowering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
 
ConFoo 2017: Introduction to performance optimization of .NET web apps
ConFoo 2017: Introduction to performance optimization of .NET web appsConFoo 2017: Introduction to performance optimization of .NET web apps
ConFoo 2017: Introduction to performance optimization of .NET web apps
 
Productionalizing ML : Real Experience
Productionalizing ML : Real ExperienceProductionalizing ML : Real Experience
Productionalizing ML : Real Experience
 
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
 
Java EE 7 with Apache Spark for the World’s Largest Credit Card Core Systems ...
Java EE 7 with Apache Spark for the World’s Largest Credit Card Core Systems ...Java EE 7 with Apache Spark for the World’s Largest Credit Card Core Systems ...
Java EE 7 with Apache Spark for the World’s Largest Credit Card Core Systems ...
 
Natural Laws of Software Performance
Natural Laws of Software PerformanceNatural Laws of Software Performance
Natural Laws of Software Performance
 
PHP Performance: Principles and tools
PHP Performance: Principles and toolsPHP Performance: Principles and tools
PHP Performance: Principles and tools
 
[2C6]Everyplay_Big_Data
[2C6]Everyplay_Big_Data[2C6]Everyplay_Big_Data
[2C6]Everyplay_Big_Data
 
Why Wordnik went non-relational
Why Wordnik went non-relationalWhy Wordnik went non-relational
Why Wordnik went non-relational
 

More from Daniel Austin

Next generation web protocols
Next generation web protocolsNext generation web protocols
Next generation web protocolsDaniel Austin
 
Always Offline: Delay-Tolerant Networking for the Internet of Things
Always Offline: Delay-Tolerant Networking for the Internet of ThingsAlways Offline: Delay-Tolerant Networking for the Internet of Things
Always Offline: Delay-Tolerant Networking for the Internet of ThingsDaniel Austin
 
Performance: How Fast is Fast Enough?
Performance: How Fast is Fast Enough?Performance: How Fast is Fast Enough?
Performance: How Fast is Fast Enough?Daniel Austin
 
Big Data and the Future of Money 2014
Big Data and the Future of Money 2014Big Data and the Future of Money 2014
Big Data and the Future of Money 2014Daniel Austin
 
Big data comes in small packages v1.2
Big data comes in small packages v1.2Big data comes in small packages v1.2
Big data comes in small packages v1.2Daniel Austin
 
Designing Delay-tolerant Data Services for the Network of Things
Designing Delay-tolerant Data Services for the Network of ThingsDesigning Delay-tolerant Data Services for the Network of Things
Designing Delay-tolerant Data Services for the Network of ThingsDaniel Austin
 
Web Performance Bootcamp 2014
Web Performance Bootcamp 2014Web Performance Bootcamp 2014
Web Performance Bootcamp 2014Daniel Austin
 
HTML5, HTTP2, and You 1.1
HTML5, HTTP2, and You 1.1HTML5, HTTP2, and You 1.1
HTML5, HTTP2, and You 1.1Daniel Austin
 
Managing Performance Globally with MySQL
Managing Performance Globally with MySQLManaging Performance Globally with MySQL
Managing Performance Globally with MySQLDaniel Austin
 
Web Performance BootCamp 2013
Web Performance BootCamp 2013Web Performance BootCamp 2013
Web Performance BootCamp 2013Daniel Austin
 
Perspectives on the Evolution of HTML
Perspectives on the Evolution of HTMLPerspectives on the Evolution of HTML
Perspectives on the Evolution of HTMLDaniel Austin
 
The Fastest Possible Search Algorithm: Grover's Search and the World of Quant...
The Fastest Possible Search Algorithm: Grover's Search and the World of Quant...The Fastest Possible Search Algorithm: Grover's Search and the World of Quant...
The Fastest Possible Search Algorithm: Grover's Search and the World of Quant...Daniel Austin
 
Quantum Computing in a Nutshell: Grover's Search and the World of Quantum Com...
Quantum Computing in a Nutshell: Grover's Search and the World of Quantum Com...Quantum Computing in a Nutshell: Grover's Search and the World of Quantum Com...
Quantum Computing in a Nutshell: Grover's Search and the World of Quantum Com...Daniel Austin
 
Reconceiving the Web as a Distributed (NoSQL) Data System
Reconceiving the Web as a Distributed (NoSQL) Data SystemReconceiving the Web as a Distributed (NoSQL) Data System
Reconceiving the Web as a Distributed (NoSQL) Data SystemDaniel Austin
 
Big data and the Future of Money (World Big Data Congress 2013)
Big data and the Future of Money (World Big Data Congress 2013)Big data and the Future of Money (World Big Data Congress 2013)
Big data and the Future of Money (World Big Data Congress 2013)Daniel Austin
 
Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)
Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)
Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)Daniel Austin
 
Performance analysisclass
Performance analysisclassPerformance analysisclass
Performance analysisclassDaniel Austin
 
Yes sql08 inmemorydb
Yes sql08 inmemorydbYes sql08 inmemorydb
Yes sql08 inmemorydbDaniel Austin
 
The Fastest Possible Search Algorithm
The Fastest Possible Search AlgorithmThe Fastest Possible Search Algorithm
The Fastest Possible Search AlgorithmDaniel Austin
 
A Global In-memory Data System for MySQL
A Global In-memory Data System for MySQLA Global In-memory Data System for MySQL
A Global In-memory Data System for MySQLDaniel Austin
 

More from Daniel Austin (20)

Next generation web protocols
Next generation web protocolsNext generation web protocols
Next generation web protocols
 
Always Offline: Delay-Tolerant Networking for the Internet of Things
Always Offline: Delay-Tolerant Networking for the Internet of ThingsAlways Offline: Delay-Tolerant Networking for the Internet of Things
Always Offline: Delay-Tolerant Networking for the Internet of Things
 
Performance: How Fast is Fast Enough?
Performance: How Fast is Fast Enough?Performance: How Fast is Fast Enough?
Performance: How Fast is Fast Enough?
 
Big Data and the Future of Money 2014
Big Data and the Future of Money 2014Big Data and the Future of Money 2014
Big Data and the Future of Money 2014
 
Big data comes in small packages v1.2
Big data comes in small packages v1.2Big data comes in small packages v1.2
Big data comes in small packages v1.2
 
Designing Delay-tolerant Data Services for the Network of Things
Designing Delay-tolerant Data Services for the Network of ThingsDesigning Delay-tolerant Data Services for the Network of Things
Designing Delay-tolerant Data Services for the Network of Things
 
Web Performance Bootcamp 2014
Web Performance Bootcamp 2014Web Performance Bootcamp 2014
Web Performance Bootcamp 2014
 
HTML5, HTTP2, and You 1.1
HTML5, HTTP2, and You 1.1HTML5, HTTP2, and You 1.1
HTML5, HTTP2, and You 1.1
 
Managing Performance Globally with MySQL
Managing Performance Globally with MySQLManaging Performance Globally with MySQL
Managing Performance Globally with MySQL
 
Web Performance BootCamp 2013
Web Performance BootCamp 2013Web Performance BootCamp 2013
Web Performance BootCamp 2013
 
Perspectives on the Evolution of HTML
Perspectives on the Evolution of HTMLPerspectives on the Evolution of HTML
Perspectives on the Evolution of HTML
 
The Fastest Possible Search Algorithm: Grover's Search and the World of Quant...
The Fastest Possible Search Algorithm: Grover's Search and the World of Quant...The Fastest Possible Search Algorithm: Grover's Search and the World of Quant...
The Fastest Possible Search Algorithm: Grover's Search and the World of Quant...
 
Quantum Computing in a Nutshell: Grover's Search and the World of Quantum Com...
Quantum Computing in a Nutshell: Grover's Search and the World of Quantum Com...Quantum Computing in a Nutshell: Grover's Search and the World of Quantum Com...
Quantum Computing in a Nutshell: Grover's Search and the World of Quantum Com...
 
Reconceiving the Web as a Distributed (NoSQL) Data System
Reconceiving the Web as a Distributed (NoSQL) Data SystemReconceiving the Web as a Distributed (NoSQL) Data System
Reconceiving the Web as a Distributed (NoSQL) Data System
 
Big data and the Future of Money (World Big Data Congress 2013)
Big data and the Future of Money (World Big Data Congress 2013)Big data and the Future of Money (World Big Data Congress 2013)
Big data and the Future of Money (World Big Data Congress 2013)
 
Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)
Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)
Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)
 
Performance analysisclass
Performance analysisclassPerformance analysisclass
Performance analysisclass
 
Yes sql08 inmemorydb
Yes sql08 inmemorydbYes sql08 inmemorydb
Yes sql08 inmemorydb
 
The Fastest Possible Search Algorithm
The Fastest Possible Search AlgorithmThe Fastest Possible Search Algorithm
The Fastest Possible Search Algorithm
 
A Global In-memory Data System for MySQL
A Global In-memory Data System for MySQLA Global In-memory Data System for MySQL
A Global In-memory Data System for MySQL
 

Recently uploaded

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Wrestling Large Data Volumes to the Ground

  • 1. Wrestling Large Data Volumes to the Ground Daniel Austin Yahoo! Exceptional Performance March 24, 2011 Large-Scale Production Engineering Meetup
  • 2.
  • 3.
  • 4. Business Intelligence in Three Acts Data Collection Data Storage Data Analysis (out of scope)
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10. Diagram: ETL for Data Validation 2. Semantic Validation Step 1. Syntactic Validation Step
  • 11.
  • 12. Level 1: The Boss Battle!
  • 13.
  • 14.
  • 15. Endgame! Analysis in Near Real-Time
  • 16. Thank You! Daniel Austin Yahoo! Exceptional Performance @daniel_b_austin [email_address] March 24, 2011 Large-Scale Production Engineering Meetup

Editor's Notes

  1. One other thing I forgot to mention: no budget
  2. Some very basic architectural building blocks
  3. Heresy! Normalized Dates and Times!