SlideShare uma empresa Scribd logo
1 de 19
About the BDW Meetup
• Big Data is a complex, rapidly changing
landscape
• We want to share our stories and hear
about yours
• Great networking opportunity for like
minded data nerds
• Opportunities to collaborate on exciting
projects
• Founded by Caserta Concepts, DW, BI &
Big Data Analytics Consulting
• Next BDW Meetup: September 23
About Caserta Concepts
• Financial Services
• Healthcare / Insurance
• Retail / eCommerce
• Digital Media / Marketing
• K-12 / Higher Education
Industries Served
• President: Joe Caserta, industry thought leader,
consultant, educator and co-author, The Data
Warehouse ETL Toolkit (Wiley, 2004)
Founded in 2001
• Big Data Analytics
• Data Warehousing
• Business Intelligence
• Strategic Data
Ecosystems
Focused
Expertise
Implementation Expertise & Offerings
Strategic Roadmap/
Assessment/Consulting
Database
BI/Visualization/
Analytics
Master Data Management
Big Data
Analytics
Storm
WHY NOSQL MATTERS… FOR ANALYTICS
Elliott Cordo
Principal Consultant, Caserta Concepts
NOSQL So what?
• NoSQL is one of the most exciting movements in BIG DATA.
• NoSQL is changing the way a lot of people think about
application development  especially analytic applications
• Not all data is efficiently stored or processed in a relational DB.
• High data volumes
• Data does not fit or require relational model
• We have new tools in our arsenal for processing, storing, and
analyzing data with these new challenges
But we love SQL
• Relational databases still have their place
• Flexible
• Rich Query Syntax
• THEY HAVE JOINS AND AGGREGATION!!
• The relational DB is great at being general purpose!
• You build nice normalized structure, establish the logical
relationships and then you can build any query you need for your
application.
• This has kept us happy in the Data Warehousing (and
app-dev) world for decades!
Scale and Performance
Performance:
• Relational databases have a lot of features, overhead that
we in many cases don’t need.
Scale out:
• Most relational databases scale vertically giving them
limits to how large they can get. Federation and Sharding
is an awkward manual process.
• Most NoSQL scale horizontally on commodity hardware
But what will we sacrifice?
• Query Features:
• NoSQL DB’s have fairly simple query languages. Limited or no support for
the following outside of map reduce:
• Joins
• Aggregation
Why? - NoSQL databases were born to be high performance.
Data is stored as it is to be used (tuned to a query) rather than modeled
around entities. So a sophisticated query language is not needed.
• BI and ETL tool support limited
So what about NoSQL for Analytics?
• NoSQL databases are generally not as flexible as relational databases
for ad-hoc questions.
• Secondary indexes provide some flexibility but lack of Joins generally
requires denormalization
• Materialized views: Joins and aggregates can be implemented via Map
Reduce  However materializing the world has it’s drawbacks!
• A different way of doing things:
• Client-side join: query “dimensions” get key  query “fact” on secondary index.
• Link walking  leverage metadata link between entities  Riak!
• Ad-hoc Map Reduce jobs
• Aggregate Navigation  navigate from aggregate entities of different grain
• Search!
Limited native
BI Tool Support!
NoSQL can be a great fit for analytic
applications!!
• High volumes/Low Latency analytic
environments
• Queries are largely known and can be
precomuted in-stream (via application itself or
Storm) or in batch using Map Reduce
• Sweet spot is very high volumes with relatively
static analytic requirements.
• Common Design Pattern:
• Compute aggregates and events in-line and
store to aggregate entities in NoSQL
• Write enriched detail records to NoSQL or
Hadoop for further processing
RDBMS NoSQL
Volume
QueryFlexibility
BIG ETL
• One of the promises of BIG DATA is being able to enrich,
process, enormous data volumes
• The processing engines:
• Storm  inline, real-time processing
• Hadoop  batch processing
• NoSQL can play an integral part of this architecture:
• Distributed Lookup Cache
• Shared State
• Queueing Mechanism
Storm Topology
Data
Sources
Relational
EDW
Analytic Data-stores
NoSQL databases are like snowflakes or
Smurfs..
They are all special and no two are alike!
Let’s review the main categories and determine their
general fit for analytic applications
• Platforms: Riak, Redis
• Buckets/Collections are equivalent to a table in a RDMS
• Primary unit of storage is a key value pair, the value can be
anything ranging from number to a JSON Document.
• Key value stores are super fast and simple
• Analytic Capabilities:
• Although many have very spartan feature sets some platforms like Riak
have analytic friendly Links, Tags Metadata, and powerful map reduce
capabilities!
• Generally writes and reads are ultra-fast  good candidate for a BIG
ETL component
Key Value
• Platforms: Cassandra, HBase
• Column families are the equivalent to a table in a RDMS
• Primary unit of storage is a column, they are stored contiguously
Skinny Rows: Most like relational database. Except columns are optional and not
stored if omitted:
Wide Rows: Rows can be billions of columns wide, used for time series,
relationships, analytics!
Analytic Capabilities: Widely used in analytic applications Typical analytic
design pattern is to use “Skinny Rows” for detail records, “Wide Rows” for
aggregates.
Columnar
Document
• Platforms: MongoDB, CouchDB
• Collections are the equivalent to a table in a RDMS
• Primary unit of storage is a document
{ “User" : ”Bobby”,
“Email”: bobby@db-lover.com,
“Channel”: “Web”,
“State”: “NJ” }
{ “User" : ”Susie”,
“Email”: “Susie@sql-enthusiast.com”,
“PreferredCategories: [
{ Category: “Fashion”,
CategoryAdded: “2012-01-01” },
{ Category: “Outdoor Equipment”,
CategoryAdded: “2013-01-01” } ],
“Channel”: In-Store }
Analytic Capabilities: most similar to relational in function but requires
denormalization. Secondary index support, map reduce. Mongo has a cool
new aggregation framework.
In the Real World: High Volume Sensor
Analytics
• Ingestion and analytics of Sensor Data
• 6 to 12 BILLION records being ingested daily (average
500k records per second at peek load)!
• Ingested data must be stored to disk and highly available
• Pre-defined aggregates and event monitors must be near
real-time
• Ad-hoc query capabilities required on historical data
One way to do it.. that worked
Storm Cluster
Sensor
Data
d3.js Analytics
Hadoop Cluster
Low Latency
Analytics
Atomic data
Aggregates
Event Monitors
• The Kafka messaging system is used for ingestion
• Storm is used for real-time ETL and outputs atomic data
and derived data needed for analytics
• Redis is used as a reference data lookup cache
• Real time analytics are produced from the aggregated
data.
• Higher latency ad-hoc analytics are done in Hadoop
using Pig and Hive
Kafka
Parting Thought
Polyglot Persistence – “where any decent sized
enterprise will have a variety of different data storage
technologies for different kinds of data. There will still
be large amounts of it managed in relational stores,
but increasingly we'll be first asking how we want to
manipulate the data and only then figuring out what
technology is the best bet for it.”
-- Martin Fowler
Contact
Elliott Cordo
Principal Consultant, Caserta
Concepts
P: (855) 755-2246 x267
E: elliott@casertaconcepts.com
info@casertaconcepts.com
1(855) 755-2246
www.casertaconcepts.com

Mais conteúdo relacionado

Mais procurados

Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...
Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...
Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...DATAVERSITY
 
Introducing DocumentDB
Introducing DocumentDB Introducing DocumentDB
Introducing DocumentDB James Serra
 
Data Lake, Virtual Database, or Data Hub - How to Choose?
Data Lake, Virtual Database, or Data Hub - How to Choose?Data Lake, Virtual Database, or Data Hub - How to Choose?
Data Lake, Virtual Database, or Data Hub - How to Choose?DATAVERSITY
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...Institute of Contemporary Sciences
 
NoSQL Simplified: Schema vs. Schema-less
NoSQL Simplified: Schema vs. Schema-lessNoSQL Simplified: Schema vs. Schema-less
NoSQL Simplified: Schema vs. Schema-lessInfiniteGraph
 
Enabling digital transformation api ecosystems and data virtualization
Enabling digital transformation   api ecosystems and data virtualizationEnabling digital transformation   api ecosystems and data virtualization
Enabling digital transformation api ecosystems and data virtualizationDenodo
 
Top 5 Data Architecture Challenges with Ron Huizenga
Top 5 Data Architecture Challenges with Ron HuizengaTop 5 Data Architecture Challenges with Ron Huizenga
Top 5 Data Architecture Challenges with Ron HuizengaEmbarcadero Technologies
 
Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationCreating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationMongoDB
 
Technical Demonstration - Denodo Platform 7.0
Technical Demonstration - Denodo Platform 7.0Technical Demonstration - Denodo Platform 7.0
Technical Demonstration - Denodo Platform 7.0Denodo
 
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Data Con LA
 
Ovum Fireside Chat: Governing the data lake - Understanding what's in there
Ovum Fireside Chat: Governing the data lake - Understanding what's in thereOvum Fireside Chat: Governing the data lake - Understanding what's in there
Ovum Fireside Chat: Governing the data lake - Understanding what's in thereZaloni
 
Webinar -Data Warehouse Augmentation: Cut Costs, Increase Power
Webinar -Data Warehouse Augmentation: Cut Costs, Increase PowerWebinar -Data Warehouse Augmentation: Cut Costs, Increase Power
Webinar -Data Warehouse Augmentation: Cut Costs, Increase PowerZaloni
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefitsRicky Barron
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...Amazon Web Services
 
Best Practices to Deliver BI Solutions
Best Practices to Deliver BI SolutionsBest Practices to Deliver BI Solutions
Best Practices to Deliver BI SolutionsJames Serra
 
JBoss Enterprise Data Services (Data Virtualization)
JBoss Enterprise Data Services (Data Virtualization)JBoss Enterprise Data Services (Data Virtualization)
JBoss Enterprise Data Services (Data Virtualization)plarsen67
 
Dataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra SolutionsDataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra SolutionsQuontra Solutions
 
Webinar - Data Lake Management: Extending Storage and Lifecycle of Data
Webinar - Data Lake Management: Extending Storage and Lifecycle of DataWebinar - Data Lake Management: Extending Storage and Lifecycle of Data
Webinar - Data Lake Management: Extending Storage and Lifecycle of DataZaloni
 
Data Mesh at CMC Markets: Past, Present and Future
Data Mesh at CMC Markets: Past, Present and FutureData Mesh at CMC Markets: Past, Present and Future
Data Mesh at CMC Markets: Past, Present and FutureLorenzo Nicora
 

Mais procurados (20)

Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...
Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...
Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...
 
Introducing DocumentDB
Introducing DocumentDB Introducing DocumentDB
Introducing DocumentDB
 
Data Lake, Virtual Database, or Data Hub - How to Choose?
Data Lake, Virtual Database, or Data Hub - How to Choose?Data Lake, Virtual Database, or Data Hub - How to Choose?
Data Lake, Virtual Database, or Data Hub - How to Choose?
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 
NoSQL Simplified: Schema vs. Schema-less
NoSQL Simplified: Schema vs. Schema-lessNoSQL Simplified: Schema vs. Schema-less
NoSQL Simplified: Schema vs. Schema-less
 
Enabling digital transformation api ecosystems and data virtualization
Enabling digital transformation   api ecosystems and data virtualizationEnabling digital transformation   api ecosystems and data virtualization
Enabling digital transformation api ecosystems and data virtualization
 
Top 5 Data Architecture Challenges with Ron Huizenga
Top 5 Data Architecture Challenges with Ron HuizengaTop 5 Data Architecture Challenges with Ron Huizenga
Top 5 Data Architecture Challenges with Ron Huizenga
 
Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationCreating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital Transformation
 
Technical Demonstration - Denodo Platform 7.0
Technical Demonstration - Denodo Platform 7.0Technical Demonstration - Denodo Platform 7.0
Technical Demonstration - Denodo Platform 7.0
 
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
 
Ovum Fireside Chat: Governing the data lake - Understanding what's in there
Ovum Fireside Chat: Governing the data lake - Understanding what's in thereOvum Fireside Chat: Governing the data lake - Understanding what's in there
Ovum Fireside Chat: Governing the data lake - Understanding what's in there
 
Webinar -Data Warehouse Augmentation: Cut Costs, Increase Power
Webinar -Data Warehouse Augmentation: Cut Costs, Increase PowerWebinar -Data Warehouse Augmentation: Cut Costs, Increase Power
Webinar -Data Warehouse Augmentation: Cut Costs, Increase Power
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefits
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
 
Best Practices to Deliver BI Solutions
Best Practices to Deliver BI SolutionsBest Practices to Deliver BI Solutions
Best Practices to Deliver BI Solutions
 
JBoss Enterprise Data Services (Data Virtualization)
JBoss Enterprise Data Services (Data Virtualization)JBoss Enterprise Data Services (Data Virtualization)
JBoss Enterprise Data Services (Data Virtualization)
 
Dataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra SolutionsDataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra Solutions
 
Webinar - Data Lake Management: Extending Storage and Lifecycle of Data
Webinar - Data Lake Management: Extending Storage and Lifecycle of DataWebinar - Data Lake Management: Extending Storage and Lifecycle of Data
Webinar - Data Lake Management: Extending Storage and Lifecycle of Data
 
Data Mesh at CMC Markets: Past, Present and Future
Data Mesh at CMC Markets: Past, Present and FutureData Mesh at CMC Markets: Past, Present and Future
Data Mesh at CMC Markets: Past, Present and Future
 
How to build a successful Data Lake
How to build a successful Data LakeHow to build a successful Data Lake
How to build a successful Data Lake
 

Destaque

What Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data HubsWhat Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data HubsCloudera, Inc.
 
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Caserta
 
Data warehousing with Hadoop
Data warehousing with HadoopData warehousing with Hadoop
Data warehousing with Hadoophadooparchbook
 
Hadoop and Your Data Warehouse
Hadoop and Your Data WarehouseHadoop and Your Data Warehouse
Hadoop and Your Data WarehouseCaserta
 
Visual Design with Data
Visual Design with DataVisual Design with Data
Visual Design with DataSeth Familian
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017Drift
 
TEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of WorkTEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of WorkVolker Hirsch
 

Destaque (7)

What Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data HubsWhat Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
 
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
 
Data warehousing with Hadoop
Data warehousing with HadoopData warehousing with Hadoop
Data warehousing with Hadoop
 
Hadoop and Your Data Warehouse
Hadoop and Your Data WarehouseHadoop and Your Data Warehouse
Hadoop and Your Data Warehouse
 
Visual Design with Data
Visual Design with DataVisual Design with Data
Visual Design with Data
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017
 
TEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of WorkTEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of Work
 

Semelhante a Big Data Warehousing Meetup with Riak

NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabasesAdi Challa
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...MongoDB
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7abdulrahmanhelan
 
Startup Bootcamp - Intro to NoSQL/Big Data by DataZone
Startup Bootcamp - Intro to NoSQL/Big Data by DataZoneStartup Bootcamp - Intro to NoSQL/Big Data by DataZone
Startup Bootcamp - Intro to NoSQL/Big Data by DataZoneIdan Tohami
 
Oracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data ArchitectureOracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data ArchitectureArthur Gimpel
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Denodo
 
Architecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsArchitecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsCaserta
 
No sql now2011_review_of_adhoc_architectures
No sql now2011_review_of_adhoc_architecturesNo sql now2011_review_of_adhoc_architectures
No sql now2011_review_of_adhoc_architecturesNicholas Goodman
 
Introduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBIntroduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBAhmed Farag
 
Presentation big dataappliance-overview_oow_v3
Presentation   big dataappliance-overview_oow_v3Presentation   big dataappliance-overview_oow_v3
Presentation big dataappliance-overview_oow_v3xKinAnx
 
Gilbane Boston 2011 big data
Gilbane Boston 2011 big dataGilbane Boston 2011 big data
Gilbane Boston 2011 big dataPeter O'Kelly
 
Evolution of Distributed Database Technologies in the Digital era
Evolution of Distributed Database Technologies in the Digital eraEvolution of Distributed Database Technologies in the Digital era
Evolution of Distributed Database Technologies in the Digital eraVishal Puri
 
Business Intelligence & NoSQL Databases
Business Intelligence & NoSQL DatabasesBusiness Intelligence & NoSQL Databases
Business Intelligence & NoSQL DatabasesRadhoueneRouached
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataAshnikbiz
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineeringThang Bui (Bob)
 
Introduction to Bigdata and NoSQL
Introduction to Bigdata and NoSQLIntroduction to Bigdata and NoSQL
Introduction to Bigdata and NoSQLTushar Shende
 

Semelhante a Big Data Warehousing Meetup with Riak (20)

NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
 
UNIT-2.pptx
UNIT-2.pptxUNIT-2.pptx
UNIT-2.pptx
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7
 
Startup Bootcamp - Intro to NoSQL/Big Data by DataZone
Startup Bootcamp - Intro to NoSQL/Big Data by DataZoneStartup Bootcamp - Intro to NoSQL/Big Data by DataZone
Startup Bootcamp - Intro to NoSQL/Big Data by DataZone
 
NoSQL
NoSQLNoSQL
NoSQL
 
Oracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data ArchitectureOracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data Architecture
 
A peek into the future
A peek into the futureA peek into the future
A peek into the future
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
 
Architecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsArchitecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment Options
 
No sql now2011_review_of_adhoc_architectures
No sql now2011_review_of_adhoc_architecturesNo sql now2011_review_of_adhoc_architectures
No sql now2011_review_of_adhoc_architectures
 
Introduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBIntroduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDB
 
Presentation big dataappliance-overview_oow_v3
Presentation   big dataappliance-overview_oow_v3Presentation   big dataappliance-overview_oow_v3
Presentation big dataappliance-overview_oow_v3
 
Gilbane Boston 2011 big data
Gilbane Boston 2011 big dataGilbane Boston 2011 big data
Gilbane Boston 2011 big data
 
Evolution of Distributed Database Technologies in the Digital era
Evolution of Distributed Database Technologies in the Digital eraEvolution of Distributed Database Technologies in the Digital era
Evolution of Distributed Database Technologies in the Digital era
 
Business Intelligence & NoSQL Databases
Business Intelligence & NoSQL DatabasesBusiness Intelligence & NoSQL Databases
Business Intelligence & NoSQL Databases
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
NoSQL.pptx
NoSQL.pptxNoSQL.pptx
NoSQL.pptx
 
Introduction to Bigdata and NoSQL
Introduction to Bigdata and NoSQLIntroduction to Bigdata and NoSQL
Introduction to Bigdata and NoSQL
 

Mais de Caserta

Using Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingUsing Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingCaserta
 
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Caserta
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Caserta
 
General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017Caserta
 
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Caserta
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteCaserta
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Caserta
 
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Caserta
 
The Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseThe Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseCaserta
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Caserta
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Caserta
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?Caserta
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation Caserta
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for EveryoneCaserta
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure CloudCaserta
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the CloudCaserta
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on HadoopCaserta
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data LakeCaserta
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by DatabricksCaserta
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkCaserta
 

Mais de Caserta (20)

Using Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingUsing Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven Marketing
 
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
 
General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017
 
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)
 
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
 
The Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseThe Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's Enterprise
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for Everyone
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure Cloud
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the Cloud
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on Hadoop
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data Lake
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by Databricks
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache Spark
 

Último

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 

Último (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

Big Data Warehousing Meetup with Riak

  • 1. About the BDW Meetup • Big Data is a complex, rapidly changing landscape • We want to share our stories and hear about yours • Great networking opportunity for like minded data nerds • Opportunities to collaborate on exciting projects • Founded by Caserta Concepts, DW, BI & Big Data Analytics Consulting • Next BDW Meetup: September 23
  • 2. About Caserta Concepts • Financial Services • Healthcare / Insurance • Retail / eCommerce • Digital Media / Marketing • K-12 / Higher Education Industries Served • President: Joe Caserta, industry thought leader, consultant, educator and co-author, The Data Warehouse ETL Toolkit (Wiley, 2004) Founded in 2001 • Big Data Analytics • Data Warehousing • Business Intelligence • Strategic Data Ecosystems Focused Expertise
  • 3. Implementation Expertise & Offerings Strategic Roadmap/ Assessment/Consulting Database BI/Visualization/ Analytics Master Data Management Big Data Analytics Storm
  • 4. WHY NOSQL MATTERS… FOR ANALYTICS Elliott Cordo Principal Consultant, Caserta Concepts
  • 5. NOSQL So what? • NoSQL is one of the most exciting movements in BIG DATA. • NoSQL is changing the way a lot of people think about application development  especially analytic applications • Not all data is efficiently stored or processed in a relational DB. • High data volumes • Data does not fit or require relational model • We have new tools in our arsenal for processing, storing, and analyzing data with these new challenges
  • 6. But we love SQL • Relational databases still have their place • Flexible • Rich Query Syntax • THEY HAVE JOINS AND AGGREGATION!! • The relational DB is great at being general purpose! • You build nice normalized structure, establish the logical relationships and then you can build any query you need for your application. • This has kept us happy in the Data Warehousing (and app-dev) world for decades!
  • 7. Scale and Performance Performance: • Relational databases have a lot of features, overhead that we in many cases don’t need. Scale out: • Most relational databases scale vertically giving them limits to how large they can get. Federation and Sharding is an awkward manual process. • Most NoSQL scale horizontally on commodity hardware
  • 8. But what will we sacrifice? • Query Features: • NoSQL DB’s have fairly simple query languages. Limited or no support for the following outside of map reduce: • Joins • Aggregation Why? - NoSQL databases were born to be high performance. Data is stored as it is to be used (tuned to a query) rather than modeled around entities. So a sophisticated query language is not needed. • BI and ETL tool support limited
  • 9. So what about NoSQL for Analytics? • NoSQL databases are generally not as flexible as relational databases for ad-hoc questions. • Secondary indexes provide some flexibility but lack of Joins generally requires denormalization • Materialized views: Joins and aggregates can be implemented via Map Reduce  However materializing the world has it’s drawbacks! • A different way of doing things: • Client-side join: query “dimensions” get key  query “fact” on secondary index. • Link walking  leverage metadata link between entities  Riak! • Ad-hoc Map Reduce jobs • Aggregate Navigation  navigate from aggregate entities of different grain • Search! Limited native BI Tool Support!
  • 10. NoSQL can be a great fit for analytic applications!! • High volumes/Low Latency analytic environments • Queries are largely known and can be precomuted in-stream (via application itself or Storm) or in batch using Map Reduce • Sweet spot is very high volumes with relatively static analytic requirements. • Common Design Pattern: • Compute aggregates and events in-line and store to aggregate entities in NoSQL • Write enriched detail records to NoSQL or Hadoop for further processing RDBMS NoSQL Volume QueryFlexibility
  • 11. BIG ETL • One of the promises of BIG DATA is being able to enrich, process, enormous data volumes • The processing engines: • Storm  inline, real-time processing • Hadoop  batch processing • NoSQL can play an integral part of this architecture: • Distributed Lookup Cache • Shared State • Queueing Mechanism Storm Topology Data Sources Relational EDW Analytic Data-stores
  • 12. NoSQL databases are like snowflakes or Smurfs.. They are all special and no two are alike! Let’s review the main categories and determine their general fit for analytic applications
  • 13. • Platforms: Riak, Redis • Buckets/Collections are equivalent to a table in a RDMS • Primary unit of storage is a key value pair, the value can be anything ranging from number to a JSON Document. • Key value stores are super fast and simple • Analytic Capabilities: • Although many have very spartan feature sets some platforms like Riak have analytic friendly Links, Tags Metadata, and powerful map reduce capabilities! • Generally writes and reads are ultra-fast  good candidate for a BIG ETL component Key Value
  • 14. • Platforms: Cassandra, HBase • Column families are the equivalent to a table in a RDMS • Primary unit of storage is a column, they are stored contiguously Skinny Rows: Most like relational database. Except columns are optional and not stored if omitted: Wide Rows: Rows can be billions of columns wide, used for time series, relationships, analytics! Analytic Capabilities: Widely used in analytic applications Typical analytic design pattern is to use “Skinny Rows” for detail records, “Wide Rows” for aggregates. Columnar
  • 15. Document • Platforms: MongoDB, CouchDB • Collections are the equivalent to a table in a RDMS • Primary unit of storage is a document { “User" : ”Bobby”, “Email”: bobby@db-lover.com, “Channel”: “Web”, “State”: “NJ” } { “User" : ”Susie”, “Email”: “Susie@sql-enthusiast.com”, “PreferredCategories: [ { Category: “Fashion”, CategoryAdded: “2012-01-01” }, { Category: “Outdoor Equipment”, CategoryAdded: “2013-01-01” } ], “Channel”: In-Store } Analytic Capabilities: most similar to relational in function but requires denormalization. Secondary index support, map reduce. Mongo has a cool new aggregation framework.
  • 16. In the Real World: High Volume Sensor Analytics • Ingestion and analytics of Sensor Data • 6 to 12 BILLION records being ingested daily (average 500k records per second at peek load)! • Ingested data must be stored to disk and highly available • Pre-defined aggregates and event monitors must be near real-time • Ad-hoc query capabilities required on historical data
  • 17. One way to do it.. that worked Storm Cluster Sensor Data d3.js Analytics Hadoop Cluster Low Latency Analytics Atomic data Aggregates Event Monitors • The Kafka messaging system is used for ingestion • Storm is used for real-time ETL and outputs atomic data and derived data needed for analytics • Redis is used as a reference data lookup cache • Real time analytics are produced from the aggregated data. • Higher latency ad-hoc analytics are done in Hadoop using Pig and Hive Kafka
  • 18. Parting Thought Polyglot Persistence – “where any decent sized enterprise will have a variety of different data storage technologies for different kinds of data. There will still be large amounts of it managed in relational stores, but increasingly we'll be first asking how we want to manipulate the data and only then figuring out what technology is the best bet for it.” -- Martin Fowler
  • 19. Contact Elliott Cordo Principal Consultant, Caserta Concepts P: (855) 755-2246 x267 E: elliott@casertaconcepts.com info@casertaconcepts.com 1(855) 755-2246 www.casertaconcepts.com