SlideShare uma empresa Scribd logo
1 de 122
Database Shootout:
what's best for BI?
2
The New Data Warehousing
Source: Timo Elliot, SAP
3
Let's just buy Teradata...
Forrester Wave, as of April 2011
Gartner Group MQ, as of February 2012
4
Or not...
5
But...
⇨
Back to basics: BI & DWHBack to basics: BI & DWH
⇨
The Need for SpeedThe Need for Speed
⇨
Database ArchitecturesDatabase Architectures
⇨
#BigData & the Hadoop Hoopla#BigData & the Hadoop Hoopla
⇨
The forgotten power ofThe forgotten power of Olap & MDXOlap & MDX
⇨
A Cloudy future?A Cloudy future?
⇨
Shootout: Evaluating alternativesShootout: Evaluating alternatives
7
”
“
Business Intelligence (BI)
Process of identifying, collecting,
combining, analyzing, interpreting
and communicating internal and
external information to support
decision making processes
Concepts and methods to improveConcepts and methods to improve
business decision making by usingbusiness decision making by using
fact-based support systemsfact-based support systems
“
”
First definition of BI: 1958!
8
Business Intelligence is....
Doing useful stuff with data, in order to…
Support the
Decision Making
process
So why not simply use
Decision Support Systems?
9
How it all started in 1958...
⇨ Hans Peter Luhn (IBM) → A
Business Intelligence System
The notion of intelligence is also defined here, in a
more general sense, as the “ability to apprehend
the interrelationships of presented facts in such a
way as to guide action towards a desired goal.”
Full text on Timo Elliott's blog:
http://timoelliott.com/blog/2007/11/the_real_pioneer_of_busin
ess_i.html
10
Luhn's Vision (1958!)
A Business Intelligence System
Abstract: An automatic system is being developed to disseminate
information to the various sections of any industrial, scientific or
government organization. This intelligence system will utilize data-
processing machines for auto-abstracting and auto-encoding of documents
and for creating interest profiles for each of the “action points” in an
organization. Both incoming and internally generated documents are
automatically abstracted, characterized by a word pattern, and sent
automatically to appropriate action points. This paper shows the flexibility
of such a system in identifying known information, in finding who needs to
know it and in disseminating it efficiently either in abstract form or as a
complete document.
11
BI is dead; long live Analytics?
12
12
Business Analytics
Evolution
13
The Evolution of Enterprise Business
Intelligence
Enterprise Decision Management
Embracing all relevant data sources
BI injected into everyday business
processes
Master Data Management
Advanced Data Mining / Analytics
Business Activity Monitoring
Common Information &
Processes
BusinessValue
Disconnected Silos of
Information
Query/Reporting/Online Analytical Processing
Content Management/Data Warehousing
Search
2000 2005 2010 2015
Slide 13
Image courtesy Jim Fitzgerald, IBM research
14
Evolving Business Intelligence Platform Requirements
Image courtesy Teradata
15
Passive Monitoring: BI Starting Point
Source: Mark Madsen, Third Nature
16
Supporting Better Analysis: Common Next Step
Source: Mark Madsen, Third Nature
17
Active Monitoring
Source: Mark Madsen, Third Nature
18
Active Monitoring and Feedback
Source: Mark Madsen, Third Nature
19
Analysis
Source: Mark Madsen, Third Nature
20
Prediction (passive)
Source: Mark Madsen, Third Nature
21
Active Prediction
Source: Mark Madsen, Third Nature
22
Prescription and Enterprise Decision Management
Source: Mark Madsen, Third Nature
24
Watch out, the world is changing
750TB per week (compressed)
⇨ Sensor data
⇨ RFID
⇨ Sentiment Analysis
⇨
Text mining
⇨ Location data
25
Machines generate most data
26
Remember the Origins
•The general conception of a
separate architecture for BI has been
around longer, but this is the first
formal relational architecture and
definition published.
•One thing left out of most designs:
the box labeled business process
definitions.
“An architecture for a business and
information system”, B. A. Devlin, P. T.
Murphy, IBM Systems Journal, Vol.27,
No. 1, (1988)
27
2012: we're still doing this!
Staging
Area
CSV
Files
ETL
ERP
DBMS
Sources ETL Process Data Warehouse EUL
DBMS
Files
ETL
Central DWH &
Data Marts
DBMS ETL
End User Layer,
in case you were
wondering ;-)
28
We’ve (also) accumulated over 20 years of changes
Databases Documents Flat Files XML Queues ERP Applications
Source Environments
Data Consumers
Databases Dashboards OLAP Productivity BAM/BPM Reporting ETL Data Mining Applications
Warehouse
Database
ETL
Marts
ODS
EDR EII
Content
Store
EAI
Stream
processing
SQL Service API
29
The assumption of the warehouse as a
database is gone
29
Traditional tabular
or structured data
Data at rest
Non-traditional
data (logs, audio,
documents)
Parallel
programming
platforms
Databases
Streaming
DBs/engines
Message
streams
Data in motion
Slide 29
Copyright Third Nature, Inc.
30
The Need for Speed
31
Why BI Projects Fail?
1. Query Performance Too Slow
(BI Survey 9)
70% of DWH's experience
performance constrained issues
of various types
(Gartner DWH MQ 2010)
Poor Query Performance
No 1 reason for replacing DWH
(TDWI Best Practices)
32
Two dimensions of Speed
Companies wishing to maximize BI benefits
should focus on
1) support quality
2) implementation timeimplementation time
3) query response timequery response time and
4) breadth of deployment,
in that order.
33
Minimize Implementation Time
Use 'RTF': vs
Off the shelf: Etc.
+ use Agile methods like Scrum or DSDM
+ look at Data Vault model & methodology
34
Minimize Query Response Time
Source: TDWI Next generation Data Warehouse Platforms,
By Philip Russom
Why replace
a data
warehouse
solution?
35
Solving Performance Problems
Replace every single thing before the database?
Migrating to an analytic database is twice as likely as to another row-store database.
36
Applying “Laborware”: think twice...
⇨
Apply traditional optimization techniques:
⇨ Redesign solution
⇨ Add/optimize indexes
⇨ Horizontal partitioning
⇨ Add materialized views
⇨ Rewrite queries
⇨ Reorganize data
⇨ Offload old data
⇨ …
⇨ Costs will increase & recur!
“Hardware will change 
the basic assumptions of 
BI professionals about 
what they can do”
Richard Hackathorn
38
Numbers everyone should know
⇨
L1 cache reference 0.5 ns
⇨ Branch mispredict 5 ns
⇨
L2 cache reference 7 ns
⇨
Mutex lock/unlock 100 ns
⇨ Main memory reference 100 ns
⇨
Compress 1K bytes with Zippy 10,000 ns
⇨ Send 2K bytes over 1 Gbps network 20,000 ns
⇨
Read 1 MB sequentially from memory 250,000 ns
⇨ Round trip within same datacenter 500,000 ns
⇨
Disk seek 10,000,000 ns
⇨ Read 1 MB sequentially from network 10,000,000 ns
⇨
Read 1 MB sequentially from disk 30,000,000 ns
⇨ Send packet CA->Netherlands->CA 150,000,000 ns
Source: Jeff Dean, Google
39
Trend: decreasing latency times
source:90’s versus 2010's
40
The Good News: HW Cost Decline
⇨ < 2000:
⇨ tune software
⇨ 2012
⇨ hardware cheap
⇨ Mustang Index ~0.7
⇨ Cost per Gigaflop:
⇨ 1984: $ 15,000,000
⇨ 1997: $ 30,000
⇨ 2003: $ 82
⇨ 2011: $ 1.80
41
Memory Cost Decline
41
We're still waiting!
⇨ 32 GB, May 2010:
⇨ 8 *4 GB = $1,200
⇨ 4* 8 GB = $2,000
⇨ 2*16GB = $2,400
⇨ 32 GB, May 2012:
⇨ 8 *4 GB = $ 280
⇨ 4* 8 GB = $ 350
⇨ 2*16GB = $ 500
42
Intel keeps pushing the limits
43
CPU: Moore's Law in action?
Same price!
Intel Xeon
E5-2680
635
44
Storage costs keep going down
Year Size in GB US $/GB
1955 0.012 6,382,933.00
1960 0.01 3,686,400.00
1970 0.1 265,933.00
1980 2.5 16,000.00
1990 0.34 5,406.00
2000 40 7.17
2010 2,000 0.05
2012 3,000 0.07
“By the end of 2012, drives will have 100 times more 
capacity at 1/100 of the cost per GB compared to 2000”
45
Your next data warehouse?
The next-generation SDXC memory card specification,
released to members in April, 2009, dramatically
improves consumers digital lifestyles by increasing
storage capacity from more than 32 GB up to 2 TB and
increasing bus interface speed up to 104 MB per second
in 2009 with a road map to 300 MB per second.
46
Architecture basics
47
Choosing the right architecture is a trade off
FlexibilityFlexibility
AgilityAgility
Real-TimeReal-Time
ComplexityComplexity
IntegrationIntegration
AuditabilityAuditability
Data VolumeData Volume
Advanced
Analysis
Advanced
Analysis
PerformancePerformance
Low costLow cost
Skills &
Standards
Skills &
Standards
BI ArchitectureBI Architecture
Source:
48
What this means…
• No ‘one size fits all’ solution
• Easy to over or under provision
• There are always exceptions
• Clueless analysts
• Tech savvy managers (even C-level)
• Excel Junkies
49
Comparing Solutions
⇨ By Technology?
⇨
Columns, MPP, In-Memory, etc
⇨
By Storage Type?
⇨ Files, tables, OLAP
⇨ By Deployment type?
⇨ Appliance, Cloud, Saas
⇨ By Features/API?
⇨ SQL, MapReduce, R, etc.
⇨ By Speed?
⇨ TPC-H, Airline DB, Custom
⇨ By Licence type/price?
⇨ CPU, data size, memory usage
50
SQL DB's for BI/Analytics
51
Major BI Vendors have SQL DB's
IBM:
Microsoft:
Oracle:
SAP:
⇨ DB2, Netezza
⇨ SQL Server
⇨ MySQL, Oracle DB, Exalytics (TimesTen)
⇨ Sybase (IQ) & SAP Hana
..and all others are DB agnostic: Microstrategy, SAS, Tableau,
Tibco Spotfire, LogiXML, Pentaho, Jaspersoft, etc.
52
Analytical DB's: What’s Different?
⇨ MPP: Massive Parallel Processing
⇨ Column based data organization
⇨ Data compression
⇨ Read optimization
⇨ In memory operation
⇨ Different disk configuration options
⇨ In DB analytics
⇨ Data mining
⇨ Statistics
53
Architecture: SMP vs MPP
Different storage approaches:
● Shared Disk (clustering)
● Shared Nothing
Most DWH appliance & new software vendors
use Shared Nothing, MPP, Scale Out architecture
54
Scaling Up and Out
Typical Workloads
55
Blurring lines
⇨ 1 machine, up to:
⇨ 8 CPU/80 cores
⇨ 2 TB Ram
⇨ 24 SAS/SSD
⇨24*512 GB SSD = 12 TB
⇨24*900 GB SAS = 21 TB
“In terms of raw speed, nothing beats DASD”
Supermicro SuperServer 5086B-TRF
56
Beware of SPOF's (or: why clustering?)
57
Rows vs Columns
⇨ Nothing new about column storage: Taxir, 1969
⇨ Conceptual (and simplified) view:
Rows
Rows:
1,Smith,Joe,40000;2,Jones,Mary,
50000;3,Johnson,Cathy,44000;
1,2,3;Smith,Jones,Johnson;Joe,
Mary,Cathy;40000,50000,44000
EmpID Lastname Firstname Salary
1 Smith Joe 40000
2 Jones Mary 50000
3 Johnson Cathy 44000
Columns
58
Rows vs Columns (2)
Source: Paraccel®
⇨ Columnar Challenges:
⇨ (fast) Loading
⇨ Updates
59
Rows AND Columns!
⇨ Many vendors offer hybrid row/column options
⇨ Beware of differences between storage & indexing
⇨ Examples:
⇨ Teradata Aster
⇨ Greenplum
⇨ HP Vertica
⇨ Vectorwise
⇨ Microsoft
⇨ Oracle
60
Data compression
Source:
⇨
Compression 50-90%
⇨
Some vendors claim > 95%
⇨
DB size < raw data size
61
Read Optimization
⇨ OLTP: 90% write, 10% read
⇨ DWH: 10% write, 90% read
⇨ Common solution:
⇨ 'buffer' area (row oriented)
⇨ background process
updates/inserts to columns
⇨ Bulk loading = directSource:
62
Memory Usage
⇨ Different approaches
⇨ Query (result) caching
⇨ Dynamic allocation
⇨ Explicit loading (e.g. dim)
⇨ Some products still disk
focused! (e.g. GP)
⇨ VectorWise: RAM as
secondary (!) storage
63
Disk Usage/Configuration
⇨ 1. Disk/partition per CPU (core), e.g. Greenplum
⇨ 2. Software 'Raid' by DBMS, e.g. Paraccel
ADB
64
Disk Usage/Configuration (2)
⇨ 3. Use standard devices, e.g. VectorWise, Vertica
ADB
⇨ 3 is easiest to set up (but some ADB's auto config)
⇨ Speed depends on other things too
65
RAIS instead of RAID
⇨ 1. Failover Node (Hot Standby) ⇨ 2. Data Distribution
A
B
B
A
C
C
etc.
Hot Standby
66
Mixed Storage Solution
⇨ SAN = SOR
⇨ Nodes = Persistent
subset
⇨ 'Blended Scan'
⇨ Patent Pending
Source: Paraccel®
67
ILM: Software meets Hardware
⇨ Different approaches
⇨ Usage (e.g. TeraData)
⇨ Age (e.g. Oracle)
⇨ Partitions (e.g. Sybase IQ)
Burning
Hot
Warm
Cool
Cold
Sas
Sata
www.etre.com
68
Beware of (Interconnect) Bottlenecks
Fast & Expensive SAN
Fast & Expensive Servers(s)
1Gb/s
1Gb/s shared
DWH
VM
ERP
VM
MAIL
VM
CRM
VM
Undersized Virtual DWH
You want:
* Dedicated hardware
* Infiniband QDR 12x: 96 Gb/s, or
* 100 Gb Ethernet: 100Gb/s
OR: Local storage (MPP w DASD)
69
In Database Analytics
Source: Fuzzy Logix
70
Everybody Loves R
71
Inevitable In DB analytics
⇨ Fuzzy Logix
⇨ IBM/Netezza
⇨ IBM/Informix
⇨ SAP/Sybase
⇨ Paraccel
⇨ Microsoft
⇨ Asterdata/Teradata
⇨ SAS
⇨ IBM/Netezza
⇨ EMC Greenplum
⇨ TeraData
⇨ R
⇨ IBM/Netezza
⇨ AsterData/TeraData
⇨ Oracle
⇨ Greenplum
⇨ SAS
72
The Hadoop Hoopla
73
#BigData, the new frontier
Yes, these (and more) are all Open Source!
74
#BigData?
Largest data set analyzed
KDNuggets poll 2012
75
Putting #BigData into perspective*
Median
DWH size
*Idea by Glen Rabie, YellowFin BI
76
*THIS* is Hadoop:
a Distributed File System
Data Distribution Data Retrieval using M/R
77
#BigData & NoSQL: No Standards
“Each NoSQL DB has its own strengths/weaknesses;
most are not (directly) suited for typical BI workloads”
78
The Great Divide(s)
⇨ Pure SQL DB's
⇨ All OS Column Stores
⇨ Paraccel, Kognitio
⇨ In Database Analytics
⇨ Map/Reduce (many)
⇨ R (GreenPlum)
⇨ SAS (TeraData)
⇨ Everything (Netezza iClass)
⇨ NoSQL Databases
⇨ Hive (Hadoop)
⇨ MongoDB
⇨ CouchDB
⇨ etc.
Worlds Colliding
⇨MapReduce (NoSQL)
⇨
Programming model
⇨
No DBMS/SQL required
⇨
Schema free
⇨
Exclusively <key,value>
⇨
Java, Python, C++, C,
etc.
⇨
Text/data mining
⇨
Eventually Consistent
⇨SQL (RDBMS)
⇨
Query language
⇨
DBMS required
⇨
Fixed schema
⇨
Complex structure
⇨
SQL
⇨
Not good at Text
⇨
ACID compliant
80
What is MapReduce?
⇨ M/R is now patented by
Google (Patent
#7,650,331)
⇨ Used in many ADB's
⇨Hadoop, CouchDB
⇨AsterData
⇨GreenPlum
⇨Vertica
⇨...
MapReduce is a programming
model and an associated
implementation for processing and
generating large data sets.
Users specify a map function that
processes a key/value pair to
generate a set of intermediate
key/value pairs, and a reduce
function that merges all
intermediate values associated with
the same intermediate key
81
MapReduce Explained
Source: http://blog.jteam.nl/2009/08/04/introduction-to-hadoop/
MR info: http://www.mapreduce.org (by Aster Data)
82
M/R & SQL: How to get there
⇨ SQL on top of M/R
⇨ e.g. Hive-Hadoop
⇨ M/R invoking SQL
⇨ e.g. Greenplum
⇨ SQL invoking M/R
⇨ e.g. TeraData/Aster Data
⇨ Most ADB vendors implementing/investigating M/R
⇨ e.g. Vertica (Hadoop integration), Oracle, Netezza, etc.
83
84
85
86
87
88
89
90
91
92
(R/H/M)OLAP
⇨ OnLine Analytical Processing
⇨ Analyse multidimensional data
⇨ Basic architecture:
Data Warehouse
MDX
OLAP
engine/server
Analysis front end
93
Stars and Cubes
⇨ Star schema
⇨ Dimension & fact tables
⇨ Best foundation for cubes
⇨ Cubes (logical/physical)
⇨ Dimensions
⇨Hierarchies
⇨ Levels
⇨Attributes
⇨ Measures
94
The power of OLAP
Aggregates, positional calculations (prior vs current), range
calculations (ytd, mtd), level calculations (child to parent
contribution)
95
MDX
⇨ Short for 'Multi Dimensional Expressions'
⇨ ~ SQL for OLAP:
⇨
SELECT
{set for column headers} ON COLUMNS,
{set for row headers} on ROWS
FROM [Cube Name]
WHERE {set for filtering}
⇨
SELECT:
{[Measures].[Unit Sales]} ON COLUMNS,
{[Product].[Drink], [Product].[Food]} ON ROWS
FROM [Sales]
WHERE [Time].[1997]
96
The Power of MDX
Positional: [Measures].[Profit], [Time].PrevMember
Range: Aggregate(YTD(), [Measures].[Profit]
“MDX is far
more powerful
than SQL for
the typical BI
questions”
97
Adding OLAP to the mix
⇨ Virtual Cubes, e.g.
⇨ Kognitio Pablo
⇨ Pentaho Mondrian
⇨ Microstrategy
⇨ Physical Cubes, e.g.
⇨ Microsoft Analysis Services
⇨ Oracle Essbase
⇨ Jedox Palo
Physical cubes allow 'write back': what
if, forecasting, budgetting & planning
'New' kid on the block: SAP HANA
99
The promises of the Cloud
⇨ “Utility computing”
⇨ Unlimited capacity
⇨ Pay as you go/by the sip
⇨ Lower costs
⇨ Always up to date
⇨ Invisible OS
⇨ Security
⇨ Safety
100
Cloud still getting Hotter
Source: IBM CIO Survey 2011
101
Types of Cloud Solutions
Virtualization
IaaS (Infrastructure)
PaaS (Platform)
SaaS (Software)
ValueAdded
102
Cloud Cost Components
StorageStorage
BandwithBandwith
SLA/ServiceSLA/Service
CPU powerCPU power
MemoryMemory
Data transferData transfer
RequestsRequests
103
BI&DWH aaS Scenarios
104
The trouble with Cloud DWH
⇨ DWH aaS vendors:
⇨ e.g. 1010Data, Kognitio, Vertica, EMC/Greenplum
⇨
more will follow
105
What about No Database at all?
Rick F. van der Lans
Key element: abstraction (de-coupling)
106
Data Virtualization concept
Virtual DB
SQL SOAP REST FILE WS-*
Information Consumers
107
© 2011 Composite Software, Inc. / Composite Proprietary
Example: Composite 6
Discovery
Active Cluster
Composite Information Server
XQuery, Java, WSDL, SCA
(Services Centric)
Front-end Applications
Security
Metadata Repository
Views, SQLScript
(Database Centric)
Security
Query Engine
Cost-based
Optimizer
Rules-based
Optimizer
Federation
Engine
Web Services
(HTTP, REST, SOAP, JSON, XQuery)
SQL
(ODBC, JDBC, ADO.NET)
Messaging
(JMS)
Java
(POJO)
Web Services
(HTTP, SOAP, JSON)
Messaging
(JMS)
Application
APIs
MF
Adapter
Java
(POJO)
Advanced Functions
Quality GovernanceCaching
SQL
(ODBC, JDBC)
URI
Monitor
Manager
Studio
Performance Plus
Adapters
Development
Environment
Runtime Server
Environment
Management
Environment
Applications, Big Data Stores, Excel, Flat Files, Mainframes, Messages, OLAP Cubes,
RDBMS, Web Services, XML Documents
108
Meet
109
A Unified Data Hub
110
Virtual vs Physical trade offs
Source:Source:
Mark MadsenMark Madsen
111
The Shootout!
⇨
Things to ask your (potential) vendor
⇨ References
⇨ Assist in a paid POC
⇨ License model & unit of cost: CPU, Core,
Server, (raw) Data volume, Memory used
⇨ Free dev/test editions (only pay for
production use)
⇨ Support options (updates only, mail/phone
support, etc)
⇨ If migrating: trade in discount
⇨ Opt out/de-integration options
112
Does your DB cover the Basics?
⇨ Full SQL 2003 support?
⇨ Easy backup/restore features?
⇨ Scaling up or out?
⇨ Failover & persistency?
⇨ External (management) Tool integration?
113
Which deployment types?
⇨ On Premise ⇨ Saas/Cloud
Software only
Appliance
Vendor/ISPCustomer
114
Size/Workload/Complexity?
Source:
Bloor Group
Source:
Third Nature
Most
organizations
are here
115
What's the question?
Source:
116
Analytical Power?
Source: SAP (Sybase IQ 15.4)
117
Beware of
Benchmarks !
⇨ Differences in
⇨# threads
⇨# cores
⇨# disks
⇨# nodes
⇨CPU generation/speed
1. Always use P.O.C. on your own
data & query workload
2. Don't trust the MQ's
⇨
Ongoing Market Consolidation
⇨
More additional/alternative storage engines
⇨
Hybrid Row/Column solutions
⇨
Every db will get In DB analytics
⇨
Every db will get Hadoop/MR extensions
⇨
Everything in-memory
119
So what's the best
database for BI?
120
Web: www.tholis.com
Email: jos<at>tholis.com
Phone: +31-(0)6-51169606
Skype: tholis.jos
LinkedIn: jvdongen
Twitter: josvandongen
IRC: _grumpy
Jos van Dongen
In BI since 1991
Principal Consultant
Author/Speaker/Analyst
Proud member of #BBBT

Mais conteúdo relacionado

Mais procurados

Danish Business Authority: Explainability and causality in relation to ML Ops
Danish Business Authority: Explainability and causality in relation to ML OpsDanish Business Authority: Explainability and causality in relation to ML Ops
Danish Business Authority: Explainability and causality in relation to ML OpsNeo4j
 
Data strategy in a Big Data world
Data strategy in a Big Data worldData strategy in a Big Data world
Data strategy in a Big Data worldCraig Milroy
 
Process Mining: Data Science in Action - Wil van der Aalst, TU/e, DSC/e, HSE
 Process Mining: Data Science in Action - Wil van der Aalst, TU/e, DSC/e, HSE Process Mining: Data Science in Action - Wil van der Aalst, TU/e, DSC/e, HSE
Process Mining: Data Science in Action - Wil van der Aalst, TU/e, DSC/e, HSEYandex
 
AI Data Acquisition and Governance: Considerations for Success
AI Data Acquisition and Governance: Considerations for SuccessAI Data Acquisition and Governance: Considerations for Success
AI Data Acquisition and Governance: Considerations for SuccessDatabricks
 
Introduction to data mining technique
Introduction to data mining techniqueIntroduction to data mining technique
Introduction to data mining techniquePawneshwar Datt Rai
 
The Non-Invasive Data Governance Framework
The Non-Invasive Data Governance FrameworkThe Non-Invasive Data Governance Framework
The Non-Invasive Data Governance FrameworkDATAVERSITY
 
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityPrecisely
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysisVishwas N
 
Exploratory data analysis
Exploratory data analysis Exploratory data analysis
Exploratory data analysis Peter Reimann
 
Data Quality Strategy: A Step-by-Step Approach
Data Quality Strategy: A Step-by-Step ApproachData Quality Strategy: A Step-by-Step Approach
Data Quality Strategy: A Step-by-Step ApproachFindWhitePapers
 
Data-Ed: Data Governance Strategies
Data-Ed: Data Governance StrategiesData-Ed: Data Governance Strategies
Data-Ed: Data Governance StrategiesData Blueprint
 
Creating Effective Data Visualizations in Excel 2016: Some Basics
Creating Effective Data Visualizations in Excel 2016:  Some BasicsCreating Effective Data Visualizations in Excel 2016:  Some Basics
Creating Effective Data Visualizations in Excel 2016: Some BasicsShalin Hai-Jew
 
The data quality challenge
The data quality challengeThe data quality challenge
The data quality challengeLenia Miltiadous
 
Data Maturity - A Balanced Approach
Data Maturity - A Balanced ApproachData Maturity - A Balanced Approach
Data Maturity - A Balanced ApproachDATAVERSITY
 
What is Data mining? Data mining Presentation
What is Data mining? Data mining Presentation What is Data mining? Data mining Presentation
What is Data mining? Data mining Presentation Pralhad Rijal
 
Implementing Effective Data Governance
Implementing Effective Data GovernanceImplementing Effective Data Governance
Implementing Effective Data GovernanceChristopher Bradley
 
101 Lessons Learned for Startups
101 Lessons Learned for Startups101 Lessons Learned for Startups
101 Lessons Learned for StartupsAndy Harjanto
 

Mais procurados (20)

Danish Business Authority: Explainability and causality in relation to ML Ops
Danish Business Authority: Explainability and causality in relation to ML OpsDanish Business Authority: Explainability and causality in relation to ML Ops
Danish Business Authority: Explainability and causality in relation to ML Ops
 
Big Data Profiling
Big Data Profiling Big Data Profiling
Big Data Profiling
 
Data strategy in a Big Data world
Data strategy in a Big Data worldData strategy in a Big Data world
Data strategy in a Big Data world
 
Process Mining: Data Science in Action - Wil van der Aalst, TU/e, DSC/e, HSE
 Process Mining: Data Science in Action - Wil van der Aalst, TU/e, DSC/e, HSE Process Mining: Data Science in Action - Wil van der Aalst, TU/e, DSC/e, HSE
Process Mining: Data Science in Action - Wil van der Aalst, TU/e, DSC/e, HSE
 
AI Data Acquisition and Governance: Considerations for Success
AI Data Acquisition and Governance: Considerations for SuccessAI Data Acquisition and Governance: Considerations for Success
AI Data Acquisition and Governance: Considerations for Success
 
Introduction to data mining technique
Introduction to data mining techniqueIntroduction to data mining technique
Introduction to data mining technique
 
The Non-Invasive Data Governance Framework
The Non-Invasive Data Governance FrameworkThe Non-Invasive Data Governance Framework
The Non-Invasive Data Governance Framework
 
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data Quality
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysis
 
Exploratory data analysis
Exploratory data analysis Exploratory data analysis
Exploratory data analysis
 
Data science 101
Data science 101Data science 101
Data science 101
 
Data Quality Strategy: A Step-by-Step Approach
Data Quality Strategy: A Step-by-Step ApproachData Quality Strategy: A Step-by-Step Approach
Data Quality Strategy: A Step-by-Step Approach
 
Data-Ed: Data Governance Strategies
Data-Ed: Data Governance StrategiesData-Ed: Data Governance Strategies
Data-Ed: Data Governance Strategies
 
Creating Effective Data Visualizations in Excel 2016: Some Basics
Creating Effective Data Visualizations in Excel 2016:  Some BasicsCreating Effective Data Visualizations in Excel 2016:  Some Basics
Creating Effective Data Visualizations in Excel 2016: Some Basics
 
The data quality challenge
The data quality challengeThe data quality challenge
The data quality challenge
 
Lecture #02
Lecture #02 Lecture #02
Lecture #02
 
Data Maturity - A Balanced Approach
Data Maturity - A Balanced ApproachData Maturity - A Balanced Approach
Data Maturity - A Balanced Approach
 
What is Data mining? Data mining Presentation
What is Data mining? Data mining Presentation What is Data mining? Data mining Presentation
What is Data mining? Data mining Presentation
 
Implementing Effective Data Governance
Implementing Effective Data GovernanceImplementing Effective Data Governance
Implementing Effective Data Governance
 
101 Lessons Learned for Startups
101 Lessons Learned for Startups101 Lessons Learned for Startups
101 Lessons Learned for Startups
 

Destaque

Visualization 101 BA4All
Visualization 101 BA4AllVisualization 101 BA4All
Visualization 101 BA4AllJos van Dongen
 
Data Scientist 101 BI Dutch
Data Scientist 101 BI DutchData Scientist 101 BI Dutch
Data Scientist 101 BI DutchJos van Dongen
 
Hi Speed Datawarehousing
Hi Speed DatawarehousingHi Speed Datawarehousing
Hi Speed DatawarehousingJos van Dongen
 
PDI data vault framework #pcmams 2012
PDI data vault framework #pcmams 2012PDI data vault framework #pcmams 2012
PDI data vault framework #pcmams 2012Jos van Dongen
 
Estado del arte del BI | Jornada Madrid 2014 | UOC
Estado del arte del BI | Jornada Madrid 2014 | UOCEstado del arte del BI | Jornada Madrid 2014 | UOC
Estado del arte del BI | Jornada Madrid 2014 | UOCJosep Curto
 
Everything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data WarehouseEverything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data Warehousemark madsen
 
5 Signs You Need to Re-Think Your Data Integration Strategy
5 Signs You Need to Re-Think Your Data Integration Strategy5 Signs You Need to Re-Think Your Data Integration Strategy
5 Signs You Need to Re-Think Your Data Integration StrategyDarren Cunningham
 
Open Source Business Intelligence
Open Source Business IntelligenceOpen Source Business Intelligence
Open Source Business IntelligenceJos van Dongen
 
Bin3 Open Source BI, overhyped or undervalued?
Bin3 Open Source BI, overhyped or undervalued?Bin3 Open Source BI, overhyped or undervalued?
Bin3 Open Source BI, overhyped or undervalued?Jos van Dongen
 
A Journey to Modern Apps with Containers, Microservices and Big Data
A Journey to Modern Apps with Containers, Microservices and Big DataA Journey to Modern Apps with Containers, Microservices and Big Data
A Journey to Modern Apps with Containers, Microservices and Big DataEdward Hsu
 
World Domination with Pentaho EE?
World Domination with Pentaho EE?World Domination with Pentaho EE?
World Domination with Pentaho EE?Jos van Dongen
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Robbie Strickland
 
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®Cambridge Semantics
 
Introduction to Anzo Unstructured
Introduction to Anzo UnstructuredIntroduction to Anzo Unstructured
Introduction to Anzo UnstructuredCambridge Semantics
 
SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData
 
Graph-based Discovery and Analytics at Enterprise Scale
Graph-based Discovery and Analytics at Enterprise ScaleGraph-based Discovery and Analytics at Enterprise Scale
Graph-based Discovery and Analytics at Enterprise ScaleCambridge Semantics
 
Always On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on CassandraAlways On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on CassandraRobbie Strickland
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisHelena Edelson
 
Scalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and MesosScalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and MesosDataWorks Summit
 

Destaque (20)

Visualization 101 BA4All
Visualization 101 BA4AllVisualization 101 BA4All
Visualization 101 BA4All
 
Data Scientist 101 BI Dutch
Data Scientist 101 BI DutchData Scientist 101 BI Dutch
Data Scientist 101 BI Dutch
 
Hi Speed Datawarehousing
Hi Speed DatawarehousingHi Speed Datawarehousing
Hi Speed Datawarehousing
 
PDI data vault framework #pcmams 2012
PDI data vault framework #pcmams 2012PDI data vault framework #pcmams 2012
PDI data vault framework #pcmams 2012
 
Estado del arte del BI | Jornada Madrid 2014 | UOC
Estado del arte del BI | Jornada Madrid 2014 | UOCEstado del arte del BI | Jornada Madrid 2014 | UOC
Estado del arte del BI | Jornada Madrid 2014 | UOC
 
Everything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data WarehouseEverything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data Warehouse
 
Business Intelligence In The Cloud
Business Intelligence In The CloudBusiness Intelligence In The Cloud
Business Intelligence In The Cloud
 
5 Signs You Need to Re-Think Your Data Integration Strategy
5 Signs You Need to Re-Think Your Data Integration Strategy5 Signs You Need to Re-Think Your Data Integration Strategy
5 Signs You Need to Re-Think Your Data Integration Strategy
 
Open Source Business Intelligence
Open Source Business IntelligenceOpen Source Business Intelligence
Open Source Business Intelligence
 
Bin3 Open Source BI, overhyped or undervalued?
Bin3 Open Source BI, overhyped or undervalued?Bin3 Open Source BI, overhyped or undervalued?
Bin3 Open Source BI, overhyped or undervalued?
 
A Journey to Modern Apps with Containers, Microservices and Big Data
A Journey to Modern Apps with Containers, Microservices and Big DataA Journey to Modern Apps with Containers, Microservices and Big Data
A Journey to Modern Apps with Containers, Microservices and Big Data
 
World Domination with Pentaho EE?
World Domination with Pentaho EE?World Domination with Pentaho EE?
World Domination with Pentaho EE?
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015
 
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
 
Introduction to Anzo Unstructured
Introduction to Anzo UnstructuredIntroduction to Anzo Unstructured
Introduction to Anzo Unstructured
 
SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15
 
Graph-based Discovery and Analytics at Enterprise Scale
Graph-based Discovery and Analytics at Enterprise ScaleGraph-based Discovery and Analytics at Enterprise Scale
Graph-based Discovery and Analytics at Enterprise Scale
 
Always On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on CassandraAlways On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on Cassandra
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
 
Scalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and MesosScalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and Mesos
 

Semelhante a Database Shootout: What's best for BI?

Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Denodo
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?RTTS
 
Making the Most of In-Memory: More than Speed
Making the Most of In-Memory: More than SpeedMaking the Most of In-Memory: More than Speed
Making the Most of In-Memory: More than SpeedInside Analysis
 
Qo Introduction V2
Qo Introduction V2Qo Introduction V2
Qo Introduction V2Joe_F
 
Architecting a Modern Data Warehouse: Enterprise Must-Haves
Architecting a Modern Data Warehouse: Enterprise Must-HavesArchitecting a Modern Data Warehouse: Enterprise Must-Haves
Architecting a Modern Data Warehouse: Enterprise Must-HavesYellowbrick Data
 
Database Camp 2016 @ United Nations, NYC - Amir Orad, CEO, Sisense
Database Camp 2016 @ United Nations, NYC - Amir Orad, CEO, SisenseDatabase Camp 2016 @ United Nations, NYC - Amir Orad, CEO, Sisense
Database Camp 2016 @ United Nations, NYC - Amir Orad, CEO, Sisense✔ Eric David Benari, PMP
 
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU DatabasePowering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU DatabaseKinetica
 
DATA Warehousing & Data Mining
DATA Warehousing & Data MiningDATA Warehousing & Data Mining
DATA Warehousing & Data Miningcpjcollege
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017SingleStore
 
Big data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You WantBig data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You WantStuart Miniman
 
Informix & IWA : Operational analytics performance
Informix & IWA : Operational analytics performanceInformix & IWA : Operational analytics performance
Informix & IWA : Operational analytics performanceKeshav Murthy
 
Refactoring your EDW with Mobile Analytics Products
Refactoring your EDW with Mobile Analytics ProductsRefactoring your EDW with Mobile Analytics Products
Refactoring your EDW with Mobile Analytics ProductsLuke Han
 
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Denodo
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...DATAVERSITY
 
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdfOSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdfAltinity Ltd
 
High Performance BI with Cognos and ParAccel Analytic Database
High Performance BI with Cognos and ParAccel Analytic DatabaseHigh Performance BI with Cognos and ParAccel Analytic Database
High Performance BI with Cognos and ParAccel Analytic DatabaseKarol Chlasta
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
 

Semelhante a Database Shootout: What's best for BI? (20)

Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?
 
Making the Most of In-Memory: More than Speed
Making the Most of In-Memory: More than SpeedMaking the Most of In-Memory: More than Speed
Making the Most of In-Memory: More than Speed
 
Qo Introduction V2
Qo Introduction V2Qo Introduction V2
Qo Introduction V2
 
Architecting a Modern Data Warehouse: Enterprise Must-Haves
Architecting a Modern Data Warehouse: Enterprise Must-HavesArchitecting a Modern Data Warehouse: Enterprise Must-Haves
Architecting a Modern Data Warehouse: Enterprise Must-Haves
 
Database Camp 2016 @ United Nations, NYC - Amir Orad, CEO, Sisense
Database Camp 2016 @ United Nations, NYC - Amir Orad, CEO, SisenseDatabase Camp 2016 @ United Nations, NYC - Amir Orad, CEO, Sisense
Database Camp 2016 @ United Nations, NYC - Amir Orad, CEO, Sisense
 
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU DatabasePowering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
 
DATA Warehousing & Data Mining
DATA Warehousing & Data MiningDATA Warehousing & Data Mining
DATA Warehousing & Data Mining
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017
 
Big data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You WantBig data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You Want
 
Informix & IWA : Operational analytics performance
Informix & IWA : Operational analytics performanceInformix & IWA : Operational analytics performance
Informix & IWA : Operational analytics performance
 
Refactoring your EDW with Mobile Analytics Products
Refactoring your EDW with Mobile Analytics ProductsRefactoring your EDW with Mobile Analytics Products
Refactoring your EDW with Mobile Analytics Products
 
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdfOSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
 
Big Data: an introduction
Big Data: an introductionBig Data: an introduction
Big Data: an introduction
 
High Performance BI with Cognos and ParAccel Analytic Database
High Performance BI with Cognos and ParAccel Analytic DatabaseHigh Performance BI with Cognos and ParAccel Analytic Database
High Performance BI with Cognos and ParAccel Analytic Database
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 

Último

Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 

Último (20)

Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 

Database Shootout: What's best for BI?

  • 2. 2 The New Data Warehousing Source: Timo Elliot, SAP
  • 3. 3 Let's just buy Teradata... Forrester Wave, as of April 2011 Gartner Group MQ, as of February 2012
  • 6. ⇨ Back to basics: BI & DWHBack to basics: BI & DWH ⇨ The Need for SpeedThe Need for Speed ⇨ Database ArchitecturesDatabase Architectures ⇨ #BigData & the Hadoop Hoopla#BigData & the Hadoop Hoopla ⇨ The forgotten power ofThe forgotten power of Olap & MDXOlap & MDX ⇨ A Cloudy future?A Cloudy future? ⇨ Shootout: Evaluating alternativesShootout: Evaluating alternatives
  • 7. 7 ” “ Business Intelligence (BI) Process of identifying, collecting, combining, analyzing, interpreting and communicating internal and external information to support decision making processes Concepts and methods to improveConcepts and methods to improve business decision making by usingbusiness decision making by using fact-based support systemsfact-based support systems “ ” First definition of BI: 1958!
  • 8. 8 Business Intelligence is.... Doing useful stuff with data, in order to… Support the Decision Making process So why not simply use Decision Support Systems?
  • 9. 9 How it all started in 1958... ⇨ Hans Peter Luhn (IBM) → A Business Intelligence System The notion of intelligence is also defined here, in a more general sense, as the “ability to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal.” Full text on Timo Elliott's blog: http://timoelliott.com/blog/2007/11/the_real_pioneer_of_busin ess_i.html
  • 10. 10 Luhn's Vision (1958!) A Business Intelligence System Abstract: An automatic system is being developed to disseminate information to the various sections of any industrial, scientific or government organization. This intelligence system will utilize data- processing machines for auto-abstracting and auto-encoding of documents and for creating interest profiles for each of the “action points” in an organization. Both incoming and internally generated documents are automatically abstracted, characterized by a word pattern, and sent automatically to appropriate action points. This paper shows the flexibility of such a system in identifying known information, in finding who needs to know it and in disseminating it efficiently either in abstract form or as a complete document.
  • 11. 11 BI is dead; long live Analytics?
  • 13. 13 The Evolution of Enterprise Business Intelligence Enterprise Decision Management Embracing all relevant data sources BI injected into everyday business processes Master Data Management Advanced Data Mining / Analytics Business Activity Monitoring Common Information & Processes BusinessValue Disconnected Silos of Information Query/Reporting/Online Analytical Processing Content Management/Data Warehousing Search 2000 2005 2010 2015 Slide 13 Image courtesy Jim Fitzgerald, IBM research
  • 14. 14 Evolving Business Intelligence Platform Requirements Image courtesy Teradata
  • 15. 15 Passive Monitoring: BI Starting Point Source: Mark Madsen, Third Nature
  • 16. 16 Supporting Better Analysis: Common Next Step Source: Mark Madsen, Third Nature
  • 17. 17 Active Monitoring Source: Mark Madsen, Third Nature
  • 18. 18 Active Monitoring and Feedback Source: Mark Madsen, Third Nature
  • 20. 20 Prediction (passive) Source: Mark Madsen, Third Nature
  • 21. 21 Active Prediction Source: Mark Madsen, Third Nature
  • 22. 22 Prescription and Enterprise Decision Management Source: Mark Madsen, Third Nature
  • 23.
  • 24. 24 Watch out, the world is changing 750TB per week (compressed) ⇨ Sensor data ⇨ RFID ⇨ Sentiment Analysis ⇨ Text mining ⇨ Location data
  • 26. 26 Remember the Origins •The general conception of a separate architecture for BI has been around longer, but this is the first formal relational architecture and definition published. •One thing left out of most designs: the box labeled business process definitions. “An architecture for a business and information system”, B. A. Devlin, P. T. Murphy, IBM Systems Journal, Vol.27, No. 1, (1988)
  • 27. 27 2012: we're still doing this! Staging Area CSV Files ETL ERP DBMS Sources ETL Process Data Warehouse EUL DBMS Files ETL Central DWH & Data Marts DBMS ETL End User Layer, in case you were wondering ;-)
  • 28. 28 We’ve (also) accumulated over 20 years of changes Databases Documents Flat Files XML Queues ERP Applications Source Environments Data Consumers Databases Dashboards OLAP Productivity BAM/BPM Reporting ETL Data Mining Applications Warehouse Database ETL Marts ODS EDR EII Content Store EAI Stream processing SQL Service API
  • 29. 29 The assumption of the warehouse as a database is gone 29 Traditional tabular or structured data Data at rest Non-traditional data (logs, audio, documents) Parallel programming platforms Databases Streaming DBs/engines Message streams Data in motion Slide 29 Copyright Third Nature, Inc.
  • 31. 31 Why BI Projects Fail? 1. Query Performance Too Slow (BI Survey 9) 70% of DWH's experience performance constrained issues of various types (Gartner DWH MQ 2010) Poor Query Performance No 1 reason for replacing DWH (TDWI Best Practices)
  • 32. 32 Two dimensions of Speed Companies wishing to maximize BI benefits should focus on 1) support quality 2) implementation timeimplementation time 3) query response timequery response time and 4) breadth of deployment, in that order.
  • 33. 33 Minimize Implementation Time Use 'RTF': vs Off the shelf: Etc. + use Agile methods like Scrum or DSDM + look at Data Vault model & methodology
  • 34. 34 Minimize Query Response Time Source: TDWI Next generation Data Warehouse Platforms, By Philip Russom Why replace a data warehouse solution?
  • 35. 35 Solving Performance Problems Replace every single thing before the database? Migrating to an analytic database is twice as likely as to another row-store database.
  • 36. 36 Applying “Laborware”: think twice... ⇨ Apply traditional optimization techniques: ⇨ Redesign solution ⇨ Add/optimize indexes ⇨ Horizontal partitioning ⇨ Add materialized views ⇨ Rewrite queries ⇨ Reorganize data ⇨ Offload old data ⇨ … ⇨ Costs will increase & recur!
  • 38. 38 Numbers everyone should know ⇨ L1 cache reference 0.5 ns ⇨ Branch mispredict 5 ns ⇨ L2 cache reference 7 ns ⇨ Mutex lock/unlock 100 ns ⇨ Main memory reference 100 ns ⇨ Compress 1K bytes with Zippy 10,000 ns ⇨ Send 2K bytes over 1 Gbps network 20,000 ns ⇨ Read 1 MB sequentially from memory 250,000 ns ⇨ Round trip within same datacenter 500,000 ns ⇨ Disk seek 10,000,000 ns ⇨ Read 1 MB sequentially from network 10,000,000 ns ⇨ Read 1 MB sequentially from disk 30,000,000 ns ⇨ Send packet CA->Netherlands->CA 150,000,000 ns Source: Jeff Dean, Google
  • 39. 39 Trend: decreasing latency times source:90’s versus 2010's
  • 40. 40 The Good News: HW Cost Decline ⇨ < 2000: ⇨ tune software ⇨ 2012 ⇨ hardware cheap ⇨ Mustang Index ~0.7 ⇨ Cost per Gigaflop: ⇨ 1984: $ 15,000,000 ⇨ 1997: $ 30,000 ⇨ 2003: $ 82 ⇨ 2011: $ 1.80
  • 41. 41 Memory Cost Decline 41 We're still waiting! ⇨ 32 GB, May 2010: ⇨ 8 *4 GB = $1,200 ⇨ 4* 8 GB = $2,000 ⇨ 2*16GB = $2,400 ⇨ 32 GB, May 2012: ⇨ 8 *4 GB = $ 280 ⇨ 4* 8 GB = $ 350 ⇨ 2*16GB = $ 500
  • 43. 43 CPU: Moore's Law in action? Same price! Intel Xeon E5-2680 635
  • 44. 44 Storage costs keep going down Year Size in GB US $/GB 1955 0.012 6,382,933.00 1960 0.01 3,686,400.00 1970 0.1 265,933.00 1980 2.5 16,000.00 1990 0.34 5,406.00 2000 40 7.17 2010 2,000 0.05 2012 3,000 0.07 “By the end of 2012, drives will have 100 times more  capacity at 1/100 of the cost per GB compared to 2000”
  • 45. 45 Your next data warehouse? The next-generation SDXC memory card specification, released to members in April, 2009, dramatically improves consumers digital lifestyles by increasing storage capacity from more than 32 GB up to 2 TB and increasing bus interface speed up to 104 MB per second in 2009 with a road map to 300 MB per second.
  • 47. 47 Choosing the right architecture is a trade off FlexibilityFlexibility AgilityAgility Real-TimeReal-Time ComplexityComplexity IntegrationIntegration AuditabilityAuditability Data VolumeData Volume Advanced Analysis Advanced Analysis PerformancePerformance Low costLow cost Skills & Standards Skills & Standards BI ArchitectureBI Architecture Source:
  • 48. 48 What this means… • No ‘one size fits all’ solution • Easy to over or under provision • There are always exceptions • Clueless analysts • Tech savvy managers (even C-level) • Excel Junkies
  • 49. 49 Comparing Solutions ⇨ By Technology? ⇨ Columns, MPP, In-Memory, etc ⇨ By Storage Type? ⇨ Files, tables, OLAP ⇨ By Deployment type? ⇨ Appliance, Cloud, Saas ⇨ By Features/API? ⇨ SQL, MapReduce, R, etc. ⇨ By Speed? ⇨ TPC-H, Airline DB, Custom ⇨ By Licence type/price? ⇨ CPU, data size, memory usage
  • 50. 50 SQL DB's for BI/Analytics
  • 51. 51 Major BI Vendors have SQL DB's IBM: Microsoft: Oracle: SAP: ⇨ DB2, Netezza ⇨ SQL Server ⇨ MySQL, Oracle DB, Exalytics (TimesTen) ⇨ Sybase (IQ) & SAP Hana ..and all others are DB agnostic: Microstrategy, SAS, Tableau, Tibco Spotfire, LogiXML, Pentaho, Jaspersoft, etc.
  • 52. 52 Analytical DB's: What’s Different? ⇨ MPP: Massive Parallel Processing ⇨ Column based data organization ⇨ Data compression ⇨ Read optimization ⇨ In memory operation ⇨ Different disk configuration options ⇨ In DB analytics ⇨ Data mining ⇨ Statistics
  • 53. 53 Architecture: SMP vs MPP Different storage approaches: ● Shared Disk (clustering) ● Shared Nothing Most DWH appliance & new software vendors use Shared Nothing, MPP, Scale Out architecture
  • 54. 54 Scaling Up and Out Typical Workloads
  • 55. 55 Blurring lines ⇨ 1 machine, up to: ⇨ 8 CPU/80 cores ⇨ 2 TB Ram ⇨ 24 SAS/SSD ⇨24*512 GB SSD = 12 TB ⇨24*900 GB SAS = 21 TB “In terms of raw speed, nothing beats DASD” Supermicro SuperServer 5086B-TRF
  • 56. 56 Beware of SPOF's (or: why clustering?)
  • 57. 57 Rows vs Columns ⇨ Nothing new about column storage: Taxir, 1969 ⇨ Conceptual (and simplified) view: Rows Rows: 1,Smith,Joe,40000;2,Jones,Mary, 50000;3,Johnson,Cathy,44000; 1,2,3;Smith,Jones,Johnson;Joe, Mary,Cathy;40000,50000,44000 EmpID Lastname Firstname Salary 1 Smith Joe 40000 2 Jones Mary 50000 3 Johnson Cathy 44000 Columns
  • 58. 58 Rows vs Columns (2) Source: Paraccel® ⇨ Columnar Challenges: ⇨ (fast) Loading ⇨ Updates
  • 59. 59 Rows AND Columns! ⇨ Many vendors offer hybrid row/column options ⇨ Beware of differences between storage & indexing ⇨ Examples: ⇨ Teradata Aster ⇨ Greenplum ⇨ HP Vertica ⇨ Vectorwise ⇨ Microsoft ⇨ Oracle
  • 60. 60 Data compression Source: ⇨ Compression 50-90% ⇨ Some vendors claim > 95% ⇨ DB size < raw data size
  • 61. 61 Read Optimization ⇨ OLTP: 90% write, 10% read ⇨ DWH: 10% write, 90% read ⇨ Common solution: ⇨ 'buffer' area (row oriented) ⇨ background process updates/inserts to columns ⇨ Bulk loading = directSource:
  • 62. 62 Memory Usage ⇨ Different approaches ⇨ Query (result) caching ⇨ Dynamic allocation ⇨ Explicit loading (e.g. dim) ⇨ Some products still disk focused! (e.g. GP) ⇨ VectorWise: RAM as secondary (!) storage
  • 63. 63 Disk Usage/Configuration ⇨ 1. Disk/partition per CPU (core), e.g. Greenplum ⇨ 2. Software 'Raid' by DBMS, e.g. Paraccel ADB
  • 64. 64 Disk Usage/Configuration (2) ⇨ 3. Use standard devices, e.g. VectorWise, Vertica ADB ⇨ 3 is easiest to set up (but some ADB's auto config) ⇨ Speed depends on other things too
  • 65. 65 RAIS instead of RAID ⇨ 1. Failover Node (Hot Standby) ⇨ 2. Data Distribution A B B A C C etc. Hot Standby
  • 66. 66 Mixed Storage Solution ⇨ SAN = SOR ⇨ Nodes = Persistent subset ⇨ 'Blended Scan' ⇨ Patent Pending Source: Paraccel®
  • 67. 67 ILM: Software meets Hardware ⇨ Different approaches ⇨ Usage (e.g. TeraData) ⇨ Age (e.g. Oracle) ⇨ Partitions (e.g. Sybase IQ) Burning Hot Warm Cool Cold Sas Sata www.etre.com
  • 68. 68 Beware of (Interconnect) Bottlenecks Fast & Expensive SAN Fast & Expensive Servers(s) 1Gb/s 1Gb/s shared DWH VM ERP VM MAIL VM CRM VM Undersized Virtual DWH You want: * Dedicated hardware * Infiniband QDR 12x: 96 Gb/s, or * 100 Gb Ethernet: 100Gb/s OR: Local storage (MPP w DASD)
  • 71. 71 Inevitable In DB analytics ⇨ Fuzzy Logix ⇨ IBM/Netezza ⇨ IBM/Informix ⇨ SAP/Sybase ⇨ Paraccel ⇨ Microsoft ⇨ Asterdata/Teradata ⇨ SAS ⇨ IBM/Netezza ⇨ EMC Greenplum ⇨ TeraData ⇨ R ⇨ IBM/Netezza ⇨ AsterData/TeraData ⇨ Oracle ⇨ Greenplum ⇨ SAS
  • 73. 73 #BigData, the new frontier Yes, these (and more) are all Open Source!
  • 74. 74 #BigData? Largest data set analyzed KDNuggets poll 2012
  • 75. 75 Putting #BigData into perspective* Median DWH size *Idea by Glen Rabie, YellowFin BI
  • 76. 76 *THIS* is Hadoop: a Distributed File System Data Distribution Data Retrieval using M/R
  • 77. 77 #BigData & NoSQL: No Standards “Each NoSQL DB has its own strengths/weaknesses; most are not (directly) suited for typical BI workloads”
  • 78. 78 The Great Divide(s) ⇨ Pure SQL DB's ⇨ All OS Column Stores ⇨ Paraccel, Kognitio ⇨ In Database Analytics ⇨ Map/Reduce (many) ⇨ R (GreenPlum) ⇨ SAS (TeraData) ⇨ Everything (Netezza iClass) ⇨ NoSQL Databases ⇨ Hive (Hadoop) ⇨ MongoDB ⇨ CouchDB ⇨ etc.
  • 79. Worlds Colliding ⇨MapReduce (NoSQL) ⇨ Programming model ⇨ No DBMS/SQL required ⇨ Schema free ⇨ Exclusively <key,value> ⇨ Java, Python, C++, C, etc. ⇨ Text/data mining ⇨ Eventually Consistent ⇨SQL (RDBMS) ⇨ Query language ⇨ DBMS required ⇨ Fixed schema ⇨ Complex structure ⇨ SQL ⇨ Not good at Text ⇨ ACID compliant
  • 80. 80 What is MapReduce? ⇨ M/R is now patented by Google (Patent #7,650,331) ⇨ Used in many ADB's ⇨Hadoop, CouchDB ⇨AsterData ⇨GreenPlum ⇨Vertica ⇨... MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key
  • 82. 82 M/R & SQL: How to get there ⇨ SQL on top of M/R ⇨ e.g. Hive-Hadoop ⇨ M/R invoking SQL ⇨ e.g. Greenplum ⇨ SQL invoking M/R ⇨ e.g. TeraData/Aster Data ⇨ Most ADB vendors implementing/investigating M/R ⇨ e.g. Vertica (Hadoop integration), Oracle, Netezza, etc.
  • 83. 83
  • 84. 84
  • 85. 85
  • 86. 86
  • 87. 87
  • 88. 88
  • 89. 89
  • 90. 90
  • 91. 91
  • 92. 92 (R/H/M)OLAP ⇨ OnLine Analytical Processing ⇨ Analyse multidimensional data ⇨ Basic architecture: Data Warehouse MDX OLAP engine/server Analysis front end
  • 93. 93 Stars and Cubes ⇨ Star schema ⇨ Dimension & fact tables ⇨ Best foundation for cubes ⇨ Cubes (logical/physical) ⇨ Dimensions ⇨Hierarchies ⇨ Levels ⇨Attributes ⇨ Measures
  • 94. 94 The power of OLAP Aggregates, positional calculations (prior vs current), range calculations (ytd, mtd), level calculations (child to parent contribution)
  • 95. 95 MDX ⇨ Short for 'Multi Dimensional Expressions' ⇨ ~ SQL for OLAP: ⇨ SELECT {set for column headers} ON COLUMNS, {set for row headers} on ROWS FROM [Cube Name] WHERE {set for filtering} ⇨ SELECT: {[Measures].[Unit Sales]} ON COLUMNS, {[Product].[Drink], [Product].[Food]} ON ROWS FROM [Sales] WHERE [Time].[1997]
  • 96. 96 The Power of MDX Positional: [Measures].[Profit], [Time].PrevMember Range: Aggregate(YTD(), [Measures].[Profit] “MDX is far more powerful than SQL for the typical BI questions”
  • 97. 97 Adding OLAP to the mix ⇨ Virtual Cubes, e.g. ⇨ Kognitio Pablo ⇨ Pentaho Mondrian ⇨ Microstrategy ⇨ Physical Cubes, e.g. ⇨ Microsoft Analysis Services ⇨ Oracle Essbase ⇨ Jedox Palo Physical cubes allow 'write back': what if, forecasting, budgetting & planning
  • 98. 'New' kid on the block: SAP HANA
  • 99. 99 The promises of the Cloud ⇨ “Utility computing” ⇨ Unlimited capacity ⇨ Pay as you go/by the sip ⇨ Lower costs ⇨ Always up to date ⇨ Invisible OS ⇨ Security ⇨ Safety
  • 100. 100 Cloud still getting Hotter Source: IBM CIO Survey 2011
  • 101. 101 Types of Cloud Solutions Virtualization IaaS (Infrastructure) PaaS (Platform) SaaS (Software) ValueAdded
  • 102. 102 Cloud Cost Components StorageStorage BandwithBandwith SLA/ServiceSLA/Service CPU powerCPU power MemoryMemory Data transferData transfer RequestsRequests
  • 104. 104 The trouble with Cloud DWH ⇨ DWH aaS vendors: ⇨ e.g. 1010Data, Kognitio, Vertica, EMC/Greenplum ⇨ more will follow
  • 105. 105 What about No Database at all? Rick F. van der Lans Key element: abstraction (de-coupling)
  • 106. 106 Data Virtualization concept Virtual DB SQL SOAP REST FILE WS-* Information Consumers
  • 107. 107 © 2011 Composite Software, Inc. / Composite Proprietary Example: Composite 6 Discovery Active Cluster Composite Information Server XQuery, Java, WSDL, SCA (Services Centric) Front-end Applications Security Metadata Repository Views, SQLScript (Database Centric) Security Query Engine Cost-based Optimizer Rules-based Optimizer Federation Engine Web Services (HTTP, REST, SOAP, JSON, XQuery) SQL (ODBC, JDBC, ADO.NET) Messaging (JMS) Java (POJO) Web Services (HTTP, SOAP, JSON) Messaging (JMS) Application APIs MF Adapter Java (POJO) Advanced Functions Quality GovernanceCaching SQL (ODBC, JDBC) URI Monitor Manager Studio Performance Plus Adapters Development Environment Runtime Server Environment Management Environment Applications, Big Data Stores, Excel, Flat Files, Mainframes, Messages, OLAP Cubes, RDBMS, Web Services, XML Documents
  • 110. 110 Virtual vs Physical trade offs Source:Source: Mark MadsenMark Madsen
  • 111. 111 The Shootout! ⇨ Things to ask your (potential) vendor ⇨ References ⇨ Assist in a paid POC ⇨ License model & unit of cost: CPU, Core, Server, (raw) Data volume, Memory used ⇨ Free dev/test editions (only pay for production use) ⇨ Support options (updates only, mail/phone support, etc) ⇨ If migrating: trade in discount ⇨ Opt out/de-integration options
  • 112. 112 Does your DB cover the Basics? ⇨ Full SQL 2003 support? ⇨ Easy backup/restore features? ⇨ Scaling up or out? ⇨ Failover & persistency? ⇨ External (management) Tool integration?
  • 113. 113 Which deployment types? ⇨ On Premise ⇨ Saas/Cloud Software only Appliance Vendor/ISPCustomer
  • 117. 117 Beware of Benchmarks ! ⇨ Differences in ⇨# threads ⇨# cores ⇨# disks ⇨# nodes ⇨CPU generation/speed 1. Always use P.O.C. on your own data & query workload 2. Don't trust the MQ's
  • 118. ⇨ Ongoing Market Consolidation ⇨ More additional/alternative storage engines ⇨ Hybrid Row/Column solutions ⇨ Every db will get In DB analytics ⇨ Every db will get Hadoop/MR extensions ⇨ Everything in-memory
  • 119. 119 So what's the best database for BI?
  • 120. 120
  • 121.
  • 122. Web: www.tholis.com Email: jos<at>tholis.com Phone: +31-(0)6-51169606 Skype: tholis.jos LinkedIn: jvdongen Twitter: josvandongen IRC: _grumpy Jos van Dongen In BI since 1991 Principal Consultant Author/Speaker/Analyst Proud member of #BBBT

Notas do Editor

  1. The original definition of business intelligence
  2. What most people in the BI/DWH department tend to forget is that BI is not about technology, cool dashboards or the fastest analytical database. Nor is it about building ETL flows and publishing 100’s of reports. It is about helping the business user and manager to make more insightful decisions. If a simple Excel spreadsheet gets you there: great! Unfortunately, things are usually more complex than that... Seminar Open Source BI November 2008 Tholis Consulting &amp;lt;number&amp;gt;
  3. In order to deliver full business impact, business intelligence must shift from retrospective analysis by experts to mechanisms that make it fully operational in a business context e.g. automatically triggered by external events as well as driven by people making decisions. The former is action within processes, while the latter is more often action on processes. As core business processes become more service oriented, there is increased scope for injecting decision-driven services. The technology evolution of software architecture means we can mix BI and decision services with application services. This allows us to maintain both application-oriented and data-oriented architectures. If business intelligence is going to directly impact business processes then we need a closed loop system to evaluate and improve results on an ongoing basis. This is where the combination of performance management concepts, business process models and data all come together. Current waterfall methods of design and construction are inadequate because they don’t allow evolution in different areas at different speeds, nor do they take into account the service model architecture over the application function-centric architecture.
  4. Data warehouses usually follow a predictable evolution. After over 25 years, we have seen the “stages” companies go through on their path to enterprise data warehousing. Moving from Stage 1 (What Happened?) into Stage 2 (Why Did It Happen?) requires new capabilities for ad hoc analysis. Then as you evolve to Stage 3 (Predicting What Will Happen) you again grow in your platform and database requirements. As you cross the chasm into Stages 4 and 5 (Operational Intelligence) you require a platform capable of “active” analysis.
  5. Monitor: passive monitoring, basic description of what’s going on Identify: active monitoring, human intervention may be required, identify exceptions, alerts Explore: examine exceptions, determine what happened, boundaries and data Analyze: determine root causes, more detailed analysis, Predict: model problems and processes, determine future outcomes Prescribe: optimize, determine choices between options, define actions to take
  6. Monitor: passive monitoring, basic description of what’s going on Identify: active monitoring, human intervention may be required, identify exceptions, alerts Explore: examine exceptions, determine what happened, boundaries and data Analyze: determine root causes, more detailed analysis, Predict: model problems and processes, determine future outcomes Prescribe: optimize, determine choices between options, define actions to take
  7. Monitor: passive monitoring, basic description of what’s going on Identify: active monitoring, human intervention may be required, identify exceptions, alerts Explore: examine exceptions, determine what happened, boundaries and data Analyze: determine root causes, more detailed analysis, Predict: model problems and processes, determine future outcomes Prescribe: optimize, determine choices between options, define actions to take
  8. Monitor: passive monitoring, basic description of what’s going on Identify: active monitoring, human intervention may be required, identify exceptions, alerts Explore: examine exceptions, determine what happened, boundaries and data Analyze: determine root causes, more detailed analysis, Predict: model problems and processes, determine future outcomes Prescribe: optimize, determine choices between options, define actions to take
  9. Step one is ad-hoc analysis, most frequently done manually and not to strict schedule. Monitor: passive monitoring, basic description of what’s going on Identify: active monitoring, human intervention may be required, identify exceptions, alerts Explore: examine exceptions, determine what happened, boundaries and data Analyze: determine root causes, more detailed analysis, model building Predict: model problems and processes, determine future outcomes Prescribe: optimize, determine choices between options, define actions to take
  10. Prediction implies automation of processes, systematic. Monitor: passive monitoring, basic description of what’s going on Identify: active monitoring, human intervention may be required, identify exceptions, alerts Explore: examine exceptions, determine what happened, boundaries and data Analyze: determine root causes, more detailed analysis, mdoel building Predict: model problems and processes, determine future outcomes Prescribe: optimize, determine choices between options, define actions to take
  11. The process requirement, and it’s lack in our environments, is coming back in BI, model and tool requirements.
  12. The warehouse concept is no longer a simple database-oriented model. It’s grown up into a large collection of data management, storage, processing and delivery components that must all work together. There have been many changes from the once per night batch oriented design, with a single data model capturing the entire enterprise, and 100% of the organization’s data readily available through a single user interface. We now have operational data stores and other staging areas to address mixed data latencies, different data types, the requirement to manage master data and clean up problems in operational data. In larger environments we’ve created warehouse-mart architectures and offloaded some of the processing or event refined the data further. Data, particularly in the case of planning, scenario modeling / what-if analysis, or scorecards, has writeback requirements. Data types are more varied and complex than SQL standard types. The new view: Data warehouse as a platform. This means meeting application needs as well as traditional BI workloads. We have to think in terms of data and decision services, as well as traditional query-response models. Access to both historical and current data. Multiple storage methods, possibly distributed. Multiple access methods. Data usage decoupled from the underlying platform. More fluid management of data, regardless of location.
  13. Any architecture now will have multiple repositories for data, multiple technologies to cope with the different needs. The primary technology classes line up like this. For most BI programs, the low hanging fruit has been picked. The BI market is changing and BI programs, skills and architectures need to change with it. That means learning about the storage and processing technologies and architectures, and how they can be put together.
  14. Lots of time is wasted on evaluating different solutions; by just take what you already have (MySQL, SQL Server, Oracle, Whatever) lots of time can be saved in your first (pilot) project. For bigger scale efforts: use &amp;apos;Ready To Fly&amp;apos; solutions, either off-premise (Cloud based stuff) or on-premise (Appliances)
  15. Seminar Open Source BI November 2008 Tholis Consulting &amp;lt;number&amp;gt;
  16. Seminar Open Source BI November 2008 Tholis Consulting &amp;lt;number&amp;gt;
  17. Seminar Open Source BI November 2008 Tholis Consulting &amp;lt;number&amp;gt;
  18. Seminar Open Source BI November 2008 Tholis Consulting &amp;lt;number&amp;gt;
  19. Selecting any solution is a trade off between conflicting goals; often, high performance and low cost don’t go well together; requiring full auditability and real time data access at the same time can also cause problems. For any combination of factors, a decision has to be made what factor has the more weight in a selection process. Seminar Open Source BI November 2008 Tholis Consulting &amp;lt;number&amp;gt;
  20. Seminar Open Source BI November 2008 Tholis Consulting &amp;lt;number&amp;gt;