The way we make decisions has changed. The data we use has changed. The techniques we can apply to data and decisions have changed. Yet what we build and how we build it has barely changed in 20 years.
The definition of madness is doing more of what you already do and expecting different results. The threat to the data warehouse is not from new technology that will replace the data warehouse. It is from destabilization caused by new technology as it changes the architecture, and from failure to adapt to those changes.
The technology that we use is problematic because it constrains and sometimes prevents necessary activities. We don’t need more technology and bigger machines. We need different technology that does different things. More product features from the same vendors won’t solve the problem.
The data we want to use is challenging. We can’t model and clean and maintain it fast enough. We don’t need more data modeling to solve this problem. We need less modeling and more metadata.
And lastly, a change in scale has occurred. It isn’t a simple problem of “big”. The problem with current workloads has been solved, despite the performance problems that many people still have today. Scale has many dimensions – important among them are the number of discrete sources and structures, the rate of change of individual structures, the rate of change in data use, the variety of uses and the concurrency of those uses.
In short, we need new architecture that is not focused on creating stability in data, but one that is adaptable to continuous and rapidly changing uses of data.
9. Copyright Third Nature, Inc.
It’s not the number of genes
that determine complexity, it’s
the interactions between them.
Source: M. Pertea and S. Salzberg/Genome Biology 2010
10. Copyright Third Nature, Inc.
It’s not the number of genes
that determine complexity, it’s
the interactions between them.
Source: M. Pertea and S. Salzberg/Genome Biology 2010
32. Copyright Third Nature, Inc.
Which is best, 3NF or dimensional?
The core assumption that
there can be just one big
schema model on one big
platform is flawed.
Answer: neither.
We think we can model all
the data before use, but
that’s a bottleneck. Current
techniques for modeling and
managing data are too rigid
and incapable of describing
all the possible relationships.
39. Copyright Third Nature, Inc.
Workloads
OLTP BI Analytics
Access Read‐Write Read‐only Read‐mostly
Predictability Predictable Unpredictable Fixed path
Selectivity High Low Low
Retrieval Low Low High
Latency Milliseconds < seconds msecs to days
Concurrency Huge Moderate 1 to huge
Model 3NF, nested object Dim, denorm BWT
Task size Small Large Small to huge
56. About Third Nature
Third Nature is a research and consulting firm focused on new and
emerging technology and practices in analytics, business intelligence,
information strategy and data management. If your question is related to
data, analytics, information strategy and technology infrastructure then
you‘re at the right place.
Our goal is to help organizations solve problems using data. We offer
education, consulting and research services to support business and IT
organizations as well as technology vendors.
We fill the gap between what the industry analyst firms cover and what IT
needs. We specialize in product and technology analysis, so we look at
emerging technologies and markets, evaluating technology and hw it is
applied rather than vendor market positions.