Mais conteúdo relacionado Semelhante a Martin Willcox - What is a Data Lake, Anyway? (20) Martin Willcox - What is a Data Lake, Anyway?1. DEBUNKING THE MYTHS
Speaker 10 of 17
Martin Willcox
@Willcoxmnk
What is Data Lake, Anyway?
Followed by
Anthony Miller
2. One of the Big Data labels that we risk over-loading to
complete abstraction is the idea of a "Data Lake”…
2 © 2014 Teradata
“…store all data present
and future and create a
centralised data archive
location.”
“A large
object-based
repository that
holds data in
its native
format”
“Sometimes
called the bit
bucket or the
landing zone”
“All Water
and Little
Substance”
“As more and more applications
are created that derive value
from… new types of data… the
Data Lake forms”
3. “Data lakes can
help resolve the
nagging problem of
accessibility and
data integration”
…and some of the discussions sound eerily familiar
3 © 2014 Teradata
Data accessibility
and integration?
Isn’t that what the
Data Warehouse is
for?
4. So is the Data Lake a new architectural construct?
4 © 2014 Teradata
Or are we just re-platforming Data Marts?
Simple, single subject area Dimensional
Data Marts – with all of the dimensions
pre-joined to the fact table? One-per-workload
/ application?
Is this really the future of Enterprise
Analytics? Or circa 1995 silo,
departmental Decision Support Systems
warmed-over?
5. Take the merits of the different technologies out of the
equation – and this is what some of us are thinking…
5 © 2014 Teradata
6. …but there are no free lunches in Information
Management – merely more and different options
Explicit, or implicit, there
is always, always, always
(at least one) schema
6 © 2014 Teradata
Agile application
development, versus
agile data acquisition
None of the information
management
strategies / technologies
are magic - “pay me
now, or pay me later”
7. 7 © 2014 Teradata
Big Data Are Plural
For the foreseeable future, we will need multiple Information
Management strategies - and multiple Information
Management technologies
DATA WAREHOUSE
DISCOVERY PLATFORM
Integration
becomes a
critical concern
DATA
PLATFORM
– Gartner –
Logical Data Warehouse
– Forrester –
Enterprise Data Hub
– Teradata –
Unified Data Architecture
8. 8 © 2014 Teradata
A definition of the Data Lake (Data Reservoir)
A centralised, consolidated, persistent store of raw, un-modelled and un-transformed data from
multiple sources / silos (without an explicit, pre-defined schema, without externally defined metadata –
and without guarantees about the quality, provenance and security of the data)
Agile data acquisition –
a haystack to go looking
for needles…
…with a natural storage
model for complex,
multi-structured data…
…support for efficient
non-relational
computation…
Now that is new, interesting and (potentially) very, very useful…
…and provision for cost-effective
storage of large
and noisy data-sets.
10. does nature tend to give us a single, beautiful lake? Or a messy patchwork of lakes, plural?
10 © 2014 Teradata
Left to its own devices,
STOP PRESS: Laws of Physics* Unchanged!
(* More specifically, the 2nd Law of Thermodynamics)
None of the new information management strategies and technologies is by itself a cure
for information entropy – data silos form naturally, just like lakes form naturally
11. 11 © 2014 Teradata
Summary and conclusions