SlideShare uma empresa Scribd logo
1 de 43
Baixar para ler offline
Grab some
coffee and
enjoy the
pre-show
banter before
the top of the
hour!
The Data Lake Survival Guide
Exploratory Webcast | October 26, 2016
SPONSORED BY
Presenting
Robin Bloor
Chief Analyst, The Bloor Group
@robinbloor robin.bloor@bloorgroup.com
Host: Eric Kavanagh
CEO, The Bloor Group
@eric_kavanagh eric.kavanagh@bloorgroup.com
Dez Blanchfield
Data Scientist, The Bloor Group
@dez_blanchfield dez.blanchfield@bloorgroup.com
Findings Webcast
January 12, 2017
Data Lake Survival Guide
Roundtable Webcast
December 8, 2016
Exploratory Webcast
October 26, 2016
Data Lake
Survival
Robin Bloor, PhD
The Sequence of Topics….
1  Disturbance in the Force
2  What is a Data Lake,
exactly?
3  Streams and Events
1
Disturbance
in the
Force
The Generic Dimensions of IT
q  All IT involves 4 components (only)
q  Users
q  Software
q  Data
q  Hardware
q  They all relate to each other
q  Change any one of these and the other
three components have to adjust
q  Aggregate these and you get a process
q  Time will impose change anyway
q  We can also consider:
q  Staff
q  Business Processes
q  Business Information
q  Facility
q  And also
q  People
q  Information
q  Human Activity
q  Civilization (Stuff)
Four Fundamental (IT) Factors
Hardware
Users
Software Data
BusinessInformation
BusinessProcess
HumanActivity
AllInformation
Staff
Facility
People
Civilization
TIME
The Technology Layers
§  The buying impulse
descends through the
stack
§  The impact of
technology change rises
up the stack
§  This ensures the
eventual “legacification”
of all technology
The Buying
Impulse Goes
Down
Technology
Change Rises Up
The Technology
Layers
Disruption in the Technology Layers
§  Disruption (as
innovation) can happen in
any layer
§  Where it occurs it will
impact all layers above it
§  And it may also impact
the layers below it (but
less quickly)
§  There is no such thing as
future-proof; but some
technologies definitely live
longer
The Buying
Impulse Goes
Down
Technology
Change Rises Up
The Technology
Layers
§  Mainframe Computer (Batch architecture)
§  On-line Interaction (Centralized
architecture)
§  PC (Client Server)
§  Internet (Multi-tier architecture)
§  Mobile (Service Oriented architecture)
§  Internet of Things (Event Driven
Architecture)
Tech Revolutions
Note that all of these disruptive changes
were driven by hardware innovation
Cloud
Centralized Computer Systems
PC Based Systems
Integrated Systems
Limited process power
Terminals only
Few applications
No external data sources
Extensive process power
PCs & Apps
Analytics capability
Wealth of applications
Many external data sources
Moderate process power
PCs
Spreadsheets & email
Many applications
Few external data sources
Parallelism: The Imp Out of the Bottle
u  Multicore chips enabled
parallelism
u  It has changed the whole
performance equation
u  It enabled Big Data
u  Big Data is really Big
Processing
The Impact of Parallelism
We used to see 10x performance
improvement every 6 years, now we
see 1000x (and that’s just an
approximation)
Hardware Factors
q  CPUs, GPUs & FPGAs
q  Cross breeding
q  SoCs
q  3D Xpoint and PCM (and
memristor?)
q  SSDs & parallel access
q  Parallel hardware
architectures
Performance is accelerating
and costs continue to fall.
The Perfect Storm (Software)
q  The triumph of Open
Source as a business model
q  The dominance of Apache
q  Hadoop, the platform
for data
q  Spark, for speed
q  Kafka, for connectivity
q  The triumph of the cloud
and its dominance
q  Little data is also big data
q  Cost challenges
Then the Data
Lake evaporated
into the Cloud
2
What is a
Data Lake?
Everything in flux
u  Hardware (network,
storage, servers)
u  Data Sources
u  Data Staging
u  Data Volumes
u  Data Flow
u  Data Governance
u  Data Usage
u  Data Structures
u  Schema definition
u  Ingest Speeds
u  Data Workloads
Hadoop Applications
The Scale Out Applications
§  Data Ingest & Staging
§  Data Governance
§  Software development
platform
§  Analytics environment
§  Database/Data
Warehouse
§  Data Archiving
§  Video rendering & other
niche apps
The Data Lake involves just
the first two and does not
necessarily involve Hadoop
Data Lake, Refinery, Hub, in Overview
Think Logical, Implement Physical
The Data Lake Analytics Picture
Data Sources
Analytics
Service
Mgt
Life Cycle
Mgt
MetaData
Discovery
MDM
MetaData
Mgt
Data
Cleansing
Data
Lineage
R
O
U
N
D
|
U
P
W
R
A
N
G
L
I
N
G
Staging Area
(Hadoop)
Data Warehouse
or other location
Data Streams
ETL
ETL
How Data Gets to be Wrong
u  Accidentally born wrong
u  Deliberately born wrong
u  Defective sensor/data
source
u  Murdered (truncated,
overwritten)
u  Corrupted in flight (rare)
u  Corrupted by bad code
(surely not!)
u  Corrupted by bad DBA
Data Governance
If data governance was important
before Big Data, (and it was) it is
far more important in the era of
Data Lakes
What Needs To Be Governed
Data Governance
  Data Flows and Data Storage
  Security & Access
  Data cleansing and
transformation
  Data meaning
  Data provenance and lineage
  Data archive and disposal
  Availability and performance
Analytics Is a Process Not an Activity
q Data Analytics is a multi-
disciplinary end-to-end
process
q Until recently it was a
walled-garden. But the
walls were torn down by…
§  Data availability
§  Scalable technology
§  Open source tools
q It is now becoming an
integrated process
Data Governance is a process,
not an activity!!
The Global Map and Data Options
u  Move the data to
the processing
u  Move the
processing to the
data
u  Move the
processing and the
data
u  Shard
All network nodes can be data
creators, data stores and
processing points.
Logical Data Lakes
Soon we will be speaking of a
logical data lake and multiple
physical data lakes
3
Events
and
Streams
Big Data, Event Data – The Data of Everything
WHAT
IS BIG
DATA?
Business
data
Traditional
data
Log file
data
Operational
data
Mobile data
Location
data Social
network
data
Public data
Commercial
databases
Streaming
data
Internet of
Things
A TRANSACTION is a
MOLECULE of ATOMIC EVENTS
The ATOM of data has
become the EVENT
Events: Atoms and Molecules
It’s Become and Event Based World
Events
Think of events as drops of water.
They can live in streams, and they
can also live in data pools and data
lakes.
Two Data Flows
The Traffic Cop (Events)
Event Types
q  Instantiation Event
q  A State Report
q  A Trigger Event
q  A Correction Event
We also need to consider:
Data Refinement
Aggregations
Homogeneous Collections
Derived Data
§  The pulse and the
threshold alert
§  Some of this involves
distributed processing
§  There are known apps
and unknown apps, so
analytical exploration
needs to be enabled
§  Only aggregations will
migrate
DepotDepot
Central
Hub
Source
Proc.
Depot
Proc.
Central
Proc.
Sensors, controllers, CPUs
Data Data
Data
Event Based IoT Architecture
u Time
u Geographic location
u Virtual/logical location
u Source device
u Device ID
u Actors
u Ownership/
Provenance
u Values
Events and Event Data
Spark, Storm, Flink & Kafka
u  Spark has dethroned Hadoop as a platform and
has momentum, both for microbatch and
streaming
u  Storm provides batch and streaming (event
processing capabilities) concurrently via the
lambda architecture
u  Flink was purpose built for streaming
u  Kafka is the pipe
u  Lambda and Zeta Architectures…
In Summary
1  Disturbance in the Force
2  What is a Data Lake,
exactly?
3  Streams and Events
Questions?
THANK
YOU!
FIND OUT MORE at
InsideAnalysis.com

Mais conteúdo relacionado

Mais procurados

Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...Mitul Tiwari
 
RubiOne: Apache Spark as the Backbone of a Retail Analytics Development Envir...
RubiOne: Apache Spark as the Backbone of a Retail Analytics Development Envir...RubiOne: Apache Spark as the Backbone of a Retail Analytics Development Envir...
RubiOne: Apache Spark as the Backbone of a Retail Analytics Development Envir...Databricks
 
Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
Is Revolution R Enterprise Faster than SAS? Benchmarking Results RevealedIs Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
Is Revolution R Enterprise Faster than SAS? Benchmarking Results RevealedRevolution Analytics
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionDatabricks
 
Revolution Analytics: a 5-minute history
Revolution Analytics: a 5-minute historyRevolution Analytics: a 5-minute history
Revolution Analytics: a 5-minute historyRevolution Analytics
 
Spark with Delta Lake
Spark with Delta LakeSpark with Delta Lake
Spark with Delta LakeKnoldus Inc.
 
Rapid Data Analytics @ Netflix
Rapid Data Analytics @ NetflixRapid Data Analytics @ Netflix
Rapid Data Analytics @ NetflixData Con LA
 
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSets
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSetsWebinar: The Modern Streaming Data Stack with Kinetica & StreamSets
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSetsKinetica
 
Architecture in action 01
Architecture in action 01Architecture in action 01
Architecture in action 01Krishna Sankar
 
Big Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in ActionBig Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in ActionGuido Schmutz
 
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data PlatformsWhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data PlatformsMars Lan
 
The Evolution of Big Data Frameworks
The Evolution of Big Data FrameworksThe Evolution of Big Data Frameworks
The Evolution of Big Data FrameworkseXascale Infolab
 
Stream Analytics
Stream Analytics Stream Analytics
Stream Analytics Franco Ucci
 
Spark Summit EU talk by Pat Patterson
Spark Summit EU talk by Pat PattersonSpark Summit EU talk by Pat Patterson
Spark Summit EU talk by Pat PattersonSpark Summit
 
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit AgarwalSuccinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit AgarwalSpark Summit
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardParis Data Engineers !
 
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...Big Data Spain
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Shirshanka Das
 

Mais procurados (20)

Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
 
RubiOne: Apache Spark as the Backbone of a Retail Analytics Development Envir...
RubiOne: Apache Spark as the Backbone of a Retail Analytics Development Envir...RubiOne: Apache Spark as the Backbone of a Retail Analytics Development Envir...
RubiOne: Apache Spark as the Backbone of a Retail Analytics Development Envir...
 
Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
Is Revolution R Enterprise Faster than SAS? Benchmarking Results RevealedIs Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 
Revolution Analytics: a 5-minute history
Revolution Analytics: a 5-minute historyRevolution Analytics: a 5-minute history
Revolution Analytics: a 5-minute history
 
Spark with Delta Lake
Spark with Delta LakeSpark with Delta Lake
Spark with Delta Lake
 
Rapid Data Analytics @ Netflix
Rapid Data Analytics @ NetflixRapid Data Analytics @ Netflix
Rapid Data Analytics @ Netflix
 
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSets
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSetsWebinar: The Modern Streaming Data Stack with Kinetica & StreamSets
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSets
 
Architecture in action 01
Architecture in action 01Architecture in action 01
Architecture in action 01
 
Big Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in ActionBig Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in Action
 
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data PlatformsWhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
 
Smart data for a predictive bank
Smart data for a predictive bankSmart data for a predictive bank
Smart data for a predictive bank
 
The Evolution of Big Data Frameworks
The Evolution of Big Data FrameworksThe Evolution of Big Data Frameworks
The Evolution of Big Data Frameworks
 
Stream Analytics
Stream Analytics Stream Analytics
Stream Analytics
 
Spark Summit EU talk by Pat Patterson
Spark Summit EU talk by Pat PattersonSpark Summit EU talk by Pat Patterson
Spark Summit EU talk by Pat Patterson
 
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit AgarwalSuccinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
 
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
 
Big Data Analysis Starts with R
Big Data Analysis Starts with RBig Data Analysis Starts with R
Big Data Analysis Starts with R
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
 

Destaque

Mind Your Business: Why Privacy Matters to the Successful Enterprise
 Mind Your Business: Why Privacy Matters to the Successful Enterprise Mind Your Business: Why Privacy Matters to the Successful Enterprise
Mind Your Business: Why Privacy Matters to the Successful EnterpriseEric Kavanagh
 
A Tight Ship: How Containers and SDS Optimize the Enterprise
 A Tight Ship: How Containers and SDS Optimize the Enterprise A Tight Ship: How Containers and SDS Optimize the Enterprise
A Tight Ship: How Containers and SDS Optimize the EnterpriseEric Kavanagh
 
Presentation dual inversion-index
Presentation dual inversion-indexPresentation dual inversion-index
Presentation dual inversion-indexmahi_uta
 
Solving the Really Big Tech Problems with IoT
 Solving the Really Big Tech Problems with IoT Solving the Really Big Tech Problems with IoT
Solving the Really Big Tech Problems with IoTEric Kavanagh
 
Arcadian Landscapes
Arcadian LandscapesArcadian Landscapes
Arcadian LandscapesM-droid
 
A Better Understanding: Solving Business Challenges with Data
A Better Understanding: Solving Business Challenges with DataA Better Understanding: Solving Business Challenges with Data
A Better Understanding: Solving Business Challenges with DataEric Kavanagh
 
Auto bodies
Auto bodiesAuto bodies
Auto bodiesM-droid
 
See the Whole Story: The Case for a Visualization Platform
See the Whole Story: The Case for a Visualization PlatformSee the Whole Story: The Case for a Visualization Platform
See the Whole Story: The Case for a Visualization PlatformEric Kavanagh
 
Warsztaty Active Image | Opinie
Warsztaty Active Image | OpinieWarsztaty Active Image | Opinie
Warsztaty Active Image | Opiniesawares
 
Who, What, Where and How: Why You Want to Know
 Who, What, Where and How: Why You Want to Know Who, What, Where and How: Why You Want to Know
Who, What, Where and How: Why You Want to KnowEric Kavanagh
 
The Art of Visibility: Enabling Multi-Platform Management
The Art of Visibility: Enabling Multi-Platform ManagementThe Art of Visibility: Enabling Multi-Platform Management
The Art of Visibility: Enabling Multi-Platform ManagementEric Kavanagh
 
Test your taste buds
Test your taste budsTest your taste buds
Test your taste budskelsey-jane
 
Warsztaty PR-u i komunikacji | Opinie
Warsztaty PR-u i komunikacji | OpinieWarsztaty PR-u i komunikacji | Opinie
Warsztaty PR-u i komunikacji | Opiniesawares
 
The Key to Effective Analytics: Fast-Returning Queries
The Key to Effective Analytics: Fast-Returning QueriesThe Key to Effective Analytics: Fast-Returning Queries
The Key to Effective Analytics: Fast-Returning QueriesEric Kavanagh
 
Extracción-de-la-muestra-_ Clase Nº 2 Hematología
Extracción-de-la-muestra-_ Clase Nº 2  Hematología Extracción-de-la-muestra-_ Clase Nº 2  Hematología
Extracción-de-la-muestra-_ Clase Nº 2 Hematología gabriela aguilar
 
Webエンジニアがラクして企業向けモバイルアプリを作る方法 ~Salesforce1モバイルコンテナを使った開発手法~
Webエンジニアがラクして企業向けモバイルアプリを作る方法 ~Salesforce1モバイルコンテナを使った開発手法~Webエンジニアがラクして企業向けモバイルアプリを作る方法 ~Salesforce1モバイルコンテナを使った開発手法~
Webエンジニアがラクして企業向けモバイルアプリを作る方法 ~Salesforce1モバイルコンテナを使った開発手法~Mitch Okamoto
 
Summer '12のワイルドな新機能+
Summer '12のワイルドな新機能+Summer '12のワイルドな新機能+
Summer '12のワイルドな新機能+Mitch Okamoto
 
Heroku-ja Meetup #1 - Salesforce.com
Heroku-ja Meetup #1 - Salesforce.comHeroku-ja Meetup #1 - Salesforce.com
Heroku-ja Meetup #1 - Salesforce.comMitch Okamoto
 
The New Normal: Dealing with the Reality of an Unsecure World
The New Normal: Dealing with the Reality of an Unsecure WorldThe New Normal: Dealing with the Reality of an Unsecure World
The New Normal: Dealing with the Reality of an Unsecure WorldEric Kavanagh
 

Destaque (20)

Mind Your Business: Why Privacy Matters to the Successful Enterprise
 Mind Your Business: Why Privacy Matters to the Successful Enterprise Mind Your Business: Why Privacy Matters to the Successful Enterprise
Mind Your Business: Why Privacy Matters to the Successful Enterprise
 
A Tight Ship: How Containers and SDS Optimize the Enterprise
 A Tight Ship: How Containers and SDS Optimize the Enterprise A Tight Ship: How Containers and SDS Optimize the Enterprise
A Tight Ship: How Containers and SDS Optimize the Enterprise
 
Presentation dual inversion-index
Presentation dual inversion-indexPresentation dual inversion-index
Presentation dual inversion-index
 
My OS
My OSMy OS
My OS
 
Solving the Really Big Tech Problems with IoT
 Solving the Really Big Tech Problems with IoT Solving the Really Big Tech Problems with IoT
Solving the Really Big Tech Problems with IoT
 
Arcadian Landscapes
Arcadian LandscapesArcadian Landscapes
Arcadian Landscapes
 
A Better Understanding: Solving Business Challenges with Data
A Better Understanding: Solving Business Challenges with DataA Better Understanding: Solving Business Challenges with Data
A Better Understanding: Solving Business Challenges with Data
 
Auto bodies
Auto bodiesAuto bodies
Auto bodies
 
See the Whole Story: The Case for a Visualization Platform
See the Whole Story: The Case for a Visualization PlatformSee the Whole Story: The Case for a Visualization Platform
See the Whole Story: The Case for a Visualization Platform
 
Warsztaty Active Image | Opinie
Warsztaty Active Image | OpinieWarsztaty Active Image | Opinie
Warsztaty Active Image | Opinie
 
Who, What, Where and How: Why You Want to Know
 Who, What, Where and How: Why You Want to Know Who, What, Where and How: Why You Want to Know
Who, What, Where and How: Why You Want to Know
 
The Art of Visibility: Enabling Multi-Platform Management
The Art of Visibility: Enabling Multi-Platform ManagementThe Art of Visibility: Enabling Multi-Platform Management
The Art of Visibility: Enabling Multi-Platform Management
 
Test your taste buds
Test your taste budsTest your taste buds
Test your taste buds
 
Warsztaty PR-u i komunikacji | Opinie
Warsztaty PR-u i komunikacji | OpinieWarsztaty PR-u i komunikacji | Opinie
Warsztaty PR-u i komunikacji | Opinie
 
The Key to Effective Analytics: Fast-Returning Queries
The Key to Effective Analytics: Fast-Returning QueriesThe Key to Effective Analytics: Fast-Returning Queries
The Key to Effective Analytics: Fast-Returning Queries
 
Extracción-de-la-muestra-_ Clase Nº 2 Hematología
Extracción-de-la-muestra-_ Clase Nº 2  Hematología Extracción-de-la-muestra-_ Clase Nº 2  Hematología
Extracción-de-la-muestra-_ Clase Nº 2 Hematología
 
Webエンジニアがラクして企業向けモバイルアプリを作る方法 ~Salesforce1モバイルコンテナを使った開発手法~
Webエンジニアがラクして企業向けモバイルアプリを作る方法 ~Salesforce1モバイルコンテナを使った開発手法~Webエンジニアがラクして企業向けモバイルアプリを作る方法 ~Salesforce1モバイルコンテナを使った開発手法~
Webエンジニアがラクして企業向けモバイルアプリを作る方法 ~Salesforce1モバイルコンテナを使った開発手法~
 
Summer '12のワイルドな新機能+
Summer '12のワイルドな新機能+Summer '12のワイルドな新機能+
Summer '12のワイルドな新機能+
 
Heroku-ja Meetup #1 - Salesforce.com
Heroku-ja Meetup #1 - Salesforce.comHeroku-ja Meetup #1 - Salesforce.com
Heroku-ja Meetup #1 - Salesforce.com
 
The New Normal: Dealing with the Reality of an Unsecure World
The New Normal: Dealing with the Reality of an Unsecure WorldThe New Normal: Dealing with the Reality of an Unsecure World
The New Normal: Dealing with the Reality of an Unsecure World
 

Semelhante a The Central Hub: Defining the Data Lake

Database Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory WebcastDatabase Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory WebcastEric Kavanagh
 
Think Big - How to Design a Big Data Information Architecture
Think Big - How to Design a Big Data Information ArchitectureThink Big - How to Design a Big Data Information Architecture
Think Big - How to Design a Big Data Information ArchitectureInside Analysis
 
Foundation for Success: How Big Data Fits in an Information Architecture
Foundation for Success: How Big Data Fits in an Information ArchitectureFoundation for Success: How Big Data Fits in an Information Architecture
Foundation for Success: How Big Data Fits in an Information ArchitectureInside Analysis
 
Stream Processing – Concepts and Frameworks
Stream Processing – Concepts and FrameworksStream Processing – Concepts and Frameworks
Stream Processing – Concepts and FrameworksGuido Schmutz
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Denodo
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
Big Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-AriBig Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-AriDemi Ben-Ari
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database RoundtableEric Kavanagh
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationInside Analysis
 
Data Strategy in 2016
Data Strategy in 2016Data Strategy in 2016
Data Strategy in 2016FairCom
 
When Streaming Becomes Strategic
When Streaming Becomes StrategicWhen Streaming Becomes Strategic
When Streaming Becomes StrategicMapR Technologies
 
Measure All the Things! - Austin Data Day 2014
Measure All the Things! - Austin Data Day 2014Measure All the Things! - Austin Data Day 2014
Measure All the Things! - Austin Data Day 2014gdusbabek
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQLWSO2
 
Hadoop at the Center: The Next Generation of Hadoop
Hadoop at the Center: The Next Generation of HadoopHadoop at the Center: The Next Generation of Hadoop
Hadoop at the Center: The Next Generation of HadoopAdam Muise
 
The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!DataWorks Summit/Hadoop Summit
 

Semelhante a The Central Hub: Defining the Data Lake (20)

Database Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory WebcastDatabase Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory Webcast
 
Think Big - How to Design a Big Data Information Architecture
Think Big - How to Design a Big Data Information ArchitectureThink Big - How to Design a Big Data Information Architecture
Think Big - How to Design a Big Data Information Architecture
 
BDIA Findings
BDIA FindingsBDIA Findings
BDIA Findings
 
Foundation for Success: How Big Data Fits in an Information Architecture
Foundation for Success: How Big Data Fits in an Information ArchitectureFoundation for Success: How Big Data Fits in an Information Architecture
Foundation for Success: How Big Data Fits in an Information Architecture
 
Stream Processing – Concepts and Frameworks
Stream Processing – Concepts and FrameworksStream Processing – Concepts and Frameworks
Stream Processing – Concepts and Frameworks
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Big Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-AriBig Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-Ari
 
Big data pipelines
Big data pipelinesBig data pipelines
Big data pipelines
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter Integration
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
 
Data Strategy in 2016
Data Strategy in 2016Data Strategy in 2016
Data Strategy in 2016
 
When Streaming Becomes Strategic
When Streaming Becomes StrategicWhen Streaming Becomes Strategic
When Streaming Becomes Strategic
 
Measure All the Things! - Austin Data Day 2014
Measure All the Things! - Austin Data Day 2014Measure All the Things! - Austin Data Day 2014
Measure All the Things! - Austin Data Day 2014
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
 
Technology Disruption
Technology DisruptionTechnology Disruption
Technology Disruption
 
Hadoop at the Center: The Next Generation of Hadoop
Hadoop at the Center: The Next Generation of HadoopHadoop at the Center: The Next Generation of Hadoop
Hadoop at the Center: The Next Generation of Hadoop
 
The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!
 

Mais de Eric Kavanagh

The Future of Data Warehousing and Data Integration
The Future of Data Warehousing and Data IntegrationThe Future of Data Warehousing and Data Integration
The Future of Data Warehousing and Data IntegrationEric Kavanagh
 
Best Practices in DataOps: How to Create Agile, Automated Data Pipelines
Best Practices in DataOps: How to Create Agile, Automated Data PipelinesBest Practices in DataOps: How to Create Agile, Automated Data Pipelines
Best Practices in DataOps: How to Create Agile, Automated Data PipelinesEric Kavanagh
 
Expediting the Path to Discovery with Multi-Source Analysis
Expediting the Path to Discovery with Multi-Source AnalysisExpediting the Path to Discovery with Multi-Source Analysis
Expediting the Path to Discovery with Multi-Source AnalysisEric Kavanagh
 
Will AI Eliminate Reports and Dashboards
Will AI Eliminate Reports and DashboardsWill AI Eliminate Reports and Dashboards
Will AI Eliminate Reports and DashboardsEric Kavanagh
 
Metadata Mastery: A Big Step for BI Modernization
Metadata Mastery: A Big Step for BI ModernizationMetadata Mastery: A Big Step for BI Modernization
Metadata Mastery: A Big Step for BI ModernizationEric Kavanagh
 
Better to Ask Permission? Best Practices for Privacy and Security
Better to Ask Permission? Best Practices for Privacy and SecurityBetter to Ask Permission? Best Practices for Privacy and Security
Better to Ask Permission? Best Practices for Privacy and SecurityEric Kavanagh
 
The Model Enterprise: A Blueprint for Enterprise Data Governance
The Model Enterprise: A Blueprint for Enterprise Data GovernanceThe Model Enterprise: A Blueprint for Enterprise Data Governance
The Model Enterprise: A Blueprint for Enterprise Data GovernanceEric Kavanagh
 
Best Laid Plans: Saving Time, Money and Trouble with Optimal Forecasting
Best Laid Plans: Saving Time, Money and Trouble with Optimal ForecastingBest Laid Plans: Saving Time, Money and Trouble with Optimal Forecasting
Best Laid Plans: Saving Time, Money and Trouble with Optimal ForecastingEric Kavanagh
 
A Winning Strategy for the Digital Economy
A Winning Strategy for the Digital EconomyA Winning Strategy for the Digital Economy
A Winning Strategy for the Digital EconomyEric Kavanagh
 
Discovering Big Data in the Fog: Why Catalogs Matter
 Discovering Big Data in the Fog: Why Catalogs Matter Discovering Big Data in the Fog: Why Catalogs Matter
Discovering Big Data in the Fog: Why Catalogs MatterEric Kavanagh
 
Health Check: Maintaining Enterprise BI
Health Check: Maintaining Enterprise BIHealth Check: Maintaining Enterprise BI
Health Check: Maintaining Enterprise BIEric Kavanagh
 
Rapid Response: Debugging and Profiling to the Rescue
Rapid Response: Debugging and Profiling to the RescueRapid Response: Debugging and Profiling to the Rescue
Rapid Response: Debugging and Profiling to the RescueEric Kavanagh
 
Beyond the Platform: Enabling Fluid Analysis
Beyond the Platform: Enabling Fluid AnalysisBeyond the Platform: Enabling Fluid Analysis
Beyond the Platform: Enabling Fluid AnalysisEric Kavanagh
 
Protect Your Database: High Availability for High Demand Data
 Protect Your Database: High Availability for High Demand Data Protect Your Database: High Availability for High Demand Data
Protect Your Database: High Availability for High Demand DataEric Kavanagh
 
Application Acceleration: Faster Performance for End Users
Application Acceleration: Faster Performance for End Users	Application Acceleration: Faster Performance for End Users
Application Acceleration: Faster Performance for End Users Eric Kavanagh
 
Time's Up! Getting Value from Big Data Now
Time's Up! Getting Value from Big Data NowTime's Up! Getting Value from Big Data Now
Time's Up! Getting Value from Big Data NowEric Kavanagh
 
A Bigger Magnifying Glass: Analyzing the Internet of Things
A Bigger Magnifying Glass: Analyzing the Internet of Things	A Bigger Magnifying Glass: Analyzing the Internet of Things
A Bigger Magnifying Glass: Analyzing the Internet of Things Eric Kavanagh
 
A Real-Time Version of the Truth
 A Real-Time Version of the Truth A Real-Time Version of the Truth
A Real-Time Version of the TruthEric Kavanagh
 

Mais de Eric Kavanagh (18)

The Future of Data Warehousing and Data Integration
The Future of Data Warehousing and Data IntegrationThe Future of Data Warehousing and Data Integration
The Future of Data Warehousing and Data Integration
 
Best Practices in DataOps: How to Create Agile, Automated Data Pipelines
Best Practices in DataOps: How to Create Agile, Automated Data PipelinesBest Practices in DataOps: How to Create Agile, Automated Data Pipelines
Best Practices in DataOps: How to Create Agile, Automated Data Pipelines
 
Expediting the Path to Discovery with Multi-Source Analysis
Expediting the Path to Discovery with Multi-Source AnalysisExpediting the Path to Discovery with Multi-Source Analysis
Expediting the Path to Discovery with Multi-Source Analysis
 
Will AI Eliminate Reports and Dashboards
Will AI Eliminate Reports and DashboardsWill AI Eliminate Reports and Dashboards
Will AI Eliminate Reports and Dashboards
 
Metadata Mastery: A Big Step for BI Modernization
Metadata Mastery: A Big Step for BI ModernizationMetadata Mastery: A Big Step for BI Modernization
Metadata Mastery: A Big Step for BI Modernization
 
Better to Ask Permission? Best Practices for Privacy and Security
Better to Ask Permission? Best Practices for Privacy and SecurityBetter to Ask Permission? Best Practices for Privacy and Security
Better to Ask Permission? Best Practices for Privacy and Security
 
The Model Enterprise: A Blueprint for Enterprise Data Governance
The Model Enterprise: A Blueprint for Enterprise Data GovernanceThe Model Enterprise: A Blueprint for Enterprise Data Governance
The Model Enterprise: A Blueprint for Enterprise Data Governance
 
Best Laid Plans: Saving Time, Money and Trouble with Optimal Forecasting
Best Laid Plans: Saving Time, Money and Trouble with Optimal ForecastingBest Laid Plans: Saving Time, Money and Trouble with Optimal Forecasting
Best Laid Plans: Saving Time, Money and Trouble with Optimal Forecasting
 
A Winning Strategy for the Digital Economy
A Winning Strategy for the Digital EconomyA Winning Strategy for the Digital Economy
A Winning Strategy for the Digital Economy
 
Discovering Big Data in the Fog: Why Catalogs Matter
 Discovering Big Data in the Fog: Why Catalogs Matter Discovering Big Data in the Fog: Why Catalogs Matter
Discovering Big Data in the Fog: Why Catalogs Matter
 
Health Check: Maintaining Enterprise BI
Health Check: Maintaining Enterprise BIHealth Check: Maintaining Enterprise BI
Health Check: Maintaining Enterprise BI
 
Rapid Response: Debugging and Profiling to the Rescue
Rapid Response: Debugging and Profiling to the RescueRapid Response: Debugging and Profiling to the Rescue
Rapid Response: Debugging and Profiling to the Rescue
 
Beyond the Platform: Enabling Fluid Analysis
Beyond the Platform: Enabling Fluid AnalysisBeyond the Platform: Enabling Fluid Analysis
Beyond the Platform: Enabling Fluid Analysis
 
Protect Your Database: High Availability for High Demand Data
 Protect Your Database: High Availability for High Demand Data Protect Your Database: High Availability for High Demand Data
Protect Your Database: High Availability for High Demand Data
 
Application Acceleration: Faster Performance for End Users
Application Acceleration: Faster Performance for End Users	Application Acceleration: Faster Performance for End Users
Application Acceleration: Faster Performance for End Users
 
Time's Up! Getting Value from Big Data Now
Time's Up! Getting Value from Big Data NowTime's Up! Getting Value from Big Data Now
Time's Up! Getting Value from Big Data Now
 
A Bigger Magnifying Glass: Analyzing the Internet of Things
A Bigger Magnifying Glass: Analyzing the Internet of Things	A Bigger Magnifying Glass: Analyzing the Internet of Things
A Bigger Magnifying Glass: Analyzing the Internet of Things
 
A Real-Time Version of the Truth
 A Real-Time Version of the Truth A Real-Time Version of the Truth
A Real-Time Version of the Truth
 

Último

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Último (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

The Central Hub: Defining the Data Lake

  • 1. Grab some coffee and enjoy the pre-show banter before the top of the hour!
  • 2. The Data Lake Survival Guide Exploratory Webcast | October 26, 2016 SPONSORED BY
  • 3. Presenting Robin Bloor Chief Analyst, The Bloor Group @robinbloor robin.bloor@bloorgroup.com Host: Eric Kavanagh CEO, The Bloor Group @eric_kavanagh eric.kavanagh@bloorgroup.com Dez Blanchfield Data Scientist, The Bloor Group @dez_blanchfield dez.blanchfield@bloorgroup.com
  • 4. Findings Webcast January 12, 2017 Data Lake Survival Guide Roundtable Webcast December 8, 2016 Exploratory Webcast October 26, 2016
  • 6. The Sequence of Topics…. 1  Disturbance in the Force 2  What is a Data Lake, exactly? 3  Streams and Events
  • 8. The Generic Dimensions of IT q  All IT involves 4 components (only) q  Users q  Software q  Data q  Hardware q  They all relate to each other q  Change any one of these and the other three components have to adjust q  Aggregate these and you get a process q  Time will impose change anyway q  We can also consider: q  Staff q  Business Processes q  Business Information q  Facility q  And also q  People q  Information q  Human Activity q  Civilization (Stuff) Four Fundamental (IT) Factors Hardware Users Software Data BusinessInformation BusinessProcess HumanActivity AllInformation Staff Facility People Civilization TIME
  • 9. The Technology Layers §  The buying impulse descends through the stack §  The impact of technology change rises up the stack §  This ensures the eventual “legacification” of all technology The Buying Impulse Goes Down Technology Change Rises Up The Technology Layers
  • 10. Disruption in the Technology Layers §  Disruption (as innovation) can happen in any layer §  Where it occurs it will impact all layers above it §  And it may also impact the layers below it (but less quickly) §  There is no such thing as future-proof; but some technologies definitely live longer The Buying Impulse Goes Down Technology Change Rises Up The Technology Layers
  • 11. §  Mainframe Computer (Batch architecture) §  On-line Interaction (Centralized architecture) §  PC (Client Server) §  Internet (Multi-tier architecture) §  Mobile (Service Oriented architecture) §  Internet of Things (Event Driven Architecture) Tech Revolutions Note that all of these disruptive changes were driven by hardware innovation Cloud Centralized Computer Systems PC Based Systems Integrated Systems Limited process power Terminals only Few applications No external data sources Extensive process power PCs & Apps Analytics capability Wealth of applications Many external data sources Moderate process power PCs Spreadsheets & email Many applications Few external data sources
  • 12. Parallelism: The Imp Out of the Bottle u  Multicore chips enabled parallelism u  It has changed the whole performance equation u  It enabled Big Data u  Big Data is really Big Processing
  • 13. The Impact of Parallelism We used to see 10x performance improvement every 6 years, now we see 1000x (and that’s just an approximation)
  • 14. Hardware Factors q  CPUs, GPUs & FPGAs q  Cross breeding q  SoCs q  3D Xpoint and PCM (and memristor?) q  SSDs & parallel access q  Parallel hardware architectures Performance is accelerating and costs continue to fall.
  • 15. The Perfect Storm (Software) q  The triumph of Open Source as a business model q  The dominance of Apache q  Hadoop, the platform for data q  Spark, for speed q  Kafka, for connectivity q  The triumph of the cloud and its dominance q  Little data is also big data q  Cost challenges
  • 16. Then the Data Lake evaporated into the Cloud 2 What is a Data Lake?
  • 17. Everything in flux u  Hardware (network, storage, servers) u  Data Sources u  Data Staging u  Data Volumes u  Data Flow u  Data Governance u  Data Usage u  Data Structures u  Schema definition u  Ingest Speeds u  Data Workloads
  • 19. The Scale Out Applications §  Data Ingest & Staging §  Data Governance §  Software development platform §  Analytics environment §  Database/Data Warehouse §  Data Archiving §  Video rendering & other niche apps The Data Lake involves just the first two and does not necessarily involve Hadoop
  • 20. Data Lake, Refinery, Hub, in Overview Think Logical, Implement Physical
  • 21. The Data Lake Analytics Picture Data Sources Analytics Service Mgt Life Cycle Mgt MetaData Discovery MDM MetaData Mgt Data Cleansing Data Lineage R O U N D | U P W R A N G L I N G Staging Area (Hadoop) Data Warehouse or other location Data Streams ETL ETL
  • 22. How Data Gets to be Wrong u  Accidentally born wrong u  Deliberately born wrong u  Defective sensor/data source u  Murdered (truncated, overwritten) u  Corrupted in flight (rare) u  Corrupted by bad code (surely not!) u  Corrupted by bad DBA
  • 23. Data Governance If data governance was important before Big Data, (and it was) it is far more important in the era of Data Lakes
  • 24. What Needs To Be Governed
  • 25. Data Governance   Data Flows and Data Storage   Security & Access   Data cleansing and transformation   Data meaning   Data provenance and lineage   Data archive and disposal   Availability and performance
  • 26. Analytics Is a Process Not an Activity q Data Analytics is a multi- disciplinary end-to-end process q Until recently it was a walled-garden. But the walls were torn down by… §  Data availability §  Scalable technology §  Open source tools q It is now becoming an integrated process Data Governance is a process, not an activity!!
  • 27. The Global Map and Data Options u  Move the data to the processing u  Move the processing to the data u  Move the processing and the data u  Shard All network nodes can be data creators, data stores and processing points.
  • 28. Logical Data Lakes Soon we will be speaking of a logical data lake and multiple physical data lakes
  • 30. Big Data, Event Data – The Data of Everything WHAT IS BIG DATA? Business data Traditional data Log file data Operational data Mobile data Location data Social network data Public data Commercial databases Streaming data Internet of Things
  • 31. A TRANSACTION is a MOLECULE of ATOMIC EVENTS The ATOM of data has become the EVENT Events: Atoms and Molecules
  • 32. It’s Become and Event Based World
  • 33. Events Think of events as drops of water. They can live in streams, and they can also live in data pools and data lakes.
  • 35. The Traffic Cop (Events)
  • 36. Event Types q  Instantiation Event q  A State Report q  A Trigger Event q  A Correction Event We also need to consider: Data Refinement Aggregations Homogeneous Collections Derived Data
  • 37. §  The pulse and the threshold alert §  Some of this involves distributed processing §  There are known apps and unknown apps, so analytical exploration needs to be enabled §  Only aggregations will migrate DepotDepot Central Hub Source Proc. Depot Proc. Central Proc. Sensors, controllers, CPUs Data Data Data Event Based IoT Architecture
  • 38. u Time u Geographic location u Virtual/logical location u Source device u Device ID u Actors u Ownership/ Provenance u Values Events and Event Data
  • 39. Spark, Storm, Flink & Kafka u  Spark has dethroned Hadoop as a platform and has momentum, both for microbatch and streaming u  Storm provides batch and streaming (event processing capabilities) concurrently via the lambda architecture u  Flink was purpose built for streaming u  Kafka is the pipe u  Lambda and Zeta Architectures…
  • 40. In Summary 1  Disturbance in the Force 2  What is a Data Lake, exactly? 3  Streams and Events
  • 41.
  • 43. THANK YOU! FIND OUT MORE at InsideAnalysis.com