SlideShare uma empresa Scribd logo
1 de 57
Information Management
Reference Architecture
3rd Evolution
EMEA Enterprise Architecture
Contents
 Introduction
 Conceptual view
 Design Patterns
 IM Logical view and component outline
 Discovery Lab
 R/T Event Engine logical view
 Mapping to previous Reference Architecture release
Introduction
Introduction
 This PPT documents the main architectural components of Oracle‟s
Information Management Reference Architecture.
 The architecture is intended to be practical and pragmatic, with many of the
ideas and experiences that inform the approach dating back almost 20 years in
Oracle and are based on real world customer experiences.
 We define Information Management to mean the following. Please note that
this definition embraces all types and forms of data as well as embracing
aspects such as Information Discovery and Governance:
“Information Management is the means by which an organisation maximises the efficiency
with which it plans, collects, organises, uses, controls, stores, disseminates, and disposes
of its Information, and through which it ensures that the value of that information is
identified and exploited to the maximum extent possible”
3rd Evolution of Oracle‟s Information Management Reference Architecture
Oracle’s Information Management Reference Architecture (3rd Edition)
 More relevant to Big Data oriented audience
 Better representation of pragmatic customer projects
 Includes Raw data store as part of the architecture
 Show effort / cost to store and interpret data that separates
schema-on-read and schema-on-write approaches
 Aligned to Analytics 3.0
 Consistent with Oracle‟s engineering efforts
What‟s changed?
Aligning analytical requirements and IM architecture
Enabling Analytics 3.0 with a pragmatic architecture
Analytics 2.0
Analytics 3.0
Analytics 1.0
• Reporting with limited use of
descriptive analytics
• Limited range of tabular data
• Batch oriented analysis
• Analysis bolted onto limited
set of business processes
• Firms “Competing on Analytics”
• Extended analytics to larger
and less structured datasets
• Emergence of Big Data into the
commercial world
• Recognition of Data Science
role in commercial orgs.
• Platform for monetisation
• Deeper analysis & more data
• Faster test-do-learn iterations
• Different types of data & wider
business process coverage
• Analysts focus on discovery and
driving business value
• “Agile” with operational elements
incorporated into design patterns
Adapted from Tom Davenport material
Oracle’s Information Management Reference Architecture (3rd Edition)
“All those layers and definitions in your
Reference Architecture, I just don’t get
it… and it looks complicated !”
Hadoop developer knee deep in complex Map:Reduce code
What‟s changed?
Business
Trends
Technology
Trends
Data
Trends
Conceptual View
Actionable
Events
Event Engine Data
Reservoir
Data Factory Enterprise
Information Store
Reporting
Discovery Lab
Actionable
Information
Actionable
Insights
Input
Events
Execution
Innovation
Discovery
Output
Events
& Data
Conceptual View
Structured
Enterprise
Data
Other
Data
Component Outline
Data Engine Respond to R/T events in appropriate and/or optimised fashion
Data Reservoir Raw data Reservoir – typically event data at lowest grain
Data Factory Managed ETL onto, within and between platforms
Enterprise Data Data stores for Information Management
Reporting BI tools and infrastructure components
Discovery Lab Platform, data and tools to support discovery process
Execution – things you do every day
Innovation – innovation to drive tomorrows business
Line of Governance!
Discovery
Output
– Possible outputs include new knowledge, mining models / parameters, scored data…
Design Patterns
Design Pattern: Discovery Lab
 Specific focus on identifying commercial value for exploitation
 Small group of highly skilled individuals (aka Data Scientists)
 Iterative development approach – data oriented NOT development oriented
 Wide range of tools and techniques applied
 Data provisioned through
Data Factory or own ETL
 Typically separate infrastructure
but could also be unified Reservoir
if resource managed effectively
Design Pattern : Information Platform
 Build the next generation Information Management platform
 Either Business Strategy driven or IT cost / capability driven initiative
 Initial project may be specifically linked to lower data grain or retention
BUT it is the platform as a whole that forms the solution required
 Platform for consolidating other IM assets onto
 Key issues related to differences in
procurement, development process,
governance and skills differences
 Discovery Lab may be implemented
as a pragmatic initial POV.
Design Pattern : Data Application
 Big Data technologies applied to a specific business problem
e.g. Genome sequence analysis using BLAST or log data from
pharmaceutical production plant and machinery required for traceabiliy
 Limited or no integration to broader Information Management estate
 Specific solution so Non-functional requirements have less impact
on solution quality or long term costs
 Platform costs and scalability are
important considerations
Design Pattern: Information Solution
 Specific solution based on Big Data technologies requiring broader
integration to the wider Information Management estate
e.g. ETL pre-processor for the DW or affordably store a lower level of grain
 Non-functional requirements more critical in this solution
 Scalable integration to IM estate
an important factor for success
 Analysis may take place in Reservoir
or Reservoir only used as an aggregator
Design Pattern: Real-Time Events
 May take place at multiple locations between place of data origination and the
Data Centre – requiring careful design and implementation
 May include Next-Best-Activity, declarative rules and Data Mining technologies
to optimise decisions. i.e. optimise across declarative, data mining, customer
preference & business-defined rules
 May include considerations for
personal preferences and privacy
(e.g. opt-out) for customer related
events
 Common component seen across
many industries & markets
e.g. connected vehicle
Real-Time optimisation of events
Design Pattern against component usage map
Design pattern Discovery Lab
Information
Platform
Data Application Information Solution R/T Events
Outline
Data science lab
Assess the value of
the data
Next Generation
information platform to
align IM capability with
business strategy
Addressing a specific data
problem in Hadoop with no
broader integration required.
Addressing a specific data
problem but requires broader
enterprise wide integrations. e.g.
ETL pre-processing, Event Store
at lower grain than existing DW
Execution platform to
respond to R/T events
Examples
Gov. Healthcare
Mobile operator
Spanish Bank (Business led)
UK Gov. Dept. (Tech. led)
Pharma Genome project
Pharma production archive
Investment Bank – trade risk
Mobile Operator – ETL processing
Mobile operator –
location based offers
Data Engine Possible Yes
Data Reservoir Yes Yes Yes
Data Factory Yes Yes Yes
Enterprise Data Yes
Reporting Yes
Discovery Lab Yes Implied Alternative approach
to Reservoir + Factory above
IM Logical View and
Components
Information Management – Logical View
Data Sources
Data Ingestion
Methods and process
to load data into our
managed data store
and manage data
quality
• Contemporary Information Management solutions must be able to ingest any type of data from any source in any format and
mechanism and at any frequency. e.g. Flat file loads, streaming…
• The data may be highly unstructured, mono-structured or highly poly-structured.
• Data will vary in volume and in Data Quality.
• Operational isolation should be considered to ensure operational applications will continue in the event of the loss of the
Information Management system
Data Engines &
Poly-structured
sources
Content
Docs Web & Social Media
SMS
Structured
Data
Sources
• Operational Data
• COTS Data
• Master & Ref. Data
• Streaming & BAM
Information Management – Logical View
Information Ingestion
Data Ingestion
Information Interpretation
Methods and process
to load data and
manage Data
Quality
Methods and
process needed to
access information
Managed Data
Load
All data under management
Query
• Data structure and processing required to load data into managed data stores
• Shape represents the work done on the data to load data and/or process between layers
• Layer may include file mechanism where required to facilitate loading
(e.g. Fuse fs or ZFS for operational isolation and file concat)
• Normal rules of micro-batch, taking all the data and KISS principles recommended
• DQ and loading stats presented through BI dashboards as a non-judgemental mechanism to improve DQ.
• Data may be landed in the Ingestion layer to facilitate loading but not typically stored for any length of time. e.g. Raw data loaded from web
logs but sessionised data then loaded to Raw. Another example is data used to manage CDC may be stored in this layer.
Information Management – Logical View
Data Interpretation
Data Ingestion
Information Interpretation
Methods and process
to load data and
manage Data
Quality
Methods and
process needed to
access information
Managed Data
Load
All data under management
Query
• Methods and processes required to access information in each of the stores
• Shape represents the cost of interpreting the data under management
• For schema-on-read the cost may include the AVRO, SerDe or reader class as well as the associated processing code to
select, filter and process the data.
• For schema-on-write the cost is represented by the complexity of the SQL required to access the data only – more complex
typically for 3NF than for a dimensional query.
Information Management – Logical View
Data Layers – cost, quality and concurrency trade off
Managed DataAccess & Performance Layer
Foundation Data Layer
Raw Data Reservoir
Immutable raw data reservoir
Raw data at rest is not interpreted
Immutable modelled data. Business
Process Neutral form. Abstracted
from business process changes
Past, current and future interpretation of
enterprise data. Structured to support
agile access & navigation
• Increasing enrichment
• Increasing data quality
• Reducing concurrency costs
• Data under management includes 3 key layers – Raw, Foundation and Access and Performance layers.
• Data normally loaded into Raw and Foundation layers BUT BI Apps loads data directly into APL and federated warehouses may
well also load data at aggregate level from federated operating companies.
• Data Factory is responsible for loading and then managing data between layers.
• Work is done to elevate the data between layers – typically further enriching and improving data quality.
• Work done in processing the data between the layers significantly reduce query costs. i.e. higher levels of concurrency can be
sustained for the same processing power.
• Increasing formalisation of definition
Information Management – Logical View
Data Layers – Analytical processing
Managed DataAccess & Performance Layer
Foundation Data Layer
Raw Data Reservoir
• Analytical processing capabilities of Hadoop and RDBMS used to elevate data between layers as previously described.
• These analytical capabilities can also be leveraged by tools that access the data directly.
Typically this would be by a Data Scientist for Discovery Lab operations or BI Tools and Services that are processing data using
a model previously defined by the Data Scientist.
OLAP
Data Mining
Statistics
OLAP
Text Mining
Other
Analytical
Processing
Data Mining
Text Mining
Image
Processing
• Increasing enrichment
• Increasing data quality
• Reducing concurrency costs
• Increasing formalisation of definition
Information Management – Logical View
Data Layers – Raw Data Reservoir
Managed DataAccess & Performance Layer
Foundation Data Layer
Raw Data Reservoir
Immutable raw data reservoir
Raw data at rest is not interpreted
Immutable modelled data. Business
Process Neutral form. Abstracted
from business process changes
Past, current and future interpretation of
enterprise data. Structured to support
agile access & navigation
• Immutable data store with data at lowest level of grain.
• Typically implemented in Hadoop or NoSQL for cost reasons but not always.
• May be:
• Queries directly,
• Used to derive base level data for Foundation Layer. Data may be represented logically in Foundation or physically as the
store is immutable BUT this effects ILM policy.
• or used to derive values or aggregates for Access and Performance layer. (e.g. propensity score or total monthly SMS‟s)
Information Management – Logical View
Data Layers – Foundation Data Layer
Managed DataAccess & Performance Layer
Foundation Data Layer
Raw Data Reservoir
Immutable raw data reservoir
Raw data at rest is not interpreted
Immutable modelled data. Business
Process Neutral form. Abstracted
from business process changes
Past, current and future interpretation of
enterprise data. Structured to support
agile access & navigation
• Immutable integrated and standardised store of enterprise class data. Stuff the business has agreed and organises around.
• Data at lowest level of grain of value for Enterprise data.
• Stored in business process neutral fashion to avoid data maintenance tasks to keep in step with current business interpretations.
• Typically close to 3NF. Special attention to modelling hierarchy, flexible entity attributions, customer / supplier etc.
• ONLY implemented in relational technology BUT this could be logical as previously noted in Raw Data Reservoir.
• May be queries directly by a select few individuals. Wider access to detail data provided through views in APL, often with VPD
implemented to prevent queries to antecedent data.
• Data in the Foundation Layer should be retained for as long as possible.
• Consideration should be given to retaining data in Raw Data Reservoir rather than archiving.
Information Management – Logical View
Data Layers – Access and Performance Layer
Managed DataAccess & Performance Layer
Foundation Data Layer
Raw Data Reservoir
Immutable raw data reservoir
Raw data at rest is not interpreted
Immutable modelled data. Business
Process Neutral form. Abstracted
from business process changes
Past, current and future interpretation of
enterprise data. Structured to support
agile access & navigation
• Layer facilitates access, navigation and performance of queries.
• Allows for multiple interpretations of data from Foundation or Raw data Reservoir.
• Most structures can be thrown away and re-built from scratch based on Foundation and Raw Reservoir.
• The exception is derived and aggregate data which may have to be retained if the underlying data/mechanism is archived.
• Most users presenting information in a standardised fashion on dashboards and reports will access this layer only.
Access and Performance Layer
Information Interpretation
Access & Performance Layer
Foundation Data Layer
Raw Data Reservoir
Data Engines &
Poly-structured
sources
Content
Docs Web & Social Media
SMS
Structured
Data
Sources
• Operational Data
• COTS Data
• Master & Ref. Data
• BAM Data
• Data destined for Raw Data Reservoir may be loaded directly (e.g. through Flume) or may be stored temporarily in fs prior to
loading (e.g. Fuse fs)
• Relational data ingested in most appropriate mechanism before persisting in Foundation Data Layer (usual rules apply…)
• Ideally micro batch using simplest mechanism possible
• Only data of agreed quality loaded in FDL
• For efficient loading relationally data may be pre-staged in fs so a large number of small files can be concatenated
Information Management – Logical View
Data Factory Ingestion flow
Data Ingestion
Batch & Real-Time
ETL / ELT
CDC
Stream
File Ops.
Access and Performance Layer
Data Ingestion
Information Interpretation
Access & Performance Layer
Foundation Data Layer
Raw Data Reservoir
Flow shown:
1. Data to be formalised from HDFS store extracted and loaded into Foundation Data Layer.
e.g. where Flume/HDFS is being used as an ETL pre-processor for Enterprise Data
or where HDFS data is being logically modelled in the foundation layer
2. Data is re-structured and/or aggregated to facilitate access by users and business processes
3. Data may also be re-structured and/or aggregated from HDFS store where there are no specific
requirements to manage Enterprise Data in a more formal data store over time
1
2
3
Information Management – Logical View
Data Factory intra data processing flow
Access and Performance Layer
Information Management – Logical View
Information Provisioning – BI & Data Science Components
Virtualisation&
QueryFederation
Enterprise
Performance
Management
Pre-built &
Ad-hoc BI Assets
Information
Services
Data Ingestion
Information Interpretation
Access & Performance Layer
Foundation Data Layer
Raw Data Reservoir
Virtualisation&
QueryFederation
• Data Virtualisation and the various components to access the data are as per our previous view on BI tools.
• Data Virtualisation is a key components that helps to deliver tools independence, services integration and a future state roadmap
• Big Data has focused considerable attention on Data Science
• Analytical capabilities delivered through analytical processing in the data layers and Advanced Analytical Tools used to drive capabilities
• Data Mining in particular often involves complex data processing to flatten data into a longitudinal form. This derived data and model results
are typically written to a project based sandbox.
• Agile discovery is often best served through a separate Discovery Lab infrastructure (see later details)
Data Science
Access and Performance Layer
Information Management – Logical View
Information Provisioning BI Flows
Virtualisation&
QueryFederation
Enterprise
Performance
Management
Pre-built &
Ad-hoc BI Assets
Information
Services
Data Science
Data Ingestion
Information Interpretation
Access & Performance Layer
Foundation Data Layer
Raw Data Reservoir
2
3
1. Typical access mechanism for Enterprise data via Access and Performance layer structures
2. Access to Foundation Layer Data to specific functions, processes and users only
3. Data interpretation & DQ assured through encoded logic, Avro, SerDe, FileReader, HCat etc.
4. Diagonal flows shows how data can be joined between layers as well as accessed directly. e.g. Raw Data
can be queried directly through HIVE connector or joined to the RDBMS data and queried.
1
4
4
Information Management – Logical View
Data / Information Quality
Access and Performance Layer
Data Ingestion
Information Interpretation
Access & Performance Layer
Foundation Data Layer
Raw Data Reservoir
Virtualisation&
QueryFederation
Enterprise
Performance
Management
Pre-built &
Ad-hoc BI Assets
Information
Services
Data Science
 Quality of data at rest assured by a number of factors in addition to the underlying quality of data at source
– File and event handling to ensure data is not missed (e.g. missing log files assured by log file sequence numbering)
– The processing of data between Raw and FDL / APL layers. This can be seen as a DQ firewall to ensure only data of known and
acceptable quality is loaded. Typically this involves an element of synchronisation as some data will need to be held off until required
reference data is available due to the micro-batch incremental loading approach.
 Quality of information presented to downstream tools and services determined by
– Model quality, understanding and performance of provisioning from modelled layers
– Consistency of definition, code quality and query performance when accessing Hadoop data (e.g. HR code, Avro definition…)
Information Management – Logical View
Data Reservoir & Enterprise Information Store
Virtualisation&
QueryFederation
Enterprise
Performance
Management
Pre-built &
Ad-hoc
BI Assets
Information
Services
Data Ingestion
Information Interpretation
Access & Performance Layer
Foundation Data Layer
Raw Data Reservoir
Data
Science
Data Engines &
Poly-structured
sources
Content
Docs Web & Social Media
SMS
Structured
Data
Sources
• Operational Data
• COTS Data
• Master & Ref. Data
• Streaming & BAM
Immutable raw data reservoir
Raw data at rest is not interpreted
Immutable modelled data. Business
Process Neutral form. Abstracted
from business process changes
Past, current and future interpretation of
enterprise data. Structured to support
agile access & navigation
Discovery Lab
Analysis Processing & Delivery
Discovery Lab & Data Science Tooling
Data Reservoir & Enterprise Data
Data
Science
(Primary
Toolset)
Statistics Tools
Data & Text Mining Tools
Faceted Query Tools
Programming & Scripting
Data Modeling Tools
Query & Search Tools
Pre-Built
Intelligence
Assets
Intelligence
Analysis
Tools
Ad Hoc Query
& Analysis Tools
OLAP Tools
Forecasting &
Simulation Tools
Reporting Tools
Data
Scientist
Virtualisation&
InformationServices
Data Factory
flow
1. Data Factory responsible for
access provisioning to data
or replication (all or sample)
to Sandbox in Discovery Lab.
2. Direct connection from Data
Science tools and analysis
sandbox. Data Science tools
read and write data from/to
project sandboxes.
3. Data Scientist can also
access standard dashboards,
reports and KPI‟s through
Data Virtualisation layer
Data Quality & Profiling
Graphical rendering tools
Dashboards & Reports
Scorecards
Charts & Graphs
Sandbox – Project 3
Sandbox – Project 2
Sandbox – Project 1
1
2
Data store
Analytical
Processing
3
Information Management – Logical View
Discovery Lab data flow
R/T event Engine – Logical
View and Components
Real-time
Data Engine
To Event Subscribers
(Events / Data)
Privacy Filter
Data Transform
Rules & Models
Mediation
Next Best Action
Real-Time
Data Store
From Input Events
Reference
Data
Models
& Rules
Privacy
Data
Analytics
Real-Time Data Engine – Logical View
Business Activity Monitoring
Real-Time event
monitoring
Real-Time Data Engine
 Message mediation service
 Privacy filter for event data. i.e. apply customer specified privacy
and preference filters to the data stream
 Transformation of the message data to outbound form
 Apply declarative rules and models to the data stream to detect
events for further downstream processing
 Next Best Activity (NBA) event detection and processing. NBA
typically also includes control group management and global
optimisation of rules
 Business Activity Monitoring
 Local data store – local persistence of rules and metadata
Components
Privacy Filter
Data Transform
Rules & Models
Mediation
Next Best Action
Real-Time Data
Store
BAM
Real-Time data engine flows
 Describe each of the data flows
Reference
Data
Models
& Rules
Privacy
Data
Event Analytics
From Input Events
To Event Subscribers
(Events / Data)
R/T Event Monitoring
To Do
Mapping from the previous
release of the architecture
Information Management Reference Architecture
Version 2.0 of the Architecture
Information Management Reference Architecture
Interpretation layer
shows the relative cost
of reading data
depending on its
location
Previous staging layer
now split into Data
Ingestion and Raw
store.
Ingestion layer
includes methods and
processes to load data
and manage Data
Quality. Shape
represents the relative
cost of these
processes. i.e. from
none for HDFS to lots
in APL.
Raw Reservoir is
typically at the lowest
level of grain. Often
lower than the
enterprise cares about
and so may not have
been included in
previous
representation.
Renamed from
Knowledge Discovery
to Discovery Lab but
otherwise unchanged.
The role of Discovery
Labs is becoming
more central though so
additional operational
guidance will be
added.
Discovery Lab
Still an immutable
store but may be
physically
implemented in
relational or non-
relational technologies
Key differences from 2.0 to 3.0 of the Architecture
Discovery Lab and
Governance considerations
Data discovery for the Enterprise
 Discovery phase
– Unbounded discovery
– Self-Service sandbox
– Wide toolset
– Agile methods
 Promotion to Exploitation
– Commercial exploitation
– Narrower toolset
– Integration to operations
– Non-functional requirements
– Code standardisation &
governance
Discovery and monetising steps have different requirements
Business
Value
Commercial
Exploitation
Time / Effort
Discovery phase
Understanding
of the data
Governance
To monetise fully you need to standardise
It‟s smart to standardise as part of Governance
 Discovery process requires
a broad toolset
 Standardisation is essential
for Commercial exploitation
 Sustainability depends on
standardisation / rationalisation
– Reduced training burden
– Reduced support costs
– Reduced license costs
– Ongoing agility & alignment
Data Discovery Toolset Data Exploitation Toolset
Rationalised
Components
• Cloudera CDH, Oracle, No-SQL
• Mammoth, Yarn, EM-plugin
• MR, Hive, Pig, Impala, Accum.
• Flume NG, Oozie
• …
• …
• …
Optional additions
• Oracle Connectors
• Additional corporate standard
components
Oraclestandard
deployment
Corporate
standard
Standardised Hadoop Zoo
Standardised deployment
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.52
The kind of things we are looking to Discover
 Data science skills required
vary by the type of analysis
 Data Management skills vary
by the amount of data and its
structure
 So making data movement
and manipulation easy will
deliver a better result and
deliver it faster
Descriptive
Diagnostic
Predictive
Prescriptive
Business Impact
AnalyticalSkills
Insight
Foresight
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.53
Discovery is a Data process not a Development Process
Requirement
Analysis
High Level
Design
Low Level
Design
Coding Testing
Acceptance
Testing
Three Versions of the BI Development Process
Excel
Spreadsheet
Shared linked
spreadsheets
Local Access
Database
Shared Access
Server
SQL Server
Database
Oracle
Datawarehouse
Discovery & Profile Model Exploit
What IT thinks it should be
What normally happens
What Big Data is trying to achieve
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.54
Sandbox delivery options
• Separate Data Lab environment
• Delivered as part of Information
Management architecture
Self Service Sandboxes
• Self service provisioning of new
sandboxes for Discovery phase
• Automation of data access rights,
resources and tools provisioning
Data provision
• Quickly take on new data to
rapidly make available to Analysts
• Tools such as “Data Factory” can
fully automate data flows
Sandboxes facilitate “Agile”
Providing the technology platform for agile discovery
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.55
Monetise and Optimise steps are different
 New insights deployed into business process in some form
– Technical: e.g. Business rules, new customer segments
– Non-technical: e.g. Observations about behaviours
 Business Intelligence systems adapted to provide
monitoring, feedback and control optimisation
 The faster you iterate this cycle the greater the benefit BUT
 Big Data does not change the fundamental need
for accurate, consistent and integrated information
What happens when we want to exploit insights?
New
insights
Business Process
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.56
Rules of thumb for data
Organised information leads to better analyses
Information needs to be organised in order to analyse it
RDBMS are great when information is organised
Hadoop minimises the penalty for disorganisation
The closer you are to insight, the more complete and
organised information needs to be
Data needs to be organised to monetise it effectively
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.57
What that really means is…
 We need to apply structure
to data in order to analyse it
 Schema on read works well
for us in Discovery as we can
be agile about interpretation
 As we move into Discovery
schema on read can causes
Governance & quality issues
 Key lesson: The cost to store & manage is distinct from structural
considerations between Big Data and RDBMS technologies
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 1358
De-mystifying schema on read
DQ
Bus. Rules
Mapping
ETL
Data Reservoirs
 Traditional “Schema on Write”
– Data quality managed by formalised ETL
process
– Data persisted in tabular, agreed and
consistent form
– Data integration happens in ETL
– Structure must be decided before writing
 Big Data “Schema on Read”
– Interpretation of data captured in code for
each program accessing the data
– Data quality dependent on code quality
– Data integration happens in code
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 1359
Underlying storage capabilities are different
0
1
2
3
4
5
Tooling maturity
Stringent Non-Functionals
ACID transactional
requirement
Security
Variety of data formats
Data sparsity
ETL simplicity
Cost effectively store low
value data
Ingestion rate
Straight Through
Processing (STP)
Hadoop
Relational
My Appllication
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 1360
Analytics 3.0 platform include
both relational and
non-relational technologies
Ken Rudin* refers to this
as the genius of AND vs the
tyranny of OR
(see his TDWI „13 presentation)
Unified Reservoir simplifies
access to all data regardless of
characteristics & analysis
requirements
It’s smart to unify your data into a single Reservoir
Fully expose your data for discovery and monetisation
Ken Rudin is Director of Analytics at Facebook*
All
Data
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 1361
Access and Performance Layer
Information Management – Logical View
Virtualisation&
QueryFederation
Enterprise
Performance
Management
Pre-built &
Ad-hoc BI Assets
Information
Services
Advanced
Analytical
Tools
Information Provisioning Analysis Processing & Delivery
Data Ingestion
Information Access
Access & Performance Layer
Foundation Data Layer
Raw Data Reservoir
Immutable raw data reservoir
Raw data at rest is not interpreted
Immutable modelled data. Business
Process Neutral form. Abstracted
from business process changes
Past, current and future interpretation of
enterprise data. Structured to support
agile access & navigation
Methods and process
to load data and
manage Data
Quality
Methods and
process needed to
access information
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 1363
Access and Performance Layer
Information Management – Logical View
Analytical processing and delivery
Virtualisation&
QueryFederation
Enterprise
Performance
Management
Pre-built &
Ad-hoc BI Assets
Information
Services
Advanced
Analytical
Tools
Data Ingestion
Information Access
Access & Performance Layer
Foundation Data Layer
Raw Data Reservoir
Structures and
processing required
to load data (batch
and Real-Time)
and manage
Data Quality
Structures required
to interpret the data
under management.
i.e. logical interpretation
• Data Virtualisation and the various components to access the data are as per our previous view on Bo tools.
• Data Virtualisation is a key components that helps to deliver tools independence, services integration and a future state roadmap
• What has changed is the focused on Analytics
• Analytical capabilities is delivered through analytical processing in the data layers and Advanced Analytical Tools used to drive capabilities
• Data Mining in particular often involves complex data processing to flatten data into a longitudinal form. This derived data and model results
are typically written to a project based sandbox.
• Agile discovery is often best served through a separate Discovery Lab infrastructure (described later)
OLAP
Data Mining
Statistics
OLAP
Text Mining
Other
Analytical
Processing
Data Mining
Text Mining

Mais conteúdo relacionado

Mais procurados

You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?Precisely
 
Data quality metrics infographic
Data quality metrics infographicData quality metrics infographic
Data quality metrics infographicIntellspot
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...DATAVERSITY
 
Big Data Analytics Architecture PowerPoint Presentation Slides
Big Data Analytics Architecture PowerPoint Presentation SlidesBig Data Analytics Architecture PowerPoint Presentation Slides
Big Data Analytics Architecture PowerPoint Presentation SlidesSlideTeam
 
Marcoccio10 22
Marcoccio10 22Marcoccio10 22
Marcoccio10 22jaikms kms
 
Analytics Organization Modeling for Maturity Assessment and Strategy Development
Analytics Organization Modeling for Maturity Assessment and Strategy DevelopmentAnalytics Organization Modeling for Maturity Assessment and Strategy Development
Analytics Organization Modeling for Maturity Assessment and Strategy DevelopmentVijay Raj
 
DAS Slides: Data Quality Best Practices
DAS Slides: Data Quality Best PracticesDAS Slides: Data Quality Best Practices
DAS Slides: Data Quality Best PracticesDATAVERSITY
 
DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...
DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...
DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...Christopher Bradley
 
How to Create and Manage a Successful Analytics Organization
How to Create and Manage a Successful Analytics OrganizationHow to Create and Manage a Successful Analytics Organization
How to Create and Manage a Successful Analytics OrganizationDATAVERSITY
 
Predictions for the Future of Graph Database
Predictions for the Future of Graph DatabasePredictions for the Future of Graph Database
Predictions for the Future of Graph DatabaseNeo4j
 
Holistic data governance frame work whitepaper
Holistic data governance frame work whitepaperHolistic data governance frame work whitepaper
Holistic data governance frame work whitepaperMaria Pulsoni-Cicio
 
Data-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsData-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsDATAVERSITY
 
Lessons in Data Modeling: Data Modeling & MDM
Lessons in Data Modeling: Data Modeling & MDMLessons in Data Modeling: Data Modeling & MDM
Lessons in Data Modeling: Data Modeling & MDMDATAVERSITY
 
Data-Ed Online: Unlock Business Value through Reference & MDM
Data-Ed Online: Unlock Business Value through Reference & MDMData-Ed Online: Unlock Business Value through Reference & MDM
Data-Ed Online: Unlock Business Value through Reference & MDMDATAVERSITY
 
Impact of BIG Data on MDM
Impact of BIG Data on MDMImpact of BIG Data on MDM
Impact of BIG Data on MDMSubhendu Dey
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...DATAVERSITY
 
Using Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeUsing Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeDATAVERSITY
 
TDWI Spotlight: Enabling Data Self-Service with Security, Governance, and Reg...
TDWI Spotlight: Enabling Data Self-Service with Security, Governance, and Reg...TDWI Spotlight: Enabling Data Self-Service with Security, Governance, and Reg...
TDWI Spotlight: Enabling Data Self-Service with Security, Governance, and Reg...Denodo
 

Mais procurados (20)

You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?
 
Data quality metrics infographic
Data quality metrics infographicData quality metrics infographic
Data quality metrics infographic
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
 
Big Data Analytics Architecture PowerPoint Presentation Slides
Big Data Analytics Architecture PowerPoint Presentation SlidesBig Data Analytics Architecture PowerPoint Presentation Slides
Big Data Analytics Architecture PowerPoint Presentation Slides
 
Marcoccio10 22
Marcoccio10 22Marcoccio10 22
Marcoccio10 22
 
CP Brochure Final
CP Brochure FinalCP Brochure Final
CP Brochure Final
 
Analytics Organization Modeling for Maturity Assessment and Strategy Development
Analytics Organization Modeling for Maturity Assessment and Strategy DevelopmentAnalytics Organization Modeling for Maturity Assessment and Strategy Development
Analytics Organization Modeling for Maturity Assessment and Strategy Development
 
DAS Slides: Data Quality Best Practices
DAS Slides: Data Quality Best PracticesDAS Slides: Data Quality Best Practices
DAS Slides: Data Quality Best Practices
 
Data Quality Management
Data Quality ManagementData Quality Management
Data Quality Management
 
DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...
DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...
DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...
 
How to Create and Manage a Successful Analytics Organization
How to Create and Manage a Successful Analytics OrganizationHow to Create and Manage a Successful Analytics Organization
How to Create and Manage a Successful Analytics Organization
 
Predictions for the Future of Graph Database
Predictions for the Future of Graph DatabasePredictions for the Future of Graph Database
Predictions for the Future of Graph Database
 
Holistic data governance frame work whitepaper
Holistic data governance frame work whitepaperHolistic data governance frame work whitepaper
Holistic data governance frame work whitepaper
 
Data-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsData-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling Fundamentals
 
Lessons in Data Modeling: Data Modeling & MDM
Lessons in Data Modeling: Data Modeling & MDMLessons in Data Modeling: Data Modeling & MDM
Lessons in Data Modeling: Data Modeling & MDM
 
Data-Ed Online: Unlock Business Value through Reference & MDM
Data-Ed Online: Unlock Business Value through Reference & MDMData-Ed Online: Unlock Business Value through Reference & MDM
Data-Ed Online: Unlock Business Value through Reference & MDM
 
Impact of BIG Data on MDM
Impact of BIG Data on MDMImpact of BIG Data on MDM
Impact of BIG Data on MDM
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
 
Using Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeUsing Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-Purpose
 
TDWI Spotlight: Enabling Data Self-Service with Security, Governance, and Reg...
TDWI Spotlight: Enabling Data Self-Service with Security, Governance, and Reg...TDWI Spotlight: Enabling Data Self-Service with Security, Governance, and Reg...
TDWI Spotlight: Enabling Data Self-Service with Security, Governance, and Reg...
 

Destaque

Tdwi solution spotlight presentation slides
Tdwi solution spotlight   presentation slidesTdwi solution spotlight   presentation slides
Tdwi solution spotlight presentation slidesWilliam Lam
 
Tdwi agile data warehouse - dv, what is the buzz about
Tdwi   agile data warehouse - dv, what is the buzz aboutTdwi   agile data warehouse - dv, what is the buzz about
Tdwi agile data warehouse - dv, what is the buzz aboutPrudenza B.V
 
TDWI Roundtable: The HANA EDW
TDWI Roundtable: The HANA EDWTDWI Roundtable: The HANA EDW
TDWI Roundtable: The HANA EDWukc4
 
Which data should you move to Hadoop?
Which data should you move to Hadoop?Which data should you move to Hadoop?
Which data should you move to Hadoop?Attunity
 
Executive BI, Analytics, Modeling and Insights Strategy Framework Practices
Executive BI, Analytics, Modeling and Insights Strategy Framework PracticesExecutive BI, Analytics, Modeling and Insights Strategy Framework Practices
Executive BI, Analytics, Modeling and Insights Strategy Framework PracticesInsightSlides
 
Graphics for big data reference architecture blog
Graphics for big data reference architecture blogGraphics for big data reference architecture blog
Graphics for big data reference architecture blogSunil Soares
 
Gartner: The BI, Analytics and Performance Management Framework
Gartner: The BI, Analytics and Performance Management FrameworkGartner: The BI, Analytics and Performance Management Framework
Gartner: The BI, Analytics and Performance Management FrameworkGartner
 
Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChoosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChicago Hadoop Users Group
 
Big Data: Architectures and Approaches
Big Data: Architectures and ApproachesBig Data: Architectures and Approaches
Big Data: Architectures and ApproachesThoughtworks
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitecturePerficient, Inc.
 
Tdwi march 2015 presentation
Tdwi march 2015 presentationTdwi march 2015 presentation
Tdwi march 2015 presentationAlison Macfie
 
Big Data Architectures
Big Data ArchitecturesBig Data Architectures
Big Data ArchitecturesGuido Schmutz
 
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...SoftServe
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data ArchitectureGuido Schmutz
 

Destaque (16)

Tdwi solution spotlight presentation slides
Tdwi solution spotlight   presentation slidesTdwi solution spotlight   presentation slides
Tdwi solution spotlight presentation slides
 
Tdwi agile data warehouse - dv, what is the buzz about
Tdwi   agile data warehouse - dv, what is the buzz aboutTdwi   agile data warehouse - dv, what is the buzz about
Tdwi agile data warehouse - dv, what is the buzz about
 
TDWI Roundtable: The HANA EDW
TDWI Roundtable: The HANA EDWTDWI Roundtable: The HANA EDW
TDWI Roundtable: The HANA EDW
 
Which data should you move to Hadoop?
Which data should you move to Hadoop?Which data should you move to Hadoop?
Which data should you move to Hadoop?
 
Executive BI, Analytics, Modeling and Insights Strategy Framework Practices
Executive BI, Analytics, Modeling and Insights Strategy Framework PracticesExecutive BI, Analytics, Modeling and Insights Strategy Framework Practices
Executive BI, Analytics, Modeling and Insights Strategy Framework Practices
 
Going MAD: A Framework For Delivering Pervasive BI Solutions
Going MAD: A Framework For Delivering Pervasive BI SolutionsGoing MAD: A Framework For Delivering Pervasive BI Solutions
Going MAD: A Framework For Delivering Pervasive BI Solutions
 
Graphics for big data reference architecture blog
Graphics for big data reference architecture blogGraphics for big data reference architecture blog
Graphics for big data reference architecture blog
 
Gartner: The BI, Analytics and Performance Management Framework
Gartner: The BI, Analytics and Performance Management FrameworkGartner: The BI, Analytics and Performance Management Framework
Gartner: The BI, Analytics and Performance Management Framework
 
Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChoosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your Business
 
Big Data: Architectures and Approaches
Big Data: Architectures and ApproachesBig Data: Architectures and Approaches
Big Data: Architectures and Approaches
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data Architecture
 
Solution architecture for big data projects
Solution architecture for big data projectsSolution architecture for big data projects
Solution architecture for big data projects
 
Tdwi march 2015 presentation
Tdwi march 2015 presentationTdwi march 2015 presentation
Tdwi march 2015 presentation
 
Big Data Architectures
Big Data ArchitecturesBig Data Architectures
Big Data Architectures
 
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 

Semelhante a Эволюция Big Data и Information Management. Reference Architecture.

BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)Syaifuddin Ismail
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSSDeepali Raut
 
dw_concepts_2_day_course.ppt
dw_concepts_2_day_course.pptdw_concepts_2_day_course.ppt
dw_concepts_2_day_course.pptDougSchoemaker
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefitsRicky Barron
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Nathan Bijnens
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?RTTS
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.pptSumathiG8
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guidethomasmary607
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse conceptsobieefans
 
3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.pptBsMath3rdsem
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.pptPalaniKumarR2
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing conceptspcherukumalla
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousingwork
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.pptSamPrem3
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysNEWYORKSYS-IT SOLUTIONS
 
Building a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White PaperBuilding a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White PaperImpetus Technologies
 
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...PwC
 
The Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewThe Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewDataWorks Summit/Hadoop Summit
 

Semelhante a Эволюция Big Data и Information Management. Reference Architecture. (20)

BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSS
 
dw_concepts_2_day_course.ppt
dw_concepts_2_day_course.pptdw_concepts_2_day_course.ppt
dw_concepts_2_day_course.ppt
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefits
 
Data Warehouse 101
Data Warehouse 101Data Warehouse 101
Data Warehouse 101
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guide
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse concepts
 
3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing concepts
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt
 
Oracle sql plsql & dw
Oracle sql plsql & dwOracle sql plsql & dw
Oracle sql plsql & dw
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
 
Building a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White PaperBuilding a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White Paper
 
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
 
The Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewThe Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture View
 

Mais de Andrey Akulov

Oracle OpenWorld 2016. Big Data references
Oracle OpenWorld 2016. Big Data referencesOracle OpenWorld 2016. Big Data references
Oracle OpenWorld 2016. Big Data referencesAndrey Akulov
 
Oracle Big Data proposition
Oracle Big Data propositionOracle Big Data proposition
Oracle Big Data propositionAndrey Akulov
 
Oracle Cloud Computing portfolio and strategy
Oracle Cloud Computing portfolio and strategyOracle Cloud Computing portfolio and strategy
Oracle Cloud Computing portfolio and strategyAndrey Akulov
 
Oracle Big Data. Обзор технологий
Oracle Big Data. Обзор технологийOracle Big Data. Обзор технологий
Oracle Big Data. Обзор технологийAndrey Akulov
 
Oracle IaaS including OCM and Ravello
Oracle IaaS including OCM and RavelloOracle IaaS including OCM and Ravello
Oracle IaaS including OCM and RavelloAndrey Akulov
 
Oracle Enterprise Metadata Management
Oracle Enterprise Metadata ManagementOracle Enterprise Metadata Management
Oracle Enterprise Metadata ManagementAndrey Akulov
 
Решения Oracle для Big Data
Решения Oracle для Big DataРешения Oracle для Big Data
Решения Oracle для Big DataAndrey Akulov
 
Преимущества построения оперативной отчетности с помощью технологий Oracle
Преимущества построения оперативной отчетности с помощью технологий OracleПреимущества построения оперативной отчетности с помощью технологий Oracle
Преимущества построения оперативной отчетности с помощью технологий OracleAndrey Akulov
 
Подход Oracle к управлению метаданными для аналитических систем
Подход Oracle к управлению метаданными для аналитических системПодход Oracle к управлению метаданными для аналитических систем
Подход Oracle к управлению метаданными для аналитических системAndrey Akulov
 
Управление административными учетными записями как средство защиты от челове...
Управление административными учетными записями как  средство защиты от челове...Управление административными учетными записями как  средство защиты от челове...
Управление административными учетными записями как средство защиты от челове...Andrey Akulov
 
Cоблюдение требований законодательства с помощью сертифицированных средств бе...
Cоблюдение требований законодательства с помощью сертифицированных средств бе...Cоблюдение требований законодательства с помощью сертифицированных средств бе...
Cоблюдение требований законодательства с помощью сертифицированных средств бе...Andrey Akulov
 
Защита информации на уровне СУБД
Защита информации на уровне СУБДЗащита информации на уровне СУБД
Защита информации на уровне СУБДAndrey Akulov
 
Новые возможности по разработке приложений (ADF, SOA, BPM)
Новые возможности по разработке приложений (ADF, SOA, BPM)Новые возможности по разработке приложений (ADF, SOA, BPM)
Новые возможности по разработке приложений (ADF, SOA, BPM)Andrey Akulov
 
Повышение эффективности Java приложений (новые возможности Web Logic 12c, кон...
Повышение эффективности Java приложений (новые возможности Web Logic 12c, кон...Повышение эффективности Java приложений (новые возможности Web Logic 12c, кон...
Повышение эффективности Java приложений (новые возможности Web Logic 12c, кон...Andrey Akulov
 
Новые возможности распределенной обработки данных в памяти (Coherence)
Новые возможности распределенной обработки данных в памяти (Coherence)Новые возможности распределенной обработки данных в памяти (Coherence)
Новые возможности распределенной обработки данных в памяти (Coherence)Andrey Akulov
 
Database as a Service
Database as a ServiceDatabase as a Service
Database as a ServiceAndrey Akulov
 
Новый подход к резервному копированию БД - Zero Data Loss Recovery Appliance
Новый подход к резервному копированию БД - Zero Data Loss Recovery ApplianceНовый подход к резервному копированию БД - Zero Data Loss Recovery Appliance
Новый подход к резервному копированию БД - Zero Data Loss Recovery ApplianceAndrey Akulov
 
Oracle database In-Memory - новая технология обработки в памяти
Oracle database In-Memory - новая технология обработки в памятиOracle database In-Memory - новая технология обработки в памяти
Oracle database In-Memory - новая технология обработки в памятиAndrey Akulov
 

Mais de Andrey Akulov (20)

Highly Automated IT
Highly Automated ITHighly Automated IT
Highly Automated IT
 
Oracle OpenWorld 2016. Big Data references
Oracle OpenWorld 2016. Big Data referencesOracle OpenWorld 2016. Big Data references
Oracle OpenWorld 2016. Big Data references
 
Oracle Big Data proposition
Oracle Big Data propositionOracle Big Data proposition
Oracle Big Data proposition
 
Oracle Cloud Computing portfolio and strategy
Oracle Cloud Computing portfolio and strategyOracle Cloud Computing portfolio and strategy
Oracle Cloud Computing portfolio and strategy
 
Oracle Big Data. Обзор технологий
Oracle Big Data. Обзор технологийOracle Big Data. Обзор технологий
Oracle Big Data. Обзор технологий
 
Oracle IaaS including OCM and Ravello
Oracle IaaS including OCM and RavelloOracle IaaS including OCM and Ravello
Oracle IaaS including OCM and Ravello
 
Oracle Ravello
Oracle Ravello Oracle Ravello
Oracle Ravello
 
Oracle Enterprise Metadata Management
Oracle Enterprise Metadata ManagementOracle Enterprise Metadata Management
Oracle Enterprise Metadata Management
 
Решения Oracle для Big Data
Решения Oracle для Big DataРешения Oracle для Big Data
Решения Oracle для Big Data
 
Преимущества построения оперативной отчетности с помощью технологий Oracle
Преимущества построения оперативной отчетности с помощью технологий OracleПреимущества построения оперативной отчетности с помощью технологий Oracle
Преимущества построения оперативной отчетности с помощью технологий Oracle
 
Подход Oracle к управлению метаданными для аналитических систем
Подход Oracle к управлению метаданными для аналитических системПодход Oracle к управлению метаданными для аналитических систем
Подход Oracle к управлению метаданными для аналитических систем
 
Управление административными учетными записями как средство защиты от челове...
Управление административными учетными записями как  средство защиты от челове...Управление административными учетными записями как  средство защиты от челове...
Управление административными учетными записями как средство защиты от челове...
 
Cоблюдение требований законодательства с помощью сертифицированных средств бе...
Cоблюдение требований законодательства с помощью сертифицированных средств бе...Cоблюдение требований законодательства с помощью сертифицированных средств бе...
Cоблюдение требований законодательства с помощью сертифицированных средств бе...
 
Защита информации на уровне СУБД
Защита информации на уровне СУБДЗащита информации на уровне СУБД
Защита информации на уровне СУБД
 
Новые возможности по разработке приложений (ADF, SOA, BPM)
Новые возможности по разработке приложений (ADF, SOA, BPM)Новые возможности по разработке приложений (ADF, SOA, BPM)
Новые возможности по разработке приложений (ADF, SOA, BPM)
 
Повышение эффективности Java приложений (новые возможности Web Logic 12c, кон...
Повышение эффективности Java приложений (новые возможности Web Logic 12c, кон...Повышение эффективности Java приложений (новые возможности Web Logic 12c, кон...
Повышение эффективности Java приложений (новые возможности Web Logic 12c, кон...
 
Новые возможности распределенной обработки данных в памяти (Coherence)
Новые возможности распределенной обработки данных в памяти (Coherence)Новые возможности распределенной обработки данных в памяти (Coherence)
Новые возможности распределенной обработки данных в памяти (Coherence)
 
Database as a Service
Database as a ServiceDatabase as a Service
Database as a Service
 
Новый подход к резервному копированию БД - Zero Data Loss Recovery Appliance
Новый подход к резервному копированию БД - Zero Data Loss Recovery ApplianceНовый подход к резервному копированию БД - Zero Data Loss Recovery Appliance
Новый подход к резервному копированию БД - Zero Data Loss Recovery Appliance
 
Oracle database In-Memory - новая технология обработки в памяти
Oracle database In-Memory - новая технология обработки в памятиOracle database In-Memory - новая технология обработки в памяти
Oracle database In-Memory - новая технология обработки в памяти
 

Último

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 

Último (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 

Эволюция Big Data и Information Management. Reference Architecture.

  • 1. Information Management Reference Architecture 3rd Evolution EMEA Enterprise Architecture
  • 2. Contents  Introduction  Conceptual view  Design Patterns  IM Logical view and component outline  Discovery Lab  R/T Event Engine logical view  Mapping to previous Reference Architecture release
  • 4. Introduction  This PPT documents the main architectural components of Oracle‟s Information Management Reference Architecture.  The architecture is intended to be practical and pragmatic, with many of the ideas and experiences that inform the approach dating back almost 20 years in Oracle and are based on real world customer experiences.  We define Information Management to mean the following. Please note that this definition embraces all types and forms of data as well as embracing aspects such as Information Discovery and Governance: “Information Management is the means by which an organisation maximises the efficiency with which it plans, collects, organises, uses, controls, stores, disseminates, and disposes of its Information, and through which it ensures that the value of that information is identified and exploited to the maximum extent possible” 3rd Evolution of Oracle‟s Information Management Reference Architecture
  • 5. Oracle’s Information Management Reference Architecture (3rd Edition)  More relevant to Big Data oriented audience  Better representation of pragmatic customer projects  Includes Raw data store as part of the architecture  Show effort / cost to store and interpret data that separates schema-on-read and schema-on-write approaches  Aligned to Analytics 3.0  Consistent with Oracle‟s engineering efforts What‟s changed?
  • 6. Aligning analytical requirements and IM architecture Enabling Analytics 3.0 with a pragmatic architecture Analytics 2.0 Analytics 3.0 Analytics 1.0 • Reporting with limited use of descriptive analytics • Limited range of tabular data • Batch oriented analysis • Analysis bolted onto limited set of business processes • Firms “Competing on Analytics” • Extended analytics to larger and less structured datasets • Emergence of Big Data into the commercial world • Recognition of Data Science role in commercial orgs. • Platform for monetisation • Deeper analysis & more data • Faster test-do-learn iterations • Different types of data & wider business process coverage • Analysts focus on discovery and driving business value • “Agile” with operational elements incorporated into design patterns Adapted from Tom Davenport material
  • 7. Oracle’s Information Management Reference Architecture (3rd Edition) “All those layers and definitions in your Reference Architecture, I just don’t get it… and it looks complicated !” Hadoop developer knee deep in complex Map:Reduce code What‟s changed? Business Trends Technology Trends Data Trends
  • 9. Actionable Events Event Engine Data Reservoir Data Factory Enterprise Information Store Reporting Discovery Lab Actionable Information Actionable Insights Input Events Execution Innovation Discovery Output Events & Data Conceptual View Structured Enterprise Data Other Data
  • 10. Component Outline Data Engine Respond to R/T events in appropriate and/or optimised fashion Data Reservoir Raw data Reservoir – typically event data at lowest grain Data Factory Managed ETL onto, within and between platforms Enterprise Data Data stores for Information Management Reporting BI tools and infrastructure components Discovery Lab Platform, data and tools to support discovery process Execution – things you do every day Innovation – innovation to drive tomorrows business Line of Governance! Discovery Output – Possible outputs include new knowledge, mining models / parameters, scored data…
  • 12. Design Pattern: Discovery Lab  Specific focus on identifying commercial value for exploitation  Small group of highly skilled individuals (aka Data Scientists)  Iterative development approach – data oriented NOT development oriented  Wide range of tools and techniques applied  Data provisioned through Data Factory or own ETL  Typically separate infrastructure but could also be unified Reservoir if resource managed effectively
  • 13. Design Pattern : Information Platform  Build the next generation Information Management platform  Either Business Strategy driven or IT cost / capability driven initiative  Initial project may be specifically linked to lower data grain or retention BUT it is the platform as a whole that forms the solution required  Platform for consolidating other IM assets onto  Key issues related to differences in procurement, development process, governance and skills differences  Discovery Lab may be implemented as a pragmatic initial POV.
  • 14. Design Pattern : Data Application  Big Data technologies applied to a specific business problem e.g. Genome sequence analysis using BLAST or log data from pharmaceutical production plant and machinery required for traceabiliy  Limited or no integration to broader Information Management estate  Specific solution so Non-functional requirements have less impact on solution quality or long term costs  Platform costs and scalability are important considerations
  • 15. Design Pattern: Information Solution  Specific solution based on Big Data technologies requiring broader integration to the wider Information Management estate e.g. ETL pre-processor for the DW or affordably store a lower level of grain  Non-functional requirements more critical in this solution  Scalable integration to IM estate an important factor for success  Analysis may take place in Reservoir or Reservoir only used as an aggregator
  • 16. Design Pattern: Real-Time Events  May take place at multiple locations between place of data origination and the Data Centre – requiring careful design and implementation  May include Next-Best-Activity, declarative rules and Data Mining technologies to optimise decisions. i.e. optimise across declarative, data mining, customer preference & business-defined rules  May include considerations for personal preferences and privacy (e.g. opt-out) for customer related events  Common component seen across many industries & markets e.g. connected vehicle Real-Time optimisation of events
  • 17. Design Pattern against component usage map Design pattern Discovery Lab Information Platform Data Application Information Solution R/T Events Outline Data science lab Assess the value of the data Next Generation information platform to align IM capability with business strategy Addressing a specific data problem in Hadoop with no broader integration required. Addressing a specific data problem but requires broader enterprise wide integrations. e.g. ETL pre-processing, Event Store at lower grain than existing DW Execution platform to respond to R/T events Examples Gov. Healthcare Mobile operator Spanish Bank (Business led) UK Gov. Dept. (Tech. led) Pharma Genome project Pharma production archive Investment Bank – trade risk Mobile Operator – ETL processing Mobile operator – location based offers Data Engine Possible Yes Data Reservoir Yes Yes Yes Data Factory Yes Yes Yes Enterprise Data Yes Reporting Yes Discovery Lab Yes Implied Alternative approach to Reservoir + Factory above
  • 18. IM Logical View and Components
  • 19. Information Management – Logical View Data Sources Data Ingestion Methods and process to load data into our managed data store and manage data quality • Contemporary Information Management solutions must be able to ingest any type of data from any source in any format and mechanism and at any frequency. e.g. Flat file loads, streaming… • The data may be highly unstructured, mono-structured or highly poly-structured. • Data will vary in volume and in Data Quality. • Operational isolation should be considered to ensure operational applications will continue in the event of the loss of the Information Management system Data Engines & Poly-structured sources Content Docs Web & Social Media SMS Structured Data Sources • Operational Data • COTS Data • Master & Ref. Data • Streaming & BAM
  • 20. Information Management – Logical View Information Ingestion Data Ingestion Information Interpretation Methods and process to load data and manage Data Quality Methods and process needed to access information Managed Data Load All data under management Query • Data structure and processing required to load data into managed data stores • Shape represents the work done on the data to load data and/or process between layers • Layer may include file mechanism where required to facilitate loading (e.g. Fuse fs or ZFS for operational isolation and file concat) • Normal rules of micro-batch, taking all the data and KISS principles recommended • DQ and loading stats presented through BI dashboards as a non-judgemental mechanism to improve DQ. • Data may be landed in the Ingestion layer to facilitate loading but not typically stored for any length of time. e.g. Raw data loaded from web logs but sessionised data then loaded to Raw. Another example is data used to manage CDC may be stored in this layer.
  • 21. Information Management – Logical View Data Interpretation Data Ingestion Information Interpretation Methods and process to load data and manage Data Quality Methods and process needed to access information Managed Data Load All data under management Query • Methods and processes required to access information in each of the stores • Shape represents the cost of interpreting the data under management • For schema-on-read the cost may include the AVRO, SerDe or reader class as well as the associated processing code to select, filter and process the data. • For schema-on-write the cost is represented by the complexity of the SQL required to access the data only – more complex typically for 3NF than for a dimensional query.
  • 22. Information Management – Logical View Data Layers – cost, quality and concurrency trade off Managed DataAccess & Performance Layer Foundation Data Layer Raw Data Reservoir Immutable raw data reservoir Raw data at rest is not interpreted Immutable modelled data. Business Process Neutral form. Abstracted from business process changes Past, current and future interpretation of enterprise data. Structured to support agile access & navigation • Increasing enrichment • Increasing data quality • Reducing concurrency costs • Data under management includes 3 key layers – Raw, Foundation and Access and Performance layers. • Data normally loaded into Raw and Foundation layers BUT BI Apps loads data directly into APL and federated warehouses may well also load data at aggregate level from federated operating companies. • Data Factory is responsible for loading and then managing data between layers. • Work is done to elevate the data between layers – typically further enriching and improving data quality. • Work done in processing the data between the layers significantly reduce query costs. i.e. higher levels of concurrency can be sustained for the same processing power. • Increasing formalisation of definition
  • 23. Information Management – Logical View Data Layers – Analytical processing Managed DataAccess & Performance Layer Foundation Data Layer Raw Data Reservoir • Analytical processing capabilities of Hadoop and RDBMS used to elevate data between layers as previously described. • These analytical capabilities can also be leveraged by tools that access the data directly. Typically this would be by a Data Scientist for Discovery Lab operations or BI Tools and Services that are processing data using a model previously defined by the Data Scientist. OLAP Data Mining Statistics OLAP Text Mining Other Analytical Processing Data Mining Text Mining Image Processing • Increasing enrichment • Increasing data quality • Reducing concurrency costs • Increasing formalisation of definition
  • 24. Information Management – Logical View Data Layers – Raw Data Reservoir Managed DataAccess & Performance Layer Foundation Data Layer Raw Data Reservoir Immutable raw data reservoir Raw data at rest is not interpreted Immutable modelled data. Business Process Neutral form. Abstracted from business process changes Past, current and future interpretation of enterprise data. Structured to support agile access & navigation • Immutable data store with data at lowest level of grain. • Typically implemented in Hadoop or NoSQL for cost reasons but not always. • May be: • Queries directly, • Used to derive base level data for Foundation Layer. Data may be represented logically in Foundation or physically as the store is immutable BUT this effects ILM policy. • or used to derive values or aggregates for Access and Performance layer. (e.g. propensity score or total monthly SMS‟s)
  • 25. Information Management – Logical View Data Layers – Foundation Data Layer Managed DataAccess & Performance Layer Foundation Data Layer Raw Data Reservoir Immutable raw data reservoir Raw data at rest is not interpreted Immutable modelled data. Business Process Neutral form. Abstracted from business process changes Past, current and future interpretation of enterprise data. Structured to support agile access & navigation • Immutable integrated and standardised store of enterprise class data. Stuff the business has agreed and organises around. • Data at lowest level of grain of value for Enterprise data. • Stored in business process neutral fashion to avoid data maintenance tasks to keep in step with current business interpretations. • Typically close to 3NF. Special attention to modelling hierarchy, flexible entity attributions, customer / supplier etc. • ONLY implemented in relational technology BUT this could be logical as previously noted in Raw Data Reservoir. • May be queries directly by a select few individuals. Wider access to detail data provided through views in APL, often with VPD implemented to prevent queries to antecedent data. • Data in the Foundation Layer should be retained for as long as possible. • Consideration should be given to retaining data in Raw Data Reservoir rather than archiving.
  • 26. Information Management – Logical View Data Layers – Access and Performance Layer Managed DataAccess & Performance Layer Foundation Data Layer Raw Data Reservoir Immutable raw data reservoir Raw data at rest is not interpreted Immutable modelled data. Business Process Neutral form. Abstracted from business process changes Past, current and future interpretation of enterprise data. Structured to support agile access & navigation • Layer facilitates access, navigation and performance of queries. • Allows for multiple interpretations of data from Foundation or Raw data Reservoir. • Most structures can be thrown away and re-built from scratch based on Foundation and Raw Reservoir. • The exception is derived and aggregate data which may have to be retained if the underlying data/mechanism is archived. • Most users presenting information in a standardised fashion on dashboards and reports will access this layer only.
  • 27. Access and Performance Layer Information Interpretation Access & Performance Layer Foundation Data Layer Raw Data Reservoir Data Engines & Poly-structured sources Content Docs Web & Social Media SMS Structured Data Sources • Operational Data • COTS Data • Master & Ref. Data • BAM Data • Data destined for Raw Data Reservoir may be loaded directly (e.g. through Flume) or may be stored temporarily in fs prior to loading (e.g. Fuse fs) • Relational data ingested in most appropriate mechanism before persisting in Foundation Data Layer (usual rules apply…) • Ideally micro batch using simplest mechanism possible • Only data of agreed quality loaded in FDL • For efficient loading relationally data may be pre-staged in fs so a large number of small files can be concatenated Information Management – Logical View Data Factory Ingestion flow Data Ingestion Batch & Real-Time ETL / ELT CDC Stream File Ops.
  • 28. Access and Performance Layer Data Ingestion Information Interpretation Access & Performance Layer Foundation Data Layer Raw Data Reservoir Flow shown: 1. Data to be formalised from HDFS store extracted and loaded into Foundation Data Layer. e.g. where Flume/HDFS is being used as an ETL pre-processor for Enterprise Data or where HDFS data is being logically modelled in the foundation layer 2. Data is re-structured and/or aggregated to facilitate access by users and business processes 3. Data may also be re-structured and/or aggregated from HDFS store where there are no specific requirements to manage Enterprise Data in a more formal data store over time 1 2 3 Information Management – Logical View Data Factory intra data processing flow
  • 29. Access and Performance Layer Information Management – Logical View Information Provisioning – BI & Data Science Components Virtualisation& QueryFederation Enterprise Performance Management Pre-built & Ad-hoc BI Assets Information Services Data Ingestion Information Interpretation Access & Performance Layer Foundation Data Layer Raw Data Reservoir Virtualisation& QueryFederation • Data Virtualisation and the various components to access the data are as per our previous view on BI tools. • Data Virtualisation is a key components that helps to deliver tools independence, services integration and a future state roadmap • Big Data has focused considerable attention on Data Science • Analytical capabilities delivered through analytical processing in the data layers and Advanced Analytical Tools used to drive capabilities • Data Mining in particular often involves complex data processing to flatten data into a longitudinal form. This derived data and model results are typically written to a project based sandbox. • Agile discovery is often best served through a separate Discovery Lab infrastructure (see later details) Data Science
  • 30. Access and Performance Layer Information Management – Logical View Information Provisioning BI Flows Virtualisation& QueryFederation Enterprise Performance Management Pre-built & Ad-hoc BI Assets Information Services Data Science Data Ingestion Information Interpretation Access & Performance Layer Foundation Data Layer Raw Data Reservoir 2 3 1. Typical access mechanism for Enterprise data via Access and Performance layer structures 2. Access to Foundation Layer Data to specific functions, processes and users only 3. Data interpretation & DQ assured through encoded logic, Avro, SerDe, FileReader, HCat etc. 4. Diagonal flows shows how data can be joined between layers as well as accessed directly. e.g. Raw Data can be queried directly through HIVE connector or joined to the RDBMS data and queried. 1 4 4
  • 31. Information Management – Logical View Data / Information Quality Access and Performance Layer Data Ingestion Information Interpretation Access & Performance Layer Foundation Data Layer Raw Data Reservoir Virtualisation& QueryFederation Enterprise Performance Management Pre-built & Ad-hoc BI Assets Information Services Data Science  Quality of data at rest assured by a number of factors in addition to the underlying quality of data at source – File and event handling to ensure data is not missed (e.g. missing log files assured by log file sequence numbering) – The processing of data between Raw and FDL / APL layers. This can be seen as a DQ firewall to ensure only data of known and acceptable quality is loaded. Typically this involves an element of synchronisation as some data will need to be held off until required reference data is available due to the micro-batch incremental loading approach.  Quality of information presented to downstream tools and services determined by – Model quality, understanding and performance of provisioning from modelled layers – Consistency of definition, code quality and query performance when accessing Hadoop data (e.g. HR code, Avro definition…)
  • 32. Information Management – Logical View Data Reservoir & Enterprise Information Store Virtualisation& QueryFederation Enterprise Performance Management Pre-built & Ad-hoc BI Assets Information Services Data Ingestion Information Interpretation Access & Performance Layer Foundation Data Layer Raw Data Reservoir Data Science Data Engines & Poly-structured sources Content Docs Web & Social Media SMS Structured Data Sources • Operational Data • COTS Data • Master & Ref. Data • Streaming & BAM Immutable raw data reservoir Raw data at rest is not interpreted Immutable modelled data. Business Process Neutral form. Abstracted from business process changes Past, current and future interpretation of enterprise data. Structured to support agile access & navigation
  • 34. Analysis Processing & Delivery Discovery Lab & Data Science Tooling Data Reservoir & Enterprise Data Data Science (Primary Toolset) Statistics Tools Data & Text Mining Tools Faceted Query Tools Programming & Scripting Data Modeling Tools Query & Search Tools Pre-Built Intelligence Assets Intelligence Analysis Tools Ad Hoc Query & Analysis Tools OLAP Tools Forecasting & Simulation Tools Reporting Tools Data Scientist Virtualisation& InformationServices Data Factory flow 1. Data Factory responsible for access provisioning to data or replication (all or sample) to Sandbox in Discovery Lab. 2. Direct connection from Data Science tools and analysis sandbox. Data Science tools read and write data from/to project sandboxes. 3. Data Scientist can also access standard dashboards, reports and KPI‟s through Data Virtualisation layer Data Quality & Profiling Graphical rendering tools Dashboards & Reports Scorecards Charts & Graphs Sandbox – Project 3 Sandbox – Project 2 Sandbox – Project 1 1 2 Data store Analytical Processing 3 Information Management – Logical View Discovery Lab data flow
  • 35. R/T event Engine – Logical View and Components
  • 36. Real-time Data Engine To Event Subscribers (Events / Data) Privacy Filter Data Transform Rules & Models Mediation Next Best Action Real-Time Data Store From Input Events Reference Data Models & Rules Privacy Data Analytics Real-Time Data Engine – Logical View Business Activity Monitoring Real-Time event monitoring
  • 37. Real-Time Data Engine  Message mediation service  Privacy filter for event data. i.e. apply customer specified privacy and preference filters to the data stream  Transformation of the message data to outbound form  Apply declarative rules and models to the data stream to detect events for further downstream processing  Next Best Activity (NBA) event detection and processing. NBA typically also includes control group management and global optimisation of rules  Business Activity Monitoring  Local data store – local persistence of rules and metadata Components Privacy Filter Data Transform Rules & Models Mediation Next Best Action Real-Time Data Store BAM
  • 38. Real-Time data engine flows  Describe each of the data flows Reference Data Models & Rules Privacy Data Event Analytics From Input Events To Event Subscribers (Events / Data) R/T Event Monitoring To Do
  • 39. Mapping from the previous release of the architecture
  • 40. Information Management Reference Architecture Version 2.0 of the Architecture
  • 41. Information Management Reference Architecture Interpretation layer shows the relative cost of reading data depending on its location Previous staging layer now split into Data Ingestion and Raw store. Ingestion layer includes methods and processes to load data and manage Data Quality. Shape represents the relative cost of these processes. i.e. from none for HDFS to lots in APL. Raw Reservoir is typically at the lowest level of grain. Often lower than the enterprise cares about and so may not have been included in previous representation. Renamed from Knowledge Discovery to Discovery Lab but otherwise unchanged. The role of Discovery Labs is becoming more central though so additional operational guidance will be added. Discovery Lab Still an immutable store but may be physically implemented in relational or non- relational technologies Key differences from 2.0 to 3.0 of the Architecture
  • 42.
  • 43.
  • 45. Data discovery for the Enterprise  Discovery phase – Unbounded discovery – Self-Service sandbox – Wide toolset – Agile methods  Promotion to Exploitation – Commercial exploitation – Narrower toolset – Integration to operations – Non-functional requirements – Code standardisation & governance Discovery and monetising steps have different requirements Business Value Commercial Exploitation Time / Effort Discovery phase Understanding of the data Governance
  • 46. To monetise fully you need to standardise It‟s smart to standardise as part of Governance  Discovery process requires a broad toolset  Standardisation is essential for Commercial exploitation  Sustainability depends on standardisation / rationalisation – Reduced training burden – Reduced support costs – Reduced license costs – Ongoing agility & alignment Data Discovery Toolset Data Exploitation Toolset Rationalised Components • Cloudera CDH, Oracle, No-SQL • Mammoth, Yarn, EM-plugin • MR, Hive, Pig, Impala, Accum. • Flume NG, Oozie • … • … • … Optional additions • Oracle Connectors • Additional corporate standard components Oraclestandard deployment Corporate standard Standardised Hadoop Zoo Standardised deployment
  • 47. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.52 The kind of things we are looking to Discover  Data science skills required vary by the type of analysis  Data Management skills vary by the amount of data and its structure  So making data movement and manipulation easy will deliver a better result and deliver it faster Descriptive Diagnostic Predictive Prescriptive Business Impact AnalyticalSkills Insight Foresight
  • 48. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.53 Discovery is a Data process not a Development Process Requirement Analysis High Level Design Low Level Design Coding Testing Acceptance Testing Three Versions of the BI Development Process Excel Spreadsheet Shared linked spreadsheets Local Access Database Shared Access Server SQL Server Database Oracle Datawarehouse Discovery & Profile Model Exploit What IT thinks it should be What normally happens What Big Data is trying to achieve
  • 49. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.54 Sandbox delivery options • Separate Data Lab environment • Delivered as part of Information Management architecture Self Service Sandboxes • Self service provisioning of new sandboxes for Discovery phase • Automation of data access rights, resources and tools provisioning Data provision • Quickly take on new data to rapidly make available to Analysts • Tools such as “Data Factory” can fully automate data flows Sandboxes facilitate “Agile” Providing the technology platform for agile discovery
  • 50. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.55 Monetise and Optimise steps are different  New insights deployed into business process in some form – Technical: e.g. Business rules, new customer segments – Non-technical: e.g. Observations about behaviours  Business Intelligence systems adapted to provide monitoring, feedback and control optimisation  The faster you iterate this cycle the greater the benefit BUT  Big Data does not change the fundamental need for accurate, consistent and integrated information What happens when we want to exploit insights? New insights Business Process
  • 51. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.56 Rules of thumb for data Organised information leads to better analyses Information needs to be organised in order to analyse it RDBMS are great when information is organised Hadoop minimises the penalty for disorganisation The closer you are to insight, the more complete and organised information needs to be Data needs to be organised to monetise it effectively
  • 52. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.57 What that really means is…  We need to apply structure to data in order to analyse it  Schema on read works well for us in Discovery as we can be agile about interpretation  As we move into Discovery schema on read can causes Governance & quality issues  Key lesson: The cost to store & manage is distinct from structural considerations between Big Data and RDBMS technologies
  • 53. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 1358 De-mystifying schema on read DQ Bus. Rules Mapping ETL Data Reservoirs  Traditional “Schema on Write” – Data quality managed by formalised ETL process – Data persisted in tabular, agreed and consistent form – Data integration happens in ETL – Structure must be decided before writing  Big Data “Schema on Read” – Interpretation of data captured in code for each program accessing the data – Data quality dependent on code quality – Data integration happens in code
  • 54. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 1359 Underlying storage capabilities are different 0 1 2 3 4 5 Tooling maturity Stringent Non-Functionals ACID transactional requirement Security Variety of data formats Data sparsity ETL simplicity Cost effectively store low value data Ingestion rate Straight Through Processing (STP) Hadoop Relational My Appllication
  • 55. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 1360 Analytics 3.0 platform include both relational and non-relational technologies Ken Rudin* refers to this as the genius of AND vs the tyranny of OR (see his TDWI „13 presentation) Unified Reservoir simplifies access to all data regardless of characteristics & analysis requirements It’s smart to unify your data into a single Reservoir Fully expose your data for discovery and monetisation Ken Rudin is Director of Analytics at Facebook* All Data
  • 56. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 1361 Access and Performance Layer Information Management – Logical View Virtualisation& QueryFederation Enterprise Performance Management Pre-built & Ad-hoc BI Assets Information Services Advanced Analytical Tools Information Provisioning Analysis Processing & Delivery Data Ingestion Information Access Access & Performance Layer Foundation Data Layer Raw Data Reservoir Immutable raw data reservoir Raw data at rest is not interpreted Immutable modelled data. Business Process Neutral form. Abstracted from business process changes Past, current and future interpretation of enterprise data. Structured to support agile access & navigation Methods and process to load data and manage Data Quality Methods and process needed to access information
  • 57. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 1363 Access and Performance Layer Information Management – Logical View Analytical processing and delivery Virtualisation& QueryFederation Enterprise Performance Management Pre-built & Ad-hoc BI Assets Information Services Advanced Analytical Tools Data Ingestion Information Access Access & Performance Layer Foundation Data Layer Raw Data Reservoir Structures and processing required to load data (batch and Real-Time) and manage Data Quality Structures required to interpret the data under management. i.e. logical interpretation • Data Virtualisation and the various components to access the data are as per our previous view on Bo tools. • Data Virtualisation is a key components that helps to deliver tools independence, services integration and a future state roadmap • What has changed is the focused on Analytics • Analytical capabilities is delivered through analytical processing in the data layers and Advanced Analytical Tools used to drive capabilities • Data Mining in particular often involves complex data processing to flatten data into a longitudinal form. This derived data and model results are typically written to a project based sandbox. • Agile discovery is often best served through a separate Discovery Lab infrastructure (described later) OLAP Data Mining Statistics OLAP Text Mining Other Analytical Processing Data Mining Text Mining

Notas do Editor

  1. 2010 Tom Davenport in HBR
  2. The closer you are to monetising data the more organised the data should beHadoop minimises the penalty for not being organised i.e. not understanding your dataData ManagementData ProfilingDescriptive statisticsGraphical Analysis
  3. Many of our customer have already developed Hadoop based solutions in a pre-production setting by downloading from internet and running it on a virtualised Linux server, often on a laptop.
  4. If the audience is very pro Big Data lay on the first explanation thick – talk about TRADITIONAL systems and how ETL can be very slow to put into place because of the need to agree the process with the business, build a common understanding of data and how it must be integrated etc.Schema on read is the opposite – it is very fast to value BUT the cost of ETL is carried by each system that accesses the data. Data quality is a function of the program that accesses the data.Time also has a bearing here. Use the example of the recent changes to Hadoop and the deprecation of large numbers of JAVA classes
  5. Ken was also at Zinga also ex SiebelHis point about the way they have included Analysts in their product teams is a key one regards Analytics 3.0Also in Zinga more than 50% of the data was held in flex fields – it’s a shame nobody told them how to model this kind of system!The closer you are to monetising data the more organised the data should beHadoop minimises the penalty for not being organised i.e. not understanding your data