SlideShare uma empresa Scribd logo
1 de 50
Baixar para ler offline
Information Management Reference Architecture 
EMEA Enterprise Architecture
Contents 
 Introduction 
 Conceptual view 
 Design Patterns 
 IM Logical view and component outline 
 Discovery Lab 
 R/T Event Engine logical view 
 Mapping to previous Reference Architecture release
Introduction
Introduction 
 This PPT documents the main architectural components of Oracle’s 
Information Management Reference Architecture. 
 The architecture is intended to be practical and pragmatic, with many of the 
ideas and experiences that inform the approach dating back almost 20 years 
in Oracle. 
 These ideas and concepts have been continually refined through the 
engagement of our Enterprise Architecture team on real world customer 
engagements. 
3rd Evolution of Oracle’s Information Management Reference Architecture
What is Information Management 
“Information Management is the means by which an 
organisation maximises the efficiency with which it plans, 
collects, organises, uses, controls, stores, disseminates, 
and disposes of its Information, and through which it 
ensures that the value of that information is identified and 
exploited to the maximum extent possible” 
We define Information Management to mean…
Aligning analytical requirements and IM architecture 
Enabling Analytics 3.0 with a pragmatic architecture 
Analytics 2.0 
Analytics 3.0 
Analytics 1.0 
• Reporting with limited use of 
descriptive analytics 
• Limited range of tabular data 
• Batch oriented analysis 
• Analysis bolted onto limited 
set of business processes 
• Firms “Competing on Analytics” 
• Extended analytics to larger 
and less structured datasets 
• Emergence of Big Data into the 
commercial world 
• Recognition of Data Science 
role in commercial orgs. 
• Platform for monetisation 
• Deeper analysis & more data 
• Faster test-do-learn iterations 
• Different types of data & wider 
business process coverage 
• Analysts focus on discovery and 
driving business value 
• “Agile” with operational elements 
incorporated into design patterns 
Adapted from Tom Davenport material
Conceptual View
Actionable 
Events 
Event Engine Data 
Reservoir 
Data Factory Enterprise 
Information Store 
Reporting 
Discovery Lab 
Actionable 
Information 
Actionable 
Insights 
Data 
Streams 
Execution 
Innovation 
Discovery 
Output 
Events 
& Data 
Conceptual View 
Structured 
Enterprise 
Data 
Other 
Data
Component Outline 
Event Engine Respond to R/T events in appropriate and/or optimised fashion 
Data Reservoir Raw data Reservoir – typically event data at lowest grain 
Data Factory Managed ETL onto, within and between platforms 
Enterprise Data Data stores for Information Management 
Reporting BI tools and infrastructure components 
Discovery Lab Platform, data and tools to support discovery process 
Execution – things you do every day 
Innovation – innovation to drive tomorrows business 
Line of Governance! 
Discovery 
Output 
– Possible outputs include new knowledge, mining models / parameters, scored data…
Design Patterns
Design Pattern: Discovery Lab 
 Specific focus on identifying commercial value for exploitation 
 Small group of highly skilled individuals (aka Data Scientists) 
 Iterative development approach – data oriented NOT development oriented 
 Wide range of tools and techniques applied 
 Data provisioned through 
Data Factory or own ETL 
 Typically separate infrastructure 
but could also be unified Reservoir 
if resource managed effectively
Design Pattern : Information Platform 
 Build the next generation Information Management platform 
 Either Business Strategy driven or IT cost / capability driven initiative 
 Initial project may be specifically linked to lower data grain or retention 
BUT it is the platform as a whole that forms the solution required 
 Platform for consolidating other IM assets onto 
 Key issues related to differences in 
procurement, development process, 
governance and skills differences 
 Discovery Lab may be implemented 
as a pragmatic initial POV.
Design Pattern : Data Application 
 Big Data technologies applied to a specific business problem 
e.g. Genome sequence analysis using BLAST or log data from 
pharmaceutical production plant and machinery required for traceability 
 Limited or no integration to broader Information Management estate 
 Specific solution so Non-functional requirements have less impact 
on solution quality or long term costs 
 Platform costs and scalability are 
important considerations
Design Pattern: Information Solution 
 Specific solution based on Big Data technologies requiring broader 
integration to the wider Information Management estate 
e.g. ETL pre-processor for the DW or affordably store a lower level of grain 
 Non-functional requirements more critical in this solution 
 Scalable integration to IM estate 
an important factor for success 
 Analysis may take place in Reservoir 
or Reservoir only used as an aggregator
Design Pattern: Real-Time Events 
 May take place at multiple locations between place of data origination and the 
Data Centre – requiring careful design and implementation 
 May include Next-Best-Activity, declarative rules and Data Mining technologies 
to optimise decisions. i.e. optimise across declarative, data mining, customer 
preference & business-defined rules 
 May include considerations for 
personal preferences and privacy 
(e.g. opt-out) for customer related 
events 
 Common component seen across 
many industries & markets 
e.g. connected vehicle 
Real-Time optimisation of events
Design Pattern against component usage map 
Design pattern Discovery Lab 
Information 
Platform 
Data Application Information Solution R/T Events 
Outline 
Data science lab 
Assess the value of 
the data 
Next Generation 
information platform to 
align IM capability with 
business strategy 
Addressing a specific data 
problem in Hadoop with no 
broader integration required. 
Addressing a specific data 
problem but requires broader 
enterprise wide integrations. e.g. 
ETL pre-processing, Event Store 
at lower grain than existing DW 
Execution platform to 
respond to R/T events 
Examples 
Gov. Healthcare 
Mobile operator 
Spanish Bank (Business led) 
UK Gov. Dept. (Tech. led) 
Pharma Genome project 
Pharma production archive 
Investment Bank – trade risk 
Mobile Operator – ETL processing 
Mobile operator – 
location based offers 
Data Engine Possible Yes 
Data Reservoir Yes Yes Yes 
Data Factory Yes Yes Yes 
Enterprise Data Yes 
Reporting Yes 
Discovery Lab Yes Implied Alternative approach 
to Reservoir + Factory above
IM Logical View and 
Components
Information Management – Logical View 
Data Sources 
Data Ingestion 
Methods and process 
to load data into our 
managed data store 
and manage data 
quality 
• Contemporary Information Management solutions must be able to ingest any type of data from any source in any format and 
mechanism and at any frequency. e.g. Flat file loads, streaming… 
• The data may be highly unstructured, mono-structured or highly poly-structured. 
• Data will vary in volume and in Data Quality. 
• Operational isolation should be considered to ensure operational applications will continue in the event of the loss of the 
Information Management system 
Data Engines & 
Poly-structured 
sources 
Content 
Docs Web & Social Media 
SMS 
Structured 
Data 
Sources 
• Operational Data 
• COTS Data 
• Streaming & BAM 
Master & 
Reference 
Data Sources
Information Management – Logical View 
Information Ingestion 
Data Ingestion 
Information Interpretation 
Methods and process 
to load data and 
manage Data 
Quality 
Methods and 
process needed to 
access information 
Managed Data 
Load 
All data under management 
Query 
• Data structure and processing required to load data into managed data stores 
• Shape represents the work done on the data to load data and/or process between layers 
• Layer may include file mechanism where required to facilitate loading 
(e.g. Fuse fs or ZFS for operational isolation and file concat) 
• Normal rules of micro-batch, taking all the data and KISS principles recommended 
• DQ and loading stats presented through BI dashboards as a non-judgemental mechanism to improve DQ. 
• Data may be landed in the Ingestion layer to facilitate loading but not typically stored for any length of time. e.g. Raw data loaded from web 
logs but sessionised data then loaded to Raw. Another example is data used to manage CDC may be stored in this layer.
Information Management – Logical View 
Data Interpretation 
Data Ingestion 
Information Interpretation 
Methods and process 
to load data and 
manage Data 
Quality 
Methods and 
process needed to 
access information 
Managed Data 
Load 
All data under management 
Query 
• Methods and processes required to access information in each of the stores 
• Shape represents the cost of interpreting the data under management 
• For schema-on-read the cost may include the AVRO, SerDe or reader class as well as the associated processing code to 
select, filter and process the data. 
• For schema-on-write the cost is represented by the complexity of the SQL required to access the data only – more complex 
typically for 3NF than for a dimensional query.
Information Management – Logical View 
Data Layers – cost, quality and concurrency trade off 
MAacncaegsesd & D Patear formance Layer 
Foundation Data Layer 
Raw Data Reservoir 
Immutable raw data reservoir 
Raw data at rest is not interpreted 
Immutable modelled data. Business 
Process Neutral form. Abstracted 
from business process changes 
Past, current and future interpretation of 
enterprise data. Structured to support 
agile access & navigation 
• Increasing enrichment 
• Increasing data quality 
• Reducing concurrency costs 
• Data under management includes 3 key layers – Raw, Foundation and Access and Performance layers. 
• Data normally loaded into Raw and Foundation layers BUT BI Apps loads data directly into APL and federated warehouses may 
well also load data at aggregate level from federated operating companies. 
• Data Factory is responsible for loading and then managing data between layers. 
• Work is done to elevate the data between layers – typically further enriching and improving data quality. 
• Work done in processing the data between the layers significantly reduce query costs. i.e. higher levels of concurrency can be 
sustained for the same processing power. 
• Increasing formalisation of definition
Information Management – Logical View 
Data Layers – Analytical processing 
MAacncaegsesd & D Patear formance Layer 
Foundation Data Layer 
Raw Data Reservoir 
• Analytical processing capabilities of Hadoop and RDBMS used to elevate data between layers as previously described. 
• These analytical capabilities can also be leveraged by tools that access the data directly. 
Typically this would be by a Data Scientist for Discovery Lab operations or BI Tools and Services that are processing data using 
a model previously defined by the Data Scientist. 
OLAP 
Data Mining 
Statistics 
OLAP 
Text Mining 
Other 
Analytical 
Processing 
Data Mining 
Text Mining 
Image 
Processing 
• Increasing enrichment 
• Increasing data quality 
• Reducing concurrency costs 
• Increasing formalisation of definition
Information Management – Logical View 
Data Layers – Raw Data Reservoir 
MAacncaegsesd & D Patear formance Layer 
Foundation Data Layer 
Raw Data Reservoir 
Immutable raw data reservoir 
Raw data at rest is not interpreted 
Immutable modelled data. Business 
Process Neutral form. Abstracted 
from business process changes 
Past, current and future interpretation of 
enterprise data. Structured to support 
agile access & navigation 
• Immutable data store with data at lowest level of grain. 
• Typically implemented in Hadoop or NoSQL for cost reasons but not always. 
• May be: 
• Queries directly, 
• Used to derive base level data for Foundation Layer. Data may be represented logically in Foundation or physically as the 
store is immutable BUT this effects ILM policy. 
• or used to derive values or aggregates for Access and Performance layer. (e.g. propensity score or total monthly SMS’s)
Information Management – Logical View 
Data Layers – Foundation Data Layer 
MAacncaegsesd & D Patear formance Layer 
Foundation Data Layer 
Raw Data Reservoir 
Immutable raw data reservoir 
Raw data at rest is not interpreted 
Immutable modelled data. Business 
Process Neutral form. Abstracted 
from business process changes 
Past, current and future interpretation of 
enterprise data. Structured to support 
agile access & navigation 
• Immutable integrated and standardised store of enterprise class data. Stuff the business has agreed and organises around. 
• Data at lowest level of grain of value for Enterprise data. 
• Stored in business process neutral fashion to avoid data maintenance tasks to keep in step with current business interpretations. 
• Typically close to 3NF. Special attention to modelling hierarchy, flexible entity attributions, customer / supplier etc. 
• ONLY implemented in relational technology BUT this could be logical as previously noted in Raw Data Reservoir. 
• May be queries directly by a select few individuals. Wider access to detail data provided through views in APL, often with VPD 
implemented to prevent queries to antecedent data. 
• Data in the Foundation Layer should be retained for as long as possible. 
• Consideration should be given to retaining data in Raw Data Reservoir rather than archiving.
Information Management – Logical View 
Data Layers – Access and Performance Layer 
MAacncaegsesd & D Patear formance Layer 
Foundation Data Layer 
Raw Data Reservoir 
Immutable raw data reservoir 
Raw data at rest is not interpreted 
Immutable modelled data. Business 
Process Neutral form. Abstracted 
from business process changes 
Past, current and future interpretation of 
enterprise data. Structured to support 
agile access & navigation 
• Layer facilitates access, navigation and performance of queries. 
• Allows for multiple interpretations of data from Foundation or Raw data Reservoir. 
• Most structures can be thrown away and re-built from scratch based on Foundation and Raw Reservoir. 
• The exception is derived and aggregate data which may have to be retained if the underlying data/mechanism is archived. 
• Most users presenting information in a standardised fashion on dashboards and reports will access this layer only.
Access and Performance Layer 
Information Interpretation 
Access & Performance Layer 
Foundation Data Layer 
Raw Data Reservoir 
• Data destined for Raw Data Reservoir may be loaded directly (e.g. through Flume) or may be stored temporarily in fs prior to 
loading (e.g. Fuse fs) 
• Relational data ingested in most appropriate mechanism before persisting in Foundation Data Layer (usual rules apply…) 
• Ideally micro batch using simplest mechanism possible 
• Only data of agreed quality loaded in FDL 
• For efficient loading relationally data may be pre-staged in fs so a large number of small files can be concatenated 
Information Management – Logical View 
Data Factory Ingestion flow 
Data Ingestion 
Batch & Real-Time 
ETL / ELT 
CDC 
Stream 
File Ops. 
Data Engines & 
Poly-structured 
sources 
Content 
Docs Web & Social Media 
SMS 
Structured 
Data 
Sources 
• Operational Data 
• COTS Data 
• Streaming & BAM 
Master & 
Reference 
Data Sources
Access and Performance Layer 
Data Ingestion 
Information Interpretation 
Access & Performance Layer 
Foundation Data Layer 
Raw Data Reservoir 
Flow shown: 
1. Data to be formalised from HDFS store extracted and loaded into Foundation Data Layer. 
e.g. where Flume/HDFS is being used as an ETL pre-processor for Enterprise Data 
or where HDFS data is being logically modelled in the foundation layer 
2. Data is re-structured and/or aggregated to facilitate access by users and business processes 
3. Data may also be re-structured and/or aggregated from HDFS store where there are no specific 
requirements to manage Enterprise Data in a more formal data store over time 
1 
2 
3 
Information Management – Logical View 
Data Factory intra data processing flow
Access and Performance Layer 
Information Management – Logical View 
Information Provisioning – BI & Data Science Components 
Federation 
Enterprise 
Performance 
Management 
Pre-built & 
Ad-hoc BI Assets 
Information 
Services 
Data Ingestion 
Information Interpretation 
Access & Performance Layer 
Foundation Data Layer 
Raw Data Reservoir 
Virtualisation & 
Query • Data Virtualisation and the various components to access the data are as per our previous view on BI tools. 
• By far the majority of users will access data via Access and Performance Layer although data may come from Raw Store or Foundation 
• Data Virtualisation is a key components that helps to deliver tools independence, services integration and a future state roadmap 
• Big Data has focused considerable attention on Data Science 
• Analytical capabilities delivered through analytical processing in the data layers and Advanced Analytical Tools used to drive capabilities 
• Data Mining in particular often involves complex data processing to flatten data into a longitudinal form. This derived data and model results are 
typically written to a project based sandbox. 
• Agile discovery is often best served through a separate Discovery Lab infrastructure (see later details) 
Data Science
Access and Performance Layer 
Information Management – Logical View 
Information Provisioning Typical BI Flows 
Virtualisation & 
Query Federation 
Enterprise 
Performance 
Management 
Pre-built & 
Ad-hoc BI Assets 
Information 
Services 
Data Science 
Data Ingestion 
Information Interpretation 
Access & Performance Layer 
Foundation Data Layer 
Raw Data Reservoir 
2 
3 
1. Typical access mechanism for Enterprise data via Access and Performance layer structures 
2. Access to Foundation Layer Data to specific functions, processes and users only 
3. Data interpretation & DQ assured through encoded logic, Avro, SerDe, FileReader, HCat etc. 
4. Diagonal flows shows how data can be joined between layers as well as accessed directly. e.g. Raw Data 
can be queried directly through HIVE connector or joined to the RDBMS data and queried. 
1 
4 
4
Information Management – Logical View 
Data / Information Quality 
Access and Performance Layer 
Data Ingestion 
Information Interpretation 
Access & Performance Layer 
Foundation Data Layer 
Raw Data Reservoir 
Virtualisation & 
Query Federation 
Enterprise 
Performance 
Management 
Pre-built & 
Ad-hoc BI Assets 
Information 
Services 
Data Science 
 Quality of data at rest assured by a number of factors in addition to the underlying quality of data at source 
– File and event handling to ensure data is not missed (e.g. missing log files assured by log file sequence numbering) 
– The processing of data between Raw and FDL / APL layers. This can be seen as a DQ firewall to ensure only data of known and 
acceptable quality is loaded. Typically this involves an element of synchronisation as some data will need to be held off until required 
reference data is available due to the micro-batch incremental loading approach. 
 Quality of information presented to downstream tools and services determined by 
– Model quality, understanding and performance of provisioning from modelled layers 
– Consistency of definition, code quality and query performance when accessing Hadoop data (e.g. HR code, Avro definition…)
Access and Performance Layer 
Information Management – Logical View 
Information Provisioning Direct Flow from Source Systems 
Virtualisation & 
Query Federation 
Enterprise 
Performance 
Management 
Pre-built & 
Ad-hoc BI Assets 
Information 
Services 
Data Science 
Data Ingestion 
Information Interpretation 
Access & Performance Layer 
Foundation Data Layer 
Raw Data Reservoir 
• Direct access from source systems to BI and Discovery or through the Data Virtualisation layer is also possible 
• This is a fairly typical requirement for EPM and Data Science. Much less common for general BI other than as 
part of a temporary expedient. 
Data Sources 
Data Engines & 
Poly-structured 
sources 
Content 
Docs Web & Social Media 
SMS 
Structured 
Data 
Sources 
• Operational Data 
• COTS Data 
• Streaming & BAM 
Master & 
Reference 
Data Sources 
Immutable raw data reservoir 
Raw data at rest is not interpreted 
Immutable modelled data. Business 
Process Neutral form. Abstracted 
from business process changes 
Past, current and future interpretation of 
enterprise data. Structured to support 
agile access & navigation
Information Management – Logical View 
Information Provisioning Direct Flow from Source Systems 
Virtualisation & 
Query Federation 
Enterprise 
Performance 
Management 
Pre-built & 
Ad-hoc BI Assets 
Information 
Services 
Data Science 
Data Ingestion 
Information Interpretation 
Access & Performance Layer 
Foundation Data Layer 
Raw Data Reservoir 
• Another view showing how the quality of data is altered between stores 
Data Sources 
Data Engines & 
Poly-structured 
sources 
Content 
Docs Web & Social Media 
SMS 
Structured 
Data 
Sources 
• Operational Data 
• COTS Data 
• Streaming & BAM 
Master & 
Reference 
Data Sources
Information Management – Logical View 
Virtualisation & 
Query Federation 
Enterprise 
Performance 
Management 
Pre-built & 
Ad-hoc 
BI Assets 
Information 
Services 
Data Ingestion 
Information Interpretation 
Access & Performance Layer 
Foundation Data Layer 
Raw Data Reservoir 
Data 
Science 
Data Engines & 
Poly-structured 
sources 
Content 
Docs Web & Social Media 
SMS 
Structured 
Data 
Sources 
• Operational Data 
• COTS Data 
• Streaming & BAM 
Immutable raw data reservoir 
Raw data at rest is not interpreted 
Immutable modelled data. Business 
Process Neutral form. Abstracted 
from business process changes 
Past, current and future interpretation of 
enterprise data. Structured to support 
agile access & navigation 
Discovery Lab Sandboxes Rapid Development Sandboxes 
Project based data stores 
to support specific 
discovery objectives 
Project based data stored 
to facilitate rapid content / 
presentation delivery 
Data Sources 
Data Reservoir & Enterprise Information Store – complete view 
Master & 
Reference 
Data Sources
Discovery Lab Sandboxes
Data Mining Method – Conceptual Map 
Data 
Understand 
Prepare 
Data 
Model 
Evaluate 
Deploy 
Monitor 
Discovery 
Business 
Goals 
• Data scientist led discovery 
• Domain expertise also critical 
• Wide range of tools & data 
• Data preparation is a significant challenge 
• Able to quickly mashup & transform data
Data Mining Method – Conceptual Map 
Data 
Understand 
Prepare 
Data 
Model 
Evaluate 
Deploy 
Monitor 
Discovery 
Business 
Goals 
• Choice of deployment options 
• Organisational learning 
• Automated event and/or response 
(e.g. inbound call and CSR support) 
• Manual list generation based on detected risk events 
• Tools support depending on deployment option 
• Visualisations, numerical presentation…etc 
• Provision for Marketing Analyst data mashup
Data Mining Method – Conceptual Map 
Data 
Understand 
Prepare 
Data 
Model 
Evaluate 
Deploy 
Monitor 
Discovery 
Business 
Goals 
• Agile incorporation into standard reporting framework 
• Expose new risk indicators and interventions 
• Track model lift and trigger perturbation or rebuild 
automatic or Data Science led activity
Analysis Processing & Delivery 
Discovery Lab & Data Science Tooling 
Data Reservoir & Enterprise Data 
Data 
Science 
(Primary 
Toolset) 
Statistics Tools 
Data & Text Mining Tools 
Faceted Query Tools 
Programming & Scripting 
Data Modeling Tools 
Query & Search Tools 
Pre-Built 
Intelligence 
Assets 
Intelligence 
Analysis 
Tools 
Ad Hoc Query 
& Analysis Tools 
OLAP Tools 
Forecasting & 
Simulation Tools 
Reporting Tools 
Data 
Scientist 
Virtualisation & 
Information Services 
Data Factory 
flow 
1. Data Factory responsible for 
access provisioning to data 
or replication (all or sample) 
to Sandbox in Discovery Lab. 
2. Direct connection from Data 
Science tools and analysis 
sandbox. Data Science tools 
read and write data from/to 
project sandboxes. 
3. Data Scientist can also 
access standard dashboards, 
reports and KPI’s through 
Data Virtualisation layer 
Data Quality & Profiling 
Graphical rendering tools 
Dashboards & Reports 
Scorecards 
Charts & Graphs 
Sandbox – Project 3 
Sandbox – Project 2 
Sandbox – Project 1 
2 
Data store 
Analytical 
Processing 
Information Management – Logical View 
Discovery Lab data flow 
General BI 
flow 
3 
1
Rapid Development 
Sandboxes
Analysis Processing & Delivery 
Development Environment Tooling 
Pre-Built 
Intelligence 
Assets 
Intelligence 
Analysis 
Tools 
Ad Hoc Query 
& Analysis Tools 
OLAP Tools 
Forecasting & 
Simulation Tools 
Reporting Tools 
BICC 
Virtualisation & 
Information Services 
Data Factory 
flow 
1. The majority of BI development 
activity will be from existing 
sources – developing new 
reports to existing or new 
channels. 
2. BICC or other expert users 
may quickly develop new 
reporting through mashups 
from any available sources. 
Careful governance is required 
once the report is completed to 
ensure data and report are 
professionally managed. 
Dashboards & Reports 
Scorecards 
Charts & Graphs 
Sandbox – Project 3 
Sandbox – Project 2 
Dev Sandbox – Project 1 
Information Management – Logical View 
Discovery Lab data flow 
2 
Data Reservoir & Enterprise Data 
1 
2 
General BI 
flow
R/T event Engine – Logical 
View and Components
Real-time 
Data Engine 
To Event Subscribers 
(Events / Data) 
Privacy Filter 
Data Transform 
Rules & Models 
Mediation 
Next Best Action 
Real-Time 
Data Store 
From Input Events 
Reference 
Data 
Models 
& Rules 
Privacy 
Data 
Analytics 
Real-Time Data Engine – Logical View 
Business Activity Monitoring 
Real-Time event 
monitoring
Real-Time Data Engine 
 Message mediation service 
 Privacy filter for event data. i.e. apply customer specified privacy 
and preference filters to the data stream 
 Transformation of the message data to outbound form 
 Apply declarative rules and models to the data stream to detect 
events for further downstream processing 
 Next Best Activity (NBA) event detection and processing. NBA 
typically also includes control group management and global 
optimisation of rules 
 Business Activity Monitoring 
 Local data store – local persistence of rules and metadata 
Components 
Privacy Filter 
Data Transform 
Rules & Models 
Mediation 
Next Best Action 
Real-Time Data 
Store 
BAM
Mapping from the previous 
release of the architecture
Oracle’s Information Management Reference Architecture (3rd Edition) 
 More relevant to Big Data oriented audience 
 Better representation of pragmatic customer projects 
 Includes Raw data store as part of the architecture 
 Show effort / cost to store and interpret data that separates 
schema-on-read and schema-on-write approaches 
 Aligned to Analytics 3.0 
 Consistent with Oracle’s engineering efforts 
What’s changed?
Oracle’s Information Management Reference Architecture (3rd Edition) 
“All those layers and definitions in your 
Reference Architecture, I just don’t get 
it… and it looks complicated !” 
Hadoop developer knee deep in complex Map:Reduce code 
What’s changed? 
Business 
Trends 
Technology 
Trends 
Data 
Trends
Information Management Reference Architecture 
Version 2.0 of the Architecture
Information Management Reference Architecture 
Interpretation layer 
shows the relative cost 
of reading data 
depending on its 
location 
Previous staging layer 
now split into Data 
Ingestion and Raw 
store. 
Ingestion layer 
includes methods and 
processes to load data 
and manage Data 
Quality. Shape 
represents the relative 
cost of these 
processes. i.e. from 
none for HDFS to lots 
in APL. 
Raw Reservoir is 
typically at the lowest 
level of grain. Often 
lower than the 
enterprise cares about 
and so may not have 
been included in 
previous 
representation. 
Renamed from 
Knowledge Discovery 
to Discovery Lab but 
otherwise unchanged. 
The role of Discovery 
Labs is becoming 
more central though so 
additional operational 
guidance will be 
added. 
Discovery Lab 
Still an immutable 
store but may be 
physically 
implemented in 
relational or non-relational 
technologies 
Key differences from 2.0 to 3.0 of the Architecture
BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)

Mais conteúdo relacionado

Mais procurados

Bi presentation Designing and Implementing Business Intelligence Systems
Bi presentation   Designing and Implementing Business Intelligence SystemsBi presentation   Designing and Implementing Business Intelligence Systems
Bi presentation Designing and Implementing Business Intelligence SystemsVispi Munshi
 
Asug SAP HANA Presentation - Perceptive Technologies SAP
Asug SAP HANA Presentation - Perceptive Technologies SAPAsug SAP HANA Presentation - Perceptive Technologies SAP
Asug SAP HANA Presentation - Perceptive Technologies SAPBrendan Kane
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecturepcherukumalla
 
Datawarehousing and Business Intelligence
Datawarehousing and Business IntelligenceDatawarehousing and Business Intelligence
Datawarehousing and Business IntelligencePrithwis Mukerjee
 
Business intelligence in the real time economy
Business intelligence in the real time economyBusiness intelligence in the real time economy
Business intelligence in the real time economyJohan Blomme
 
Types of business intelligence tools
Types of business intelligence toolsTypes of business intelligence tools
Types of business intelligence toolsgreenliondigital
 
Business Intelligence Presentation (1/2)
Business Intelligence Presentation (1/2)Business Intelligence Presentation (1/2)
Business Intelligence Presentation (1/2)Bernardo Najlis
 
Business Analysis, Query Tools, Dm unit-3
Business Analysis, Query Tools, Dm unit-3Business Analysis, Query Tools, Dm unit-3
Business Analysis, Query Tools, Dm unit-3Dr. Sunil Kr. Pandey
 
Datawarehouse & bi introduction
Datawarehouse & bi introductionDatawarehouse & bi introduction
Datawarehouse & bi introductionguest7b34c2
 
Overview of business intelligence
Overview of business intelligenceOverview of business intelligence
Overview of business intelligenceAhsan Kabir
 
Business Intelligence
Business IntelligenceBusiness Intelligence
Business IntelligenceSukirti Garg
 
Rev_3 Components of a Data Warehouse
Rev_3 Components of a Data WarehouseRev_3 Components of a Data Warehouse
Rev_3 Components of a Data WarehouseRyan Andhavarapu
 
Data warehouse 101-fundamentals-
Data warehouse 101-fundamentals-Data warehouse 101-fundamentals-
Data warehouse 101-fundamentals-AshishGuleria
 
Business Intelligence Data Warehouse System
Business Intelligence Data Warehouse SystemBusiness Intelligence Data Warehouse System
Business Intelligence Data Warehouse SystemKiran kumar
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouseganblues
 
Warehouse components
Warehouse componentsWarehouse components
Warehouse componentsganblues
 
Business intelligence overview
Business intelligence overviewBusiness intelligence overview
Business intelligence overviewCanara bank
 
BUSINESS INTELLIGENCE OVERVIEW & APPLICATIONS
BUSINESS INTELLIGENCE OVERVIEW & APPLICATIONSBUSINESS INTELLIGENCE OVERVIEW & APPLICATIONS
BUSINESS INTELLIGENCE OVERVIEW & APPLICATIONSGeorge Krasadakis
 

Mais procurados (20)

Bi presentation Designing and Implementing Business Intelligence Systems
Bi presentation   Designing and Implementing Business Intelligence SystemsBi presentation   Designing and Implementing Business Intelligence Systems
Bi presentation Designing and Implementing Business Intelligence Systems
 
Asug SAP HANA Presentation - Perceptive Technologies SAP
Asug SAP HANA Presentation - Perceptive Technologies SAPAsug SAP HANA Presentation - Perceptive Technologies SAP
Asug SAP HANA Presentation - Perceptive Technologies SAP
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
 
Datawarehousing and Business Intelligence
Datawarehousing and Business IntelligenceDatawarehousing and Business Intelligence
Datawarehousing and Business Intelligence
 
Business intelligence in the real time economy
Business intelligence in the real time economyBusiness intelligence in the real time economy
Business intelligence in the real time economy
 
Types of business intelligence tools
Types of business intelligence toolsTypes of business intelligence tools
Types of business intelligence tools
 
Business Intelligence Presentation (1/2)
Business Intelligence Presentation (1/2)Business Intelligence Presentation (1/2)
Business Intelligence Presentation (1/2)
 
Business Analysis, Query Tools, Dm unit-3
Business Analysis, Query Tools, Dm unit-3Business Analysis, Query Tools, Dm unit-3
Business Analysis, Query Tools, Dm unit-3
 
Datawarehouse & bi introduction
Datawarehouse & bi introductionDatawarehouse & bi introduction
Datawarehouse & bi introduction
 
Overview of business intelligence
Overview of business intelligenceOverview of business intelligence
Overview of business intelligence
 
Business Intelligence
Business IntelligenceBusiness Intelligence
Business Intelligence
 
Rev_3 Components of a Data Warehouse
Rev_3 Components of a Data WarehouseRev_3 Components of a Data Warehouse
Rev_3 Components of a Data Warehouse
 
Data warehouse 101-fundamentals-
Data warehouse 101-fundamentals-Data warehouse 101-fundamentals-
Data warehouse 101-fundamentals-
 
Business Intelligence Data Warehouse System
Business Intelligence Data Warehouse SystemBusiness Intelligence Data Warehouse System
Business Intelligence Data Warehouse System
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
Warehouse components
Warehouse componentsWarehouse components
Warehouse components
 
Business intelligence overview
Business intelligence overviewBusiness intelligence overview
Business intelligence overview
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
BUSINESS INTELLIGENCE OVERVIEW & APPLICATIONS
BUSINESS INTELLIGENCE OVERVIEW & APPLICATIONSBUSINESS INTELLIGENCE OVERVIEW & APPLICATIONS
BUSINESS INTELLIGENCE OVERVIEW & APPLICATIONS
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 

Semelhante a BI Masterclass slides (Reference Architecture v3)

Эволюция Big Data и Information Management. Reference Architecture.
Эволюция Big Data и Information Management. Reference Architecture.Эволюция Big Data и Information Management. Reference Architecture.
Эволюция Big Data и Information Management. Reference Architecture.Andrey Akulov
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefitsRicky Barron
 
dw_concepts_2_day_course.ppt
dw_concepts_2_day_course.pptdw_concepts_2_day_course.ppt
dw_concepts_2_day_course.pptDougSchoemaker
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.pptSumathiG8
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Nathan Bijnens
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousingwork
 
Big Data's Impact on the Enterprise
Big Data's Impact on the EnterpriseBig Data's Impact on the Enterprise
Big Data's Impact on the EnterpriseCaserta
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSSDeepali Raut
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?RTTS
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.pptPalaniKumarR2
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceCaserta
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysNEWYORKSYS-IT SOLUTIONS
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing conceptspcherukumalla
 
Dataware housing
Dataware housingDataware housing
Dataware housingwork
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guidethomasmary607
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricNathan Bijnens
 
Creating a Successful DataOps Framework for Your Business.pdf
Creating a Successful DataOps Framework for Your Business.pdfCreating a Successful DataOps Framework for Your Business.pdf
Creating a Successful DataOps Framework for Your Business.pdfEnov8
 

Semelhante a BI Masterclass slides (Reference Architecture v3) (20)

Эволюция Big Data и Information Management. Reference Architecture.
Эволюция Big Data и Information Management. Reference Architecture.Эволюция Big Data и Information Management. Reference Architecture.
Эволюция Big Data и Information Management. Reference Architecture.
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefits
 
dw_concepts_2_day_course.ppt
dw_concepts_2_day_course.pptdw_concepts_2_day_course.ppt
dw_concepts_2_day_course.ppt
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
Big Data's Impact on the Enterprise
Big Data's Impact on the EnterpriseBig Data's Impact on the Enterprise
Big Data's Impact on the Enterprise
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSS
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Data Warehouse 101
Data Warehouse 101Data Warehouse 101
Data Warehouse 101
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
 
Planning Data Warehouse
Planning Data WarehousePlanning Data Warehouse
Planning Data Warehouse
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing concepts
 
Dataware housing
Dataware housingDataware housing
Dataware housing
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guide
 
Data mining notes
Data mining notesData mining notes
Data mining notes
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft Fabric
 
Creating a Successful DataOps Framework for Your Business.pdf
Creating a Successful DataOps Framework for Your Business.pdfCreating a Successful DataOps Framework for Your Business.pdf
Creating a Successful DataOps Framework for Your Business.pdf
 

Último

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 

Último (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 

BI Masterclass slides (Reference Architecture v3)

  • 1. Information Management Reference Architecture EMEA Enterprise Architecture
  • 2. Contents  Introduction  Conceptual view  Design Patterns  IM Logical view and component outline  Discovery Lab  R/T Event Engine logical view  Mapping to previous Reference Architecture release
  • 4. Introduction  This PPT documents the main architectural components of Oracle’s Information Management Reference Architecture.  The architecture is intended to be practical and pragmatic, with many of the ideas and experiences that inform the approach dating back almost 20 years in Oracle.  These ideas and concepts have been continually refined through the engagement of our Enterprise Architecture team on real world customer engagements. 3rd Evolution of Oracle’s Information Management Reference Architecture
  • 5. What is Information Management “Information Management is the means by which an organisation maximises the efficiency with which it plans, collects, organises, uses, controls, stores, disseminates, and disposes of its Information, and through which it ensures that the value of that information is identified and exploited to the maximum extent possible” We define Information Management to mean…
  • 6. Aligning analytical requirements and IM architecture Enabling Analytics 3.0 with a pragmatic architecture Analytics 2.0 Analytics 3.0 Analytics 1.0 • Reporting with limited use of descriptive analytics • Limited range of tabular data • Batch oriented analysis • Analysis bolted onto limited set of business processes • Firms “Competing on Analytics” • Extended analytics to larger and less structured datasets • Emergence of Big Data into the commercial world • Recognition of Data Science role in commercial orgs. • Platform for monetisation • Deeper analysis & more data • Faster test-do-learn iterations • Different types of data & wider business process coverage • Analysts focus on discovery and driving business value • “Agile” with operational elements incorporated into design patterns Adapted from Tom Davenport material
  • 8. Actionable Events Event Engine Data Reservoir Data Factory Enterprise Information Store Reporting Discovery Lab Actionable Information Actionable Insights Data Streams Execution Innovation Discovery Output Events & Data Conceptual View Structured Enterprise Data Other Data
  • 9. Component Outline Event Engine Respond to R/T events in appropriate and/or optimised fashion Data Reservoir Raw data Reservoir – typically event data at lowest grain Data Factory Managed ETL onto, within and between platforms Enterprise Data Data stores for Information Management Reporting BI tools and infrastructure components Discovery Lab Platform, data and tools to support discovery process Execution – things you do every day Innovation – innovation to drive tomorrows business Line of Governance! Discovery Output – Possible outputs include new knowledge, mining models / parameters, scored data…
  • 11. Design Pattern: Discovery Lab  Specific focus on identifying commercial value for exploitation  Small group of highly skilled individuals (aka Data Scientists)  Iterative development approach – data oriented NOT development oriented  Wide range of tools and techniques applied  Data provisioned through Data Factory or own ETL  Typically separate infrastructure but could also be unified Reservoir if resource managed effectively
  • 12. Design Pattern : Information Platform  Build the next generation Information Management platform  Either Business Strategy driven or IT cost / capability driven initiative  Initial project may be specifically linked to lower data grain or retention BUT it is the platform as a whole that forms the solution required  Platform for consolidating other IM assets onto  Key issues related to differences in procurement, development process, governance and skills differences  Discovery Lab may be implemented as a pragmatic initial POV.
  • 13. Design Pattern : Data Application  Big Data technologies applied to a specific business problem e.g. Genome sequence analysis using BLAST or log data from pharmaceutical production plant and machinery required for traceability  Limited or no integration to broader Information Management estate  Specific solution so Non-functional requirements have less impact on solution quality or long term costs  Platform costs and scalability are important considerations
  • 14. Design Pattern: Information Solution  Specific solution based on Big Data technologies requiring broader integration to the wider Information Management estate e.g. ETL pre-processor for the DW or affordably store a lower level of grain  Non-functional requirements more critical in this solution  Scalable integration to IM estate an important factor for success  Analysis may take place in Reservoir or Reservoir only used as an aggregator
  • 15. Design Pattern: Real-Time Events  May take place at multiple locations between place of data origination and the Data Centre – requiring careful design and implementation  May include Next-Best-Activity, declarative rules and Data Mining technologies to optimise decisions. i.e. optimise across declarative, data mining, customer preference & business-defined rules  May include considerations for personal preferences and privacy (e.g. opt-out) for customer related events  Common component seen across many industries & markets e.g. connected vehicle Real-Time optimisation of events
  • 16. Design Pattern against component usage map Design pattern Discovery Lab Information Platform Data Application Information Solution R/T Events Outline Data science lab Assess the value of the data Next Generation information platform to align IM capability with business strategy Addressing a specific data problem in Hadoop with no broader integration required. Addressing a specific data problem but requires broader enterprise wide integrations. e.g. ETL pre-processing, Event Store at lower grain than existing DW Execution platform to respond to R/T events Examples Gov. Healthcare Mobile operator Spanish Bank (Business led) UK Gov. Dept. (Tech. led) Pharma Genome project Pharma production archive Investment Bank – trade risk Mobile Operator – ETL processing Mobile operator – location based offers Data Engine Possible Yes Data Reservoir Yes Yes Yes Data Factory Yes Yes Yes Enterprise Data Yes Reporting Yes Discovery Lab Yes Implied Alternative approach to Reservoir + Factory above
  • 17. IM Logical View and Components
  • 18. Information Management – Logical View Data Sources Data Ingestion Methods and process to load data into our managed data store and manage data quality • Contemporary Information Management solutions must be able to ingest any type of data from any source in any format and mechanism and at any frequency. e.g. Flat file loads, streaming… • The data may be highly unstructured, mono-structured or highly poly-structured. • Data will vary in volume and in Data Quality. • Operational isolation should be considered to ensure operational applications will continue in the event of the loss of the Information Management system Data Engines & Poly-structured sources Content Docs Web & Social Media SMS Structured Data Sources • Operational Data • COTS Data • Streaming & BAM Master & Reference Data Sources
  • 19. Information Management – Logical View Information Ingestion Data Ingestion Information Interpretation Methods and process to load data and manage Data Quality Methods and process needed to access information Managed Data Load All data under management Query • Data structure and processing required to load data into managed data stores • Shape represents the work done on the data to load data and/or process between layers • Layer may include file mechanism where required to facilitate loading (e.g. Fuse fs or ZFS for operational isolation and file concat) • Normal rules of micro-batch, taking all the data and KISS principles recommended • DQ and loading stats presented through BI dashboards as a non-judgemental mechanism to improve DQ. • Data may be landed in the Ingestion layer to facilitate loading but not typically stored for any length of time. e.g. Raw data loaded from web logs but sessionised data then loaded to Raw. Another example is data used to manage CDC may be stored in this layer.
  • 20. Information Management – Logical View Data Interpretation Data Ingestion Information Interpretation Methods and process to load data and manage Data Quality Methods and process needed to access information Managed Data Load All data under management Query • Methods and processes required to access information in each of the stores • Shape represents the cost of interpreting the data under management • For schema-on-read the cost may include the AVRO, SerDe or reader class as well as the associated processing code to select, filter and process the data. • For schema-on-write the cost is represented by the complexity of the SQL required to access the data only – more complex typically for 3NF than for a dimensional query.
  • 21. Information Management – Logical View Data Layers – cost, quality and concurrency trade off MAacncaegsesd & D Patear formance Layer Foundation Data Layer Raw Data Reservoir Immutable raw data reservoir Raw data at rest is not interpreted Immutable modelled data. Business Process Neutral form. Abstracted from business process changes Past, current and future interpretation of enterprise data. Structured to support agile access & navigation • Increasing enrichment • Increasing data quality • Reducing concurrency costs • Data under management includes 3 key layers – Raw, Foundation and Access and Performance layers. • Data normally loaded into Raw and Foundation layers BUT BI Apps loads data directly into APL and federated warehouses may well also load data at aggregate level from federated operating companies. • Data Factory is responsible for loading and then managing data between layers. • Work is done to elevate the data between layers – typically further enriching and improving data quality. • Work done in processing the data between the layers significantly reduce query costs. i.e. higher levels of concurrency can be sustained for the same processing power. • Increasing formalisation of definition
  • 22. Information Management – Logical View Data Layers – Analytical processing MAacncaegsesd & D Patear formance Layer Foundation Data Layer Raw Data Reservoir • Analytical processing capabilities of Hadoop and RDBMS used to elevate data between layers as previously described. • These analytical capabilities can also be leveraged by tools that access the data directly. Typically this would be by a Data Scientist for Discovery Lab operations or BI Tools and Services that are processing data using a model previously defined by the Data Scientist. OLAP Data Mining Statistics OLAP Text Mining Other Analytical Processing Data Mining Text Mining Image Processing • Increasing enrichment • Increasing data quality • Reducing concurrency costs • Increasing formalisation of definition
  • 23. Information Management – Logical View Data Layers – Raw Data Reservoir MAacncaegsesd & D Patear formance Layer Foundation Data Layer Raw Data Reservoir Immutable raw data reservoir Raw data at rest is not interpreted Immutable modelled data. Business Process Neutral form. Abstracted from business process changes Past, current and future interpretation of enterprise data. Structured to support agile access & navigation • Immutable data store with data at lowest level of grain. • Typically implemented in Hadoop or NoSQL for cost reasons but not always. • May be: • Queries directly, • Used to derive base level data for Foundation Layer. Data may be represented logically in Foundation or physically as the store is immutable BUT this effects ILM policy. • or used to derive values or aggregates for Access and Performance layer. (e.g. propensity score or total monthly SMS’s)
  • 24. Information Management – Logical View Data Layers – Foundation Data Layer MAacncaegsesd & D Patear formance Layer Foundation Data Layer Raw Data Reservoir Immutable raw data reservoir Raw data at rest is not interpreted Immutable modelled data. Business Process Neutral form. Abstracted from business process changes Past, current and future interpretation of enterprise data. Structured to support agile access & navigation • Immutable integrated and standardised store of enterprise class data. Stuff the business has agreed and organises around. • Data at lowest level of grain of value for Enterprise data. • Stored in business process neutral fashion to avoid data maintenance tasks to keep in step with current business interpretations. • Typically close to 3NF. Special attention to modelling hierarchy, flexible entity attributions, customer / supplier etc. • ONLY implemented in relational technology BUT this could be logical as previously noted in Raw Data Reservoir. • May be queries directly by a select few individuals. Wider access to detail data provided through views in APL, often with VPD implemented to prevent queries to antecedent data. • Data in the Foundation Layer should be retained for as long as possible. • Consideration should be given to retaining data in Raw Data Reservoir rather than archiving.
  • 25. Information Management – Logical View Data Layers – Access and Performance Layer MAacncaegsesd & D Patear formance Layer Foundation Data Layer Raw Data Reservoir Immutable raw data reservoir Raw data at rest is not interpreted Immutable modelled data. Business Process Neutral form. Abstracted from business process changes Past, current and future interpretation of enterprise data. Structured to support agile access & navigation • Layer facilitates access, navigation and performance of queries. • Allows for multiple interpretations of data from Foundation or Raw data Reservoir. • Most structures can be thrown away and re-built from scratch based on Foundation and Raw Reservoir. • The exception is derived and aggregate data which may have to be retained if the underlying data/mechanism is archived. • Most users presenting information in a standardised fashion on dashboards and reports will access this layer only.
  • 26. Access and Performance Layer Information Interpretation Access & Performance Layer Foundation Data Layer Raw Data Reservoir • Data destined for Raw Data Reservoir may be loaded directly (e.g. through Flume) or may be stored temporarily in fs prior to loading (e.g. Fuse fs) • Relational data ingested in most appropriate mechanism before persisting in Foundation Data Layer (usual rules apply…) • Ideally micro batch using simplest mechanism possible • Only data of agreed quality loaded in FDL • For efficient loading relationally data may be pre-staged in fs so a large number of small files can be concatenated Information Management – Logical View Data Factory Ingestion flow Data Ingestion Batch & Real-Time ETL / ELT CDC Stream File Ops. Data Engines & Poly-structured sources Content Docs Web & Social Media SMS Structured Data Sources • Operational Data • COTS Data • Streaming & BAM Master & Reference Data Sources
  • 27. Access and Performance Layer Data Ingestion Information Interpretation Access & Performance Layer Foundation Data Layer Raw Data Reservoir Flow shown: 1. Data to be formalised from HDFS store extracted and loaded into Foundation Data Layer. e.g. where Flume/HDFS is being used as an ETL pre-processor for Enterprise Data or where HDFS data is being logically modelled in the foundation layer 2. Data is re-structured and/or aggregated to facilitate access by users and business processes 3. Data may also be re-structured and/or aggregated from HDFS store where there are no specific requirements to manage Enterprise Data in a more formal data store over time 1 2 3 Information Management – Logical View Data Factory intra data processing flow
  • 28. Access and Performance Layer Information Management – Logical View Information Provisioning – BI & Data Science Components Federation Enterprise Performance Management Pre-built & Ad-hoc BI Assets Information Services Data Ingestion Information Interpretation Access & Performance Layer Foundation Data Layer Raw Data Reservoir Virtualisation & Query • Data Virtualisation and the various components to access the data are as per our previous view on BI tools. • By far the majority of users will access data via Access and Performance Layer although data may come from Raw Store or Foundation • Data Virtualisation is a key components that helps to deliver tools independence, services integration and a future state roadmap • Big Data has focused considerable attention on Data Science • Analytical capabilities delivered through analytical processing in the data layers and Advanced Analytical Tools used to drive capabilities • Data Mining in particular often involves complex data processing to flatten data into a longitudinal form. This derived data and model results are typically written to a project based sandbox. • Agile discovery is often best served through a separate Discovery Lab infrastructure (see later details) Data Science
  • 29. Access and Performance Layer Information Management – Logical View Information Provisioning Typical BI Flows Virtualisation & Query Federation Enterprise Performance Management Pre-built & Ad-hoc BI Assets Information Services Data Science Data Ingestion Information Interpretation Access & Performance Layer Foundation Data Layer Raw Data Reservoir 2 3 1. Typical access mechanism for Enterprise data via Access and Performance layer structures 2. Access to Foundation Layer Data to specific functions, processes and users only 3. Data interpretation & DQ assured through encoded logic, Avro, SerDe, FileReader, HCat etc. 4. Diagonal flows shows how data can be joined between layers as well as accessed directly. e.g. Raw Data can be queried directly through HIVE connector or joined to the RDBMS data and queried. 1 4 4
  • 30. Information Management – Logical View Data / Information Quality Access and Performance Layer Data Ingestion Information Interpretation Access & Performance Layer Foundation Data Layer Raw Data Reservoir Virtualisation & Query Federation Enterprise Performance Management Pre-built & Ad-hoc BI Assets Information Services Data Science  Quality of data at rest assured by a number of factors in addition to the underlying quality of data at source – File and event handling to ensure data is not missed (e.g. missing log files assured by log file sequence numbering) – The processing of data between Raw and FDL / APL layers. This can be seen as a DQ firewall to ensure only data of known and acceptable quality is loaded. Typically this involves an element of synchronisation as some data will need to be held off until required reference data is available due to the micro-batch incremental loading approach.  Quality of information presented to downstream tools and services determined by – Model quality, understanding and performance of provisioning from modelled layers – Consistency of definition, code quality and query performance when accessing Hadoop data (e.g. HR code, Avro definition…)
  • 31. Access and Performance Layer Information Management – Logical View Information Provisioning Direct Flow from Source Systems Virtualisation & Query Federation Enterprise Performance Management Pre-built & Ad-hoc BI Assets Information Services Data Science Data Ingestion Information Interpretation Access & Performance Layer Foundation Data Layer Raw Data Reservoir • Direct access from source systems to BI and Discovery or through the Data Virtualisation layer is also possible • This is a fairly typical requirement for EPM and Data Science. Much less common for general BI other than as part of a temporary expedient. Data Sources Data Engines & Poly-structured sources Content Docs Web & Social Media SMS Structured Data Sources • Operational Data • COTS Data • Streaming & BAM Master & Reference Data Sources Immutable raw data reservoir Raw data at rest is not interpreted Immutable modelled data. Business Process Neutral form. Abstracted from business process changes Past, current and future interpretation of enterprise data. Structured to support agile access & navigation
  • 32. Information Management – Logical View Information Provisioning Direct Flow from Source Systems Virtualisation & Query Federation Enterprise Performance Management Pre-built & Ad-hoc BI Assets Information Services Data Science Data Ingestion Information Interpretation Access & Performance Layer Foundation Data Layer Raw Data Reservoir • Another view showing how the quality of data is altered between stores Data Sources Data Engines & Poly-structured sources Content Docs Web & Social Media SMS Structured Data Sources • Operational Data • COTS Data • Streaming & BAM Master & Reference Data Sources
  • 33. Information Management – Logical View Virtualisation & Query Federation Enterprise Performance Management Pre-built & Ad-hoc BI Assets Information Services Data Ingestion Information Interpretation Access & Performance Layer Foundation Data Layer Raw Data Reservoir Data Science Data Engines & Poly-structured sources Content Docs Web & Social Media SMS Structured Data Sources • Operational Data • COTS Data • Streaming & BAM Immutable raw data reservoir Raw data at rest is not interpreted Immutable modelled data. Business Process Neutral form. Abstracted from business process changes Past, current and future interpretation of enterprise data. Structured to support agile access & navigation Discovery Lab Sandboxes Rapid Development Sandboxes Project based data stores to support specific discovery objectives Project based data stored to facilitate rapid content / presentation delivery Data Sources Data Reservoir & Enterprise Information Store – complete view Master & Reference Data Sources
  • 35. Data Mining Method – Conceptual Map Data Understand Prepare Data Model Evaluate Deploy Monitor Discovery Business Goals • Data scientist led discovery • Domain expertise also critical • Wide range of tools & data • Data preparation is a significant challenge • Able to quickly mashup & transform data
  • 36. Data Mining Method – Conceptual Map Data Understand Prepare Data Model Evaluate Deploy Monitor Discovery Business Goals • Choice of deployment options • Organisational learning • Automated event and/or response (e.g. inbound call and CSR support) • Manual list generation based on detected risk events • Tools support depending on deployment option • Visualisations, numerical presentation…etc • Provision for Marketing Analyst data mashup
  • 37. Data Mining Method – Conceptual Map Data Understand Prepare Data Model Evaluate Deploy Monitor Discovery Business Goals • Agile incorporation into standard reporting framework • Expose new risk indicators and interventions • Track model lift and trigger perturbation or rebuild automatic or Data Science led activity
  • 38. Analysis Processing & Delivery Discovery Lab & Data Science Tooling Data Reservoir & Enterprise Data Data Science (Primary Toolset) Statistics Tools Data & Text Mining Tools Faceted Query Tools Programming & Scripting Data Modeling Tools Query & Search Tools Pre-Built Intelligence Assets Intelligence Analysis Tools Ad Hoc Query & Analysis Tools OLAP Tools Forecasting & Simulation Tools Reporting Tools Data Scientist Virtualisation & Information Services Data Factory flow 1. Data Factory responsible for access provisioning to data or replication (all or sample) to Sandbox in Discovery Lab. 2. Direct connection from Data Science tools and analysis sandbox. Data Science tools read and write data from/to project sandboxes. 3. Data Scientist can also access standard dashboards, reports and KPI’s through Data Virtualisation layer Data Quality & Profiling Graphical rendering tools Dashboards & Reports Scorecards Charts & Graphs Sandbox – Project 3 Sandbox – Project 2 Sandbox – Project 1 2 Data store Analytical Processing Information Management – Logical View Discovery Lab data flow General BI flow 3 1
  • 40. Analysis Processing & Delivery Development Environment Tooling Pre-Built Intelligence Assets Intelligence Analysis Tools Ad Hoc Query & Analysis Tools OLAP Tools Forecasting & Simulation Tools Reporting Tools BICC Virtualisation & Information Services Data Factory flow 1. The majority of BI development activity will be from existing sources – developing new reports to existing or new channels. 2. BICC or other expert users may quickly develop new reporting through mashups from any available sources. Careful governance is required once the report is completed to ensure data and report are professionally managed. Dashboards & Reports Scorecards Charts & Graphs Sandbox – Project 3 Sandbox – Project 2 Dev Sandbox – Project 1 Information Management – Logical View Discovery Lab data flow 2 Data Reservoir & Enterprise Data 1 2 General BI flow
  • 41. R/T event Engine – Logical View and Components
  • 42. Real-time Data Engine To Event Subscribers (Events / Data) Privacy Filter Data Transform Rules & Models Mediation Next Best Action Real-Time Data Store From Input Events Reference Data Models & Rules Privacy Data Analytics Real-Time Data Engine – Logical View Business Activity Monitoring Real-Time event monitoring
  • 43. Real-Time Data Engine  Message mediation service  Privacy filter for event data. i.e. apply customer specified privacy and preference filters to the data stream  Transformation of the message data to outbound form  Apply declarative rules and models to the data stream to detect events for further downstream processing  Next Best Activity (NBA) event detection and processing. NBA typically also includes control group management and global optimisation of rules  Business Activity Monitoring  Local data store – local persistence of rules and metadata Components Privacy Filter Data Transform Rules & Models Mediation Next Best Action Real-Time Data Store BAM
  • 44. Mapping from the previous release of the architecture
  • 45. Oracle’s Information Management Reference Architecture (3rd Edition)  More relevant to Big Data oriented audience  Better representation of pragmatic customer projects  Includes Raw data store as part of the architecture  Show effort / cost to store and interpret data that separates schema-on-read and schema-on-write approaches  Aligned to Analytics 3.0  Consistent with Oracle’s engineering efforts What’s changed?
  • 46. Oracle’s Information Management Reference Architecture (3rd Edition) “All those layers and definitions in your Reference Architecture, I just don’t get it… and it looks complicated !” Hadoop developer knee deep in complex Map:Reduce code What’s changed? Business Trends Technology Trends Data Trends
  • 47. Information Management Reference Architecture Version 2.0 of the Architecture
  • 48. Information Management Reference Architecture Interpretation layer shows the relative cost of reading data depending on its location Previous staging layer now split into Data Ingestion and Raw store. Ingestion layer includes methods and processes to load data and manage Data Quality. Shape represents the relative cost of these processes. i.e. from none for HDFS to lots in APL. Raw Reservoir is typically at the lowest level of grain. Often lower than the enterprise cares about and so may not have been included in previous representation. Renamed from Knowledge Discovery to Discovery Lab but otherwise unchanged. The role of Discovery Labs is becoming more central though so additional operational guidance will be added. Discovery Lab Still an immutable store but may be physically implemented in relational or non-relational technologies Key differences from 2.0 to 3.0 of the Architecture