SlideShare uma empresa Scribd logo
1 de 60
Baixar para ler offline
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
What is Big Data Discovery,

and how it complements 

traditional Business Analytics

Mark Rittman, CTO, Rittman Mead
November 2015
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
About the Speaker
•Mark Rittman, Co-Founder of Rittman Mead
•Oracle ACE Director, specialising in Oracle BI&DW
•14 Years Experience with Oracle Technology
•Regular columnist for Oracle Magazine
•Author of two Oracle Press Oracle BI books
•Oracle Business Intelligence Developers Guide
•Oracle Exalytics Revealed
•Writer for Rittman Mead Blog :

http://www.rittmanmead.com/blog
•Email : mark.rittman@rittmanmead.com
•Twitter : @markrittman
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
About Rittman Mead
•Oracle BI and DW Gold partner
•Winner of five UKOUG Partner of the Year awards in 2013 - including BI
•World leading specialist partner for technical excellence, 

solutions delivery and innovation in Oracle BI
•Approximately 80 consultants worldwide
•All expert in Oracle BI and DW
•Offices in US (Atlanta), Europe, Australia and India
•Skills in broad range of supporting Oracle tools:
‣OBIEE, OBIA, ODIEE
‣Big Data, Hadoop, NoSQL & Big Data Discovery
‣Essbase, Oracle OLAP
‣GoldenGate
‣Endeca
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Many Organisations are Running Big Data Initiatives
•Many customers and organisations are running initiatives around “big data”
•Some are IT-led and are looking for cost-savings around data warehouse storage + ETL
•Others are “skunkworks” projects in the marketing department that are now scaling-up
•Projects now emerging from pilot exercises
•And design patterns starting to emerge
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Common Big Data Design Pattern : “Data Reservoir”
•Typical implementation of Hadoop and big data in an analytic context is the “data lake”
•Additional data storage platform with cheap storage, flexible schema support + compute
•Data lands in the data lake or reservoir in raw form, then minimally processed
•Data then accessed directly by “data scientists”, or processed further into DW
MartsData Warehouse
Σ Σ
Business
Intelligence
• Online
• Scalable
• Flexible
• Cost
Effective
Hadoop
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Key Differentiator from DW Storage : Flexible Cheap Storage
•HDFS storage and “schema-on-read” technology
gives you ability to store data in “raw” form
‣Cheap expandable storage makes it feasible to
retain all incoming data at detail level
‣Schema-on-read technology gives you ability to
apply formats and schema on-demand
•A useful complement to the curated, structured and
focused datasets in the DW
•Underlying Hadoop platform supports compute and
analysis on these raw datasets
•But how do you make sense of what’s in there?
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Data Integration and BI Tools can Access Data Reservoir
•Data integration tools such as Oracle Data Integrator can now load and process Hadoop data
•BI tools such as Oracle Business Intelligence 12c can report on Hadoop data
•Great for answering questions you know about
•Fantastic for navigating hierarchies and drilling
•Superb when you understand the data

and the value that’s within it
‣But what if you don’t yet?
Access direct Hive or extract using ODI12c
for structured OBIEE dashboard analysis
What pages are people visiting?
Who is referring to us on Twitter?
What content has the most reach?
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Different Skills and Techniques Required for Raw Datasets
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
But … Getting Value and Information Out of Hadoop is Hard
•Specialist skills typically needed to ingest and understand data coming into Hadoop
•Data loaded into the reservoir needs preparation and curation before presenting to users
•But we’ve heard a similar story before, from the world of BI…
6
Tool	Complexity
• Early	Hadoop	tools	only	for	experts
• Existing	BI	tools	not	designed	for	Hadoop
• Emerging	solutions	lack	broad	capabilities
80%	effort	typically	
spent	on	evaluating	
and	preparing	data
Data	Uncertainty
• Not	familiar	and	overwhelming
• Potential	value	not	obvious
• Requires	significant	manipulation
Overly	dependent	on	
scarce	and	highly	
skilled	resources
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Back to 2012…
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
What Was Oracle Endeca Information Discovery?
•Part of the acquisition of Endeca back in 2012
by Oracle Corporation
•Based on search technology and concept of
“faceted search”
•Data stored in flexible NoSQL-style in-memory
database called “Endeca Server”
•Added aggregation, text analytics and text
enrichment features for “data discovery”
‣Explore data in raw form, loose connections,
navigate via search rather than hierarchies
‣Useful to find out what is relevant and valuable
in a dataset before formal modeling
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Endeca Server Technology Combined Search + Analytics
•Proprietary database engine focused on search and analytics
•Data organized as records, made up of attributes stored as key/value pairs
•No over-arching schema, 

no tables, self-describing attributes
•Endeca Server hallmarks:
‣Minimal upfront design
‣Support for “jagged” data
‣Administered via web service calls
‣“No data left behind”
‣“Load and Go”
•But … limited in scale (>1m records)
‣… what if it could be rebuilt on Hadoop?
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Oracle Big Data Discovery
•A visual front-end to the Hadoop data reservoir, providing end-user access to datasets
•Catalog, profile, analyse and combine schema-on-read datasets across the Hadoop cluster
•Visualize and search datasets to gain insights, potentially load in summary form into DW
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Typical Big Data Discovery Use-Cases
•Provide a visual catalog and search function across
data in the data reservoir
•Profile and understand data, relationships, data
quality issues
•Apply simple changes, enrichment to incoming data
•Visualize datasets including combinations (joins)
‣Bring in additional data from JDBC + file sources
•Prepare more structured Hadoop datasets for use
with OBIEE
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
BDD for Reporting and Cataloging the Data Reservoir
•Very good tool for analysing and cataloging the data in the production data reservoir
•Basic BI-type reporting against datasets, joined together, filtered, transformed etc
•Cross-functional and cross-subject area analysis and visualization
•Great complement to OBIEE
‣Able to easily access semi-structured data
‣Familiar BI-type graph components
‣Ideal way to give analysts access to 

semi-structured customer datasets
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Comparing BDD to Oracle Visual Analyzer
•Visual Analyzer also provides a form of
“data discovery” for BI users
‣Similar to Tableau, Qlikview etc
‣Inspired by BI elements of OEID
•Uses OBIEE RPD as the primary
datasource, so data needs to be
curated + structured
•Probably a better option for users who
aren’t concerned its “big data”
•But can still connect to Hadoop via

Hive, Impala and Oracle Big Data SQL
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Managed vs. Free-Form Data Discovery
•Data in the data reservoir typically is raw, hasn’t been organised into facts, dimensions yet
•In this initial phase, you don’t want to it to be - too much up-front work with unknown data
•Later on though, users will benefit from structure and hierarchies being added to data
•But this takes work, and you need to understand cost/benefit of doing it now vs. later
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Supporting the Transition From Data Lab to Data Factory
•Most data science and big data projects start small, using R and single data scientist
•Big Data Discovery help opens up data science and the data reservoir to the enterprise
6
D a t a
L a b
Invent Commercialize
D a t a
F a c t o r y
Research Develop
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Enabling Hadoop Datasets for Access by Big Data Discovery
•Relies on datasets in Hadoop being registered with Hive Catalog
•Presents semi-structured and other datasets as tables, columns
•Hive SerDe and Storage Handler technologies allow Hive to run over most datasets
•Hive tables need to be defined before dataset can be used by BDD
CREATE external TABLE apachelog_parsed(
host STRING,
identity STRING,
…
agent STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "([^]*) ([^]*) ([^]*) (-|[^]*]) 

([^ ”]*|"[^"]*")(-|[0-9]*) (-|[0-9]*)(?: ([^ "]

*|".*") ([^ "]*|".*"))?"
)
STORED AS TEXTFILE
LOCATION '/user/flume/rm_website_logs;
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Today’s Layered Data Warehouse Architecture
Virtualization&

QueryFederation
Enterprise
Performance
Management
Pre-built & 

Ad-hoc 

BI Assets
Information

Services
Data Ingestion
Information Interpretation
Access & Performance Layer
Foundation Data Layer
Raw Data Reservoir
Data 

Science
Data Engines & 

Poly-structured 

sources
Content
Docs Web & Social Media
SMS
Structured
Data

Sources
•Operational Data
•COTS Data
•Master & Ref. Data
•Streaming & BAM
Immutable raw data reservoir
Raw data at rest is not interpreted
Immutable modelled data. Business
Process Neutral form. Abstracted from
business process changes
Past, current and future interpretation of
enterprise data. Structured to support agile
access & navigation
Discovery Lab Sandboxes Rapid Development Sandboxes
Project based data stores to
support specific discovery
objectives
Project based data stored to
facilitate rapid content /
presentation delivery
Data Sources
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Data Reservoir within the Big Data Management Platform
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Oracle Big Data Discovery Architecture
•Adds additional nodes into the CDH5.3 cluster, for running DGraph and Studio
•DGraph engine based on Endeca Server technology, can also be clustered
•Hive (HCatalog) used for reading table metadata,

mapping back to underlying HDFS files
•Apache Spark then used to upload (ingest)

data into DGraph, typically 1m row sample
•Data then held for online analysis in DGraph
•Option to write-back transformations to

underlying Hive/HDFS files using Apache Spark
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Ingesting & Sampling Datasets for the DGraph Engine
•Datasets in Hive have to be ingested into DGraph engine before analysis, transformation
•Can either define an automatic Hive table detector process, or manually upload
•Typically ingests 1m row random sample
‣1m row sample provides > 99% confidence that answer is within 2% of value shown

no matter how big the full dataset (1m, 1b, 1q+)
‣Makes interactivity cheap - representative dataset
Amount'of'data'queried
The'100%'premium
Cost
Accuracy
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Search, Catalog, Join and Analyse Data Using BDD Studio
•Big Data Discovery Studio then provides web-based tools for organising, searching datasets
•Visualize Data, Transform and Export back to Hadoop
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Example Scenario : Social Media Analysis
•Rittman Mead want to understand drivers and audience for their website
‣What is our most popular content? Who are the most in-demand blog authors?
‣Who are the influencers? What do they read?
•Three data sources in scope:
RM Website Logs Twitter Stream Website Posts, Comments etc
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Objective of Exercises
•Enrich and transform incoming schema-on-read social media data to add value
•Do some initial profiling, analysis to understand datasets
•Provide a user interface for the Big Data team
•Provide a means to identify candidate dimensions 

and facts for a more structured (managed) 

data discovery tool like Visual Analyzer
Combine with site content, semantics, text enrichment
Catalog and explore using Oracle Big Data Discovery
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Data Sources used for Data Discovery Exercise
Spark
Hive
HDFS
Spark
Hive
HDFS
Spark
Hive
HDFS
Cloudera CDH5.3 BDA Hadoop Cluster
Hive Client
HDFS Client
BDD

DGraph

Gateway
Hive Client
BDD
Studio

Web UI
BDD Node
BDD Data
Processing
BDD Data
Processing
BDD Data
Processing
Ingest semi-

process logs

(1m rows)
Ingest processed

Twitter activity
Write-back

Transformations

to full

datasets
Upload

Site page and

comment
contents
Persist uploaded DGraph

content in Hive / HDFS
Data Discovery
using Studio
web-based app
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Ingesting Site Activity and Tweet Data into DGraph
•Tweets and Website Log Activity stored already in data reservoir as Hive tables
•Upload triggered by manual call to BDD Data Processing CLI
‣Runs Oozie job in the background to profile,

enrich and then ingest data into DGraph
[oracle@bddnode1 ~]$ cd /home/oracle/Middleware/BDD1.0/dataprocessing/edp_cli
[oracle@bddnode1 edp_cli]$ ./data_processing_CLI -t access_per_post_cat_author
[oracle@bddnode1 edp_cli]$ ./data_processing_CLI -t rm_linked_tweets
Hive
Apache Spark
pageviews
X rows
pageviews
>1m rows
Profiling pageviews
>1m rows
Enrichment pageviews
>1m rows
BDD
pageviews
>1m rows
{
"@class" : "com.oracle.endeca.pdi.client.config.workflow.

ProvisionDataSetFromHiveConfig",
"hiveTableName" : "rm_linked_tweets",
"hiveDatabaseName" : "default",
"newCollectionName" : “edp_cli_edp_a5dbdb38-b065…”,
"runEnrichment" : true,
"maxRecordsForNewDataSet" : 1000000,
"languageOverride" : "unknown"
}
1
2
3
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Ingesting Site Activity and Tweet Data into DGraph
•Two output datasets from ODI process have to be ingested into DGraph engine
•Upload triggered by manual call to BDD Data Processing CLI
‣Runs Oozie job in the background to profile,

enrich and then ingest data into DGraph
[oracle@bddnode1 ~]$ cd /home/oracle/Middleware/BDD1.0/dataprocessing/edp_cli
[oracle@bddnode1 edp_cli]$ ./data_processing_CLI -t access_per_post_cat_author
[oracle@bddnode1 edp_cli]$ ./data_processing_CLI -t rm_linked_tweets
Hive
Apache Spark
pageviews
X rows
pageviews
>1m rows
Profiling pageviews
>1m rows
Enrichment pageviews
>1m rows
BDD
pageviews
>1m rows
{
"@class" : "com.oracle.endeca.pdi.client.config.workflow.

ProvisionDataSetFromHiveConfig",
"hiveTableName" : "rm_linked_tweets",
"hiveDatabaseName" : "default",
"newCollectionName" : “edp_cli_edp_a5dbdb38-b065…”,
"runEnrichment" : true,
"maxRecordsForNewDataSet" : 1000000,
"languageOverride" : "unknown"
}
1
2
3
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Ingesting and Sampling Hive Data into Big Data Discovery
[oracle@bigdatalite ~]$ cd /home/oracle/movie/Middleware/BDD1.0/dataprocessing/edp_cli
[oracle@bigdatalite edp_cli]$ ./data_processing_CLI -t access_per_post_cat_author
[oracle@bigdatalite edp_cli]$ ./data_processing_CLI -t rm_linked_tweets
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
View Ingested Datasets, Create New Project
•Ingested datasets are now visible in Big Data Discovery Studio
•Create new project from first dataset, then add second
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Automatic Enrichment of Ingested Datasets
•Ingestion process has automatically geo-coded host IP addresses
•Other automatic enrichments run after initial discovery step, based on datatypes, content
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Initial Data Exploration On Uploaded Dataset Attributes
•For the ACCESS_PER_POST_CAT_AUTHORS dataset, 18 attributes now available
•Combination of original attributes, and derived attributes added by enrichment process
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Explore Attribute Values, Distribution using Scratchpad
•Click on individual attributes to view more details about them
•Add to scratchpad, automatically selects most relevant data visualisation
1
2
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Filter (Refine) Visualizations in Scratchpad
•Click on the Filter button to display a refinement list
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Display Refined Data Visualization
•Select refinement (filter) values from refinement pane
•Visualization in scratchpad now filtered by that attribute
‣Repeat to filter by multiple attribute values
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Save Scratchpad Visualization to Discovery Page
•For visualisations you want to keep, you can add them to Discovery page
•Dashboard / faceted search part of BDD Studio - we’ll see more later
1
2
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Select Multiple Attributes for Same Visualization
•Select AUTHOR attribute, see

initial ordered values, distribution
•Add attribute POST_DATE
‣choose between multiple 

instances of first attribute 

split by second
‣or one visualisation with 

multiple series
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Data Transformation & Enrichment
•Data ingest process automatically applies some enrichments - geocoding etc
•Can apply others from Transformation page - simple transformations & Groovy expressions
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Standard Transformations - Simple & Using Editor
•Group and bin attribute values; filter on attribute values, etc
•Use Transformation Editor for custom transformations (Groovy, incl. enrichment functions)
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Datatype Conversion Example : String to Date / Time
•Datatypes can be converted into other datatypes, with data transformed if required
•Example : convert Apache Combined Format Log date/time to Java date/time
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Transformations using Text Enrichment / Parsing
•Uses Salience text engine under the covers
•Extract terms, sentiment, noun groups, positive / negative words etc
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Create New Attribute using Derived (Transformed) Values
•Choose option to Create New Attribute, to add derived attribute to dataset
•Preview changes, then save to transformation script
12
3
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Commit Transforms to DGraph, or Create New Hive Table
•Transformation changes have to be committed to DGraph sample of dataset
‣Project transformations kept separate from other project copies of dataset
•Transformations can also be applied to full dataset, using Apache Spark
‣Creates new Hive table of complete dataset
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Upload Additional Datasets
•Users can upload their own datasets into BDD, from MS Excel or CSV file
•Uploaded data is first loaded into Hive table, then sampled/ingested as normal
1
2
3
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Join Datasets On Common Attributes
•Used to create a dataset based on the intersection (typically) of two datasets
•Not required to just view two or more datasets together - think of this as a JOIN and SELECT
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Join Example : Add Post + Author Details to Tweet URL
•Tweets ingested into data reservoir can reference a page URL
•Site Content dataset contains title, content, keywords etc for RM website pages
•We would like to add these details to the tweets where an RM web page was mentioned
‣And also add page author details missing from the site contents upload
Main “driving” dataset
Contains tweet user details,

tweet text, hashtags, URL referenced,
location of tweeter etc
Contains full details of each site page,

including URL, title, content, category
Join on URL 

referenced in tweet
Contains the post author details

missing from the Site Content 

dataset
Join on internal

Page ID
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Select From Multiple Visualisation Types
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Multi-Dataset Join Step 1 : Join Site Contents to Posts
•Site contents dataset needs to gain access to the page author attribute only found in Posts
•Create join in the Dataset Relationships panel, using Post ID as the common attribute
•Join from Site contents to Posts, to create left-outer join from first to second table
1
2
3
Previews rows from the join, based on

post_id = a (post_id column)
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Multi-Dataset Join Step 2 : Standardise URL Formats
•URLs in Twitter dataset have trailing ‘/‘, whereas URLs in RM site data do not
•Use the Transformation feature in Studio to add trailing ‘/‘ to RM site URLs
•Select option to replace the current URL values and overwrite within project dataset
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Multi-Dataset Join Step 3 : Join Tweets to Site Content
•Join on the standardised-format URL attributes in the two datasets
•Data view will now contain the page content and author for each tweet mentioning RM
1
2
3
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Key BDD Studio Differentiator : Faceted Search Across Hadoop
•BDD Studio dashboards support faceted search across all attributes, refinements
•Auto-filter dashboard contents on selected attribute values - for data discovery
•Fast analysis and summarisation through Endeca Server technology
Further refinement on

“OBIEE” in post keywords
3
Results now filtered

on two refinements
4
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Create Discovery Pages for Dataset Analysis
•Select from palette of visualisation components
•Select measures, attributes for display
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
•Search on attribute values, text in attributes across all datasets
•Extracted keywords, free text field search
Faceted Search Across Project
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Key BDD Studio Differentiator : Faceted Search Across Hadoop
•BDD Studio dashboards support faceted search across all attributes, refinements
•Auto-filter dashboard contents on selected attribute values
•Fast analysis and summarisation through Endeca Server technology
“Mark Rittman” selected
from Post Authors
Results filtered on 

selected refinement
1
2
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Export Prepared Datasets Back to Hive, for OBIEE + VA
•Transformations within BDD Studio can then be used to create curated fact + dim Hive tables
•Can be used then as a more suitable dataset for use with OBIEE RPD + Visual Analyzer
•Or exported then in to Exadata or Exalytics to combine with main DW datasets
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Further Analyse in Visual Analyzer for Managed Dataset
•Users in Visual Analyzer then have

a more structured dataset to use
•Data organised into dimensions, 

facts, hierarchies and attributes
•Can still access Hadoop directly

through Impala or Big Data SQL
•Big Data Discovery though was 

key to initial understanding of data
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
… And Finally
Additional Resources
How to Contact Us
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Additional Resources
•Articles on the Rittman Mead Blog
‣http://www.rittmanmead.com/category/oracle-big-data-appliance/
‣http://www.rittmanmead.com/category/big-data/
‣http://www.rittmanmead.com/category/oracle-big-data-discovery/
•Slides will be on the BI Forum USB sticks
•Rittman Mead offer consulting, training and managed services for Oracle Big Data
‣Oracle & Cloudera partners
‣http://www.rittmanmead.com/bigdata
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 

+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Rittman Mead and Big Data on Oracle
Case Study - Rittman Mead

Mark Rittman, CTO, Rittman Mead
March 2015

Mais conteúdo relacionado

Mais procurados

Unlock the value in your big data reservoir using oracle big data discovery a...
Unlock the value in your big data reservoir using oracle big data discovery a...Unlock the value in your big data reservoir using oracle big data discovery a...
Unlock the value in your big data reservoir using oracle big data discovery a...Mark Rittman
 
Part 4 - Hadoop Data Output and Reporting using OBIEE11g
Part 4 - Hadoop Data Output and Reporting using OBIEE11gPart 4 - Hadoop Data Output and Reporting using OBIEE11g
Part 4 - Hadoop Data Output and Reporting using OBIEE11gMark Rittman
 
Leveraging Hadoop with OBIEE 11g and ODI 11g - UKOUG Tech'13
Leveraging Hadoop with OBIEE 11g and ODI 11g - UKOUG Tech'13Leveraging Hadoop with OBIEE 11g and ODI 11g - UKOUG Tech'13
Leveraging Hadoop with OBIEE 11g and ODI 11g - UKOUG Tech'13Mark Rittman
 
Big Data for Oracle Devs - Towards Spark, Real-Time and Predictive Analytics
Big Data for Oracle Devs - Towards Spark, Real-Time and Predictive AnalyticsBig Data for Oracle Devs - Towards Spark, Real-Time and Predictive Analytics
Big Data for Oracle Devs - Towards Spark, Real-Time and Predictive AnalyticsMark Rittman
 
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...StampedeCon
 
Deep-Dive into Big Data ETL with ODI12c and Oracle Big Data Connectors
Deep-Dive into Big Data ETL with ODI12c and Oracle Big Data ConnectorsDeep-Dive into Big Data ETL with ODI12c and Oracle Big Data Connectors
Deep-Dive into Big Data ETL with ODI12c and Oracle Big Data ConnectorsMark Rittman
 
Deploying Full BI Platforms to Oracle Cloud
Deploying Full BI Platforms to Oracle CloudDeploying Full BI Platforms to Oracle Cloud
Deploying Full BI Platforms to Oracle CloudMark Rittman
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016StampedeCon
 
Riga dev day 2016 adding a data reservoir and oracle bdd to extend your ora...
Riga dev day 2016   adding a data reservoir and oracle bdd to extend your ora...Riga dev day 2016   adding a data reservoir and oracle bdd to extend your ora...
Riga dev day 2016 adding a data reservoir and oracle bdd to extend your ora...Mark Rittman
 
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...Mark Rittman
 
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...Mark Rittman
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016StampedeCon
 
Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...
Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...
Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...Mark Rittman
 
OTN EMEA Tour 2016 : Deploying Full BI Platforms to Oracle Cloud
OTN EMEA Tour 2016 : Deploying Full BI Platforms to Oracle CloudOTN EMEA Tour 2016 : Deploying Full BI Platforms to Oracle Cloud
OTN EMEA Tour 2016 : Deploying Full BI Platforms to Oracle CloudMark Rittman
 
Deploying Full Oracle BI Platforms to Oracle Cloud - OOW2015
Deploying Full Oracle BI Platforms to Oracle Cloud - OOW2015Deploying Full Oracle BI Platforms to Oracle Cloud - OOW2015
Deploying Full Oracle BI Platforms to Oracle Cloud - OOW2015Mark Rittman
 
Big Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyondBig Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyondDataWorks Summit/Hadoop Summit
 
GoldenGate and Oracle Data Integrator - A Perfect Match- Upgrade to 12c
GoldenGate and Oracle Data Integrator - A Perfect Match- Upgrade to 12cGoldenGate and Oracle Data Integrator - A Perfect Match- Upgrade to 12c
GoldenGate and Oracle Data Integrator - A Perfect Match- Upgrade to 12cMichael Rainey
 
How to get started in Big Data without Big Costs - StampedeCon 2016
How to get started in Big Data without Big Costs - StampedeCon 2016How to get started in Big Data without Big Costs - StampedeCon 2016
How to get started in Big Data without Big Costs - StampedeCon 2016StampedeCon
 
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...Seeling Cheung
 
Social Network Analysis using Oracle Big Data Spatial & Graph (incl. why I di...
Social Network Analysis using Oracle Big Data Spatial & Graph (incl. why I di...Social Network Analysis using Oracle Big Data Spatial & Graph (incl. why I di...
Social Network Analysis using Oracle Big Data Spatial & Graph (incl. why I di...Mark Rittman
 

Mais procurados (20)

Unlock the value in your big data reservoir using oracle big data discovery a...
Unlock the value in your big data reservoir using oracle big data discovery a...Unlock the value in your big data reservoir using oracle big data discovery a...
Unlock the value in your big data reservoir using oracle big data discovery a...
 
Part 4 - Hadoop Data Output and Reporting using OBIEE11g
Part 4 - Hadoop Data Output and Reporting using OBIEE11gPart 4 - Hadoop Data Output and Reporting using OBIEE11g
Part 4 - Hadoop Data Output and Reporting using OBIEE11g
 
Leveraging Hadoop with OBIEE 11g and ODI 11g - UKOUG Tech'13
Leveraging Hadoop with OBIEE 11g and ODI 11g - UKOUG Tech'13Leveraging Hadoop with OBIEE 11g and ODI 11g - UKOUG Tech'13
Leveraging Hadoop with OBIEE 11g and ODI 11g - UKOUG Tech'13
 
Big Data for Oracle Devs - Towards Spark, Real-Time and Predictive Analytics
Big Data for Oracle Devs - Towards Spark, Real-Time and Predictive AnalyticsBig Data for Oracle Devs - Towards Spark, Real-Time and Predictive Analytics
Big Data for Oracle Devs - Towards Spark, Real-Time and Predictive Analytics
 
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
 
Deep-Dive into Big Data ETL with ODI12c and Oracle Big Data Connectors
Deep-Dive into Big Data ETL with ODI12c and Oracle Big Data ConnectorsDeep-Dive into Big Data ETL with ODI12c and Oracle Big Data Connectors
Deep-Dive into Big Data ETL with ODI12c and Oracle Big Data Connectors
 
Deploying Full BI Platforms to Oracle Cloud
Deploying Full BI Platforms to Oracle CloudDeploying Full BI Platforms to Oracle Cloud
Deploying Full BI Platforms to Oracle Cloud
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016
 
Riga dev day 2016 adding a data reservoir and oracle bdd to extend your ora...
Riga dev day 2016   adding a data reservoir and oracle bdd to extend your ora...Riga dev day 2016   adding a data reservoir and oracle bdd to extend your ora...
Riga dev day 2016 adding a data reservoir and oracle bdd to extend your ora...
 
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...
 
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...
Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...
Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...
 
OTN EMEA Tour 2016 : Deploying Full BI Platforms to Oracle Cloud
OTN EMEA Tour 2016 : Deploying Full BI Platforms to Oracle CloudOTN EMEA Tour 2016 : Deploying Full BI Platforms to Oracle Cloud
OTN EMEA Tour 2016 : Deploying Full BI Platforms to Oracle Cloud
 
Deploying Full Oracle BI Platforms to Oracle Cloud - OOW2015
Deploying Full Oracle BI Platforms to Oracle Cloud - OOW2015Deploying Full Oracle BI Platforms to Oracle Cloud - OOW2015
Deploying Full Oracle BI Platforms to Oracle Cloud - OOW2015
 
Big Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyondBig Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyond
 
GoldenGate and Oracle Data Integrator - A Perfect Match- Upgrade to 12c
GoldenGate and Oracle Data Integrator - A Perfect Match- Upgrade to 12cGoldenGate and Oracle Data Integrator - A Perfect Match- Upgrade to 12c
GoldenGate and Oracle Data Integrator - A Perfect Match- Upgrade to 12c
 
How to get started in Big Data without Big Costs - StampedeCon 2016
How to get started in Big Data without Big Costs - StampedeCon 2016How to get started in Big Data without Big Costs - StampedeCon 2016
How to get started in Big Data without Big Costs - StampedeCon 2016
 
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
 
Social Network Analysis using Oracle Big Data Spatial & Graph (incl. why I di...
Social Network Analysis using Oracle Big Data Spatial & Graph (incl. why I di...Social Network Analysis using Oracle Big Data Spatial & Graph (incl. why I di...
Social Network Analysis using Oracle Big Data Spatial & Graph (incl. why I di...
 

Semelhante a Big Data Discovery Complements Traditional Business Analytics

Deploying OBIEE in the Cloud - Oracle Openworld 2014
Deploying OBIEE in the Cloud - Oracle Openworld 2014Deploying OBIEE in the Cloud - Oracle Openworld 2014
Deploying OBIEE in the Cloud - Oracle Openworld 2014Mark Rittman
 
UKOUG Tech'14 Super Sunday : Deep-Dive into Big Data ETL with ODI12c
UKOUG Tech'14 Super Sunday : Deep-Dive into Big Data ETL with ODI12cUKOUG Tech'14 Super Sunday : Deep-Dive into Big Data ETL with ODI12c
UKOUG Tech'14 Super Sunday : Deep-Dive into Big Data ETL with ODI12cMark Rittman
 
TimesTen - Beyond the Summary Advisor (ODTUG KScope'14)
TimesTen - Beyond the Summary Advisor (ODTUG KScope'14)TimesTen - Beyond the Summary Advisor (ODTUG KScope'14)
TimesTen - Beyond the Summary Advisor (ODTUG KScope'14)Mark Rittman
 
Using Endeca with Oracle Exalytics - Oracle France BI Customer Event, October...
Using Endeca with Oracle Exalytics - Oracle France BI Customer Event, October...Using Endeca with Oracle Exalytics - Oracle France BI Customer Event, October...
Using Endeca with Oracle Exalytics - Oracle France BI Customer Event, October...Mark Rittman
 
Real-Time Data Replication to Hadoop using GoldenGate 12c Adaptors
Real-Time Data Replication to Hadoop using GoldenGate 12c AdaptorsReal-Time Data Replication to Hadoop using GoldenGate 12c Adaptors
Real-Time Data Replication to Hadoop using GoldenGate 12c AdaptorsMichael Rainey
 
2-in-1 : RPD Magic and Hyperion Planning "Adapter"
2-in-1 : RPD Magic and Hyperion Planning "Adapter"2-in-1 : RPD Magic and Hyperion Planning "Adapter"
2-in-1 : RPD Magic and Hyperion Planning "Adapter"Gianni Ceresa
 
UKOUG Tech 15 - Migration from Oracle Warehouse Builder to Oracle Data Integr...
UKOUG Tech 15 - Migration from Oracle Warehouse Builder to Oracle Data Integr...UKOUG Tech 15 - Migration from Oracle Warehouse Builder to Oracle Data Integr...
UKOUG Tech 15 - Migration from Oracle Warehouse Builder to Oracle Data Integr...Jérôme Françoisse
 
OBIEE, Endeca, Hadoop and ORE Development (on Exalytics) (ODTUG 2013)
OBIEE, Endeca, Hadoop and ORE Development (on Exalytics) (ODTUG 2013)OBIEE, Endeca, Hadoop and ORE Development (on Exalytics) (ODTUG 2013)
OBIEE, Endeca, Hadoop and ORE Development (on Exalytics) (ODTUG 2013)Mark Rittman
 
Ougn2013 high speed, in-memory big data analysis with oracle exalytics
Ougn2013   high speed, in-memory big data analysis with oracle exalyticsOugn2013   high speed, in-memory big data analysis with oracle exalytics
Ougn2013 high speed, in-memory big data analysis with oracle exalyticsMark Rittman
 
OBIEE11g Seminar by Mark Rittman for OU Expert Summit, Dubai 2015
OBIEE11g Seminar by Mark Rittman for OU Expert Summit, Dubai 2015OBIEE11g Seminar by Mark Rittman for OU Expert Summit, Dubai 2015
OBIEE11g Seminar by Mark Rittman for OU Expert Summit, Dubai 2015Mark Rittman
 
IBANK - Oracle developers-guide
IBANK - Oracle developers-guide IBANK - Oracle developers-guide
IBANK - Oracle developers-guide ibankuk
 
ODI 11g in the Enterprise - BIWA 2013
ODI 11g in the Enterprise - BIWA 2013ODI 11g in the Enterprise - BIWA 2013
ODI 11g in the Enterprise - BIWA 2013Mark Rittman
 
Big Data's Impact on the Enterprise
Big Data's Impact on the EnterpriseBig Data's Impact on the Enterprise
Big Data's Impact on the EnterpriseCaserta
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Kent Graziano
 
Analytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopAnalytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopCCG
 
How to Integrate OBIEE and Essbase / EPM Suite (OOW 2012)
How to Integrate OBIEE and Essbase / EPM Suite (OOW 2012)How to Integrate OBIEE and Essbase / EPM Suite (OOW 2012)
How to Integrate OBIEE and Essbase / EPM Suite (OOW 2012)Mark Rittman
 
oracleadvancedanalyticsv2otn-2859525.pptx
oracleadvancedanalyticsv2otn-2859525.pptxoracleadvancedanalyticsv2otn-2859525.pptx
oracleadvancedanalyticsv2otn-2859525.pptxAdityaDas899782
 

Semelhante a Big Data Discovery Complements Traditional Business Analytics (20)

Deploying OBIEE in the Cloud - Oracle Openworld 2014
Deploying OBIEE in the Cloud - Oracle Openworld 2014Deploying OBIEE in the Cloud - Oracle Openworld 2014
Deploying OBIEE in the Cloud - Oracle Openworld 2014
 
UKOUG Tech'14 Super Sunday : Deep-Dive into Big Data ETL with ODI12c
UKOUG Tech'14 Super Sunday : Deep-Dive into Big Data ETL with ODI12cUKOUG Tech'14 Super Sunday : Deep-Dive into Big Data ETL with ODI12c
UKOUG Tech'14 Super Sunday : Deep-Dive into Big Data ETL with ODI12c
 
TimesTen - Beyond the Summary Advisor (ODTUG KScope'14)
TimesTen - Beyond the Summary Advisor (ODTUG KScope'14)TimesTen - Beyond the Summary Advisor (ODTUG KScope'14)
TimesTen - Beyond the Summary Advisor (ODTUG KScope'14)
 
Using Endeca with Oracle Exalytics - Oracle France BI Customer Event, October...
Using Endeca with Oracle Exalytics - Oracle France BI Customer Event, October...Using Endeca with Oracle Exalytics - Oracle France BI Customer Event, October...
Using Endeca with Oracle Exalytics - Oracle France BI Customer Event, October...
 
Real-Time Data Replication to Hadoop using GoldenGate 12c Adaptors
Real-Time Data Replication to Hadoop using GoldenGate 12c AdaptorsReal-Time Data Replication to Hadoop using GoldenGate 12c Adaptors
Real-Time Data Replication to Hadoop using GoldenGate 12c Adaptors
 
2-in-1 : RPD Magic and Hyperion Planning "Adapter"
2-in-1 : RPD Magic and Hyperion Planning "Adapter"2-in-1 : RPD Magic and Hyperion Planning "Adapter"
2-in-1 : RPD Magic and Hyperion Planning "Adapter"
 
UKOUG Tech 15 - Migration from Oracle Warehouse Builder to Oracle Data Integr...
UKOUG Tech 15 - Migration from Oracle Warehouse Builder to Oracle Data Integr...UKOUG Tech 15 - Migration from Oracle Warehouse Builder to Oracle Data Integr...
UKOUG Tech 15 - Migration from Oracle Warehouse Builder to Oracle Data Integr...
 
OBIEE, Endeca, Hadoop and ORE Development (on Exalytics) (ODTUG 2013)
OBIEE, Endeca, Hadoop and ORE Development (on Exalytics) (ODTUG 2013)OBIEE, Endeca, Hadoop and ORE Development (on Exalytics) (ODTUG 2013)
OBIEE, Endeca, Hadoop and ORE Development (on Exalytics) (ODTUG 2013)
 
Ougn2013 high speed, in-memory big data analysis with oracle exalytics
Ougn2013   high speed, in-memory big data analysis with oracle exalyticsOugn2013   high speed, in-memory big data analysis with oracle exalytics
Ougn2013 high speed, in-memory big data analysis with oracle exalytics
 
OBIEE11g Seminar by Mark Rittman for OU Expert Summit, Dubai 2015
OBIEE11g Seminar by Mark Rittman for OU Expert Summit, Dubai 2015OBIEE11g Seminar by Mark Rittman for OU Expert Summit, Dubai 2015
OBIEE11g Seminar by Mark Rittman for OU Expert Summit, Dubai 2015
 
IBANK - Oracle developers-guide
IBANK - Oracle developers-guide IBANK - Oracle developers-guide
IBANK - Oracle developers-guide
 
Rittman endeca
Rittman endecaRittman endeca
Rittman endeca
 
ODI 11g in the Enterprise - BIWA 2013
ODI 11g in the Enterprise - BIWA 2013ODI 11g in the Enterprise - BIWA 2013
ODI 11g in the Enterprise - BIWA 2013
 
Big Data's Impact on the Enterprise
Big Data's Impact on the EnterpriseBig Data's Impact on the Enterprise
Big Data's Impact on the Enterprise
 
Seed endeca
Seed endecaSeed endeca
Seed endeca
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)
 
Analytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopAnalytics in a Day Virtual Workshop
Analytics in a Day Virtual Workshop
 
How to Integrate OBIEE and Essbase / EPM Suite (OOW 2012)
How to Integrate OBIEE and Essbase / EPM Suite (OOW 2012)How to Integrate OBIEE and Essbase / EPM Suite (OOW 2012)
How to Integrate OBIEE and Essbase / EPM Suite (OOW 2012)
 
Hadoop and SAP BI
Hadoop and SAP BI   Hadoop and SAP BI
Hadoop and SAP BI
 
oracleadvancedanalyticsv2otn-2859525.pptx
oracleadvancedanalyticsv2otn-2859525.pptxoracleadvancedanalyticsv2otn-2859525.pptx
oracleadvancedanalyticsv2otn-2859525.pptx
 

Mais de Mark Rittman

The Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data PlatformsThe Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data PlatformsMark Rittman
 
Using Oracle Big Data Discovey as a Data Scientist's Toolkit
Using Oracle Big Data Discovey as a Data Scientist's ToolkitUsing Oracle Big Data Discovey as a Data Scientist's Toolkit
Using Oracle Big Data Discovey as a Data Scientist's ToolkitMark Rittman
 
SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?
SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?
SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?Mark Rittman
 
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...Mark Rittman
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...Mark Rittman
 
OTN EMEA TOUR 2016 - OBIEE12c New Features for End-Users, Developers and Sys...
OTN EMEA TOUR 2016  - OBIEE12c New Features for End-Users, Developers and Sys...OTN EMEA TOUR 2016  - OBIEE12c New Features for End-Users, Developers and Sys...
OTN EMEA TOUR 2016 - OBIEE12c New Features for End-Users, Developers and Sys...Mark Rittman
 
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop : Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop : Mark Rittman
 
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business AnalyticsOracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business AnalyticsMark Rittman
 
OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...
OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...
OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...Mark Rittman
 
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12c
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12cPart 2 - Hadoop Data Loading using Hadoop Tools and ODI12c
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12cMark Rittman
 

Mais de Mark Rittman (10)

The Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data PlatformsThe Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data Platforms
 
Using Oracle Big Data Discovey as a Data Scientist's Toolkit
Using Oracle Big Data Discovey as a Data Scientist's ToolkitUsing Oracle Big Data Discovey as a Data Scientist's Toolkit
Using Oracle Big Data Discovey as a Data Scientist's Toolkit
 
SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?
SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?
SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?
 
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
 
OTN EMEA TOUR 2016 - OBIEE12c New Features for End-Users, Developers and Sys...
OTN EMEA TOUR 2016  - OBIEE12c New Features for End-Users, Developers and Sys...OTN EMEA TOUR 2016  - OBIEE12c New Features for End-Users, Developers and Sys...
OTN EMEA TOUR 2016 - OBIEE12c New Features for End-Users, Developers and Sys...
 
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop : Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
 
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business AnalyticsOracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
 
OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...
OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...
OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...
 
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12c
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12cPart 2 - Hadoop Data Loading using Hadoop Tools and ODI12c
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12c
 

Último

Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024Timothy Spann
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxAleenaJamil4
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxellehsormae
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 

Último (20)

Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptx
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptx
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 

Big Data Discovery Complements Traditional Business Analytics

  • 1. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com What is Big Data Discovery,
 and how it complements 
 traditional Business Analytics
 Mark Rittman, CTO, Rittman Mead November 2015
  • 2. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com About the Speaker •Mark Rittman, Co-Founder of Rittman Mead •Oracle ACE Director, specialising in Oracle BI&DW •14 Years Experience with Oracle Technology •Regular columnist for Oracle Magazine •Author of two Oracle Press Oracle BI books •Oracle Business Intelligence Developers Guide •Oracle Exalytics Revealed •Writer for Rittman Mead Blog :
 http://www.rittmanmead.com/blog •Email : mark.rittman@rittmanmead.com •Twitter : @markrittman
  • 3. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com About Rittman Mead •Oracle BI and DW Gold partner •Winner of five UKOUG Partner of the Year awards in 2013 - including BI •World leading specialist partner for technical excellence, 
 solutions delivery and innovation in Oracle BI •Approximately 80 consultants worldwide •All expert in Oracle BI and DW •Offices in US (Atlanta), Europe, Australia and India •Skills in broad range of supporting Oracle tools: ‣OBIEE, OBIA, ODIEE ‣Big Data, Hadoop, NoSQL & Big Data Discovery ‣Essbase, Oracle OLAP ‣GoldenGate ‣Endeca
  • 4. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Many Organisations are Running Big Data Initiatives •Many customers and organisations are running initiatives around “big data” •Some are IT-led and are looking for cost-savings around data warehouse storage + ETL •Others are “skunkworks” projects in the marketing department that are now scaling-up •Projects now emerging from pilot exercises •And design patterns starting to emerge
  • 5. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Common Big Data Design Pattern : “Data Reservoir” •Typical implementation of Hadoop and big data in an analytic context is the “data lake” •Additional data storage platform with cheap storage, flexible schema support + compute •Data lands in the data lake or reservoir in raw form, then minimally processed •Data then accessed directly by “data scientists”, or processed further into DW MartsData Warehouse Σ Σ Business Intelligence • Online • Scalable • Flexible • Cost Effective Hadoop
  • 6. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Key Differentiator from DW Storage : Flexible Cheap Storage •HDFS storage and “schema-on-read” technology gives you ability to store data in “raw” form ‣Cheap expandable storage makes it feasible to retain all incoming data at detail level ‣Schema-on-read technology gives you ability to apply formats and schema on-demand •A useful complement to the curated, structured and focused datasets in the DW •Underlying Hadoop platform supports compute and analysis on these raw datasets •But how do you make sense of what’s in there?
  • 7. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Data Integration and BI Tools can Access Data Reservoir •Data integration tools such as Oracle Data Integrator can now load and process Hadoop data •BI tools such as Oracle Business Intelligence 12c can report on Hadoop data •Great for answering questions you know about •Fantastic for navigating hierarchies and drilling •Superb when you understand the data
 and the value that’s within it ‣But what if you don’t yet? Access direct Hive or extract using ODI12c for structured OBIEE dashboard analysis What pages are people visiting? Who is referring to us on Twitter? What content has the most reach?
  • 8. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Different Skills and Techniques Required for Raw Datasets
  • 9. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com But … Getting Value and Information Out of Hadoop is Hard •Specialist skills typically needed to ingest and understand data coming into Hadoop •Data loaded into the reservoir needs preparation and curation before presenting to users •But we’ve heard a similar story before, from the world of BI… 6 Tool Complexity • Early Hadoop tools only for experts • Existing BI tools not designed for Hadoop • Emerging solutions lack broad capabilities 80% effort typically spent on evaluating and preparing data Data Uncertainty • Not familiar and overwhelming • Potential value not obvious • Requires significant manipulation Overly dependent on scarce and highly skilled resources
  • 10. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Back to 2012…
  • 11. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com What Was Oracle Endeca Information Discovery? •Part of the acquisition of Endeca back in 2012 by Oracle Corporation •Based on search technology and concept of “faceted search” •Data stored in flexible NoSQL-style in-memory database called “Endeca Server” •Added aggregation, text analytics and text enrichment features for “data discovery” ‣Explore data in raw form, loose connections, navigate via search rather than hierarchies ‣Useful to find out what is relevant and valuable in a dataset before formal modeling
  • 12. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Endeca Server Technology Combined Search + Analytics •Proprietary database engine focused on search and analytics •Data organized as records, made up of attributes stored as key/value pairs •No over-arching schema, 
 no tables, self-describing attributes •Endeca Server hallmarks: ‣Minimal upfront design ‣Support for “jagged” data ‣Administered via web service calls ‣“No data left behind” ‣“Load and Go” •But … limited in scale (>1m records) ‣… what if it could be rebuilt on Hadoop?
  • 13. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Oracle Big Data Discovery •A visual front-end to the Hadoop data reservoir, providing end-user access to datasets •Catalog, profile, analyse and combine schema-on-read datasets across the Hadoop cluster •Visualize and search datasets to gain insights, potentially load in summary form into DW
  • 14. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Typical Big Data Discovery Use-Cases •Provide a visual catalog and search function across data in the data reservoir •Profile and understand data, relationships, data quality issues •Apply simple changes, enrichment to incoming data •Visualize datasets including combinations (joins) ‣Bring in additional data from JDBC + file sources •Prepare more structured Hadoop datasets for use with OBIEE
  • 15. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com BDD for Reporting and Cataloging the Data Reservoir •Very good tool for analysing and cataloging the data in the production data reservoir •Basic BI-type reporting against datasets, joined together, filtered, transformed etc •Cross-functional and cross-subject area analysis and visualization •Great complement to OBIEE ‣Able to easily access semi-structured data ‣Familiar BI-type graph components ‣Ideal way to give analysts access to 
 semi-structured customer datasets
  • 16. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Comparing BDD to Oracle Visual Analyzer •Visual Analyzer also provides a form of “data discovery” for BI users ‣Similar to Tableau, Qlikview etc ‣Inspired by BI elements of OEID •Uses OBIEE RPD as the primary datasource, so data needs to be curated + structured •Probably a better option for users who aren’t concerned its “big data” •But can still connect to Hadoop via
 Hive, Impala and Oracle Big Data SQL
  • 17. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Managed vs. Free-Form Data Discovery •Data in the data reservoir typically is raw, hasn’t been organised into facts, dimensions yet •In this initial phase, you don’t want to it to be - too much up-front work with unknown data •Later on though, users will benefit from structure and hierarchies being added to data •But this takes work, and you need to understand cost/benefit of doing it now vs. later
  • 18. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Supporting the Transition From Data Lab to Data Factory •Most data science and big data projects start small, using R and single data scientist •Big Data Discovery help opens up data science and the data reservoir to the enterprise 6 D a t a L a b Invent Commercialize D a t a F a c t o r y Research Develop
  • 19. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Enabling Hadoop Datasets for Access by Big Data Discovery •Relies on datasets in Hadoop being registered with Hive Catalog •Presents semi-structured and other datasets as tables, columns •Hive SerDe and Storage Handler technologies allow Hive to run over most datasets •Hive tables need to be defined before dataset can be used by BDD CREATE external TABLE apachelog_parsed( host STRING, identity STRING, … agent STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ( "input.regex" = "([^]*) ([^]*) ([^]*) (-|[^]*]) 
 ([^ ”]*|"[^"]*")(-|[0-9]*) (-|[0-9]*)(?: ([^ "]
 *|".*") ([^ "]*|".*"))?" ) STORED AS TEXTFILE LOCATION '/user/flume/rm_website_logs;
  • 20. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Today’s Layered Data Warehouse Architecture Virtualization&
 QueryFederation Enterprise Performance Management Pre-built & 
 Ad-hoc 
 BI Assets Information
 Services Data Ingestion Information Interpretation Access & Performance Layer Foundation Data Layer Raw Data Reservoir Data 
 Science Data Engines & 
 Poly-structured 
 sources Content Docs Web & Social Media SMS Structured Data
 Sources •Operational Data •COTS Data •Master & Ref. Data •Streaming & BAM Immutable raw data reservoir Raw data at rest is not interpreted Immutable modelled data. Business Process Neutral form. Abstracted from business process changes Past, current and future interpretation of enterprise data. Structured to support agile access & navigation Discovery Lab Sandboxes Rapid Development Sandboxes Project based data stores to support specific discovery objectives Project based data stored to facilitate rapid content / presentation delivery Data Sources
  • 21. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Data Reservoir within the Big Data Management Platform
  • 22. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Oracle Big Data Discovery Architecture •Adds additional nodes into the CDH5.3 cluster, for running DGraph and Studio •DGraph engine based on Endeca Server technology, can also be clustered •Hive (HCatalog) used for reading table metadata,
 mapping back to underlying HDFS files •Apache Spark then used to upload (ingest)
 data into DGraph, typically 1m row sample •Data then held for online analysis in DGraph •Option to write-back transformations to
 underlying Hive/HDFS files using Apache Spark
  • 23. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Ingesting & Sampling Datasets for the DGraph Engine •Datasets in Hive have to be ingested into DGraph engine before analysis, transformation •Can either define an automatic Hive table detector process, or manually upload •Typically ingests 1m row random sample ‣1m row sample provides > 99% confidence that answer is within 2% of value shown
 no matter how big the full dataset (1m, 1b, 1q+) ‣Makes interactivity cheap - representative dataset Amount'of'data'queried The'100%'premium Cost Accuracy
  • 24. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Search, Catalog, Join and Analyse Data Using BDD Studio •Big Data Discovery Studio then provides web-based tools for organising, searching datasets •Visualize Data, Transform and Export back to Hadoop
  • 25. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Example Scenario : Social Media Analysis •Rittman Mead want to understand drivers and audience for their website ‣What is our most popular content? Who are the most in-demand blog authors? ‣Who are the influencers? What do they read? •Three data sources in scope: RM Website Logs Twitter Stream Website Posts, Comments etc
  • 26. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Objective of Exercises •Enrich and transform incoming schema-on-read social media data to add value •Do some initial profiling, analysis to understand datasets •Provide a user interface for the Big Data team •Provide a means to identify candidate dimensions 
 and facts for a more structured (managed) 
 data discovery tool like Visual Analyzer Combine with site content, semantics, text enrichment Catalog and explore using Oracle Big Data Discovery
  • 27. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Data Sources used for Data Discovery Exercise Spark Hive HDFS Spark Hive HDFS Spark Hive HDFS Cloudera CDH5.3 BDA Hadoop Cluster Hive Client HDFS Client BDD
 DGraph
 Gateway Hive Client BDD Studio
 Web UI BDD Node BDD Data Processing BDD Data Processing BDD Data Processing Ingest semi-
 process logs
 (1m rows) Ingest processed
 Twitter activity Write-back
 Transformations
 to full
 datasets Upload
 Site page and
 comment contents Persist uploaded DGraph
 content in Hive / HDFS Data Discovery using Studio web-based app
  • 28. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Ingesting Site Activity and Tweet Data into DGraph •Tweets and Website Log Activity stored already in data reservoir as Hive tables •Upload triggered by manual call to BDD Data Processing CLI ‣Runs Oozie job in the background to profile,
 enrich and then ingest data into DGraph [oracle@bddnode1 ~]$ cd /home/oracle/Middleware/BDD1.0/dataprocessing/edp_cli [oracle@bddnode1 edp_cli]$ ./data_processing_CLI -t access_per_post_cat_author [oracle@bddnode1 edp_cli]$ ./data_processing_CLI -t rm_linked_tweets Hive Apache Spark pageviews X rows pageviews >1m rows Profiling pageviews >1m rows Enrichment pageviews >1m rows BDD pageviews >1m rows { "@class" : "com.oracle.endeca.pdi.client.config.workflow.
 ProvisionDataSetFromHiveConfig", "hiveTableName" : "rm_linked_tweets", "hiveDatabaseName" : "default", "newCollectionName" : “edp_cli_edp_a5dbdb38-b065…”, "runEnrichment" : true, "maxRecordsForNewDataSet" : 1000000, "languageOverride" : "unknown" } 1 2 3
  • 29. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Ingesting Site Activity and Tweet Data into DGraph •Two output datasets from ODI process have to be ingested into DGraph engine •Upload triggered by manual call to BDD Data Processing CLI ‣Runs Oozie job in the background to profile,
 enrich and then ingest data into DGraph [oracle@bddnode1 ~]$ cd /home/oracle/Middleware/BDD1.0/dataprocessing/edp_cli [oracle@bddnode1 edp_cli]$ ./data_processing_CLI -t access_per_post_cat_author [oracle@bddnode1 edp_cli]$ ./data_processing_CLI -t rm_linked_tweets Hive Apache Spark pageviews X rows pageviews >1m rows Profiling pageviews >1m rows Enrichment pageviews >1m rows BDD pageviews >1m rows { "@class" : "com.oracle.endeca.pdi.client.config.workflow.
 ProvisionDataSetFromHiveConfig", "hiveTableName" : "rm_linked_tweets", "hiveDatabaseName" : "default", "newCollectionName" : “edp_cli_edp_a5dbdb38-b065…”, "runEnrichment" : true, "maxRecordsForNewDataSet" : 1000000, "languageOverride" : "unknown" } 1 2 3
  • 30. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Ingesting and Sampling Hive Data into Big Data Discovery [oracle@bigdatalite ~]$ cd /home/oracle/movie/Middleware/BDD1.0/dataprocessing/edp_cli [oracle@bigdatalite edp_cli]$ ./data_processing_CLI -t access_per_post_cat_author [oracle@bigdatalite edp_cli]$ ./data_processing_CLI -t rm_linked_tweets
  • 31. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com View Ingested Datasets, Create New Project •Ingested datasets are now visible in Big Data Discovery Studio •Create new project from first dataset, then add second
  • 32. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Automatic Enrichment of Ingested Datasets •Ingestion process has automatically geo-coded host IP addresses •Other automatic enrichments run after initial discovery step, based on datatypes, content
  • 33. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Initial Data Exploration On Uploaded Dataset Attributes •For the ACCESS_PER_POST_CAT_AUTHORS dataset, 18 attributes now available •Combination of original attributes, and derived attributes added by enrichment process
  • 34. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Explore Attribute Values, Distribution using Scratchpad •Click on individual attributes to view more details about them •Add to scratchpad, automatically selects most relevant data visualisation 1 2
  • 35. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Filter (Refine) Visualizations in Scratchpad •Click on the Filter button to display a refinement list
  • 36. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Display Refined Data Visualization •Select refinement (filter) values from refinement pane •Visualization in scratchpad now filtered by that attribute ‣Repeat to filter by multiple attribute values
  • 37. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Save Scratchpad Visualization to Discovery Page •For visualisations you want to keep, you can add them to Discovery page •Dashboard / faceted search part of BDD Studio - we’ll see more later 1 2
  • 38. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Select Multiple Attributes for Same Visualization •Select AUTHOR attribute, see
 initial ordered values, distribution •Add attribute POST_DATE ‣choose between multiple 
 instances of first attribute 
 split by second ‣or one visualisation with 
 multiple series
  • 39. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Data Transformation & Enrichment •Data ingest process automatically applies some enrichments - geocoding etc •Can apply others from Transformation page - simple transformations & Groovy expressions
  • 40. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Standard Transformations - Simple & Using Editor •Group and bin attribute values; filter on attribute values, etc •Use Transformation Editor for custom transformations (Groovy, incl. enrichment functions)
  • 41. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Datatype Conversion Example : String to Date / Time •Datatypes can be converted into other datatypes, with data transformed if required •Example : convert Apache Combined Format Log date/time to Java date/time
  • 42. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Transformations using Text Enrichment / Parsing •Uses Salience text engine under the covers •Extract terms, sentiment, noun groups, positive / negative words etc
  • 43. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Create New Attribute using Derived (Transformed) Values •Choose option to Create New Attribute, to add derived attribute to dataset •Preview changes, then save to transformation script 12 3
  • 44. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Commit Transforms to DGraph, or Create New Hive Table •Transformation changes have to be committed to DGraph sample of dataset ‣Project transformations kept separate from other project copies of dataset •Transformations can also be applied to full dataset, using Apache Spark ‣Creates new Hive table of complete dataset
  • 45. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Upload Additional Datasets •Users can upload their own datasets into BDD, from MS Excel or CSV file •Uploaded data is first loaded into Hive table, then sampled/ingested as normal 1 2 3
  • 46. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Join Datasets On Common Attributes •Used to create a dataset based on the intersection (typically) of two datasets •Not required to just view two or more datasets together - think of this as a JOIN and SELECT
  • 47. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Join Example : Add Post + Author Details to Tweet URL •Tweets ingested into data reservoir can reference a page URL •Site Content dataset contains title, content, keywords etc for RM website pages •We would like to add these details to the tweets where an RM web page was mentioned ‣And also add page author details missing from the site contents upload Main “driving” dataset Contains tweet user details,
 tweet text, hashtags, URL referenced, location of tweeter etc Contains full details of each site page,
 including URL, title, content, category Join on URL 
 referenced in tweet Contains the post author details
 missing from the Site Content 
 dataset Join on internal
 Page ID
  • 48. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Select From Multiple Visualisation Types
  • 49. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Multi-Dataset Join Step 1 : Join Site Contents to Posts •Site contents dataset needs to gain access to the page author attribute only found in Posts •Create join in the Dataset Relationships panel, using Post ID as the common attribute •Join from Site contents to Posts, to create left-outer join from first to second table 1 2 3 Previews rows from the join, based on
 post_id = a (post_id column)
  • 50. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Multi-Dataset Join Step 2 : Standardise URL Formats •URLs in Twitter dataset have trailing ‘/‘, whereas URLs in RM site data do not •Use the Transformation feature in Studio to add trailing ‘/‘ to RM site URLs •Select option to replace the current URL values and overwrite within project dataset
  • 51. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Multi-Dataset Join Step 3 : Join Tweets to Site Content •Join on the standardised-format URL attributes in the two datasets •Data view will now contain the page content and author for each tweet mentioning RM 1 2 3
  • 52. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Key BDD Studio Differentiator : Faceted Search Across Hadoop •BDD Studio dashboards support faceted search across all attributes, refinements •Auto-filter dashboard contents on selected attribute values - for data discovery •Fast analysis and summarisation through Endeca Server technology Further refinement on
 “OBIEE” in post keywords 3 Results now filtered
 on two refinements 4
  • 53. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Create Discovery Pages for Dataset Analysis •Select from palette of visualisation components •Select measures, attributes for display
  • 54. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com •Search on attribute values, text in attributes across all datasets •Extracted keywords, free text field search Faceted Search Across Project
  • 55. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Key BDD Studio Differentiator : Faceted Search Across Hadoop •BDD Studio dashboards support faceted search across all attributes, refinements •Auto-filter dashboard contents on selected attribute values •Fast analysis and summarisation through Endeca Server technology “Mark Rittman” selected from Post Authors Results filtered on 
 selected refinement 1 2
  • 56. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Export Prepared Datasets Back to Hive, for OBIEE + VA •Transformations within BDD Studio can then be used to create curated fact + dim Hive tables •Can be used then as a more suitable dataset for use with OBIEE RPD + Visual Analyzer •Or exported then in to Exadata or Exalytics to combine with main DW datasets
  • 57. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Further Analyse in Visual Analyzer for Managed Dataset •Users in Visual Analyzer then have
 a more structured dataset to use •Data organised into dimensions, 
 facts, hierarchies and attributes •Can still access Hadoop directly
 through Impala or Big Data SQL •Big Data Discovery though was 
 key to initial understanding of data
  • 58. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com … And Finally Additional Resources How to Contact Us
  • 59. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Additional Resources •Articles on the Rittman Mead Blog ‣http://www.rittmanmead.com/category/oracle-big-data-appliance/ ‣http://www.rittmanmead.com/category/big-data/ ‣http://www.rittmanmead.com/category/oracle-big-data-discovery/ •Slides will be on the BI Forum USB sticks •Rittman Mead offer consulting, training and managed services for Oracle Big Data ‣Oracle & Cloudera partners ‣http://www.rittmanmead.com/bigdata
  • 60. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or 
 +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India) E : info@rittmanmead.com W : www.rittmanmead.com Rittman Mead and Big Data on Oracle Case Study - Rittman Mead
 Mark Rittman, CTO, Rittman Mead March 2015