Analyzing new and diverse digital data streams can reveal new sources of economic value, provide fresh insights into customer behavior and identify market trends early on. But this influx of new data can create challenges for IT departments. To derive real business value from Big Data, you need the right tools to capture and organize a wide variety of data types from different sources, and to be able to easily analyze it within the context of all your enterprise data. Attend this session to learn how Oracle’s end-to-end value chain for Big Data can help you unlock the value of Big Data.
2. The following is intended to outline our general product
direction. It is intended for information purposes only, and
may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality,
and should not be relied upon in making purchasing
decisions.
The development, release, and timing of any features or
functionality described for Oracle’s products remain at the
sole discretion of Oracle.
3. Case: On-line Ads and Content
Real-time: Determine
Low best ad to place
Latency Lookup user on page for this user
profile
Add user NoSQL Expert
if not present DB Input into System
Actual
HDFS Predictions
ads
on browsing
Web served
logs
High scale Batch
data reductions BI and
Billing
NoSQL DB Analytics
Profiles
4. Agenda
• Big Data Technology
• Oracle Big Data Appliance
• Big Data Applications
• Summary
• Q&A
6. Big Data: Infrastructure Requirements
Acquire Organize Analyze
• Low, predictable Latency
• High Transaction Volume • Deep Analytics
• Flexible Data Structures • Agile Development
• Massive Scalability
• High Throughput
• Real Time Results
• In-Place Preparation
• All Data Sources/Structures
12. Oracle Big Data Appliance Hardware
•18 Sun X4270 M2 Servers
– 48 GB memory per node = 864 GB memory
– 12 Intel cores per node = 216 cores
– 24 TB storage per node = 432 TB storage
•40 Gb p/sec InfiniBand
•10 Gb p/sec Ethernet
13. Big Data Appliance
Cluster of industry standard servers for Hadoop and NoSQL Database
• Focus on Scalability and Availability at low cost
InfiniBand Network
Compute and Storage
• Redundant 40Gb/s switches
• 18 High-performance low-cost
• IB connectivity to Exadata
servers acting as Hadoop
nodes
10GigE Network • 24 TB Capacity per node
• 8 10GigE ports • 2 6-core CPUs per node
• Datacenter connectivity • Hadoop triple replication
• NoSQL Database triple
replication
14. Scale Out to Infinity
Scale out by connecting racks
to each other using Infiniband
• Expand up to eight racks without
additional switches
• Scale beyond eight racks by adding
an additional switch
15. Oracle Big Data Appliance Software
•Oracle Linux 5.6
•Java Hotspot VM
•Apache Hadoop Distribution v0.20.x
•R Distribution
•Oracle NoSQL Database Enterprise
Edition
•Oracle Data Integrator Application
Adapter for Hadoop
•Oracle Loader for Hadoop
16. Why Open-Source Apache Hadoop?
• Fast evolution in critical features
• Built by the Hadoop experts in the community
• Practical instead of esoteric
• Focus on what is needed for large clusters
• Proven at very large scale
• In production at all the large consumers of Hadoop
• Extremely stable in those environments
• Well-understood by practitioners
17. Software Layout
• Node 1:
• M: Name Node, Balancer & HBase Master
• S: HDFS Data Node, NoSQL DB Storage Node
• Node 2:
• M: Secondary Name Node, Management,
Zookeeper, MySQL Slave
• S: HDFS Data Node, NoSQL DB Storage Node
• Node 3:
• M: JobTracker, MySQL Master, ODI Agent,
Hive Server
• S: HDFS Data Node, NoSQL DB Storage Node
• Node 4 – 18:
• S: HDFS Data Nodes, Task Tracker, HBase
Region Server, NoSQL DB Storage Nodes
• Your MapReduce runs here!
18. Big Data Appliance
Big Data for the Enterprise
• Optimized and Complete
• Everything you need to store and integrate
your lower information density data
• Integrated with Oracle Exadata
• Analyze all your data
• Easy to Deploy
• Risk Free, Quick Installation and Setup
• Single Vendor Support
• Full Oracle support for the entire system and
software set
20. Key-Value Store Workloads
• Large dynamic schema based data repositories
• Data capture
• Web applications
• Online retail
• Sensor/statistics/network capture/Mobile Devices
• Data services
• Scalable authentication
• Real-time communication (MMS, SMS, routing)
• Personalization / Localization
• Social Networks
21. Oracle NoSQL DB
A distributed, scalable key-value database
• Simple Data Model
• Key-value pair with major+sub-key paradigm
• Read/insert/update/delete operations Application Application
• Scalability NoSQLDB Driver NoSQLDB Driver
• Dynamic data partitioning and distribution
• Optimized data access via intelligent driver
• High availability
• One or more replicas
• Disaster recovery through location of replicas
• Resilient to partition master failures
• No single point of failure
Storage Nodes Storage Nodes
• Transparent load balancing Data Center B
Data Center A
• Reads from master or replicas
• Driver is network topology & latency aware
22. Resolving a Request
Operation + Key[M,m] + Value + Transaction Policy
Client
Hash Major Key to determine
Partition id
Use Partition Map to map Partition • Operation result
id to a Rep Group • New Partition Map
• RepNodeStorageTable
Use State Table to determine eligible information
Storage Node(s) within Rep Group
Use Load Balancer to select best
eligible Rep Node
Contact Rep Node directly
23. ACID Transactions
Transaction Policy Transaction Policy
Write Durability Read Consistency
• Configurable per-operation, • Configurable per-operation,
application can set defaults application can set defaults
• Write Transaction Durability consists • Read Consistency specified as
of both
Absolute, Time-based, Version or
a) Sync policy (on Master and None
Replica)
• Absolute Read from the master
• Sync – force to disk
• Write No Sync – force to OS • Time-based Read from any
buffer replica that is within <time-
• No Sync – write to local log buffer, interval> of master or better
flush when convenient • Version Read from any replica
b) Replica Acknowledgement Policy that is current with <transaction-
• All token> or higher
• Simple Majority • None Read from any replica
• None
24. Oracle NoSQL DB Differentiation
• Commercial Grade Software and Support
• General-purpose
• Reliable – Based on proven Berkeley DB JE HA
• Easy to install and configure
• Scalable throughput, bounded latency
• Simple Programming and Operational Model
• Simple Major + Sub key and Value data structure
• ACID transactions
• Configurable consistency & durability
• Easy Management
• Web-based console, API accessible
• Manages and Monitors: Topology; Load; Performance; Events; Alerts
• Completes Oracle large scale data storage offerings
25. Try NoSQL Database on OTN
Oracle NoSQL Database:
• Community Edition is available as a software
only distribution
• Enterprise Edition is available as a separately
licensable product or as part of Big Data Appliance
38. Big Data Appliance
Big Data for the Enterprise
• Optimized and Complete
• Everything you need to store and integrate your lower
information density data
• Integrated with Oracle Exadata
• Analyze all your data
• Easy to Deploy
• Risk Free, Quick Installation and Setup
• Single Vendor Support
• Full Oracle support for the entire system and software
set
39. Big Data Appliance and Exadata
Big Data for the Enterprise
NoSQL DB
HDFS
Hadoop
RDBMS
Is Developer Centric the right word? Should we hyphenate, or put comma’s
Benefits for Online Mode: No need to write to disk after Hadoop job Simpler management for use cases with lots of nodes generating output filesBenefits for Offline Mode (DP Files): Import operation can be parallelized in the database Fastest option for external tables
Direct HDFS:Access data on HDFS through the external table mechanismBenefitsData on HDFS can be queried from the databaseImport into the database as needed