SlideShare uma empresa Scribd logo
1 de 63
DBA to Data Scientist with Oracle Big Data
Appliance

November 09, 2013

© Copyright 2013. Apps Associates LLC.

1
About Me
Satyendra Kumar Pasalapudi
Practice Manager – Apps Associates
Co-Founder & Vice President – All India Oracle Users Group
14+ Years of Experience in Oracle Technologies
Exadata Certified Professional

@pasalapudi

Content Courtesy oracle.com, hortonworks,couchbase,apache
© Copyright 2013. Apps Associates LLC.

2
Agenda
•
•
•
•
•
•
•

What is Big Data
Big Data Growth
4 Phases of Big Data
NoSQL Databases
Hadoop Basics
Big Data Appliance
Skills Required for DBA Scientist

© Copyright 2013. Apps Associates LLC.

3
Big Data Growth

© Copyright 2013. Apps Associates LLC.

4
3 Macro Trends Driving Disruption

© Copyright 2013. Apps Associates LLC.

5
Gen X Stats

© Copyright 2013. Apps Associates LLC.

6
Big Data – High Data Varity & Velocity

© Copyright 2013. Apps Associates LLC.

7
Database Market Disruption
$30B Database Market Being Disrupted

© Copyright 2013. Apps Associates LLC.

8
How Did Big Data Evolve?
• More people interacting with data
• Smartphones
• Internet
• Greater volumes of data being generated (machine-to-machine
generation)
• Sensors
• General Packet Radio Services (GPRS)

© Copyright 2013. Apps Associates LLC.

9
What Is Big Data?
Big data is defined as voluminous unstructured data from many different
sources, such as:
•
•
•
•
•
•
•
•
•

Social networks
Banking and financial services
E-commerce services
Web-centric services
Internet search indexes
Scientific searches
Document searches
Medical records
Weblogs
© Copyright 2013. Apps Associates LLC.

10
Big Data
• Extremely large datasets that are hard to deal with using Relational
Databases
– Storage/Cost
– Search/Performance
– Analytics and Visualization

• Need for parallel processing on hundreds of machines
– ETL cannot complete within a reasonable time
– Beyond 24hrs – never catch up

© Copyright 2013. Apps Associates LLC.

11
Characteristics of Big Data

Social Networks

Micro Blogs

RSS Feeds

Volume

Variety

Velocity

Value

© Copyright 2013. Apps Associates LLC.

12
The Four Phases of Data Conversion

1

2
Acquire

3
Organize

4
Analyze

© Copyright 2013. Apps Associates LLC.

Decide

13
Operational vs. Analytical Databases

© Copyright 2013. Apps Associates LLC.

14
Growth is the New Reality
Instagram gained nearly 1 million users overnight when they expanded to
Android

© Copyright 2013. Apps Associates LLC.

15
Draw Something Viral Growth

© Copyright 2013. Apps Associates LLC.

16
How Do You Take This Growth?

© Copyright 2013. Apps Associates LLC.

17
Scaling Out RDBMS

© Copyright 2013. Apps Associates LLC.

18
RDBMS are Not Enough?

© Copyright 2013. Apps Associates LLC.

19
NoSQL Technology Scales Out

© Copyright 2013. Apps Associates LLC.

20
A New Technology

© Copyright 2013. Apps Associates LLC.

21
Use Cases

© Copyright 2013. Apps Associates LLC.

22
Relational vs. Documental Data Model

JSON or JavaScript Object Notation, is a text-based open standard designed for human-readable
data interchange. It is derived from the JavaScript scripting language for representing simple data
structures and associative arrays, called objects. Despite its relationship to JavaScript, it is languageindependent, with parsers available for many languages

© Copyright 2013. Apps Associates LLC.

23
Brewer's CAP Theorem

© Copyright 2013. Apps Associates LLC.

24
Brewer's CAP Theorem

© Copyright 2013. Apps Associates LLC.

25
NoSQL Technology Spectrum

© Copyright 2013. Apps Associates LLC.

26
Operational vs. Analytical Databases

© Copyright 2013. Apps Associates LLC.

27
Hadoop Design Principles
• System shall manage and heal itself
– Automatically and transparently route around failure
– Speculatively execute redundant tasks if certain nodes are detected to be
slow

• Performance shall scale linearly
– Proportional change in capacity with resource change

• Compute should move to data
– Lower latency, lower bandwidth

• Simple core, modular and extensible

© Copyright 2013. Apps Associates LLC.

28
Hadoop Intro
• At Google MapReduce operation are run on a special file system called
Google File System (GFS) that is highly optimized for this purpose.
• GFS is not open source.
• Doug Cutting and others at Yahoo! reverse engineered the GFS and called
it Hadoop Distributed File System (HDFS).
• The software framework that supports HDFS, MapReduce and other
related entities is called the project Hadoop or simply Hadoop.
• Projects Nutch and Lucene were started with “search” as the application
in mind;

© Copyright 2013. Apps Associates LLC.

29
Hadoop Intro
• Hadoop Distributed file system and mapreduce were found to have
applications beyond search.
• HDFS and MapReduce were moved out of Nutch as a sub-project of
Lucene and later promoted into a apache project Hadoop

© Copyright 2013. Apps Associates LLC.

30
Hadoop History
•
•
•
•
•
•
•

Dec 2004 – Google GFS paper published
July 2005 – Nutch uses MapReduce
Feb 2006 – Starts as a Lucene subproject
Apr 2007 – Yahoo! on 1000-node cluster
Jan 2008 – An Apache Top Level Project
Jul 2008 – A 4000 node test cluster
May 2009 – Hadoop sorts Petabyte in 17 hours

© Copyright 2013. Apps Associates LLC.

31
What & Where is Hadoop Used For?
Search
• Yahoo, Amazon, Zvents

Log Processing
• Facebook, Yahoo, ContextWeb. Joost, Last.fm

Recommendation Systems
• Facebook

Data Warehouse
• Facebook, AOL

Video and Image Analysis
• New York Times, Eyealike
© Copyright 2013. Apps Associates LLC.

32
What & Where is Hadoop Used For?
Amazon.com, Ancestry.com, Akamai, American Airlines, AOL, Apple, AVG ,
eBay, Electronic Arts, Hortonworks, Federal Reserve Board of Governors,
Foursquare, Fox Interactive Media, Google, HewlettPackard, IBM,
ImageShack, ISI, InMobi, Intuit, Joost, Last.fm, LinkedIn, Microsoft, NetApp,
Netflix, Ooyala, Riot Games, Spotify, Qualtrics, The New York Times, SAP
AG, SAS Institute, StumbleUpon, Twitter, Yodlee

© Copyright 2013. Apps Associates LLC.

33
Hadoop Ecosystem
Client Access

Data Access

Data Mining

Orchestration

Hue
Hive(Sql)
Pig(Pl/Sql)

Sqoop
Flume

Mahout

Oozie

MapReduce (Job Scheduling/Execution System)

HBase (key-value store)

(Streaming/Pipes APIs)

HDFS (Hadoop Distributed File System)
Java Virtual Machine

OS – Redhat, Suse, Ubuntu,Windows
Commodity Hardware
© Copyright 2013. Apps Associates LLC.

Chukwa (Monitoring)

ZooKeeper
(Coordination)

Networking

34
HBase
• HBase is an open source, non-relational, distributed database modeled after
Google's BigTable and is written in Java. It is developed as part of Apache
Software Foundation's Apache Hadoop project and runs on top of HDFS
(Hadoop Distributed File system), providing BigTable-like capabilities for
Hadoop. That is, it provides a fault-tolerant way of storing large quantities of
sparse data.
• HBase features compression, in-memory operation, and Bloom filters on a percolumn basis as outlined in the original BigTable paper. Tables in HBase can
serve as the input and output for MapReduce jobs run in Hadoop, and may be
accessed through the Java API but also through REST, Avro or Thrift gateway
APIs.
© Copyright 2013. Apps Associates LLC.

35
HBase
• HBase is not a direct replacement for a classic SQL database, although recently
its performance has improved, and it is now serving several data-driven
websites including Facebook's Messaging Platform.
• “Project's goal is the hosting of very large tables - billions of rows X millions of
columns - atop clusters of commodity hardware”
• Column-oriented and Random access, real time read/write
• “Random access performance on par with open source relational databases
such as MySQL”

© Copyright 2013. Apps Associates LLC.

36
PIG
• Compiled into a series of
MapReduce jobs
– Easier to program
– Optimization opportunities

• grunt> A = LOAD 'student'
USING PigStorage() AS
(name:chararray, age:int,
gpa:float);
• grunt> B = FOREACH A
GENERATE name;

© Copyright 2013. Apps Associates LLC.

37
Hive
Managing and querying
structured data
•
•
•
•
•
•

MapReduce for execution
SQL like syntax
Extensible with types,
functions, scripts
Metadata stored in a RDBMS
(MySQL)
Joins, Group By, Nesting
Optimizer for number of
MapReduce required

hive> SELECT a.foo FROM invites
a WHERE a.ds='<DATE>‘;
© Copyright 2013. Apps Associates LLC.

38
Sqoop
• It supports incremental loads of a
single table or a free form SQL
query as well as saved jobs which
can be run multiple times to
import updates made to a
database since the last import
• Imports can also be used to
populate tables in Hive or HBase

• Exports can be used to put data
from Hadoop into a relational
database

© Copyright 2013. Apps Associates LLC.

39
Flume

© Copyright 2013. Apps Associates LLC.

40
HDFS Architecture

© Copyright 2013. Apps Associates LLC.

41
Namenode and Datanodes
• Master/slave architecture
• HDFS cluster consists of a single Namenode, a master server that manages
the file system namespace and regulates access to files by clients
• There are a number of DataNodes usually one per node in a cluster
• The DataNodes manage storage attached to the nodes that they run on
• HDFS exposes a file system namespace and allows user data to be stored in
files
• A file is split into one or more blocks and set of blocks are stored in
DataNodes
• DataNodes: serves read, write requests, performs block creation, deletion,
and replication upon instruction from Namenode
© Copyright 2013. Apps Associates LLC.

42
HDFS Architecture
43
Metadata ops

Metadata(Name, replicas..)
(/home/foo/data,6...)

Namenode

Client
Read

Block ops
Datanodes

Datanodes
replication

B
Blocks

Rack1

Write

Rack2

Client
© Copyright 2013. Apps Associates LLC.

43
Architecture Overview

© Copyright 2013. Apps Associates LLC.

44
HDFS Distributions

© Copyright 2013. Apps Associates LLC.

45
Oracle Big Data Appliance: Introduction
Oracle Big Data Appliance: Introduction
• Oracle Big Data Appliance is an engineered system containing both hardware
and software components. Oracle Big Data Appliance delivers:
‒ A complete and optimized solution for big data
‒ Single-vendor support for both hardware and software
‒ An easy-to-deploy solution

‒ Tight integration with Oracle Database

© Copyright 2013. Apps Associates LLC.

46
Hadoop 2.0

© Copyright 2013. Apps Associates LLC.

47
Oracle Big Data Appliance: Where It Stands?
Data Variety
Unstructured

Big Data Appliance

Schema-less

Schema
Information

Acquire

Organize

Analyze

© Copyright 2013. Apps Associates LLC.

48
Oracle Big Data: Software Components
Oracle Big Data Connectors

Oracle NoSQL
Database

Open Source R Distribution
Cloudera Manager & Cloudera’s Distribution
Including Apache Hadoop

Oracle Linux 5.6 and Java Hotspot VM

Oracle Big Data Appliance
© Copyright 2013. Apps Associates LLC.

49
Oracle Big Data with Oracle Exadata

© Copyright 2013. Apps Associates LLC.

50
Mapping the Phases with Software
Acquire Phase
– Hadoop Distributed File System
– Oracle NoSQL Database

Organize Phase
– Hadoop Software Framework
– Oracle Data Integrator

Analyze Phase
– R Statistical Programming Environment
– Oracle Data Warehouse

© Copyright 2013. Apps Associates LLC.

51
What Is a Key-Value Store?
• A KV Store is essentially a two-column table consisting of a key and a
value associated with the key
• The key acts as the index, and the value can be referenced as a look up

© Copyright 2013. Apps Associates LLC.

52
What Is Oracle Direct Connector for HDFS?
Oracle Direct Connector for HDFS (ODCH) is a connector which facilitates
read access from HDFS to Oracle Database using external tables.
• It uses the ORACLE_LOADER access driver

• It enables you to:
‒ Access big data without loading the data
‒ Access the data stored in HDFS files

‒ Access CSV (comma-separated values) files and Data Pump files generated by Oracle
Loader for Hadoop
‒ Load data extracted and transformed by Oracle Data Integrator

© Copyright 2013. Apps Associates LLC.

53
Analyze Phase
Statistical
Functions

Analyze

Database
+
Oracle R Enterprise

Data Mining
Algorithms
Query
Capabilities

© Copyright 2013. Apps Associates LLC.

54
What Is R?
R is an open source statistical programming language and environment, which
provides:
•
•
•
•
•

An easy-to-use language
A powerful graphical environment for visualization
Several out-of-the-box statistical techniques
R packages
Several GUI front ends for analyzing data interactively

It was started in 1994 as an alternative to SAS, SPSS, and other
statistical environments.
R’s widespread use, breadth of functionality, and quality of implementation have
enabled it to establish itself as a new statistical software standard.
© Copyright 2013. Apps Associates LLC.

55
Oracle Big Data: Software Components
Oracle Big Data Connectors

Oracle NoSQL
Database

Open Source R Distribution
Cloudera Manager & Cloudera’s Distribution
Including Apache Hadoop

Oracle Linux 5.6 and Java Hotspot VM

Oracle Big Data Appliance
© Copyright 2013. Apps Associates LLC.

56
Data Science

Source :http://wikibon.org/blog/role-of-the-data-scientist/
© Copyright 2013. Apps Associates LLC.

57
Data Scientist

Source :http://wikibon.org/blog/role-of-the-data-scientist/
© Copyright 2013. Apps Associates LLC.

58
DBA to Data Scientist
Hadoop
HDFS
Map Reduce
NoSQL Database
Hive
Pig
OR
All the above with
Big Data Appliance
© Copyright 2013. Apps Associates LLC.

59
Oracle Big Data Solution
Decide
Oracle Real-Time
Decisions

Oracle Event
Processing

Apache
Flume

Oracle
GoldenGate

Stream

Endeca Information
Discovery

Cloudera
Hadoop

Oracle BI Foundation
Suite

Oracle
Database

Oracle Big Data
Connectors

Oracle
Advanced
Analytics

Oracle NoSQL
Database

Oracle R
Distribution

Oracle Data
Integrator

Oracle Spatial
& Graph

Acquire – Organize – Analyze
© Copyright 2013. Apps Associates LLC.

60
Intelligence By Variety

© Copyright 2013. Apps Associates LLC.

61
Connect with Us
Web: www.appsassociates.com
Email: satyendra.pasalapudi@appsassociates.com | satyendra.kumar@aioug.org

YouTube: www.youtube.com/user/AppsAssociates
LinkedIn: www.us.linkedin.com/company/apps-associates
Twitter: @AppsAssociates
Facebook: www.facebook.com/AppsAssociatesGlobal

© Copyright 2013. Apps Associates LLC.

62
Thank You!

Mais conteúdo relacionado

Mais procurados

project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoopManoj Jangalva
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopBrock Noland
 
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiHadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiSlim Baltagi
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overviewvhrocca
 
Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»Anna Shymchenko
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemMd. Hasan Basri (Angel)
 
The Big Data Puzzle, Where Does the Eclipse Piece Fit?
The Big Data Puzzle, Where Does the Eclipse Piece Fit?The Big Data Puzzle, Where Does the Eclipse Piece Fit?
The Big Data Puzzle, Where Does the Eclipse Piece Fit?J Langley
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Casesboorad
 
The Big Data Ecosystem at LinkedIn
The Big Data Ecosystem at LinkedInThe Big Data Ecosystem at LinkedIn
The Big Data Ecosystem at LinkedInOSCON Byrum
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewAbhishek Roy
 
Introduction to Big Data, MapReduce, its Use Cases, and the Ecosystems
Introduction to Big Data, MapReduce, its Use Cases, and the EcosystemsIntroduction to Big Data, MapReduce, its Use Cases, and the Ecosystems
Introduction to Big Data, MapReduce, its Use Cases, and the EcosystemsJongwook Woo
 
Enrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache HadoopEnrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache HadoopHortonworks
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiridatastack
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data ScienceDataWorks Summit
 
Big Data & Data Lakes Building Blocks
Big Data & Data Lakes Building BlocksBig Data & Data Lakes Building Blocks
Big Data & Data Lakes Building BlocksAmazon Web Services
 

Mais procurados (20)

project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoop
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache Hadoop
 
Hands On: Introduction to the Hadoop Ecosystem
Hands On: Introduction to the Hadoop EcosystemHands On: Introduction to the Hadoop Ecosystem
Hands On: Introduction to the Hadoop Ecosystem
 
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiHadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overview
 
Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
The Big Data Puzzle, Where Does the Eclipse Piece Fit?
The Big Data Puzzle, Where Does the Eclipse Piece Fit?The Big Data Puzzle, Where Does the Eclipse Piece Fit?
The Big Data Puzzle, Where Does the Eclipse Piece Fit?
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
 
201305 hadoop jpl-v3
201305 hadoop jpl-v3201305 hadoop jpl-v3
201305 hadoop jpl-v3
 
The Big Data Ecosystem at LinkedIn
The Big Data Ecosystem at LinkedInThe Big Data Ecosystem at LinkedIn
The Big Data Ecosystem at LinkedIn
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
 
Hadoop and other animals
Hadoop and other animalsHadoop and other animals
Hadoop and other animals
 
Introduction to Big Data, MapReduce, its Use Cases, and the Ecosystems
Introduction to Big Data, MapReduce, its Use Cases, and the EcosystemsIntroduction to Big Data, MapReduce, its Use Cases, and the Ecosystems
Introduction to Big Data, MapReduce, its Use Cases, and the Ecosystems
 
Enrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache HadoopEnrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache Hadoop
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
 
Big data with java
Big data with javaBig data with java
Big data with java
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 
Big Data & Data Lakes Building Blocks
Big Data & Data Lakes Building BlocksBig Data & Data Lakes Building Blocks
Big Data & Data Lakes Building Blocks
 

Destaque

SAAS vs PAAS: Cloud Telephony
SAAS vs PAAS: Cloud TelephonySAAS vs PAAS: Cloud Telephony
SAAS vs PAAS: Cloud TelephonyAnkit Jain
 
AWS Webcast - Introduction to Amazon RDS: Low Admin, High Performance Databas...
AWS Webcast - Introduction to Amazon RDS: Low Admin, High Performance Databas...AWS Webcast - Introduction to Amazon RDS: Low Admin, High Performance Databas...
AWS Webcast - Introduction to Amazon RDS: Low Admin, High Performance Databas...Amazon Web Services
 
AWS Webcast - Understanding database options
AWS Webcast - Understanding database optionsAWS Webcast - Understanding database options
AWS Webcast - Understanding database optionsAmazon Web Services
 
oracle-PAAS
oracle-PAASoracle-PAAS
oracle-PAASAsha BG
 
Migrating and Running DBs on Amazon RDS for Oracle
Migrating and Running DBs on Amazon RDS for OracleMigrating and Running DBs on Amazon RDS for Oracle
Migrating and Running DBs on Amazon RDS for OracleMaris Elsins
 
Database as a service con Oracle Cloud platform
Database as a service con Oracle Cloud platformDatabase as a service con Oracle Cloud platform
Database as a service con Oracle Cloud platformErick Vidal Bazini
 
AWS Webcast - Introduction to RDS Low Admin High Perf DBS
AWS Webcast - Introduction to RDS Low Admin High Perf DBSAWS Webcast - Introduction to RDS Low Admin High Perf DBS
AWS Webcast - Introduction to RDS Low Admin High Perf DBSAmazon Web Services
 
NewSQL overview, Feb 2015
NewSQL overview, Feb 2015NewSQL overview, Feb 2015
NewSQL overview, Feb 2015Ivan Glushkov
 
AWS re:Invent 2016: Deep Dive on Amazon Relational Database Service (DAT305)
AWS re:Invent 2016: Deep Dive on Amazon Relational Database Service (DAT305)AWS re:Invent 2016: Deep Dive on Amazon Relational Database Service (DAT305)
AWS re:Invent 2016: Deep Dive on Amazon Relational Database Service (DAT305)Amazon Web Services
 
Introduction Pentaho 5.0
Introduction Pentaho 5.0 Introduction Pentaho 5.0
Introduction Pentaho 5.0 Xpand IT
 
Cartagena Data Festival | Telling Stories with Data 2015 04-21
Cartagena Data Festival | Telling Stories with Data 2015 04-21Cartagena Data Festival | Telling Stories with Data 2015 04-21
Cartagena Data Festival | Telling Stories with Data 2015 04-21ulrichatz
 
Migrating to git
Migrating to gitMigrating to git
Migrating to gitXpand IT
 
онлайн бронирование модуль для турагенств
онлайн бронирование модуль для турагенствонлайн бронирование модуль для турагенств
онлайн бронирование модуль для турагенствAdrian Parker
 
Review: Leadership Frameworks
Review: Leadership FrameworksReview: Leadership Frameworks
Review: Leadership FrameworksMariam Nazarudin
 
Samanage-Website-Redesign-Jan2017
Samanage-Website-Redesign-Jan2017Samanage-Website-Redesign-Jan2017
Samanage-Website-Redesign-Jan2017WhatConts
 

Destaque (17)

SAAS vs PAAS: Cloud Telephony
SAAS vs PAAS: Cloud TelephonySAAS vs PAAS: Cloud Telephony
SAAS vs PAAS: Cloud Telephony
 
AWS Webcast - Introduction to Amazon RDS: Low Admin, High Performance Databas...
AWS Webcast - Introduction to Amazon RDS: Low Admin, High Performance Databas...AWS Webcast - Introduction to Amazon RDS: Low Admin, High Performance Databas...
AWS Webcast - Introduction to Amazon RDS: Low Admin, High Performance Databas...
 
AWS Webcast - Understanding database options
AWS Webcast - Understanding database optionsAWS Webcast - Understanding database options
AWS Webcast - Understanding database options
 
Oracle PaaS Cloud Preview Event
Oracle PaaS Cloud Preview EventOracle PaaS Cloud Preview Event
Oracle PaaS Cloud Preview Event
 
oracle-PAAS
oracle-PAASoracle-PAAS
oracle-PAAS
 
Migrating and Running DBs on Amazon RDS for Oracle
Migrating and Running DBs on Amazon RDS for OracleMigrating and Running DBs on Amazon RDS for Oracle
Migrating and Running DBs on Amazon RDS for Oracle
 
Database as a service con Oracle Cloud platform
Database as a service con Oracle Cloud platformDatabase as a service con Oracle Cloud platform
Database as a service con Oracle Cloud platform
 
DbOps, DevOps and Ops
DbOps, DevOps and OpsDbOps, DevOps and Ops
DbOps, DevOps and Ops
 
AWS Webcast - Introduction to RDS Low Admin High Perf DBS
AWS Webcast - Introduction to RDS Low Admin High Perf DBSAWS Webcast - Introduction to RDS Low Admin High Perf DBS
AWS Webcast - Introduction to RDS Low Admin High Perf DBS
 
NewSQL overview, Feb 2015
NewSQL overview, Feb 2015NewSQL overview, Feb 2015
NewSQL overview, Feb 2015
 
AWS re:Invent 2016: Deep Dive on Amazon Relational Database Service (DAT305)
AWS re:Invent 2016: Deep Dive on Amazon Relational Database Service (DAT305)AWS re:Invent 2016: Deep Dive on Amazon Relational Database Service (DAT305)
AWS re:Invent 2016: Deep Dive on Amazon Relational Database Service (DAT305)
 
Introduction Pentaho 5.0
Introduction Pentaho 5.0 Introduction Pentaho 5.0
Introduction Pentaho 5.0
 
Cartagena Data Festival | Telling Stories with Data 2015 04-21
Cartagena Data Festival | Telling Stories with Data 2015 04-21Cartagena Data Festival | Telling Stories with Data 2015 04-21
Cartagena Data Festival | Telling Stories with Data 2015 04-21
 
Migrating to git
Migrating to gitMigrating to git
Migrating to git
 
онлайн бронирование модуль для турагенств
онлайн бронирование модуль для турагенствонлайн бронирование модуль для турагенств
онлайн бронирование модуль для турагенств
 
Review: Leadership Frameworks
Review: Leadership FrameworksReview: Leadership Frameworks
Review: Leadership Frameworks
 
Samanage-Website-Redesign-Jan2017
Samanage-Website-Redesign-Jan2017Samanage-Website-Redesign-Jan2017
Samanage-Website-Redesign-Jan2017
 

Semelhante a DBA to Data Scientist

Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data scienceAjay Ohri
 
Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asiaMuhammad Rifqi
 
Data infrastructure at Facebook
Data infrastructure at Facebook Data infrastructure at Facebook
Data infrastructure at Facebook AhmedDoukh
 
Analyst Report : The Enterprise Use of Hadoop
Analyst Report : The Enterprise Use of Hadoop Analyst Report : The Enterprise Use of Hadoop
Analyst Report : The Enterprise Use of Hadoop EMC
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony NguyenThanh Nguyen
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1Thanh Nguyen
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHitendra Kumar
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online TrainingLearntek1
 

Semelhante a DBA to Data Scientist (20)

Hadoop jon
Hadoop jonHadoop jon
Hadoop jon
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data science
 
Cap 10 ingles
Cap  10 inglesCap  10 ingles
Cap 10 ingles
 
Cap 10 ingles
Cap  10 inglesCap  10 ingles
Cap 10 ingles
 
Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
 
Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asia
 
Data infrastructure at Facebook
Data infrastructure at Facebook Data infrastructure at Facebook
Data infrastructure at Facebook
 
Analyst Report : The Enterprise Use of Hadoop
Analyst Report : The Enterprise Use of Hadoop Analyst Report : The Enterprise Use of Hadoop
Analyst Report : The Enterprise Use of Hadoop
 
Hadoop in a Nutshell
Hadoop in a NutshellHadoop in a Nutshell
Hadoop in a Nutshell
 
Hadoop
HadoopHadoop
Hadoop
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
Hadoop training
Hadoop trainingHadoop training
Hadoop training
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1
 
HDFS
HDFSHDFS
HDFS
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
 

Mais de pasalapudi

Multiple ldap implementation with ebs using oid
Multiple ldap implementation with ebs using oidMultiple ldap implementation with ebs using oid
Multiple ldap implementation with ebs using oidpasalapudi
 
Oracle E-Business Suite On Oracle Cloud
Oracle E-Business Suite On Oracle CloudOracle E-Business Suite On Oracle Cloud
Oracle E-Business Suite On Oracle Cloudpasalapudi
 
Aioug2017 deploying-ebs-on-prem-and-on-oracle-cloud v2
Aioug2017 deploying-ebs-on-prem-and-on-oracle-cloud v2Aioug2017 deploying-ebs-on-prem-and-on-oracle-cloud v2
Aioug2017 deploying-ebs-on-prem-and-on-oracle-cloud v2pasalapudi
 
12.2 secure configureconsole_adop_changes_aioug_appsdba_nov17
12.2 secure configureconsole_adop_changes_aioug_appsdba_nov1712.2 secure configureconsole_adop_changes_aioug_appsdba_nov17
12.2 secure configureconsole_adop_changes_aioug_appsdba_nov17pasalapudi
 
Online patching ebs122_aioug_appsdba_nov2017
Online patching ebs122_aioug_appsdba_nov2017Online patching ebs122_aioug_appsdba_nov2017
Online patching ebs122_aioug_appsdba_nov2017pasalapudi
 
Aioug sangam13 v3
Aioug sangam13 v3Aioug sangam13 v3
Aioug sangam13 v3pasalapudi
 
Oracle database 12c intro
Oracle database 12c introOracle database 12c intro
Oracle database 12c intropasalapudi
 
Analyzing and Interpreting AWR
Analyzing and Interpreting AWRAnalyzing and Interpreting AWR
Analyzing and Interpreting AWRpasalapudi
 

Mais de pasalapudi (8)

Multiple ldap implementation with ebs using oid
Multiple ldap implementation with ebs using oidMultiple ldap implementation with ebs using oid
Multiple ldap implementation with ebs using oid
 
Oracle E-Business Suite On Oracle Cloud
Oracle E-Business Suite On Oracle CloudOracle E-Business Suite On Oracle Cloud
Oracle E-Business Suite On Oracle Cloud
 
Aioug2017 deploying-ebs-on-prem-and-on-oracle-cloud v2
Aioug2017 deploying-ebs-on-prem-and-on-oracle-cloud v2Aioug2017 deploying-ebs-on-prem-and-on-oracle-cloud v2
Aioug2017 deploying-ebs-on-prem-and-on-oracle-cloud v2
 
12.2 secure configureconsole_adop_changes_aioug_appsdba_nov17
12.2 secure configureconsole_adop_changes_aioug_appsdba_nov1712.2 secure configureconsole_adop_changes_aioug_appsdba_nov17
12.2 secure configureconsole_adop_changes_aioug_appsdba_nov17
 
Online patching ebs122_aioug_appsdba_nov2017
Online patching ebs122_aioug_appsdba_nov2017Online patching ebs122_aioug_appsdba_nov2017
Online patching ebs122_aioug_appsdba_nov2017
 
Aioug sangam13 v3
Aioug sangam13 v3Aioug sangam13 v3
Aioug sangam13 v3
 
Oracle database 12c intro
Oracle database 12c introOracle database 12c intro
Oracle database 12c intro
 
Analyzing and Interpreting AWR
Analyzing and Interpreting AWRAnalyzing and Interpreting AWR
Analyzing and Interpreting AWR
 

Último

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 

Último (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 

DBA to Data Scientist

  • 1. DBA to Data Scientist with Oracle Big Data Appliance November 09, 2013 © Copyright 2013. Apps Associates LLC. 1
  • 2. About Me Satyendra Kumar Pasalapudi Practice Manager – Apps Associates Co-Founder & Vice President – All India Oracle Users Group 14+ Years of Experience in Oracle Technologies Exadata Certified Professional @pasalapudi Content Courtesy oracle.com, hortonworks,couchbase,apache © Copyright 2013. Apps Associates LLC. 2
  • 3. Agenda • • • • • • • What is Big Data Big Data Growth 4 Phases of Big Data NoSQL Databases Hadoop Basics Big Data Appliance Skills Required for DBA Scientist © Copyright 2013. Apps Associates LLC. 3
  • 4. Big Data Growth © Copyright 2013. Apps Associates LLC. 4
  • 5. 3 Macro Trends Driving Disruption © Copyright 2013. Apps Associates LLC. 5
  • 6. Gen X Stats © Copyright 2013. Apps Associates LLC. 6
  • 7. Big Data – High Data Varity & Velocity © Copyright 2013. Apps Associates LLC. 7
  • 8. Database Market Disruption $30B Database Market Being Disrupted © Copyright 2013. Apps Associates LLC. 8
  • 9. How Did Big Data Evolve? • More people interacting with data • Smartphones • Internet • Greater volumes of data being generated (machine-to-machine generation) • Sensors • General Packet Radio Services (GPRS) © Copyright 2013. Apps Associates LLC. 9
  • 10. What Is Big Data? Big data is defined as voluminous unstructured data from many different sources, such as: • • • • • • • • • Social networks Banking and financial services E-commerce services Web-centric services Internet search indexes Scientific searches Document searches Medical records Weblogs © Copyright 2013. Apps Associates LLC. 10
  • 11. Big Data • Extremely large datasets that are hard to deal with using Relational Databases – Storage/Cost – Search/Performance – Analytics and Visualization • Need for parallel processing on hundreds of machines – ETL cannot complete within a reasonable time – Beyond 24hrs – never catch up © Copyright 2013. Apps Associates LLC. 11
  • 12. Characteristics of Big Data Social Networks Micro Blogs RSS Feeds Volume Variety Velocity Value © Copyright 2013. Apps Associates LLC. 12
  • 13. The Four Phases of Data Conversion 1 2 Acquire 3 Organize 4 Analyze © Copyright 2013. Apps Associates LLC. Decide 13
  • 14. Operational vs. Analytical Databases © Copyright 2013. Apps Associates LLC. 14
  • 15. Growth is the New Reality Instagram gained nearly 1 million users overnight when they expanded to Android © Copyright 2013. Apps Associates LLC. 15
  • 16. Draw Something Viral Growth © Copyright 2013. Apps Associates LLC. 16
  • 17. How Do You Take This Growth? © Copyright 2013. Apps Associates LLC. 17
  • 18. Scaling Out RDBMS © Copyright 2013. Apps Associates LLC. 18
  • 19. RDBMS are Not Enough? © Copyright 2013. Apps Associates LLC. 19
  • 20. NoSQL Technology Scales Out © Copyright 2013. Apps Associates LLC. 20
  • 21. A New Technology © Copyright 2013. Apps Associates LLC. 21
  • 22. Use Cases © Copyright 2013. Apps Associates LLC. 22
  • 23. Relational vs. Documental Data Model JSON or JavaScript Object Notation, is a text-based open standard designed for human-readable data interchange. It is derived from the JavaScript scripting language for representing simple data structures and associative arrays, called objects. Despite its relationship to JavaScript, it is languageindependent, with parsers available for many languages © Copyright 2013. Apps Associates LLC. 23
  • 24. Brewer's CAP Theorem © Copyright 2013. Apps Associates LLC. 24
  • 25. Brewer's CAP Theorem © Copyright 2013. Apps Associates LLC. 25
  • 26. NoSQL Technology Spectrum © Copyright 2013. Apps Associates LLC. 26
  • 27. Operational vs. Analytical Databases © Copyright 2013. Apps Associates LLC. 27
  • 28. Hadoop Design Principles • System shall manage and heal itself – Automatically and transparently route around failure – Speculatively execute redundant tasks if certain nodes are detected to be slow • Performance shall scale linearly – Proportional change in capacity with resource change • Compute should move to data – Lower latency, lower bandwidth • Simple core, modular and extensible © Copyright 2013. Apps Associates LLC. 28
  • 29. Hadoop Intro • At Google MapReduce operation are run on a special file system called Google File System (GFS) that is highly optimized for this purpose. • GFS is not open source. • Doug Cutting and others at Yahoo! reverse engineered the GFS and called it Hadoop Distributed File System (HDFS). • The software framework that supports HDFS, MapReduce and other related entities is called the project Hadoop or simply Hadoop. • Projects Nutch and Lucene were started with “search” as the application in mind; © Copyright 2013. Apps Associates LLC. 29
  • 30. Hadoop Intro • Hadoop Distributed file system and mapreduce were found to have applications beyond search. • HDFS and MapReduce were moved out of Nutch as a sub-project of Lucene and later promoted into a apache project Hadoop © Copyright 2013. Apps Associates LLC. 30
  • 31. Hadoop History • • • • • • • Dec 2004 – Google GFS paper published July 2005 – Nutch uses MapReduce Feb 2006 – Starts as a Lucene subproject Apr 2007 – Yahoo! on 1000-node cluster Jan 2008 – An Apache Top Level Project Jul 2008 – A 4000 node test cluster May 2009 – Hadoop sorts Petabyte in 17 hours © Copyright 2013. Apps Associates LLC. 31
  • 32. What & Where is Hadoop Used For? Search • Yahoo, Amazon, Zvents Log Processing • Facebook, Yahoo, ContextWeb. Joost, Last.fm Recommendation Systems • Facebook Data Warehouse • Facebook, AOL Video and Image Analysis • New York Times, Eyealike © Copyright 2013. Apps Associates LLC. 32
  • 33. What & Where is Hadoop Used For? Amazon.com, Ancestry.com, Akamai, American Airlines, AOL, Apple, AVG , eBay, Electronic Arts, Hortonworks, Federal Reserve Board of Governors, Foursquare, Fox Interactive Media, Google, HewlettPackard, IBM, ImageShack, ISI, InMobi, Intuit, Joost, Last.fm, LinkedIn, Microsoft, NetApp, Netflix, Ooyala, Riot Games, Spotify, Qualtrics, The New York Times, SAP AG, SAS Institute, StumbleUpon, Twitter, Yodlee © Copyright 2013. Apps Associates LLC. 33
  • 34. Hadoop Ecosystem Client Access Data Access Data Mining Orchestration Hue Hive(Sql) Pig(Pl/Sql) Sqoop Flume Mahout Oozie MapReduce (Job Scheduling/Execution System) HBase (key-value store) (Streaming/Pipes APIs) HDFS (Hadoop Distributed File System) Java Virtual Machine OS – Redhat, Suse, Ubuntu,Windows Commodity Hardware © Copyright 2013. Apps Associates LLC. Chukwa (Monitoring) ZooKeeper (Coordination) Networking 34
  • 35. HBase • HBase is an open source, non-relational, distributed database modeled after Google's BigTable and is written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop Distributed File system), providing BigTable-like capabilities for Hadoop. That is, it provides a fault-tolerant way of storing large quantities of sparse data. • HBase features compression, in-memory operation, and Bloom filters on a percolumn basis as outlined in the original BigTable paper. Tables in HBase can serve as the input and output for MapReduce jobs run in Hadoop, and may be accessed through the Java API but also through REST, Avro or Thrift gateway APIs. © Copyright 2013. Apps Associates LLC. 35
  • 36. HBase • HBase is not a direct replacement for a classic SQL database, although recently its performance has improved, and it is now serving several data-driven websites including Facebook's Messaging Platform. • “Project's goal is the hosting of very large tables - billions of rows X millions of columns - atop clusters of commodity hardware” • Column-oriented and Random access, real time read/write • “Random access performance on par with open source relational databases such as MySQL” © Copyright 2013. Apps Associates LLC. 36
  • 37. PIG • Compiled into a series of MapReduce jobs – Easier to program – Optimization opportunities • grunt> A = LOAD 'student' USING PigStorage() AS (name:chararray, age:int, gpa:float); • grunt> B = FOREACH A GENERATE name; © Copyright 2013. Apps Associates LLC. 37
  • 38. Hive Managing and querying structured data • • • • • • MapReduce for execution SQL like syntax Extensible with types, functions, scripts Metadata stored in a RDBMS (MySQL) Joins, Group By, Nesting Optimizer for number of MapReduce required hive> SELECT a.foo FROM invites a WHERE a.ds='<DATE>‘; © Copyright 2013. Apps Associates LLC. 38
  • 39. Sqoop • It supports incremental loads of a single table or a free form SQL query as well as saved jobs which can be run multiple times to import updates made to a database since the last import • Imports can also be used to populate tables in Hive or HBase • Exports can be used to put data from Hadoop into a relational database © Copyright 2013. Apps Associates LLC. 39
  • 40. Flume © Copyright 2013. Apps Associates LLC. 40
  • 41. HDFS Architecture © Copyright 2013. Apps Associates LLC. 41
  • 42. Namenode and Datanodes • Master/slave architecture • HDFS cluster consists of a single Namenode, a master server that manages the file system namespace and regulates access to files by clients • There are a number of DataNodes usually one per node in a cluster • The DataNodes manage storage attached to the nodes that they run on • HDFS exposes a file system namespace and allows user data to be stored in files • A file is split into one or more blocks and set of blocks are stored in DataNodes • DataNodes: serves read, write requests, performs block creation, deletion, and replication upon instruction from Namenode © Copyright 2013. Apps Associates LLC. 42
  • 43. HDFS Architecture 43 Metadata ops Metadata(Name, replicas..) (/home/foo/data,6...) Namenode Client Read Block ops Datanodes Datanodes replication B Blocks Rack1 Write Rack2 Client © Copyright 2013. Apps Associates LLC. 43
  • 44. Architecture Overview © Copyright 2013. Apps Associates LLC. 44
  • 45. HDFS Distributions © Copyright 2013. Apps Associates LLC. 45
  • 46. Oracle Big Data Appliance: Introduction Oracle Big Data Appliance: Introduction • Oracle Big Data Appliance is an engineered system containing both hardware and software components. Oracle Big Data Appliance delivers: ‒ A complete and optimized solution for big data ‒ Single-vendor support for both hardware and software ‒ An easy-to-deploy solution ‒ Tight integration with Oracle Database © Copyright 2013. Apps Associates LLC. 46
  • 47. Hadoop 2.0 © Copyright 2013. Apps Associates LLC. 47
  • 48. Oracle Big Data Appliance: Where It Stands? Data Variety Unstructured Big Data Appliance Schema-less Schema Information Acquire Organize Analyze © Copyright 2013. Apps Associates LLC. 48
  • 49. Oracle Big Data: Software Components Oracle Big Data Connectors Oracle NoSQL Database Open Source R Distribution Cloudera Manager & Cloudera’s Distribution Including Apache Hadoop Oracle Linux 5.6 and Java Hotspot VM Oracle Big Data Appliance © Copyright 2013. Apps Associates LLC. 49
  • 50. Oracle Big Data with Oracle Exadata © Copyright 2013. Apps Associates LLC. 50
  • 51. Mapping the Phases with Software Acquire Phase – Hadoop Distributed File System – Oracle NoSQL Database Organize Phase – Hadoop Software Framework – Oracle Data Integrator Analyze Phase – R Statistical Programming Environment – Oracle Data Warehouse © Copyright 2013. Apps Associates LLC. 51
  • 52. What Is a Key-Value Store? • A KV Store is essentially a two-column table consisting of a key and a value associated with the key • The key acts as the index, and the value can be referenced as a look up © Copyright 2013. Apps Associates LLC. 52
  • 53. What Is Oracle Direct Connector for HDFS? Oracle Direct Connector for HDFS (ODCH) is a connector which facilitates read access from HDFS to Oracle Database using external tables. • It uses the ORACLE_LOADER access driver • It enables you to: ‒ Access big data without loading the data ‒ Access the data stored in HDFS files ‒ Access CSV (comma-separated values) files and Data Pump files generated by Oracle Loader for Hadoop ‒ Load data extracted and transformed by Oracle Data Integrator © Copyright 2013. Apps Associates LLC. 53
  • 54. Analyze Phase Statistical Functions Analyze Database + Oracle R Enterprise Data Mining Algorithms Query Capabilities © Copyright 2013. Apps Associates LLC. 54
  • 55. What Is R? R is an open source statistical programming language and environment, which provides: • • • • • An easy-to-use language A powerful graphical environment for visualization Several out-of-the-box statistical techniques R packages Several GUI front ends for analyzing data interactively It was started in 1994 as an alternative to SAS, SPSS, and other statistical environments. R’s widespread use, breadth of functionality, and quality of implementation have enabled it to establish itself as a new statistical software standard. © Copyright 2013. Apps Associates LLC. 55
  • 56. Oracle Big Data: Software Components Oracle Big Data Connectors Oracle NoSQL Database Open Source R Distribution Cloudera Manager & Cloudera’s Distribution Including Apache Hadoop Oracle Linux 5.6 and Java Hotspot VM Oracle Big Data Appliance © Copyright 2013. Apps Associates LLC. 56
  • 59. DBA to Data Scientist Hadoop HDFS Map Reduce NoSQL Database Hive Pig OR All the above with Big Data Appliance © Copyright 2013. Apps Associates LLC. 59
  • 60. Oracle Big Data Solution Decide Oracle Real-Time Decisions Oracle Event Processing Apache Flume Oracle GoldenGate Stream Endeca Information Discovery Cloudera Hadoop Oracle BI Foundation Suite Oracle Database Oracle Big Data Connectors Oracle Advanced Analytics Oracle NoSQL Database Oracle R Distribution Oracle Data Integrator Oracle Spatial & Graph Acquire – Organize – Analyze © Copyright 2013. Apps Associates LLC. 60
  • 61. Intelligence By Variety © Copyright 2013. Apps Associates LLC. 61
  • 62. Connect with Us Web: www.appsassociates.com Email: satyendra.pasalapudi@appsassociates.com | satyendra.kumar@aioug.org YouTube: www.youtube.com/user/AppsAssociates LinkedIn: www.us.linkedin.com/company/apps-associates Twitter: @AppsAssociates Facebook: www.facebook.com/AppsAssociatesGlobal © Copyright 2013. Apps Associates LLC. 62