In this slidecast, Jim Kaskade from Infochimps presents: Cloud for Big Data.
"Infochimps was founded by data scientists and cloud computing experts. Our solutions make it faster, easier and far less complex to build and manage Big Data systems behind applications to quickly deliver actionable insights. With Infochimps Cloud, enterprises benefit from the fastest way to deploy Big Data applications in complex, hybrid cloud environments."
Learn more at:
http://infochimps.com
View the presentation video:
http://inside-bigdata.com/slidecast-cloud-for-big-data/
2. Agenda
• History of Infochimps
• Infochimps Cloud Services & Architecture
• Use Cases
• Cloud Deployment Models
• How to build Analytic Applications
6/7/13 Infochimps Confidential 2
3. 6/7/13 Infochimps Confidential 3
Big Data Is Not the Created
Content, nor Is It Even Its
Consumption
It Is the Analysis of All the Data
Surrounding or Swirling Around It
More Devices More Content
More Applications More Access
4. Your Application
+ =
Building Data Driven Applications
Non-Traditional
Data
Sources
Enterprise
Data
Sources
Data-Driven
Applications
5. What is the Infochimps Cloud?
The Infochimps Cloud for Big Data is the fastest,
easiest solution for performing data analytics at
unlimited scale for enterprise companies
Run in real-time
With Cloud::Streams
Run in Batch
With Cloud::Hadoop
Interact
With Cloud::Queries
6. Infochimps
Company
Milestones
Big
Data
Industry
Milestones
Univ
TX
Research
2004
Distributed
CompuBng
&
Data
AnalyBcs
2005
Google
Releases
Paper
on
MapReduce
2006
2007
2008
2009
2010
2011
2012
2013
Yahoo!
Creates
Hadoop
(Nutch)
Apache
Hadoop
0.10
Infochimps
Data
Marketplace
is
Born
Infochimps.org
Launched
(Hadoop-‐based)
Very
Large
Network
Research
Complete
Big
Data
Stack
Real-‐Time
to
Batch
Infochimps
Inc.
Data
Pla[orm
(Hadoop
+
NoSQL)
Infochimps
Big
Data
Cloud
V1.0
(Public)
OpenStack
Support
Storm
Support
Infochimps
Enterprise
Cloud
for
Big
Data
V2.0
(Virt
Private
&
Private)
VSphere
Support
Unified
Data
Processing
Framework
(Data
PaaS)
NoSQL
GeneraBon
(e.g.
CouchDB)
Hadoop
1.0
Apache
Release
Cloudera
Founded
Cassandra
Storm
Born
(Backtype)
MapR
Founded
Amazon
Web
Services
(IaaS)
AWS
EMR
Gartner
Defines
Big
Data
3
‘V’s
Hortonworks
Founded
Digital
Universe
Study
Data
2x
/
2
yrs
Vmware
SerengeB
(uses
Ironfan)
Ironfan
Released
Impala
1.0
Released
In
Big
Data
since
2005
9. Variety, Velocity, & Volume
LOGTXT
CSV
XML
HTTP
JSON
Input Data
Cloud::Streams
Your Application
Command Center
A
complete
managed
service
for
custom
analyBcs
in
the
public,
private,
or
hybrid
cloud.
Cloud::Queries
Cloud::Hadoop
15. Standard Reference Platform
HBase
Elastic-
search
Hadoop
Command
Center
Platform
API
Zabbix
Zookeepers Chef MySQLNFS
Backup
Scheduler
Deploy Pack
(Code Repository
+ Deploy Scripts)
Listener Queue
Storm
HTTP(S)
Syslog
Archive
Storage
StormAPI/Trident
Wukong
Archive Viewer
HadoopCL
PigWukong
HBaseAPI
ElasticsearchAPI
CommandCenter
PlatformAPI
Push to Storm
Push to Hadoop
CodeEditor
You
only
worry
about
a
Bny
part
of
the
overall
pla[orm.
19. The Cloud Value Proposition?
Infochimps has created an analytic
infrastructure that completely abstracts how
and where data analysis executes.
Your
data
scienBsts
focus
on
analyBcs
instead
of
infrastcture.
23. Public Cloud Virtual Private Cloud Private Cloud
IaaS
Develop & Test Locally with Wukong DSL &
Application Deploy Packs
Abstract to any cloud with Ironfan Orchestration
SaaS
Real-time
With Cloud::Streams
Batch
With Cloud::Hadoop
Interact
With Cloud::Queries
PaaS
24. 6/7/13 Infochimps Confidential 24
Deployment
Business
Discovery
Information
Discovery
Big Data
Architecture
✔ Identify Use-Cases
✔ Rank by Revenue Impact
✔ Data Sources
✔ Real-Time vs. Batch
✔ Stream Processing
✔ Ad-Hoc NoSQL
✔ Batch Analytics
Scope
Big Data
Application
End-to-End
Data Flow
Complete
Build-out
QA/Test
PerfTune
✔ Customer Repo Established
✔ Reference Design Launched
✔ Configure data pipeline
✔ All Data Sources
✔ Deploy / Iterate Analytics
✔ Final “Deploy Pack”
✔ Load Any Historic Data
✔ SLA Monitoring System
✔ Stage to Production
Deploy
Big Data
Application
Manage
Update
Expand &
Iterate
✔ Assigned Cust Srvc Rep
✔ 24x7x365 ‘Virtual NOC’
✔ Receive Deploy Pack Updates
✔ Stage to Production
✔ Existing Application Expansion
✔ Next Application Use-Case
✔ Self-Sufficient Customers
Drive New
Revenue
25. Broad Industry Application
25
25
Utilities
§ Weather impact analysis on
power generation
§ Transmission monitoring
§ Smart grid management
Retail
§ 360°
View
of
the
Customer
§ Click-‐stream
analysis
§ Real-‐Bme
promoBons
Law Enforcement
§ Real-time multimodal surveillance
§ Situational awareness
§ Cyber security detection
Transporta.on
§ Weather
and
traffic
impact
on
logisBcs
and
fuel
consumpBon
Financial
Services
§ Fraud
detecBon
§ Risk
management
§ 360°
View
of
the
Customer
IT
§ Transition log analysis
for multiple
transactional systems
§ Cybersecurity
Health
&
Life
Sciences
§ Epidemic early warning
system
§ ICU monitoring
§ Remote healthcare monitoring
Telecommunications
§ CDR processing
§ Churn prediction
§ Geomapping / marketing
§ Network monitoring