Mais conteúdo relacionado Semelhante a Silicon valley nosql meetup april 2012 (20) Mais de InfiniteGraph (20) Silicon valley nosql meetup april 20121. Maximize your Data with
Real-time Big Data Analytics
using NOSQL Technologies.
Silicon Valley NOSQL Meetup Group
Thursday, April 26, 2012 – Brian Clark
5/4/2012 © Objectivity Inc 2012 1
2. Agenda
• About me!
• Objectivity, Inc.
• NOSQL
• Big Data
• Use Cases
• InfiniteGraph and Objectivity/DB Overview
• Demo
• Q&A
5/4/2012 © Objectivity Inc 2012 2
3. School - The 3 R’s
•Reading
•wRiting
•aRithmetic
•I knew I was in trouble!
5/4/2012 © Objectivity Inc 2012 3
4. University - The 3 B’s
•Bands (Friday night Hop)
•Booze
•Birds
•I knew I was in trouble!
• = a job as a mainframe computer operator
5/4/2012 © Objectivity Inc 2012 4
5. A Brief History of Computing
Copyright © 2008 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
Make One Big Computer
1970s 1980s 1990s 2000s 2010s
Network Distributed NOWs Grid Cloud
Operating Operating & Computing Computing
Systems Systems
Clusters
5/4/2012 © Objectivity Inc 2012 5
8. A Brief History of Databases
Physical Many-to-many Physical Performance
pointers relationships, independence with
but still too rigid Complexity
SQL
and Scalability
Hierarchical Network Relational Object-
Model Model Model Oriented
1960’s 1960’s 1970’s 1990’s
5/4/2012 © Objectivity Inc 2012 8
9. Objectivity, Inc.
• The world today is about big data, distributed objects and
connections between them.
• Objectivity/DB™
Distributed big data and object management.
• InfiniteGraph™
Connects the dots on a global scale.
5/4/2012 © Objectivity Inc 2012 9
12. The Right Tool for the Right Job (1 of 2)
First, a truism: Relational Databases
• The closer the data model matches the data store • Data represented by rows (records) and columns
structure, the faster queries can be executed, the (attributes); a schema defines the columns and
higher the scalability, and the easier it is to write their distribution amongst tables.
applications.
• Versatile, can solve most data storage and access
• One size doesn’t fit all, and multiple tools might problems; can solve all if scale is limited.
join forces to fully solve a problem.
• Good for producing lists of data based on a value
in that data, such as a list of customers with
unfilled orders.
Hadoop/MapReduce Object Databases
• General purpose parallel processing and storing • Data represented by objects, which are groups of
facility for massive amounts of data. attributes; schema defines the attributes, which
may include pointers (relationships) to other objects
• Data store is a file system, not a database.
• Ability to store and retrieve whole objects makes
• Good for problems that can be broken into many access to set of data very fast; tighter connection to
small parts and processed independently, and object-oriented programming application reduces
done so offline, such as the ETL (extract, complexity.
transform, load) process for preparing and • Good for accessing massive amounts of data about
moving captured data into a data warehouse. related items, such as a user’s account history.
5/4/2012 © Objectivity Inc 2012 12
13. The Right Tool for the Right Job (2 of 2)
Key-Value Databases Column Family Databases
• Rows and columns like a relational database, but only 2 • Rows and columns like a relational database, but storage
columns, making it an indexing system (find a value based on disk is organized so as to make attributes (columns)
on the key) highly accessible without accessing the whole of the
associated record (row).
• No schema required, so the value could be anything, such
as an object or a pointer to data in another data store • Results in very fast actions regarding attributes, such as
calculating average age
• Very fast for indexing, such as looking up a user’s
shopping cart on an ecommerce site.
Document Databases Graph Databases
• Similar to object database, but without the need to • Similar to object database, but the objects and
predefine an object’s attributes (i.e., no schema relationships between them are all objects with their
required). own respective sets of attributes.
• Provides flexibility to store new types or unanticipated • Enables very fast queries when the value of the data is in
sizes of data/objects during operation, on the fly, such as the relationships, i.e. relationships between
event logging where the data format is unpredictable and people/items
not just simple text (e.g., video). • Are two people/items related (even if separated by
several levels of relationship)?
• Where the relationships represent costs, what is the
optimal combination of groups of people/items?
5/4/2012 © Objectivity Inc 2012 13
15. Big Data
• Volume
• Velocity
• Variety
= VALUE!
Requires new ways of thinking – distributed data and processing
5/4/2012 © Objectivity Inc 2012 15
16. Parallel Processing and Storage
Apache HADOOP InfiniteGraph
• Map/Reduce • Distributed processing
- Peer-to-peer servers and
– Distributed processing. clients anywhere in the
network.
• HDFS
• Distributed data
– Distributed file system. - Federation of databases
• HBase anywhere in the network.
– Distributed storage for
• Standard filesystem
- Random I/O for fast
large tables. navigational queries.
• Cassandra • Single logical view of all
– Multi-master database with data in the federation
- Any client anywhere can
no single point of failure. access server anywhere.
5/4/2012 © Objectivity Inc 2012 16
17. Common Big Data Architecture
Data Aggregation & Application Analytics
Commodity Linux Clusters or
High Performance Compute platforms
Data Column Graph Object Hadoop Key-Value Document
RDBMS
Warehouse Stores DB DB BigTable Stores DB
Structured Semi-structured Un-structured
5/4/2012 © Objectivity Inc 2012 17
18. Common Big Data Architecture
Visualization
Other Front End
and Analytics RDBMS Hadoop Raw Data
stores Processing
tools
Act Decide Orient Observe
The strategic competitors are all moving in this direction for Big Data
5/4/2012 © Objectivity Inc 2012 18
19. Big Data Analytics Solutions
EMC
Greenplum
Data Analytics Greenplum Data
Applications Greenplum Integration Raw Data
Hadoop
Accelerator
IBM
Infosphere Infosphere IBM Front End
BigInsights DB2 Processing Raw Data
Warehouse Hadoop
Oracle
Oracle In- Oracle
Database Oracle Oracle Cloudera Data Raw Data
Analytics 11g NoSQL Hadoop Integrator
HP
Vertica Front End
Autonomy Raw Data
Database Processing
5/4/2012 © Objectivity Inc 2012 19
20. Big Data Landscape
• All current solutions have the same basic architecture model.
• None of the current solutions have a way to store connections
between entities in the different silos.
– Analytics today focuses on the nodes of data (quantifiable occurrences)
rather than the relevant connections or edges between the nodes
(qualitative occurrences).
• Objectivity has a proven way to efficiently store, manage and
query the relationships and connections between data.
5/4/2012 © Objectivity Inc 2012 20
21. Disruptive Big Data New Architecture
The Proven Connection Store
Objectivity/DB and/or InfiniteGraph Raw Data
Visualization
and Analytics
tools
Other Front End
RDBMS Hadoop Processing Raw Data
stores
Represents data Represents bidirectional
nodes relationships/connections
between data.
5/4/2012 © Objectivity Inc 2012 21
22. Why We’re Different
• Relational databases are not optimized to understand
objects or connections.
• Objectivity/DB™ is all about objects and relationships.
• InfiniteGraph™ is all about the connections as first class
citizens.
5/4/2012 © Objectivity Inc 2012 22
24. Relationships are everywhere
Network Intelligence
CRM, (Government&
Sales & Mgmt,
Telecom Business)
Marketing
PLM
(Product
Lifecycle
Mgmt)
Finance
Healthcare
Social Logistics Master Data Research:
Networks Management
Genomics
5/4/2012 © Objectivity Inc 2012 24
25. Financial Services
Fraud Detection
– Problem: Detect patterns of
fraudulent activities before damage is
done
– Solution: Real-time identification of
inconsistencies enables
instantaneous notification to security
systems
– Results:
• Improved banking security and
client confidence
• Reduction of lost revenues
• Improved efficiency allows fraud-
detection teams to develop and
deploy additional services
5/4/2012 © Objectivity Inc 2012 25
26. Application Development
The “Facebook” For Education
– Problem: Develop system capable
of handling exponential user- base
growth
– Solution: Leverage InfiniteGraph’s
scalability and performance to
support real-time relationship
information between all members
and to act as primary DB for all
topics and users
– Results: Complete social
networking site allowing global
users to access courses from
leading institutions & to collaborate
effectively with other students and
teachers
5/4/2012 © Objectivity Inc 2012 26
27. Use Case – Confidential Ad Placement Network
• Ad placement on smart phone based on user profile and
location data generated by opt-in application (e.g., a free
game).
• Location data captured and distilled by Cassandra (key-
value/column family hybrid database).
• Locations matched with geospatial data to refine user interests.
• As ad placement orders arrive, InfiniteGraph matches groups
of users with ads, maximizing relevance for the user, value for
the advertiser and revenue for the ad placement company.
5/4/2012 © Objectivity Inc 2012 27
28. Government
Broad Area Maritime
Surveillance UAS
– Problem: Monitor potential threats
across open oceans and remote
areas on a 24/7 basis
– Solution: Use Objectivity/db to
develop a system for unmanned
aircraft to capture and transmit real-
time data of any type for analysis and
sharing
– Results: A federated view of
maritime surveillance and continuous
reconnaissance capability for
mission, reconnaissance, and
communications assessments
5/4/2012 © Objectivity Inc 2012 28
29. Healthcare
Bring together doctors, patients, and their
records
– Problem: As patients move between doctors,
manage their records globally to better
capture and understand symptoms, causes,
and interdependencies and to improve
diagnoses
– Solution: Create a database using
Objectivity/db and InfiniteGraph capable of
managing real-time entries of patient visits,
symptoms, diagnoses, reactions to
medications, and progress
– Results:
• Improved times to more accurate
diagnoses
• Creation of a knowledge base of similar
medical cases
• Increase success rates of initial
prescriptions based on historical
recommendations
5/4/2012 © Objectivity Inc 2012 29
30. Network Centric Collaborative Targeting
Team: Objectivity, L-3, and Lockheed
U.S. Air Force’s Network Centric Collaborative Targeting (NCCT)
U.S. Navy’s Cooperative Engagement Capability (CEC) system.
30
5/4/2012 © Objectivity Inc 2012 30
31. NCCT - Customer Challenge
Silo’d systems with
individual reports
did not provide
solutions
Time sensitive targets were hard to find
Sensors operated as independent systems
The performance of each individual sensor is very good ( great
ears and eyes) but collectively lack a central nervous system
Mountains of Data are coming from sensors
Existing sensors alone cannot reliably find highly mobile, moving
and/or spoofing targets
5/4/2012 © Objectivity Inc 2012 31
32. NCCT - Technical Solution Architecture
1. Build a distributed
systems that could
support multi-agency
platform requirements
2. Collect data from any
number of high volume
sources
3. Provide a data
architecture that
supported the need to
correlate and fuse data
collection for a single
view of the targets
4. Support a near real-time
data reporting C4ISR
system
5/4/2012 © Objectivity Inc 2012 Company Confidential 32
33. Intelligence - Customer Need
Collect 400,000,000 phone
calls, plus address, emails,
meetings….
Finding the links between callers
Deliver all the possible connections
between them in seconds
5/4/2012 © Objectivity Inc 2012 33
34. Intelligence Problem - Performance
With a relational product:
Initial attempts to traverse links across the database literally shut
down the server.
After much server and database optimization a process could be run
on a single query and would produce a result over a 48 hour period.
Results were unacceptable…..
With Objectivity:
The many-to-many data application was an excellent fit for Objectivity.
We then developed a proof-of-concept that delivered showing 5-6
degrees of separation within about 1 minute, running on a laptop
computer
5/4/2012 © Objectivity Inc 2012 34
36. What is a graph database?
• Optimized around data relationships
– Relationships as first class citizens
– Super fast traversal between entities
– Rich/flexible annotation of connections
• Small focused API (typically not SQL)
– Natively work with concepts of Vertex/Edge
– SQL has no concept of “navigation”
• Graphs grow quickly e.g.
– Billions of phone calls / day in US
– Emails, social media events, IP Traffic
– Financial transactions
• Some analytics require navigation of large sections of the graph
• Each step (often) depends on the last
• Must distribute data and go parallel
5/4/2012 © Objectivity Inc 2012 36
37. Database Data Representation
• Traditional databases are good at recording things, not events
or relationships
Rows/Columns/Tables Relationship/Graph Optimized
Meetings
Met
Alice
P1 P2 Place Time 5-27-10
Alice Bob Denver 5-27-10 Charlie
Calls
From To Time Duration
Called Called
Bob Carlos 13:20 25 Bob
13:20 17:10
Bob Charlie 17:10 15
Payments
From To Date Amount
Paid
Carlos Charlie 5-12-10 100000 Carlos
100000
5/4/2012 © Objectivity Inc 2012 37
38. Viewing the Data
The InfiniteGraph Visualizer will need this name to display the contents of the
graph database.
5/4/2012 © Objectivity Inc 2012 38
39. ™
InfiniteGraph
• Connects the dots on a global scale.
• InfiniteGraph™ finds connections in big data.
5/4/2012 © Objectivity Inc 2012 39
40. Find Answers Faster with InfiniteGraph™
Distributed Graph Database
5/4/2012 © Objectivity Inc 2012 40
41. InfiniteGraph’s Unique Advantages
• Supports large scale and distributed systems.
• Proven technology and deployments.
• Flexible and Easy:
• Distributed and cloud ready, Java on interoperable platforms, integrates
with most other data stores, supports ACID to flexible modes.
5/4/2012 © Objectivity Inc 2012 41
42. InfiniteGraph Basic Architecture
User Apps
Blueprints
InfiniteGraph - Core/API
Management Navigation Session / TX
Placement Configuration
Extensions Execution Management
Distributed Object and Relationship Persistence Layer
5/4/2012 © Objectivity Inc 2012 42
43. InfiniteGraph Features
• Distributed parallel ingest.
• Flexible distributed storage management.
• Node naming and indexing for fast lookup.
• User controlled navigational queries – using node and edge
filters.
• Navigator plug-in architecture for sharing plug-ins with the
visualizer.
• InfiniteGraph Visualizer.
• Blueprints support via Gremlin
5/4/2012 © Objectivity Inc 2012 43
44. Objectivity/DB Basic Architecture
User Application
C#/.NET
Java API Python API
ULB
C++ Public API
Objy Kernel
I/O Manager
Page Server
Lock Server Query Server
(AMS)
5/4/2012 © Objectivity Inc 2012 44
45. Distributed Data /Processing
Distributed Federated Persistent Store
Network
Scale Out
Scale Out
SAN
Distributed Data Management
Federated Data Management
Single Logical View
All clients and servers see all data.
5/4/2012 © Objectivity Inc 2012 45
46. Distributed Data Architecture
Federation
(schema &
64 Bit OID (Object ID)
catalog)
#21538 - 1874 - 9638 - 164
Container Container
Database 64K
64K Database Page
Container
Slot
Container Container
• 1,000’s trillions of unique objects
• 1,000’s petabytes storage
• Logical/physical indirection at every segment
• Resolving ID fast regardless of number of objects
5/4/2012 © Objectivity Inc 2012 46
47. Distributed Processing Architecture
Simple, Distributed Servers
Client
Lock Servers
Lock Servers
Cache
Application
Objectivity/DB
Data Servers
Query Agents
Data Servers
Data Servers
Put the data and processing where it’s needed
5/4/2012 © Objectivity Inc 2012 47
48. Flexibility – language interoperability
Java App C++ App C# App Python App
Objectivity/DB Objectivity/DB Objectivity/DB Objectivity/DB
A B C D E F
5/4/2012 © Objectivity Inc 2012 48
51. Comprehensive Online Resources
InfiniteGraph
Developer Wiki
Product Google Group
Documentation for Developers
InfiniteGraph.com
Download (main site,
Our Blog
InfiniteGraph content and
messaging)
5/4/2012 © Objectivity Inc 2012 51
52. Company Snapshot
• Established in 1988
Corporate
• Headquartered in Sunnyvale, California
• NOSQL platform for managing and discovering relationships between complex data
• Objectivity/DB™: Object-oriented data management system that manages localized, centralized
Products or distributed databases
• InfiniteGraph™: New massively scalable graph database that enables organizations to find,
store, and exploit the relationships hidden in their data
• Big Data Market forecasted to be $11.6B in 2012, with CAGR of 28.0% over the next 5 years
Market • 40% per year data growth, cloud adoption, mobile usage and improved real-time, predictive
Opportunity analytics underpin Objectivity’s growth opportunities
• Strategically positioned as key Big Data enabler that pulls through servers, DBs and file stores
• Deeply embedded in nearly 90 enterprises and government organizations
Customers • Competitive advantages in Big Data with strong IP and patent position
• Growing pipeline of near-term opportunities across expanding use cases
• Generating increased revenues in last twelve months
Financials &
• Profitable and cash flow positive; no debt
Ownership
• Ownership: Privately held by employees and venture investors
5/4/2012
52 © Objectivity Inc 2012
53. Brian Clark
VP Product Marketing, Objectivity Inc.
http://www.infinitegraph.com
http://www.objectivity.com
5/4/2012 © Objectivity Inc 2012 53
Notas do Editor Please see additional presentation on overview on Big Data Analytics Landscape for references (separate attachment) CUNA mutual – social CRM application to help sell financial products Broad Area Maritime Surveillance Unmanned Aircraft System. Key points on what relationship analytics is: Discovering a relationship between two data nodesGenerally via several degrees of separationEndpoints typically represent “targets”Links or associations between targets form pathsLinks may be phone calls, transactions, meetings etc Objectivity/DB supports major object languages such as Java, C++, C# .NET, Python and SQL++. Objects created in any supported language can be accessed by any other supported language. For instance, a high performance data ingest application can be written in C++, and these objects can be accessed by a GUI application written in Java.Objectivity/DB runs on many different platforms including Windows, Linux, other major Unix platforms and even real time operating systems. Data written on any platform can be accessed from any other supported platform.Objectivity/DB supports major object languages such as Java, C++, C# .NET, Python and SQL++. Objects created in any supported language can be accessed by any other supported language. For instance, a high performance data ingest application can be written in C++, and these objects can then be accessed by a GUI application written in Java.Python can be used to quickly develop new tools and utilities and prototype new algorithms.SQL++ supports access via ODBC compliant tools such as Microsoft Access.Objectivity/DB runs on many different platforms including Windows, Linux, other major Unix platforms and real time operating systems. Data written on any platform can be accessed from any other supported platform. Objectivity/DB transparently handles any necessary data conversions.You can preserve your investment in older languages and platforms while upgrading to new languages and platforms.Dynamic Schema Evolution supports changing the language class definitions and recompiling the application with transparent migration of the objects, or the developer can use an Objectivity product, Active Schema, to dynamically create and modify class definitions and object instances, or the developer can even implement a meta-schema (sometimes called a schema of schema).All this allows a system to change to keep up with the dynamically changing distributed real world. Big Data Market forecast from JMP Securities Industry Overview (11-15-11) page 1 of 6. Objectivity, Inc. – last 12 month growth Jan-Dec 2011 increased by 45% from 2010Profitable – 7 of last 10 yrs.