SlideShare uma empresa Scribd logo
1 de 5
Baixar para ler offline
Discovery cannot know all the data relationships in advance.
The essence of discovery is to identify and create relationships dynamically as new data
sources are added. Traditional analytics tools fail because they are schema based, forcing
time-consuming and error-prone schema extension for each new data source.
The Urika-GD appliance addresses this challenge with a schema-free graph database. New
sources of structured, semistructured and unstructured data can be incrementally fused without
upfront modeling. The system’s graph database also provides a range of powerful capabilities for
querying and reasoning about relationships that are unavailable in traditional tools.
Discovery cannot know all the questions to ask in advance.
Data discovery is an iterative, real-time process, where the answers to the first set of questions
determine the next questions to ask. Traditional analytics tools fail because their performance
depends upon optimizing the data model for specific questions.
The Urika-GD appliance addresses this challenge with a purpose-built hardware accelerator
leveraging massive multithreading technology, capable of returning real-time results to
the most complex, ad-hoc questions as dataset sizes grow. Furthermore, this technology
supports the breadth of analytic and visualization approaches required for rapid exploration
of big data.
Discovery doesn’t access data in a predictable pattern.
Data access during discovery follows no predictable pattern, accessing data virtually
randomly from anywhere in the massive trove of data. Traditional analytics tools fail
because they depend upon effective partitioning of the data across a computing cluster,
which requires advance knowledge of data access patterns.
www.cray.com
The Cray®
Urika-GD™
appliance is built for the challenges of the discovery process, transforming
massive amounts of seemingly unrelated data into relevant insights. With its scalable-shared
memory architecture, the Urika-GD appliance can surface hidden relationships and non-obvious
patterns in big data with unmatched speed and simplicity, facilitating breakthroughs that can
give your enterprise a measurable competitive advantage. Why is such an appliance necessary?
Because traditional data analysis tools don’t have the capability to effectively perform real-time
data discovery:
“[The] Urika appliance
offers organizations an
extremely fast, effective
means of implementing
discovery analytics through
a graph solution that
includes a graph database.
(“Discovering Big Data’s Value with Graph
Analytics,” Enterprise Strategy Group)
Cray:
Enabling Real-Time
Discovery in Big Data
Discovery is the process of gaining valuable insights into the world around us
by recognizing previously unknown relationships between occurrences, objects
and facts. This was traditionally a manual process, but big data has accelerated
discovery by enabling the formulation and validation of theories electronically,
through a collaboration between person and machine in areas as diverse as
fraud and risk analysis, drug discovery, personalized healthcare, micro-targeted
marketing campaigns, cybersecurity, law enforcement and counterterrorism
operations. Data discovery enables organizations to identify connections
between disparate pieces of data: to find not just a single “needle in a needle
stack,” but hundreds — and the invisible threads that link them together.
Product Brief | Urika-GD
The Urika-GD appliance addresses this challenge with a data model held entirely in a large
memory of up to 512 TB, shared by up to 8,192 multithreaded processors with 128 hardware
threads apiece. The large shared memory avoids the need for the data partitioning required
by distributed memory systems. Furthermore, hardware multithreading enables the processors to
tolerate memory access latency, delivering excellent scaling and real-time performance as
dataset sizes and processor counts increase.
The system complements existing data warehouses and Hadoop®
clusters by offloading
discovery analytics, while still interoperating with the existing analytics workflow. Subscription
pricing for on-premise deployment eases the adoption of the Urika-GD platform into existing
IT environments.
The Urika-GD platform can fulfill the true promise of big data in your organization. What will
you discover?
The Power of Graphs for Discovery Analytics
Graphs are the ideal data model for discovery analytics, and The Urika-GD system’s unique
hardware architecture enables graphs to scale to handle big data.
A graph is a data structure capable of representing any kind of data in an accessible way.
Fundamentally, a graph is an abstract representation of a set of objects where some pairs
of the objects are connected by links. A graph consists of “nodes” and “edges” and is typically
depicted in diagrammatic form as a set of dots for the nodes, joined by lines or curves
for the edges. A node represents an entity (a person or thing) and an edge a relationship.
Graphs provide a holistic view of the relationships in which an entity participates.
Graphs represent the relationships between entities directly, allowing data from multiple
structured, semistructured and unstructured sources to be fused without complex,
time-consuming and error prone modeling or schema definition. Furthermore, graph
databases support a variety of sophisticated, ad-hoc analytic techniques, specifically
designed to surface previously unknown patterns of relationships.
Achieving the real-time performance required for collaborative data discovery imposes
certain architectural requirements:
Graphs must not be partitioned.
Discovery analytics involve following the edges (the relationships) in the graph. It is an
in-memory problem: The iterative nature of discovery dictates that the edges to be traversed
will not be known in advance, so strategies based on pre-fetching from disk will not work.
Similarly, partitioning the graph across a distributed memory compute cluster will result a
large number of edges crossing cluster node boundaries, requiring time-consuming network
transfers. Compared to local memory, even a fast commodity network such as 10 gigabit
Ethernet is at least 100 times slower at transferring data. Users gain a significant processing
advantage on discovery problems if the entire graph is held in a large shared memory.
“IT organizations faced
with previously infeasible
graph-style discovery
problems may succeed
using a focused solution
like Urika.”
(“Urika Shows Big Data is More than
Hadoop and Data Warehouses,”
Carl Claunch, Gartner)
www.cray.com
Figure 1. High cost to follow relationships that
span cluster nodes.
1
Product Brief | Urika-GD
Graphs are not predictable.
Analyzing relationships in large datasets requires the examination of multiple, competing
alternatives. These memory accesses are very data dependent and eliminate the ability to
apply traditional performance improvement techniques such as pre-fetching and caching,
with the result that the processor sits idle most of the time waiting for delivery of data.
Using multithreading technology can help alleviate this problem. Threads can explore
different alternatives and each thread can have its own memory access. As long as the
processor supports a sufficient number of hardware threads, it can be kept busy. Given
the highly nondeterministic nature of graphs, a massively multithreaded architecture enables
a tremendous performance advantage through the concurrent investigation of multiple,
changing hypotheses.
Graphs are highly dynamic.
Graph analytics for discovery involves examining the relationships and correlations between
multiple datasets and, consequently, requires loading many large, constantly changing datasets
into memory. The sluggish speed of I/O systems — 1,000 times slower compared to the
CPU — translates into graph load and modification times that can stretch into hours or days.
This is longer than the time required for running analytics. In a dynamic, real-time enterprise
with constantly changing data, a scalable processing infrastructure enables a tremendous
performance advantage for discovery.
The Urika-GD System: Purpose Built
1. Large, global shared memory whose architecture can scale up to 512 TB, enables uniform,
low-latency access to all of the data in the graph, with no need to consider partitioning,
layout in memory or memory access patterns, eliminating the delays associated with
100 times slower network access.
2. Threadstorm™
massively multithreaded hardware accelerator supports 128 hardware
threads in a single processor (65,000 threads in a 512 processor system and over 1 million
threads at the maximum system size of 8,192 processors), which enables global access
of multiple, random dynamic memory references simultaneously without pre-fetching or
caching, and eliminates the waits caused by memory speed lagging the processor.
3. Highly scalable I/O gets data into and out of the Urika-GD appliance with transfer rates
of up to 350 TB/hr, allowing diverse data sets to be dynamically added and alleviating the
problem associated with storage I/O being 1,000 times slower than memory I/O.
www.cray.com
Figure 2. High cost to follow multiple competing
alternatives which cannot be pre-fetched/cached.
Figure 3. High cost to load multiple, constantly
changing datasets into in-memory graph models.
2
3
The Urika-GD
appliance is built
for data discovery,
integrating hardware,
software and storage.
Its hardware delivers
three key innovations
to analyze the largest
datasets in real time:
Product Brief | Urika-GD
The Urika-GD system’s software stack consists of the Graph Analytics Database
and the Graph Analytics Tools
The Graph Analytics Database is a high performance, in-memory, W3C standards-compliant
implementation of an RDF (resource description framework) quad store. It can be queried
using SPARQL 1.1 (SPARQL Protocol and RDF Query Language), providing sophisticated
pattern matching and dynamic data update capabilities. The database has been carefully
tuned to the hardware and can deliver performance that is orders of magnitude better than
alternatives. The expressivity of SPARQL has been further extended with support for a range
of whole graph algorithms, opening the door to new discovery approaches. For example,
consider how “shortest path” analysis aids in understanding how two entities are related,
while “community detection” and “between-ness centrality” enable identification of cliques
and influencers for highly targeted marketing applications.
The Graph Analytics Tools provide a comprehensive, simple and familiar set of management
tools for the appliance and database, security and the data pipeline. Appliance management
is provided through a comprehensive set of Linux-based tools, giving the Urika-GD appliance
the appearance of a Linux server. The Graph Analytics Manager performs database
management and is designed to provide a familiar environment to database administrators.
Easy to Deploy
The Urika-GD system hardware consists of a standard blade configuration, low-power profile
and air-cooled cabinet, making it easy to deploy in enterprise datacenters. It connects easily to
standard networking and storage solutions. Installation is quick, and users can load data and
perform graph analytics immediately.
The Graph Analytics Tools provide a familiar environment to systems and database
administrators. Existing skill sets are sufficient to provision, manage, secure and monitor
the Urika-GD appliance.
www.cray.com
Figure 4.
Graph Analytics Database
RDF datastore and SPARQL engine
Graph Analytics Platform
Shared-memory, Multithreaded, scalable I/O graph-optimized hardware
LINUX
RDF and SPARQL
Graph Analytics Application Services
Management, security, data pipeline
4
Product Brief | Urika-GD
Easy to Integrate
The Urika-GD appliance augments existing analytics environments and easily integrates with
data warehouses, Hadoop®
, other big data appliances and visualization solutions for a rapid
return on investment (Figure 5). A connector model extracts data and returns results, enabling
enterprises to offload graph workloads to an appliance specifically designed for the task.
The Urika-GD system also provides a graph database endpoint accessible through an open,
industry-standard interface, easing integration for in-house application developers.
Surface unknown linkages or non-obvious patterns without advance knowledge
of relationships in the data
Shared memory model enables uniform, low-latency access to all the data regardless of data
partitioning, layout or access pattern
Investigate multiple, changing hypotheses in real time simultaneously
Hardware accelerator enables global access of multiple, random, dynamic memory references
in parallel without pre-fetching/caching
Gain new insights by easily fusing diverse data sets without upfront modeling
and independent of linkage
In-memory graph analytics database enables merging of structured/semistructured/
unstructured data without schemas/layouts and querying data without prespecifying the
connections between the data
www.cray.com
Application
User
Visualization
Load
Graph
Datasets
Share Analytic/
Relationship
Results
Data
Warehouse
Other
Big Data
Appliances
EXISTING ANALYTIC
ENVIRONMENTS
D A TA S O U R C E S
Appl
A
Figure 5. The Urika-GD appliance complements
existing data warehouse/Hadoop environments.
5
The Urika-GD platform’s support
for industry standards, including
Linux, Java/J2EE, JDBC, RDF
and SPARQL, enables enterprises
to leverage existing IT skill sets
and expertise to solve big data
discovery problems while
avoiding vendor lock-in.
Product Brief | Urika-GD
About Urika-GD The Urika-GD big data appliance for graph analytics helps enterprises gain key insights by discovering
relationships in big data. Its highly scalable, real-time graph analytics warehouse supports ad hoc queries, pattern-based
searches, inferencing and deduction. The Urika-GD appliance complements an existing data warehouse or Hadoop®
cluster
by offloading graph workloads and interoperating within the existing analytics workflow.
About Cray Global supercomputing leader Cray Inc. provides innovative systems and solutions enabling scientists and
engineers in industry, academia and government to meet existing and future simulation and analytics challenges. Leveraging
more than 40 years of experience in developing and servicing the world’s most advanced supercomputers, Cray offers a
comprehensive portfolio of supercomputers and big data storage and analytics solutions delivering unrivaled performance,
efficiency and scalability. Go to www.cray.com for more information.
©2014 Cray Inc. All rights reserved. Specifications subject to change without notice. Cray is a registered trademark and
Urika-GD is a trademark of Cray Inc. All other trademarks mentioned herein are the properties of their respective owners.
20140915

Mais conteúdo relacionado

Mais procurados

BigData Behind-the-Scenes~20150827
BigData Behind-the-Scenes~20150827BigData Behind-the-Scenes~20150827
BigData Behind-the-Scenes~20150827
Anthony Potappel
 
Bigdata
BigdataBigdata

Mais procurados (19)

Big data-ppt
Big data-pptBig data-ppt
Big data-ppt
 
Unit i big data introduction
Unit  i big data introductionUnit  i big data introduction
Unit i big data introduction
 
BigData Behind-the-Scenes~20150827
BigData Behind-the-Scenes~20150827BigData Behind-the-Scenes~20150827
BigData Behind-the-Scenes~20150827
 
Data science
Data scienceData science
Data science
 
Kurukshetra - Big Data
Kurukshetra - Big DataKurukshetra - Big Data
Kurukshetra - Big Data
 
BigData Analysis
BigData AnalysisBigData Analysis
BigData Analysis
 
A Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data ScienceA Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data Science
 
The book of elephant tattoo
The book of elephant tattooThe book of elephant tattoo
The book of elephant tattoo
 
U0 vqmtq3m tc=
U0 vqmtq3m tc=U0 vqmtq3m tc=
U0 vqmtq3m tc=
 
Big Data
Big DataBig Data
Big Data
 
Bigdata
BigdataBigdata
Bigdata
 
Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentals
 
Ijariie1184
Ijariie1184Ijariie1184
Ijariie1184
 
Data mining
Data miningData mining
Data mining
 
DOCUMENT SELECTION USING MAPREDUCE
DOCUMENT SELECTION USING MAPREDUCEDOCUMENT SELECTION USING MAPREDUCE
DOCUMENT SELECTION USING MAPREDUCE
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Introduction to Data Mining, Business Intelligence and Data Science
Introduction to Data Mining, Business Intelligence and Data ScienceIntroduction to Data Mining, Business Intelligence and Data Science
Introduction to Data Mining, Business Intelligence and Data Science
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Bigdata
Bigdata Bigdata
Bigdata
 

Destaque

esmaeili-2016-ijca-911399
esmaeili-2016-ijca-911399esmaeili-2016-ijca-911399
esmaeili-2016-ijca-911399
Nafas Esmaeili
 
Empowering Parliamentarians to break the Cycle of Corruption-opinion piece He...
Empowering Parliamentarians to break the Cycle of Corruption-opinion piece He...Empowering Parliamentarians to break the Cycle of Corruption-opinion piece He...
Empowering Parliamentarians to break the Cycle of Corruption-opinion piece He...
John Hyde
 
Corporate brochure 2015 web
Corporate brochure 2015 webCorporate brochure 2015 web
Corporate brochure 2015 web
Lynda Perry
 
ASPG 2005 07 -Parliamentary Oversight from the Parliament's Perspective- Hyde
ASPG 2005 07 -Parliamentary Oversight from the Parliament's Perspective- HydeASPG 2005 07 -Parliamentary Oversight from the Parliament's Perspective- Hyde
ASPG 2005 07 -Parliamentary Oversight from the Parliament's Perspective- Hyde
John Hyde
 
ở đâu làm video quảng cáo giá tốt
ở đâu làm video quảng cáo giá tốtở đâu làm video quảng cáo giá tốt
ở đâu làm video quảng cáo giá tốt
gary438
 
Final assignment Phase3
Final assignment Phase3Final assignment Phase3
Final assignment Phase3
Arlo Duncan
 
bảng giá thiết kế clip quảng cáo uy tín
bảng giá thiết kế clip quảng cáo uy tínbảng giá thiết kế clip quảng cáo uy tín
bảng giá thiết kế clip quảng cáo uy tín
zackary636
 
Product_Brochure_13 (1)
Product_Brochure_13 (1)Product_Brochure_13 (1)
Product_Brochure_13 (1)
Lynda Perry
 
SPC 11 John Hyde_Presentation_Seoul.10.13
SPC 11 John Hyde_Presentation_Seoul.10.13SPC 11 John Hyde_Presentation_Seoul.10.13
SPC 11 John Hyde_Presentation_Seoul.10.13
John Hyde
 
The Journal of Head Trauma Rehabilitation 2008 Niedzwecki
The Journal of Head Trauma Rehabilitation 2008 NiedzweckiThe Journal of Head Trauma Rehabilitation 2008 Niedzwecki
The Journal of Head Trauma Rehabilitation 2008 Niedzwecki
Christian Niedzwecki
 

Destaque (17)

esmaeili-2016-ijca-911399
esmaeili-2016-ijca-911399esmaeili-2016-ijca-911399
esmaeili-2016-ijca-911399
 
Empowering Parliamentarians to break the Cycle of Corruption-opinion piece He...
Empowering Parliamentarians to break the Cycle of Corruption-opinion piece He...Empowering Parliamentarians to break the Cycle of Corruption-opinion piece He...
Empowering Parliamentarians to break the Cycle of Corruption-opinion piece He...
 
Corporate brochure 2015 web
Corporate brochure 2015 webCorporate brochure 2015 web
Corporate brochure 2015 web
 
ASPG 2005 07 -Parliamentary Oversight from the Parliament's Perspective- Hyde
ASPG 2005 07 -Parliamentary Oversight from the Parliament's Perspective- HydeASPG 2005 07 -Parliamentary Oversight from the Parliament's Perspective- Hyde
ASPG 2005 07 -Parliamentary Oversight from the Parliament's Perspective- Hyde
 
Western Copper and Gold - Corporate Presentation April 2015
Western Copper and Gold - Corporate Presentation April 2015 Western Copper and Gold - Corporate Presentation April 2015
Western Copper and Gold - Corporate Presentation April 2015
 
ở đâu làm video quảng cáo giá tốt
ở đâu làm video quảng cáo giá tốtở đâu làm video quảng cáo giá tốt
ở đâu làm video quảng cáo giá tốt
 
Final assignment Phase3
Final assignment Phase3Final assignment Phase3
Final assignment Phase3
 
portfolio
portfolioportfolio
portfolio
 
Estacion 4
Estacion 4Estacion 4
Estacion 4
 
Western Copper and Gold Corporate Presentation July 2015
Western Copper and Gold Corporate Presentation July 2015Western Copper and Gold Corporate Presentation July 2015
Western Copper and Gold Corporate Presentation July 2015
 
Managing Your Data/ Information
Managing Your Data/ InformationManaging Your Data/ Information
Managing Your Data/ Information
 
bảng giá thiết kế clip quảng cáo uy tín
bảng giá thiết kế clip quảng cáo uy tínbảng giá thiết kế clip quảng cáo uy tín
bảng giá thiết kế clip quảng cáo uy tín
 
Mining_booklet_2015
Mining_booklet_2015Mining_booklet_2015
Mining_booklet_2015
 
Product_Brochure_13 (1)
Product_Brochure_13 (1)Product_Brochure_13 (1)
Product_Brochure_13 (1)
 
Western Copper and Gold Corporate Presentation - June 2015
Western Copper and Gold Corporate Presentation - June 2015Western Copper and Gold Corporate Presentation - June 2015
Western Copper and Gold Corporate Presentation - June 2015
 
SPC 11 John Hyde_Presentation_Seoul.10.13
SPC 11 John Hyde_Presentation_Seoul.10.13SPC 11 John Hyde_Presentation_Seoul.10.13
SPC 11 John Hyde_Presentation_Seoul.10.13
 
The Journal of Head Trauma Rehabilitation 2008 Niedzwecki
The Journal of Head Trauma Rehabilitation 2008 NiedzweckiThe Journal of Head Trauma Rehabilitation 2008 Niedzwecki
The Journal of Head Trauma Rehabilitation 2008 Niedzwecki
 

Semelhante a Urika-GD Product Brief Online 5-page

Unit i introduction to grid computing
Unit i   introduction to grid computingUnit i   introduction to grid computing
Unit i introduction to grid computing
sudha kar
 
MasterClass Series: Unlocking Data Sharing Velocity with Data Virtualization
MasterClass Series: Unlocking Data Sharing Velocity with Data VirtualizationMasterClass Series: Unlocking Data Sharing Velocity with Data Virtualization
MasterClass Series: Unlocking Data Sharing Velocity with Data Virtualization
Denodo
 
using big-data methods analyse the Cross platform aviation
 using big-data methods analyse the Cross platform aviation using big-data methods analyse the Cross platform aviation
using big-data methods analyse the Cross platform aviation
ranjit banshpal
 
Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...
Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...
Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...
YogeshIJTSRD
 

Semelhante a Urika-GD Product Brief Online 5-page (20)

An Efficient Approach for Clustering High Dimensional Data
An Efficient Approach for Clustering High Dimensional DataAn Efficient Approach for Clustering High Dimensional Data
An Efficient Approach for Clustering High Dimensional Data
 
Grid computing 12
Grid computing 12Grid computing 12
Grid computing 12
 
Unit i introduction to grid computing
Unit i   introduction to grid computingUnit i   introduction to grid computing
Unit i introduction to grid computing
 
Data-Intensive Technologies for Cloud Computing
Data-Intensive Technologies for CloudComputingData-Intensive Technologies for CloudComputing
Data-Intensive Technologies for Cloud Computing
 
How do data analysts work with big data and distributed computing frameworks.pdf
How do data analysts work with big data and distributed computing frameworks.pdfHow do data analysts work with big data and distributed computing frameworks.pdf
How do data analysts work with big data and distributed computing frameworks.pdf
 
Enterprise Data Lake
Enterprise Data LakeEnterprise Data Lake
Enterprise Data Lake
 
Enterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable DigitalEnterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable Digital
 
MasterClass Series: Unlocking Data Sharing Velocity with Data Virtualization
MasterClass Series: Unlocking Data Sharing Velocity with Data VirtualizationMasterClass Series: Unlocking Data Sharing Velocity with Data Virtualization
MasterClass Series: Unlocking Data Sharing Velocity with Data Virtualization
 
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
 
Big Data Companies and Apache Software
Big Data Companies and Apache SoftwareBig Data Companies and Apache Software
Big Data Companies and Apache Software
 
Big data and Hadoop overview
Big data and Hadoop overviewBig data and Hadoop overview
Big data and Hadoop overview
 
[IJCT-V3I2P32] Authors: Amarbir Singh, Palwinder Singh
[IJCT-V3I2P32] Authors: Amarbir Singh, Palwinder Singh[IJCT-V3I2P32] Authors: Amarbir Singh, Palwinder Singh
[IJCT-V3I2P32] Authors: Amarbir Singh, Palwinder Singh
 
Grid computing assiment
Grid computing assimentGrid computing assiment
Grid computing assiment
 
Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)
 
PAACDA Comprehensive Data Corruption Detection Algorithm.docx
PAACDA Comprehensive Data Corruption Detection Algorithm.docxPAACDA Comprehensive Data Corruption Detection Algorithm.docx
PAACDA Comprehensive Data Corruption Detection Algorithm.docx
 
A Comprehensive Study on Big Data Applications and Challenges
A Comprehensive Study on Big Data Applications and ChallengesA Comprehensive Study on Big Data Applications and Challenges
A Comprehensive Study on Big Data Applications and Challenges
 
A Logical Architecture is Always a Flexible Architecture (ASEAN)
A Logical Architecture is Always a Flexible Architecture (ASEAN)A Logical Architecture is Always a Flexible Architecture (ASEAN)
A Logical Architecture is Always a Flexible Architecture (ASEAN)
 
Big data business case
Big data   business caseBig data   business case
Big data business case
 
using big-data methods analyse the Cross platform aviation
 using big-data methods analyse the Cross platform aviation using big-data methods analyse the Cross platform aviation
using big-data methods analyse the Cross platform aviation
 
Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...
Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...
Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...
 

Urika-GD Product Brief Online 5-page

  • 1. Discovery cannot know all the data relationships in advance. The essence of discovery is to identify and create relationships dynamically as new data sources are added. Traditional analytics tools fail because they are schema based, forcing time-consuming and error-prone schema extension for each new data source. The Urika-GD appliance addresses this challenge with a schema-free graph database. New sources of structured, semistructured and unstructured data can be incrementally fused without upfront modeling. The system’s graph database also provides a range of powerful capabilities for querying and reasoning about relationships that are unavailable in traditional tools. Discovery cannot know all the questions to ask in advance. Data discovery is an iterative, real-time process, where the answers to the first set of questions determine the next questions to ask. Traditional analytics tools fail because their performance depends upon optimizing the data model for specific questions. The Urika-GD appliance addresses this challenge with a purpose-built hardware accelerator leveraging massive multithreading technology, capable of returning real-time results to the most complex, ad-hoc questions as dataset sizes grow. Furthermore, this technology supports the breadth of analytic and visualization approaches required for rapid exploration of big data. Discovery doesn’t access data in a predictable pattern. Data access during discovery follows no predictable pattern, accessing data virtually randomly from anywhere in the massive trove of data. Traditional analytics tools fail because they depend upon effective partitioning of the data across a computing cluster, which requires advance knowledge of data access patterns. www.cray.com The Cray® Urika-GD™ appliance is built for the challenges of the discovery process, transforming massive amounts of seemingly unrelated data into relevant insights. With its scalable-shared memory architecture, the Urika-GD appliance can surface hidden relationships and non-obvious patterns in big data with unmatched speed and simplicity, facilitating breakthroughs that can give your enterprise a measurable competitive advantage. Why is such an appliance necessary? Because traditional data analysis tools don’t have the capability to effectively perform real-time data discovery: “[The] Urika appliance offers organizations an extremely fast, effective means of implementing discovery analytics through a graph solution that includes a graph database. (“Discovering Big Data’s Value with Graph Analytics,” Enterprise Strategy Group) Cray: Enabling Real-Time Discovery in Big Data Discovery is the process of gaining valuable insights into the world around us by recognizing previously unknown relationships between occurrences, objects and facts. This was traditionally a manual process, but big data has accelerated discovery by enabling the formulation and validation of theories electronically, through a collaboration between person and machine in areas as diverse as fraud and risk analysis, drug discovery, personalized healthcare, micro-targeted marketing campaigns, cybersecurity, law enforcement and counterterrorism operations. Data discovery enables organizations to identify connections between disparate pieces of data: to find not just a single “needle in a needle stack,” but hundreds — and the invisible threads that link them together. Product Brief | Urika-GD
  • 2. The Urika-GD appliance addresses this challenge with a data model held entirely in a large memory of up to 512 TB, shared by up to 8,192 multithreaded processors with 128 hardware threads apiece. The large shared memory avoids the need for the data partitioning required by distributed memory systems. Furthermore, hardware multithreading enables the processors to tolerate memory access latency, delivering excellent scaling and real-time performance as dataset sizes and processor counts increase. The system complements existing data warehouses and Hadoop® clusters by offloading discovery analytics, while still interoperating with the existing analytics workflow. Subscription pricing for on-premise deployment eases the adoption of the Urika-GD platform into existing IT environments. The Urika-GD platform can fulfill the true promise of big data in your organization. What will you discover? The Power of Graphs for Discovery Analytics Graphs are the ideal data model for discovery analytics, and The Urika-GD system’s unique hardware architecture enables graphs to scale to handle big data. A graph is a data structure capable of representing any kind of data in an accessible way. Fundamentally, a graph is an abstract representation of a set of objects where some pairs of the objects are connected by links. A graph consists of “nodes” and “edges” and is typically depicted in diagrammatic form as a set of dots for the nodes, joined by lines or curves for the edges. A node represents an entity (a person or thing) and an edge a relationship. Graphs provide a holistic view of the relationships in which an entity participates. Graphs represent the relationships between entities directly, allowing data from multiple structured, semistructured and unstructured sources to be fused without complex, time-consuming and error prone modeling or schema definition. Furthermore, graph databases support a variety of sophisticated, ad-hoc analytic techniques, specifically designed to surface previously unknown patterns of relationships. Achieving the real-time performance required for collaborative data discovery imposes certain architectural requirements: Graphs must not be partitioned. Discovery analytics involve following the edges (the relationships) in the graph. It is an in-memory problem: The iterative nature of discovery dictates that the edges to be traversed will not be known in advance, so strategies based on pre-fetching from disk will not work. Similarly, partitioning the graph across a distributed memory compute cluster will result a large number of edges crossing cluster node boundaries, requiring time-consuming network transfers. Compared to local memory, even a fast commodity network such as 10 gigabit Ethernet is at least 100 times slower at transferring data. Users gain a significant processing advantage on discovery problems if the entire graph is held in a large shared memory. “IT organizations faced with previously infeasible graph-style discovery problems may succeed using a focused solution like Urika.” (“Urika Shows Big Data is More than Hadoop and Data Warehouses,” Carl Claunch, Gartner) www.cray.com Figure 1. High cost to follow relationships that span cluster nodes. 1 Product Brief | Urika-GD
  • 3. Graphs are not predictable. Analyzing relationships in large datasets requires the examination of multiple, competing alternatives. These memory accesses are very data dependent and eliminate the ability to apply traditional performance improvement techniques such as pre-fetching and caching, with the result that the processor sits idle most of the time waiting for delivery of data. Using multithreading technology can help alleviate this problem. Threads can explore different alternatives and each thread can have its own memory access. As long as the processor supports a sufficient number of hardware threads, it can be kept busy. Given the highly nondeterministic nature of graphs, a massively multithreaded architecture enables a tremendous performance advantage through the concurrent investigation of multiple, changing hypotheses. Graphs are highly dynamic. Graph analytics for discovery involves examining the relationships and correlations between multiple datasets and, consequently, requires loading many large, constantly changing datasets into memory. The sluggish speed of I/O systems — 1,000 times slower compared to the CPU — translates into graph load and modification times that can stretch into hours or days. This is longer than the time required for running analytics. In a dynamic, real-time enterprise with constantly changing data, a scalable processing infrastructure enables a tremendous performance advantage for discovery. The Urika-GD System: Purpose Built 1. Large, global shared memory whose architecture can scale up to 512 TB, enables uniform, low-latency access to all of the data in the graph, with no need to consider partitioning, layout in memory or memory access patterns, eliminating the delays associated with 100 times slower network access. 2. Threadstorm™ massively multithreaded hardware accelerator supports 128 hardware threads in a single processor (65,000 threads in a 512 processor system and over 1 million threads at the maximum system size of 8,192 processors), which enables global access of multiple, random dynamic memory references simultaneously without pre-fetching or caching, and eliminates the waits caused by memory speed lagging the processor. 3. Highly scalable I/O gets data into and out of the Urika-GD appliance with transfer rates of up to 350 TB/hr, allowing diverse data sets to be dynamically added and alleviating the problem associated with storage I/O being 1,000 times slower than memory I/O. www.cray.com Figure 2. High cost to follow multiple competing alternatives which cannot be pre-fetched/cached. Figure 3. High cost to load multiple, constantly changing datasets into in-memory graph models. 2 3 The Urika-GD appliance is built for data discovery, integrating hardware, software and storage. Its hardware delivers three key innovations to analyze the largest datasets in real time: Product Brief | Urika-GD
  • 4. The Urika-GD system’s software stack consists of the Graph Analytics Database and the Graph Analytics Tools The Graph Analytics Database is a high performance, in-memory, W3C standards-compliant implementation of an RDF (resource description framework) quad store. It can be queried using SPARQL 1.1 (SPARQL Protocol and RDF Query Language), providing sophisticated pattern matching and dynamic data update capabilities. The database has been carefully tuned to the hardware and can deliver performance that is orders of magnitude better than alternatives. The expressivity of SPARQL has been further extended with support for a range of whole graph algorithms, opening the door to new discovery approaches. For example, consider how “shortest path” analysis aids in understanding how two entities are related, while “community detection” and “between-ness centrality” enable identification of cliques and influencers for highly targeted marketing applications. The Graph Analytics Tools provide a comprehensive, simple and familiar set of management tools for the appliance and database, security and the data pipeline. Appliance management is provided through a comprehensive set of Linux-based tools, giving the Urika-GD appliance the appearance of a Linux server. The Graph Analytics Manager performs database management and is designed to provide a familiar environment to database administrators. Easy to Deploy The Urika-GD system hardware consists of a standard blade configuration, low-power profile and air-cooled cabinet, making it easy to deploy in enterprise datacenters. It connects easily to standard networking and storage solutions. Installation is quick, and users can load data and perform graph analytics immediately. The Graph Analytics Tools provide a familiar environment to systems and database administrators. Existing skill sets are sufficient to provision, manage, secure and monitor the Urika-GD appliance. www.cray.com Figure 4. Graph Analytics Database RDF datastore and SPARQL engine Graph Analytics Platform Shared-memory, Multithreaded, scalable I/O graph-optimized hardware LINUX RDF and SPARQL Graph Analytics Application Services Management, security, data pipeline 4 Product Brief | Urika-GD
  • 5. Easy to Integrate The Urika-GD appliance augments existing analytics environments and easily integrates with data warehouses, Hadoop® , other big data appliances and visualization solutions for a rapid return on investment (Figure 5). A connector model extracts data and returns results, enabling enterprises to offload graph workloads to an appliance specifically designed for the task. The Urika-GD system also provides a graph database endpoint accessible through an open, industry-standard interface, easing integration for in-house application developers. Surface unknown linkages or non-obvious patterns without advance knowledge of relationships in the data Shared memory model enables uniform, low-latency access to all the data regardless of data partitioning, layout or access pattern Investigate multiple, changing hypotheses in real time simultaneously Hardware accelerator enables global access of multiple, random, dynamic memory references in parallel without pre-fetching/caching Gain new insights by easily fusing diverse data sets without upfront modeling and independent of linkage In-memory graph analytics database enables merging of structured/semistructured/ unstructured data without schemas/layouts and querying data without prespecifying the connections between the data www.cray.com Application User Visualization Load Graph Datasets Share Analytic/ Relationship Results Data Warehouse Other Big Data Appliances EXISTING ANALYTIC ENVIRONMENTS D A TA S O U R C E S Appl A Figure 5. The Urika-GD appliance complements existing data warehouse/Hadoop environments. 5 The Urika-GD platform’s support for industry standards, including Linux, Java/J2EE, JDBC, RDF and SPARQL, enables enterprises to leverage existing IT skill sets and expertise to solve big data discovery problems while avoiding vendor lock-in. Product Brief | Urika-GD About Urika-GD The Urika-GD big data appliance for graph analytics helps enterprises gain key insights by discovering relationships in big data. Its highly scalable, real-time graph analytics warehouse supports ad hoc queries, pattern-based searches, inferencing and deduction. The Urika-GD appliance complements an existing data warehouse or Hadoop® cluster by offloading graph workloads and interoperating within the existing analytics workflow. About Cray Global supercomputing leader Cray Inc. provides innovative systems and solutions enabling scientists and engineers in industry, academia and government to meet existing and future simulation and analytics challenges. Leveraging more than 40 years of experience in developing and servicing the world’s most advanced supercomputers, Cray offers a comprehensive portfolio of supercomputers and big data storage and analytics solutions delivering unrivaled performance, efficiency and scalability. Go to www.cray.com for more information. ©2014 Cray Inc. All rights reserved. Specifications subject to change without notice. Cray is a registered trademark and Urika-GD is a trademark of Cray Inc. All other trademarks mentioned herein are the properties of their respective owners. 20140915