3. 3
Here’s what the world is saying about the impact of
Cognitive systems on the future of how we work:
“IBM Crafts a Role for Artificial
Intelligence in Medicine.”
“IBM Watson represents a bold
technological and visionary step a future
in which every aspect of our lives will be
enhanced by the utility of cognitive
computing as it is harnessed into myriad
new applications.”
“What is distinctive about IBM is the
breadth of its effort to create Watson
tools and services as plug-in offerings for
a wide range of developers.”
“At Wayin, former Sun CEO Scott McNealy is
using Watson's image recognition capabilities
to trawl photos on social media and make
them searchable, even when they don't have
tags describing their content. ‘You can't do this
without Watson,’ he said.”
“IDC predicts the worldwide cognitive
software platforms market will grow to $3.7
billion in 2019, at a CAGR of 35% over 5
years.”
IDC: Worldwide Cognitive Software Platforms
Forecast, 2015-2019: The Emergence of a
New Market (#258781, September 2015,
David Schubmehl)
“IBM is the only company marketing a
cognitive computing platform that’s
specifically designed to support the
development of a broad range of
enterprise solutions.”
“No doubt, Watson has the means to radically
change the industry. In fact, its potential as an
‘innovation lake/incubator’ should be highly
valued.”
IDC: IBM’s Go-to-Market Transformation –
Deeper, Wider, Newer (#AP257527, April 2015,
Chris Zhang, Sabharinath Balasubramanian,
Mayur Sahni)
“IBM’s famous cognitive computer can
help banks with complex financial
operations and attack important health
care problems. Now you can add seeing
to its skill set.”
“These days, it’s not just AI algorithms
themselves that have improved, but the
ability to deliver them…that has made so
many new applications possible.”
4. There are three capabilities that
differentiate cognitive systems from
traditional programmed computing
systems.
Reasoning
They reason. They can understand
information but also the underlying ideas and
concepts. This reasoning ability can become
more advanced over time. It’s the difference
between the reasoning strategies we used
as children to solve mathematical problems,
and then the strategies we developed when
we got into advanced math like geometry,
algebra and calculus.
Learning
They never stop learning. As a technology,
this means the system actually gets more
valuable with time. They develop
“expertise”. Think about what it means to
be an expert- - it’s not about executing a
mathematical model. We don’t consider
our doctors to be experts in their fields
because they answer every question
correctly. We expect them to be able to
reason and be transparent about their
reasoning, and expose the rationale for
why they came to a conclusion.
Understanding
Cognitive systems understand like
humans do, whether that’s through
natural language or the written word;
vocal or visual.
16. Power Systems – Designed for Big Data and Analytics solutions
16
Analytics 1.0
17. Power Systems – Designed for Big Data and Analytics solutions
17
Analytics 1.5
18. Power Systems – Designed for Big Data and Analytics solutions
18
Analytics 2.0 - support a variety of data and a range of analytics
19. Classic hadoop infrastructures can be inefficient and inflexible
leading to server and cluster sprawl, unnecessary software licenses,
and infrastructure management challenges
IBM Data Engine for Hadoop and Spark
IBM Data Engine for Analytics
20. Avoid Server Sprawl as Big Data and Analytics environments grow
Intel server growth estimated as a sample
configuration with 500 TB of user space grows 5x and 10x
to 2.5 PB and 5 PB:
5x 10x
POWER8 server growth using IBM Data Engine
for Analytics estimated as a sample configuration
grows 5x and 10x to 2.5 PB and 5 PB of user
space:
5x 10x
Sizing is based on assumptions regarding general configurations and use cases for BigInsights and BigSQL with actual client. The comparison reflects
the number of servers required to deliver relative performance and equivalent user space on Intel reference architecture using Hadoop triple
replication with 4 TB drives versus POWER8 IDEA reference architecture with 4 TB drives using the Elastic Storage Server.
17 Servers
28 Servers
75 Servers
143 Servers
21. Many new solution workloads
in addition to existing apps
Leads to costly, complex, siloed, under-utilized infrastructure
and replicated data
Development Test Distributed ETL,
Sensitivity
Analysis
Hadoop based
Sentiment
Analysis
Low
Utilization
= Higher cost
Low
Utilization
= Higher cost
Infrastructure Silos aka Cluster Sprawl is Inefficient
22. IBM Data Engine for Analytics (IDEA) – Client Example
Client: Multinational Telecomm. Company
A multinational telecommunication company with
over 6M subscribers..
Challenges
Expectations of a Real Time Marketing (RTM)
based solution to run event-based campaigns
Enable event-based marketing, analyzing various
sources of input data containing information
regarding subscribers actions
Dispatch the triggered events to downstream
applications such as campaign management, for
associated campaign execution.
Architecture- Solution Components
• BigInsights, Streams, SPSS Modeller, SPSS
Analytics Server
• IBM Data Engine for Analytics: 20 X S822L, 2 X ESS
GL4, Spectrum Scale, PCM
Solution Approach
Solution provided a Hadoop-based Big Data platform,
integrated to the RTM decision engine, to enable data
monetization opportunities, including location based
analytics
Customer was not comfortable with the huge number of
x86 Data Nodes approach of typical Hadoop
Architecture
The IBM team designed the Power solution and
conducted a technical workshop on newly redefined
Hadoop architecture based on IDEA.
Key Client Benefits
Optimized Big Data deployment architecture with IDEA
Architecture with Linux on Power, Elastic Storage Server
and Spectrum Scale
Lower TCO with 4 Racks on Power vs 12 racks on x86
More IO bandwidth with 40GbE Power network against
10GbE on x86 based solution
3x less racks for 2 PB
Big Data solution
3x less racks for 2 PB
Big Data solution
4 vs. 12
23. IBM Data Engine for Analytics – Solution Highlights
Actionable Insights with IBM BigInsights preloaded
+ Increase business value by consolidating multiple
analytic capabilities and data as needed
Up to 2.5x* faster insights
Smart Infrastructure Services with IBM Platform Computing
+ Designed to handle multiple analytic workloads in a multi-tenant
environment with dynamic resources
Designed for Data with IBM POWER8 Systems (S822L)
+ Outstanding memory and IO bandwidth design for the demands of Big Data
2x* better performance
Scalable Networking with IBM and partner networks
+ High bandwidth, low latency networking Ethernet
RoCE, (10 Gbit or 40 Gbit), InfiniBand RDMA (FDR)
10x to 100x network performance growth since Hadoop inception
Flexible Storage with IBM Elastic Storage Server
+ Combines Servers, Storage Enclosures, Disks and Elastic Storage Software
Over 2x** reduction in storage disk count
*Based on internal testing and cost analysis
**Based on client example vs a triple replica Hadoop configuration.
Big Data & Analytics
Software
Infrastructure Services
POWER8 Servers
Scalable Networking
Scale Out Cluster File System
Elastic Storage Server
Appliance-Like but much more Versatile!
24. Compute Plane = Power8 Systems, Designed for Big Data
4X
Threads per core*
4X
Memory Bandwidth*
5X
More cache*
multaneous Multi-Threading
On-Line Transaction Processing
gh Performance Computing
These design decisions result in best
performance for all types of workloads
such as: Java, OLTP, Analytics, Big Data, HPC
* POWER8 compared to Intel Haswell EX
Sources: Haswell EX:
http://ark.intel.com/products/84685/Intel-Xeon-Processor-E7-8890-v3-45M-Cache-2_50-GHz
POWER8:
http://www-01.ibm.com/common/ssi/cgi-bin/ssialias?subtype=BR&infotype=PM&appname=STGE_PO_PO_USEN&
POWER8
SMT8
x86
SMT2
POWER8
pipe
Data flow
x86 pipe
POWER8
x86
x86POWER8
1.4 – 2.3X
Clock Frequency
Components of IDEA
25. Data Plane = Elastic Storage Server, A Data Lake for many
applications
Cinder SwiftGPFS NFS
Linear capacity & performance
scale out
POSIX
Enterprise storage on
standard hardware
Technical Computing Big Data & Analytics Cloud
File
77.5 Percent of
organizations are
already investing in a
Data Lake*
Elastic Storage Server
Single Name Space
25
Hadoop
Block Object
Data Lake
Structured Data
Unstructured Data
Traditional
Analytics
BigSQL
Components of IDEA
26. Production - IBM Data Engine for Analytics
26
Without HMC/TFT/SMN/MN With HMC/TFT/SMN
Available Filesystem Capacity 0PB 1PB
Number of Data Nodes 14 4
Hadoop Mgmt LPARs 0 6
System Mgmt Node 0 1
Data Node – S822L
24x 3.02GHz cores
256GB DRAM
12x 1.8TB 10K SAS HDD
2x40GbE (2 port)(data+mgmt)
2x 4-port 1GbE NIC (mgmt)
Hadoop Management Node – S822L
2 LPARs with split backplane
24x 3.02GHz cores
256GB DRAM
8x 1.8TB 10K SAS HDD (OS + data)
2x40GbE (2 port)(data+mgmt)
2x 4-port 1GbE NIC (mgmt)
System Management Node – S812L
10x 3.425GHz cores
32GB DRAM
2x 300GB 10K SAS HDD (OS)
1x40GbE (2 port)(data+mgmt)
1x 4-port 1GbE NIC (mgmt)
Initial RackScale out Racks
27. Classic hadoop infrastructures can be inefficient and inflexible
leading to server and cluster sprawl, unnecessary software licenses,
and infrastructure management challenges
IBM Data Engine for Hadoop and Spark
IBM Data Engine for Analytics
29. Single vendor support
Up to 2x better price performance
for Spark workloads*
Delivered as a fully integrated
cluster ready to run
OpenPOWER innovation with
IBM S812LC servers
Optimized configurations for
Hadoop or Spark workloads
Based on S812LC servers with
up to 14*6TB disk drives per
server
Optionally preloaded with
IBM BigInsights and IBM Open
Platform
Simplify operations – easy to
deploy and manage
Adapt and scale to your
changing analytics needs
IBM Data Engine for Hadoop and Spark
OpenPOWER innovation with IBM Open Platform with Apache Hadoop for a high
performance, storage dense and fully integrated cluster offering.
• All results are based on IBM Internal Testing of 3 SparkBench benchmarks consisting of SQL
RDD Relation, Logistic Regression, SVM
Announce: Feb 9, 2016
GA: Mar 18, 2016
30. IBM Data Engine for Spark and Hadoop (IDE-HS)
Cluster Performance
Designed for the Cognitive Era to Make Better Decisions even Faster
IBM Data Engine for Hadoop and Spark infrastructure
delivers Spark workload scaling to minimize execution
times and reduce batch windows
-2.1X more performance per dollar spent for Spark
Logistic Regression based Machine Learning used in
model training by wide variety of lines of business
-1.4X more performance per dollar spent for Support
Vector Machine (SVM) – a Machine Learning algorithm
used in product Recommender Systems
-1.7X more performance per dollar spent for Spark SQL
query processing used widely in Big Data clusters
• All results are based on IBM Internal Testing of 3 SparkBench benchmarks consisting of SQL RDD Relation, Logistic Regression, SVM
• 6 Data Nodes and 1 Management Node. Each node is IBM Power System S812LC 10 cores / 80 threads, POWER8; 2.92GHz, 256 GB
memory, RedHat 7.2, Spark 1.5.1, OpenJDK 1.8
• 6 Data Nodes and 1 Management Node. Each node is x86 E5-2620V3 12 cores / 24 threads, E5-2620 V3; 2.4GHz, 256 GB memory,
RedHat 7.1, Spark 1.5.1, OpenJDK 1.8
• Pricing is based on web prices of HP DL380 and list prices of IBM Power S812LC
SVMLogres SQL
SVMLogres SQL
6
31. • Apache Spark is an open-source in-memory distributed compute engine
– It speeds iterative analysis on large-scale data up to 100x faster
than current technologies
– Enables more people to collaborate together to access data,
apply analytics and deploy deep intelligence into every application
including IoT, web, mobile, social, business process and more
– IBM/Spark commitment: 3500 employees working on Spark
• Included in the IBM Open Platform (IOP) that runs on
Linux on Power
• Power Systems - key contributor to Spark
• Offering over 2x the performance per core for Spark workloads
compared to x86 Haswell * (SQL, ML, Graph, Streaming)
Open Platform
with Apache Hadoop
Open innovation to put data to work across the enterprise
* Based on Sparkbench on POWER8 P822L vs x86 E5-2690 V3; each 24 core and 256 GB RAM
Spark on Power
Notas do Editor
1) What is the relationship between Cognitive Business, Watson and outthink?
“Cognitive Business” is a branded POV for the entire IBM Company, while “Watson” is the lead brand for IBM’s cognitive offerings. “Outthink” is our marketing campaign that supports Cognitive Business and features Watson.
ANALYTICS 2.0
We have the most sophisticated cognitive technology.
Cognitive systems are making us rethink the way business gets done.
They’re becoming integral to the way we work and make decisions – and the market is validating this.
Understand Understands data–structured and unstructured, text-based or sensory–in context and meaning, at astonishing speeds and volumes.
Reason Has the ability to form hypotheses, make considered arguments and prioritize recommendations to help humans make better decisions.
Learn Ingests and accumulates data and insight from every interaction continuously. Is trained, not programmed, by experts who enhance, scale and accelerate their expertise. Therefore, it gets better over time.
What are the forces driving the idea of cognitive business?
It’s information, it’s data –
Cognitive systems are knowledge systems; fueled by the magnitudes of information available to us.
For the first time in history, the volume of data and information we’re producing has outpaced our ability to make use of it.
And the sources and types of data that inform the work we do and the decisions we make are broader and more diverse than ever before.
This isn’t news, because most of us spend a lot of time figuring out how to surface actionable insights from massive amounts of data and information.
Since 2014, the number of businesses that have implemented data-driven projects has increased by 125%, with executives citing improved speed and quality of decision making as their top priority.
Even with these advanced analytics solutions, businesses estimate that they’re only reaching 12% of the data they have, leaving 88% of it to waste.
That’s because this 88% of data is “invisible” to computers. It’s the type of data that humanity encodes in language and unstructured information, in the form of text – books, emails, journals, blogs, articles, tweets, as well as images, sound and motion.
We need better way to take command of the knowledge and information that matters most to us.
We need to be able to discover new connections, patterns, and insights from within it.
And a new way to think about expertise – in order to draw new conclusions and make decisions with more confidence and speed than ever before.
Today, businesses and organizations in 36 countries, across 29 industries and 5 languages (Arabic, English, Japanese, Brazilian Portuguese, Spanish) are using Watson to build cognitive abilities into their products, applications, processes, and offerings:
The 50,000 students at Deakin University in Australia using Watson as a student advisor to answer their questions as they arrive on campus;
The 1.1 million patients in Bumrungrad Hospital’s network who now have access to personalized cancer treatment recommendations with help from a system trained by the doctors at the worlds leading cancer centers;
The 5.5 million citizens in Singapore who have access to government services with help from Watson;
80,00 developers, VCs, and start-ups using Watson APIs.
More than 350 Watson ecosystem partner companies, with 100 of their applications already in market.
And countless chief marketing officers, analysts, researchers, and many more who are making connections and discoveries with apps powered by Watson.
Let’s take a closer look at a few examples:
1. 96% of unhappy customers don’t complain, but 91% never come back
2. Cognitive learning makes expertise accessible on a new scale by making it easy for any professional to keep pace with knowledge from the entire field and learn from the best in the world. Scaling the greatest mind to every mind.
3. Cognitive products and services can sense, reason and learn so they can adapt and develop new capabilities not previously imaginable. Apps with advanced and predictive analytics are growing 65% than apps without this functionality..
4. Cognitive systems bring more certainty to business by extracting real-time information from workflows, context and environment to enhance forecasting and decision-making.
5. Cognitive discovery changes the odds for high-stakes research by enabling companies to mine insights from vast amounts of data, and uncover patterns and opportunities that would be virtually impossible to find through traditional methods.
Watson is a cloud-based, open platform of expanding cognitive capabilities. With Watson, you can build cognition into digital applications, products and operations.
Next, you can leverage Watson APIs – cognitive building blocks - to apply Watson’s capabilities.
Watson APIs are delivered on a cloud-based, open platform, and with Watson, you can build cognition into your digital applications, products, and operations, using any one or combination of 28 available APIs.
For example, Natural Language Classifier API enables developers without a background in machine learning or statistical algorithms to create machine-learning, natural language interfaces for their applications.
Tone analyzer helps individuals understand the linguistic tone of their writing. This API uses linguistic analysis to detect and interpret emotional, social, and writing cues that are located within the text, and also offers rhetorical suggestions for an author to improve the intended tone.
Retrieve and rank helps users find the most relevant information for their query by using a combination of search and machine learning algorithms to detect “signals” in the data. – cognitive building blocks – to leverage capabilities including relationship extraction, personality analysis, tone analysis, concept expansion, and trade-off analytics, among others.
Each API is capable of performing a different task, and in combination, they can be adapted to solve any number of business problems or create deeply engaging experiences.
And we continue to add new and expanded cognitive capabilities to the platform.
Becoming a Cognitive Business is a journey. Leaders can capitalize on all the foundational work they’ve done to deploy cloud, analytics, mobile, social, security. P8 for Jeopardy
Build a platform that is fluent in all forms of data and analytics. It’s really about realizing it; it’s about investing in this big data and analytics platform, to build out against a master plan that will eventually accommodate all types of data, any type of analytics and really drive towards a full range of business outcomes. After all data requires analytics to make sense of it and analytics requires data in order to fuel it, so these things are really quite tightly tied together.
When we look at IBM’s strategy around big data and analytics, they want to make sure that we have a portfolio of hardware, software, storage, services, everything to address the customer’s needs. When we talk to a client they just don’t want to talk about the storage, they don’t want to just talk about servers, they want to talk about a business challenge that they have and they want to find a way of solving the business challenge in many times with a single company that understands the entire end-to-end solution and can put it in a way that helps them execute within their cost restraints and that’s what our goal is.
If you read this chart left right, we talk about all of this data, the structured and unstructured data. When bringing it in, some of it is going to be stored in unstructured format, it can be either data in motion which is Streaming the board data at rest which would be Hadoop, or we could put it into a structured database and that’s where IBM DB2 with BLU Acceleration plays. Then there is a whole line of innovative analytics solutions, everything from our cognitive solutions around Watson, to our Cognos and SPSS and our industry solutions. We want to be able to take these solutions and drive them into key business processes. The really nice thing is IBM can offer all of this on a single platform, IBM Power Systems.
Deliver insights quickly:
Access all data and make better decisions with a unified view of information across all sources
Optimized for the unique demands of Big Data applications built with Spark and Hadoop
Deploy big data technologies with confidence:
An economical entry point
Superior price/performance, with 2.3X BETTER performance per dollar spent
A platform that can scale with your needs
UNSTRUCTURED: Hadoop, CAPI-enabled Flash, NoSQL - Derive actionable insight using industry cost-efficient solutions.
Solutions such as IBM Data Engine for Analytics, IBM Solution for Hadoop, IBM Data Engine for NoSQL, InfoSphere
Up to 3X lower TCA
IN-MEMORY: Perform faster in-memory performance with leading database providers. Two of three support Linux on Power Little Endian (not Oracle)
DB2 BLU Acceleration, Oracle Database 12, SAP HANA*
56% more query results per hour
STRUCTURED: Compute tremendous amounts of data rapidly and support multiple databases
DB2, Oracle, EnterpriseDB, MariaDB and other industry databases, Cognos, SPSS
82X faster insights
Many are now discovering that there are advantages in taking a shared storage approach for Hadoop, and leading companies like IBM are challenging the common assumptions of how best to house data in the big data space.
Key Message: The IDEA solution has all the key elements needed to support BDA deployments with outstanding performance all in a pre-intregated package and it can grow as clients needs change. 1. Innovative design 2. SDI – Spectrum Scale and PS.
the traditional databases and grid computing technologies they had in-house would not scale.
Hadoop stores everything in a schema-less structure
centralized Hadoop implementation spans every system across the entire company to break down data silos and provide a single, comprehensive view of all its data.
Many are now discovering that there are advantages in taking a shared storage approach for Hadoop, and leading companies like IBM are challenging the common assumptions of how best to house data in the big data space.
Key Message: The IDEA solution has all the key elements needed to support BDA deployments with outstanding performance all in a pre-intregated package and it can grow as clients needs change. 1. Innovative design 2. SDI – Spectrum Scale and PS.
the traditional databases and grid computing technologies they had in-house would not scale.
Hadoop stores everything in a schema-less structure
centralized Hadoop implementation spans every system across the entire company to break down data silos and provide a single, comprehensive view of all its data.
Many are now discovering that there are advantages in taking a shared storage approach for Hadoop, and leading companies like IBM are challenging the common assumptions of how best to house data in the big data space.
Key Message: The IDEA solution has all the key elements needed to support BDA deployments with outstanding performance all in a pre-intregated package and it can grow as clients needs change. 1. Innovative design 2. SDI – Spectrum Scale and PS.
the traditional databases and grid computing technologies they had in-house would not scale.
Hadoop stores everything in a schema-less structure
centralized Hadoop implementation spans every system across the entire company to break down data silos and provide a single, comprehensive view of all its data.
4x Threads = 8 threads /core vs 2 threads / core = 4x
4X Memory Bandwidth = 410 GB/s vs 102 GB/s = 4x
5X Cache =
For POWER8: L1 = 96 KB, L2 = 512 KB, L3 = 96 MB and L4 = 128 MB
For Haswell-EX: L1 = 64 KB, L2 = 256 KB, L3 = 45 MB
Total in MB:
For POWER8 = .096 + .512 + 96 + 128 = 224.608 MB
For Haswell-EX = 0.064 + 0.256 + 45 = 45.32 MB
Ratio = 4.96
Clock Rates (09/29/15)
Haswell EX slowest rate = 1.9 GHz
Haswell EX fastest rate = 3.2 GHz
POWER8 (E880) fastest rate = 4.35GHz
Key Message: The Elastic Storage Server provide a consolidated storage solution that can store a wide variety of data types supporting a range of applications with standard access methods. One shared Data Lake allows for the same data to be shared across different application domains and global locations while reducing the need for data movement and copies thus saving significant costs for storage, floor space and administration.
Data Lake: A data lake is a storage repository that holds a vast amount of raw data in its native format until it is needed.
Extremely competitive $/TB with starter configurations over 200TB raw for around $100K
Key Message: Leading performance! Enables faster insights on less infrastructure.