1. Enabling Analytics as a Service (AaaS): The Key Analytical Platforms and
Workloads on IBM SoftLayer Cloud
Abstract
In the recent past, two prominent and dominant trends have gripped the IT industry. The first one is the
accelerated IT infrastructure optimization, which is being primarily sponsored and supported by the proven
and promising cloud technologies. The other one is the amount of data getting generated, collected, and
subjected to a variety of investigations to extract actionable insights in time, to enable correct and timely
decision-making by business executives with all the confidence and clarity, and to empower knowledge workers
to be greatly efficient in their tasks, is challengingly massive in size. Established product vendors and researchers
from academic institutions across the world are in fast track and in grand unison in collaboratively conceiving
and concretizing a bevy of service assemblage and delivery platforms (SDPs), data virtualization, ingestion,
analytics and visualization platforms and application enablement platforms (AEPs) in order to speed up and
simplify knowledge extraction and engineering from a variety of data heaps via real-time as well as batch
processing. Towards the knowledge discovery and dissemination, there are design and architectural patterns,
highly synchronized processes, evaluation metrics, key guidelines, best practices, etc. being unearthed and
sustained by data management professionals. We have performed a variety of proof of concepts and pilots in
the fast-growing big and fast data analytics domains and based on that experience and expertise gained, we
could produce a repository of reusable assets to be shared across. In this paper, we would illustrated how the
trendy and transformative data analytics is being exposed and delivered as a service via IBM SoftLayer Cloud
for worldwide users in an affordable, amenable and accelerated fashion.
Introduction
There are several disruptive things happening in parallel in the IT field. The device ecosystem is seeing an
unprecedented growth towards billions of connected devices, the number of implantables, wearables, portables,
cyber-physical systems (CPS), etc. are zooming ahead, the business-critical operational, transactional and
analytical systems are becoming pervasive, social sites are embraced with a greater alacrity by people across the
world, the digitization idea is pursued vigorously as never before resulting in trillions of digitized entities /
smart objects / sentient materials, scores of powerful scientific and technical experimentations are
accomplished, etc.
Traditionally business data is the main source for analytics to squeeze out business insights. Today the data size
is massive, data scope, speed, and structure are varying sharply, and the resulting data value for any individual,
innovator and institution is going to be decisive if all kinds of data getting collected are crunched cognitively.
Having understood the strategic significance of data-driven insights, there are two grand disciplines (big and
fast data analytics) of deeper research and study. There are several enabling technologies, platforms, and tools
in plenty these days from worldwide product vendors for accelerating big and fast data analytics in a simplified
and streamlined fashion. Precisely speaking, there is an insistence on crafting and composing insights-filled
knowledge services towards enhanced care, choice, comfort and convenience for people. In the ensuing
sections, we would like throw some light on the two principal technologies enabling the smooth and sagacious
realization of analytics as a service (AaaS). We write about the analytics platforms and workloads that got
modernized, migrated and deployed in IBM SoftLayer Cloud to envisage and enable the ultimate aim of
accomplishing of analytics as a service.
The World of Big and Fast Data Analytics
Big data analytics is now moving beyond the realm of intellectual curiosity and propensity to make tangible and
trendsetting impacts on business operations, offerings and outlooks. It is no longer a hype or a buzzword and
is all set to become a core and central tenet for every sort of business enterprise to be extremely relevant and
2. rightful to their stakeholders and end-users. Big data analytics is a generic and horizontally applicable idea to be
feverishly leveraged across all kinds of business domains and hence is poised to become a trendsetter for
worldwide businesses to march ahead with all clarity and confidence. Real-time analytics is the hot requirement
today and everyone is working on fulfilling this critical need. The emerging use cases include the use of real-
time data such as the sensor data to detect any abnormalities in plant and machinery and batch processing of
sensor data collected over a period to conduct root cause and failure analysis of plant and machinery.
Public Clouds for Big and Real-time Data Analytics - Most traditional data warehousing and business
intelligence (BI) projects to date have involved collecting, cleansing and analyzing data extracted from on-
premises business-critical systems. However, this age-old practice is about to change forever. However, for the
foreseeable future, it is unlikely that many organizations will move their mission-critical systems or data
(customer, confidential and corporate) to public cloud environments for analysis. Businesses steadily are
adopting the cloud idea for business operational and transactional purposes. Packaged and cloud-native
applications are primarily found fit for clouds and they are exceedingly well in their new residences. The biggest
potential for cloud computing is the affordable and adept processing of data that already exists in cloud centers.
All sorts of functional web sites, applications and services are bound to be cloud-based sooner rather than later.
The positioning of clouds as the converged, heavily optimized and automated, dedicated and shared, virtualized
and software-defined environment for IT infrastructures (servers, storage and networking), business
infrastructure and management software solutions and applications is getting strengthened fast. Therefore every
kind of physical assets are seamlessly integrated with cloud-based services in order to be smart in their
behavioral aspects. That is, ground-level sensors and actuators are increasingly tied up with cloud-based
software to be distinct in their operations and outputs. All these developments clearly foretell that the future
data analytics is to flourish fluently in clouds.
These days’ public clouds are natively providing all kinds of big data analytics tools, platforms, and tools on
their infrastructures in order to speed up the most promising data analytics at a blazing speed at an affordable
cost. WAN optimization technologies are maturing fast to substantially reduce the network latency while
transmitting huge amount of data from one system to another among geographically distributed clouds.
Federated, open, connected, and interoperable cloud schemes are fast capturing the attention of the concerned
and hence we can see the concept of the inter-cloud getting realized soon through open and industry-strength
standards and deeper automations. With the continued adoption and articulation of new capabilities and
competencies such as software-defined compute, storage and networking, the days of cloud-based data analytics
is to grow immensely. In short, clouds are being positioned as the core, central and cognitive environment for
all kinds of complex tasks.
Hybrid Clouds for Specific Cases - It is anticipated that in the years to unfold, the value of hybrid clouds is
to climb up sharply as for most of the emerging scenarios, a mixed and multi-site IT environment is more
appropriate. For the analytics space, a viable and venerable hybrid cloud use case is to filter out sensitive
information from data sets shortly after capture and then leverage the public cloud to perform any complex
analytics on them. For example, if analyzing terabytes worth of medical data to identify reliable healthcare
patterns to predict any susceptibility towards a particular disease, the identity details of patients are not too
relevant. In this case, just a filter can scrape names, addresses, and social security numbers, etc. before pushing
the anonymized set to secure cloud data storage.
All kinds of software systems are steadily being modernized and moved to cloud environments especially public
clouds to be given subscribed and used as a service over the public web. The other noteworthy factor is that a
variety of social sites for capturing and captivating different segments of people across the world are emerging
and joining in the mainstream computing. We therefore hear, read and even use social media, networking, and
computing aspects. A statistics says that the widely used Facebook pours out at least 8 terabytes of data every
day. Similarly other social sites produce a large-scale of amount of personal, social, professional data apart from
musings, blogs, opinions, feedbacks, reviews, multimedia files, comments, compliments, complaints,
3. advertisements, and other articulations. These poly-structured data play a bigger role in shaping up the data
analytics domain.
The other valuable trends include the movement of enterprise-class operational, transactional, commercial, and
analytics systems to public clouds. We all know that www.salesforce.com is the founding public cloud providing
CRM as a service. Thus most of the enterprise data originates in public clouds. With public clouds projected to
grow fast, the cloud data is being presented as another viable and venerable opportunity towards cloud-based
data analytics.
The Contemporary Analytics in Hybrid Clouds
Apart from the traditional business analytics, the above-mentioned trends ask for newer kinds of analytics
leveraging big and real-time data. There are domain-specific and agnostic analytics categories. For example,
increasingly the justifications for predictive and prescriptive analytics, operational, security, performance
analytics and so on are being expounded with the purposeful emergence of different and distributed data
sources. Every industry vertical has its big data analytics. With different data velocities, real-time / streaming
analytics is bound to be mandatory. There are a few vital parameters to determine the appropriateness of cloud
environments for powerful data analytics.
The Data Volume and Velocity
The Impacts on Compute, Storage and Network Resources
The Sensitivity of data and Regulatory /Compliance Requirements
The Scope of Analytics
The Types of the Environments?
Why the Next-Generation Data Analytics Applications and Platforms in Cloud Environments?
Clouds-based data analytics has been picking up fast in order to reap all the originally envisaged benefits of the
cloud paradigm. Here is a list of key benefits to be accrued out of the cloud embarkation strategy and journey.
Agility & Affordability - No capital investment of a large-scale IT infrastructures. Just Use and Pay
Big & Fast Data Platforms - Deploying and using any kind of Big data Platforms (generic or specific,
open or commercial-grade, etc.) for analytics are quick and easy
End-to-end Hadoop Platforms – Data virtualization, ingestion, processing, mining, analytics, and
information visualization tasks are being performed by these platforms
Data Management Systems – Parallel, Clustered, Distributed SQL databases, NoSQL and NewSQL
databases are made available in Clouds
Data Warehouse Systems – Recently there are data warehouse as a service (DWaaS) capabilities are
being realized
Social Sites, mobile application stores, etc. – The popular social media and network applications
are being run on public clouds
WAN Optimization Technologies - There are WAN optimization products and platforms for
efficiently transmitting data over the Internet infrastructure
Business Applications in Clouds - With enterprise information systems (EISs), business-critical
packaged applications such as ERP, CMS. SCM, KM, etc. are also getting deployed in clouds.
Cloud Integrators, Brokers & Orchestrators – There are products and platforms for seamless
interoperability among different and distributed systems, services and data
Operational, Transactional and Analytical Systems are modernized, migrated and hosted in
Clouds
4. Device / Sensor / Machines Integration with Cloud-native as well as enabled Applications, Services
and Data
Cloud-based Analytical Platforms
We have performed a number of proof of concepts (PoCs) in order to gain the deeper understanding of cloud-
based big and fast data analytics. The following sections are to depict the various platforms, databases, and
tools which are made to run in IBM SoftLayer Cloud for simplifying and streamlining the provision of analytics
as a service to worldwide clients and customers.
Big Data Analytics Platforms in IBM SoftLayer Cloud
Increasingly, individuals, innovators and institutions are taking advantage of the agility and cost efficiencies that
cloud infrastructures provide. There are several other advantages being carefully associated with cloudification
of enterprise IT infrastructures. As we all know, Hadoop is the prime method to proceed with confidence.
The maturity and stability levels of Hadoop-compliant data analytics platforms are pushing companies towards
big data analytics. As enunciated earlier, the cloud infrastructure is being positioned as the most appropriate
one for big data analytics. Also there are several open source as well as commercial-grade implementations of
Hadoop specifications in the market. Cloudera, Hortonworks, and MapR. IBM InfoSphere BigInsights is the
most favored and full-fledged commercial implementation with Apache Hadoop as the base.
Designed specifically for mission-critical environments, Cloudera Enterprise includes Cloudera data hub
(CDH), the world’s most popular open source Hadoop-based platform, as well as advanced system
management and data management tools. Cloudera Enterprise includes Cloudera Manager to help you easily
deploy, manage, monitor, and diagnose issues with your cluster. Cloudera is critical for operating clusters at
scale. Cloud environments are becoming increasingly popular for critical Apache Hadoop workloads, given
their flexibility and elasticity. With Cloudera Director, you can unlock the full potential of Hadoop in the cloud,
without compromise. The CDH reference architecture is given below.
5. SoftLayer Cloud not only provides potentially unlimited resources for your high-performance computing
cluster, but makes it easy to manage with Cloudera Managed Hadoop. Similarly we have deployed
Hortwonworks and MapR Hadoop platforms in SoftLayer Cloud. A typical cloud-based solution comprises
storage, processing and management components deployed on SoftLayer Cloud, an extensible, elegant,
efficient, and elastic environment for processing your data. The other benefits include extreme flexibility, high
performance, agility, and pay as per the usage obliterating the upfront costs.
IBM InfoSphere BigInsights is also made available on SoftLayer cloud and this movement brings the following
benefits to the table.
Accelerates and simplifies cluster deployment – Take advantage of big data analytics without the
need for an on-premise infrastructure.
Scales as your business demands – Keep infrastructure costs in line with the changing needs of the
business.
Provides advanced tools to reduce time to value – Gain value from Big SQL, Big Sheets, text
analytics and more.
Optimizes performance and enhances security – Experience speed and reliability with a dedicated
bare-metal infrastructure.
Offers expertise and best practices – Benefit from a dedicated cloud operations team that deploys
clusters based on best practices.
Thus Hadoop-based platforms are being steadily taken to cloud environments in order to deliver big data
analytics with nimbleness and suppleness.
Real-time Analytics Platforms in IBM SoftLayer Cloud
Not only big data analytics but also real-time analytics on fast and streaming data is also comfortably
accomplished in cloud environments. In this section, we would like to explain how a couple of platforms that
were methodically modernized and migrated to IBM SoftLayer cloud center in order to understand the
concerns, challenges and changes associated with cloud-based real-time analytics.
Delivering Real-time Applications via SoftLayer Cloud-based VoltDB - Now with the data getting
generated and captured is growing into unprecedented volumes, the traditional data analytics platforms and
infrastructures are bound to face a variety of constraints. That means we need robust and resilient algorithms
and IT solutions for big and fast data. Several product vendors, having realized the brewing challenges, are
proactively bringing forth a bevy of big data analytics systems that facilitate the smooth transition of captured
and consolidated data to information and to knowledge methodically.
Data virtualization, databases, warehouses, data marts and cubes, business intelligence (BI) and visualization
solutions are very critical for powering up the goals of knowledge extraction and engineering to realize a
growing family of smarter systems and services for fulfilling the ingenious ideas and ideals of the smarter planet
vision. VoltDB is a high performance and scalable relational database management system (RDBMS) for big
data, high-velocity OLTP and real-time analytics. VoltDB, being proclaimed as a kind of NewSQL database, is
a blazingly fast DB designed to run on modern scale-out computing infrastructures. Unlike legacy RDBMS
products and NoSQL data stores, VoltDB enables high-velocity applications without requiring complex and
costly sharding layers or compromising transactional data integrity (ACID) to gain performance and scale:
VoltDB provides
Database throughput reaching millions of operations per second
On demand scaling
6. High availability, fault tolerance and database durability
Real-time data analytics
VoltDB is deployed in SoftLayer Cloud in order to showcase its real-time and real-world capabilities of
producing actionable insights.
Apache Storm on IBM SoftLayer Cloud for Real-time Analytics
Not only the data size and structure but also the data speed matters much these days. There are specific use
cases across industry verticals emerging insisting fast data analytics. Data are being massaged, encapsulated and
delivered as messages. Data and event messages are emerging as the formalized building-block to be received,
opened up, parsed, and used for a variety of deeper and decisive analysis. There are data streams (multimedia)
and events from newer data sources such as sensors, machines, operational systems, platforms, etc. and they
need to be systematically captured and analyzed immediately in order to extract both tactic and strategically
sound insights to empower decision-makers and even systems to ponder about the next course of actions with
all the confidence and clarity. While clouds are being positioned as the core and optimized IT infrastructure,
there are several open source as well as commercial-grade platforms for accomplishing and automating the
process of real-time and streaming analytics and its associated tasks.
Apache Storm is one such real-time analytics platform, is a free and open source distributed real-time
computation system. Storm makes it easy to reliably process unbounded streams of data, doing for real-time
processing what Hadoop did for batch processing. Storm is simple, can be used with any programming
language. Storm has many use cases: real-time analytics, online machine learning, continuous computation,
distributed RPC, ETL, and more. Storm is fast: a benchmark clocked it at over a million tuples processed per
second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and
operate. Storm integrates with the queuing and database technologies you already use. A Storm topology
consumes streams of data and processes those streams in arbitrarily complex ways, repartitioning the streams
between each stage of the computation however needed. We have deployed an instance of Apache Storm in
IBM SoftLayer cloud and chosen a small use case in order to understand and enunciate how cloud-based Storm
functions and delivers its originally envisaged goals.
High-Performance Big Data Analytics in SoftLayer Cloud
Everyone agrees that the high-performance characteristic is being insisted everywhere these days. There are
valid concerns expressed in different quarters that cloud environments do not guarantee high performance.
Therefore hosting high-performing platforms on clouds is being touted as one of the viable mechanisms in
order to ensure high-performance of cloud-hosted services and workloads.
Big data analytics (BDA) is emerging as a data-intensive activity mandating high-end IT infrastructures and
integrated platforms to simplify and streamline the tasks typically associated with any data analytics. There are
several viable options these days ranging from mainframes, clusters, grids, appliances, to super computers to
accelerate and accomplish data analytics efficiently. Hadoop platforms are the most sought-after for enabling
cost-effective analysis of multi-structured data mountains. In short, high-performance computing (HPC) is the
most appropriate computing model in association with to approach the infrastructural challenges thrown by
BDA. In this paper, we have described how the Netezza software solution can be systematically moved to IBM
SoftLayer Cloud, the leading public cloud offering, configured there, and used for accomplishing next-
generation real-time analytics in a low total cost of ownership (TCP) and high return on investment (RoI). In
our PoC-induced asset document, we have given all the right and relevant details of a sample application in
order to accentuate the power of cloud-based Netezza in fulfilling the various requirements of high-
performance data analytics.
7. Streaming Analytics in IBM SoftLayer Cloud
Stream Computing continuously integrates and analyzes data in motion to deliver real-time analytics. It further
enables organizations to detect insights (risks and opportunities) in high velocity data which can only be
detected and acted on at a moment’s notice. High velocity flows of data from real-time sources such as market
data, machines, smartphones, sensors and actuators, clickstreams, and even transactions, remain largely un-
navigated. IBM Cloud Analytics Application Services delivers high performance clusters for running enterprise-
grade big data and analytics workloads on a dedicated bare metal infrastructure and pre-installed with industry-
leading big data software. Real-time analytic processing. Store less, analyze more, and make better decisions
faster. IBM InfoSphere Streams is the Supported Software for this Cloud Analytics service. IBM InfoSphere
Streams is an advanced analytic platform that allows user-developed applications to quickly ingest, analyze and
correlate information as it arrives from thousands of real-time sources. The solution can handle very high data
throughput rates, up to millions of events or messages per second.
Many organizations need to process a large amount of data in real-time for real-time analytics, real-time ETL
or to respond to events instantaneously. Analyzing on the fly of big data streams is emerging as a distinct need
for many industry verticals these days. We have deployed DataTorrent in IBM SoftLayer Cloud and verified
how it delivers on its promises for big data streaming analytics. DataTorrent is an enterprise-grade software
platform that enables businesses to perform any sort of data processing or transformations on structured or
unstructured data, all in real-time as the data is getting streamed into a data center. Leveraging Hadoop 2.0,
DataTorrent is a YARN-native application platform. It can be installed directly onto an existing Hadoop cluster,
connect directly to all in-coming data sources live, and perform any type of processing or transformation of
your data in-memory, as it comes streaming in. DataTorrent will handle all of the scaling and fault tolerance of
the system, leaving enterprises to focus on just their business logic.
DataTorrent supports today’s most demanding, mission-critical, big-data streaming applications. It enables you
to quickly develop applications that ingest massive amounts of data from various sources in real-time, and
perform highly scalable computations in real-time. With DataTorrent, you can leverage your existing Hadoop
environment for real-time stream processing. We employed a sample application in order to erudite the readers
on how cloud-based real-time analytics applications can be implemented in a streamlined manner.
End-to-end Big Data Analytics Platform in IBM SoftLayer Cloud
In general, Hadoop platforms do pre-processing, processing and analytics for knowledge discovery. But an end-
to-end big data analytics platform involves data collection, virtualization, ingestion, analytics and visualization
modules. With just a single click, everything gets accomplished quickly and securely. Datameer is one such
platform
Datameer is an end-to-end big data analytics platform purpose-built for Hadoop that enables the fastest time
from raw data to new insights. The mission is to eliminate the complexity of the tasks associated with big data
analytics and empower everyone to make data-driven decisions in minutes, not in months. There is no need of
a data scientist or multiple, technical tools to model, integrate, cleanse, prepare, analyze and visualize your data.
Datameer is the one-stop-shop for getting all your data into Hadoop, analyzing that data, discovering the
knowledge and visualizing the insights squeezed in a preferred form and format. Datameer can handle all kinds
of data from multiple sources as illustrated in the picture below. Datameer has been successfully installed in
IBM SoftLayer cloud environment and tested with a sample application in order to demonstrate its unique
capability.
8. HBase, a NoSQL Database in IBM SoftLayer Cloud
HBase is a column-oriented database management system that runs on top of Hadoop distributed file system
(HDFS). HBase is a NoSQL database, is well suited for sparse data sets, and does not support a structured
query language like SQL. An HBase system comprises a set of tables and each table must have an element
defined as a Primary Key, and all access attempts to HBase tables must use this Primary Key. An HBase column
represents an attribute of an object and allows for many attributes to be grouped together into what are known
as column families. With HBase, you must predefine the table schema and specify the column families.
However, it’s very flexible in that new columns can be added to families at any time, making the schema flexible
and therefore able to adapt to changing application requirements.
HBase is a part and parcel of every standard Hadoop distribution and was installed in IBM SoftLayer Cloud.
There are certain usage scenarios wherein big data analytics (BDA) is well-accomplished with the help of cloud-
based HBase database. We could indulge in developing a small application to test how HBase is productive in
faraway clouds.
9. There are several other competent and high-end NoSQL databases in the marketplace. Facebook Cassandra,
Google BigTable, etc. are some of the highly popular database management systems getting into cloud
environments in order to tackle the data explosion, data variety, viscosity, and variability.
The Apache Cassandra database is the correct choice when you need scalability and high availability without
compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud
infrastructure make it the perfect platform for mission-critical data. Cassandra's support for replicating across
multiple datacenters is best-in-class, providing lower latency for your users and the peace of mind of knowing
that you can survive regional outages.
Cassandra's data model offers the convenience of column indexes with the performance of log-structured
updates, strong support for denormalization and materialized views, and powerful built-in caching. This is also
deployed in IBM SoftLayer Cloud. Basho Riak is another NoSQL database made available in SoftLayer cloud.
Similarly other renowned databases such as MongoDB are also being taken to cloud to reap its infrastructural
innovations and inventions.
ScaleBase Distributed Database Management System
ScaleBase brings in elasticity, scalability and continuous high availability to MySQL databases and applications
in public, private and hybrid cloud environments. ScaleBase enables instant and transparent MySQL scale out,
leveraging the power of smaller, less expensive servers working together. The policy-based data distribution
(automated sharding), powered by the ScaleBase Analysis Genie and the intelligent load balancing with
replication-aware read/write splitting enable growth of the operational load and throughput, increase of
application performance and protect from varying usage peaks and load spikes.
ScaleBase automated failover and failback ensure business continuity and protection from both unexpected and
expected outages, as well as simplify different ongoing maintenance tasks, such as software and hardware
upgrades, without impacting the application or database availability. The ability to migrate an application from
a hosted environment with a single growing database to a virtualized environment with smaller, more
manageable data nodes gives companies agility, flexibility and competitiveness. ScaleBase was purpose built for
cloud deployment. ScaleBase can be run on private clouds and is available on public clouds. We have done the
initial formalities in order to prepare and migrate the ScaleBase solution to the IBM SoftLayer public cloud,
made the necessary configuration changes, and performed a small sample application in order to run and check
how ScaleBase functions in an online, off-premise and on-demand cloud environment. This forms a major part
of our strategy of empowering public cloud offerings to be high-performing, elastic, and exotic for data and
process-intensive applications
AeroSpike In-Memory NoSQL Database in IBM SoftLayer Cloud
Versatile in-memory computing, NoSQL and NewSQL databases, parallel file systems, etc. are the prominent
IT solutions to be enabled to be hosted and run in elastic clouds elegantly for fulfilling the varying needs of the
big data world. Aerospike is an open-source distributed NoSQL database optimized for in-memory and SSD-
based indexing and data storage. Aerospike is a modern database built from the ground up to push the limits
of flash storage, processors and networks. It was designed to operate with predictable low latency at high
throughput with uncompromising reliability – both high availability and ACID guarantees. It greatly simplifies
developers’ workloads substantially as there is no need to incorporate the logic for sharding and for cluster
changes. The perpetual needs of no worrying about data loss or downtime get realized with this game-changing
database solution. Aerospike is ideal for real-time big data or context driven applications that must sense and
respond right now. Aerospike operates at in-memory speed and global scale with enterprise-grade reliability.
Identical Aerospike servers scale out to form a shared-nothing cluster which transparently partitions data and
10. parallelizes processing across nodes. Nodes in the cluster are identical, you can start with 2 and just add more
hardware. The cluster scales linearly.
We have migrated an instance of Aerospike database to IBM SoftLayer cloud environment and configured to
deliver on its promises. We have worked on a sample application in order to gain a deeper understanding of
the distinct capabilities of Aerospike in sufficiently meeting the goals of new-generation data-intensive
workloads.
NewSQL Databases in IBM SoftLayer Cloud
Essentially, NewSQL combines the best features from both worlds – maintaining the transactional integrity of
traditional database systems while providing high-end scalable performance of NoSQL systems. This
combination of performance and scale is crucial in transaction-intensive environments. NoSQL-based data
systems are riding a seismic wave of success with the promise of scalability. NewSQL databases seek to overtake
NoSQL with the added bonus of high-speed transactional integrity.
VoltDB is a NewSQL database and is successfully deployed in IBM SoftLayer Cloud and is subjected to a
variety of small-scale tests in order to verify whether it is capable of fulfilling of its ordained capabilities. There
are other popular NewSQL databases such as Clustrix, NuoDB, etc. getting a greater market and mind shares
fast. These are conveniently hosted and delivered as a service via cloud environments.
Database as a Service (DBaaS)
Today’s applications are expected to manage a variety of structured and unstructured data, accessed by massive
networks of users, devices, and business locations, or even sensors, vehicles and Internet-enabled goods.
Companies of all sizes, from startups to mega-users like Samsung, Hothead Games, and Fidelity Investments
use Cloudant to manage data for large or fast- growing web and mobile applications in ecommerce, on-line
education, gaming, financial services, and other industries.
Cloudant is best suited for applications that need a database to handle a massively concurrent mix of low-
latency reads and writes. Its data replication & synchronization technology also enables continuous data
availability, as well as off-line application usage for mobile or remote users. In a large organization, it can take
several weeks for a DBMS instance to be provisioned for a new development project, which limits innovation
and agility. DBaaS enables instant provisioning of your data layer, so that you can begin new development
whenever you need.
Unlike Do-It-Yourself (DIY) databases, DBaaS solutions like Cloudant provide—and guarantee—a specific
level of data layer performance and up time. This eliminates risk of service delivery failure for you and your
project. The Cloudant database as a service (DBaaS) is the first data management platform to leverage the
availability, elasticity, and reach of the cloud to create a global data delivery network (DDN) that enables
applications to scale larger and remain available to users wherever they are.
Data Warehouse as a Service (DWaaS)
IBM dashDB is a fully managed data warehousing service in the cloud. IBM dashDB is a powerful, agile data
warehousing solution on the cloud that puts an analytics powerhouse at your fingertips. IBM dashDB allows
you to break free from the bonds of infrastructure when your business demands it. IBM dashDB can help
extend your existing infrastructure into the cloud, or help you start new data warehousing self-service
capabilities. It is powered by high performance in-memory and in-database technology that delivers
answers as fast as you can think. IBM dashDB provides the simplicity of an appliance with the elasticity
11. and agility of the cloud for any size organization. IBM dashDB is designed to meet your expectations
of enterprise security. You can gain instant access to critical business insights without the hefty upfront
infrastructure investment. Simply you can load, analyze, and visualize your data in minutes. Thus the
days of providing data warehouse as a service is straightening and brightening.
IBM Watson Analytics in SoftLayer Cloud
As most of us know that Watson Analytics is a natural language-based cognitive service that can provide instant
access to predictive and visual analytic tools for businesses. It is designed to make advanced and predictive
analytics easy to acquire and use for anyone. Watson Analytics offers self-service analytics, including access to
easy-to-use data refinement and data warehousing services that make it easier for business users to acquire and
prepare data, beyond the simple spreadsheets for analysis and visualization. IBM Watson Analytics automates
steps like data preparation, predictive analysis, and visual storytelling for business professionals across data
intensive disciplines like marketing, sales, operations, finance and human resources. SoftLayer is integrating the
latest IBM power systems into their cloud infrastructure in order to fulfill the infrastructural needs for cost-
effective high-performance computing. IBM Watson system is to run efficiently on IBM power systems and
hence the days of Watson Analytics as a service via the SoftLayer cloud for worldwide users is to see the light
sooner.
Containerized Analytics as a Service in IBM SoftLayer Cloud
The concept of containerization for stuffing and sandboxing mission-critical applications is catching the
attention of developers as well as system administrators. Bundling every kind of software module along with
its binaries, libraries, configuration details and other dependencies together into a single package is one grand
way out for the faster and error-free deployment and delivery of software workloads. This pragmatic idea has
penetrated further up and thereby these days, all kinds of mobile, cloud, social, embedded, middleware,
database, enterprise and IoT applications are methodically being containerized using the sandbox aspect (a
subtle and smart isolation technique) to eliminate the restricting dependencies on underlying operating systems.
Such comprehensive and compact sandboxed and contained applications are being prescribed as a most sought-
after and appropriate solution for achieving portability, extensibility, manoeuvrability, sustainability and security
needs.
With the faster maturity of the Docker technology, there is a new paradigm of “containers as a service (CaaS)”
emerging and evolving. That is, containers are being readied, hosted and delivered as a service over the public
Web. All the necessary procedures to deliver application-aware containers as a service are being meticulously
enacted on containers to make them ready for the forthcoming service era. That is, knowledge-filled, service-
oriented, cloud-based, composable, and cognitive containers are being proclaimed as one of the principal
ingredients for the establishment and sustenance of the smarter planet vision. Precisely speaking, applications
are containerized and exposed as services to be discovered and used by a variety of consumers for a growing
set of use cases. Big and fast data analytics via Hadoop and Apache Storm, Spark, etc. are fast maturing and
stabilizing. VMs are widely being used for enabling Hadoop as a service. Now with the faster adoption of
containerization, the prospects for data analytics via portable, substitutable, composable, and replaceable
containers that are very famous for faster provisioning, live-in migration, etc. In short, containers are destined
for cloud environments.
The integration of Hadoop YARN with Docker will allow multiple clusters to utilize the same hardware
resources. We have made YARN containers through the Dockerization steps and hosted the YARN containers
in IBM SoftLayer Cloud. We have do a sample work in order to understand how containerized big data
workloads and analytical platforms ensures higher efficiency and thereby the new offering of containerized
analytics as a service via the SoftLayer cloud seems imminent.
12. Conclusion
Data has become a strategic asset for any organization these days to precisely plan ahead and proceed with
utmost confidence and clarity. Data-driven enterprises are being pronounced as the one ordained for the
continued success sagaciously overcoming all kinds of unexpected business challenges and changes. That is,
any enterprising endeavor subjecting all of its data gleaned from different and distributed sources systematically
to a series of IT-enabled deeper analytics processes with the help of end-to-end platforms for extracting
actionable insights is bound to attain and retain a greater success in its long and arduous journey. With the
steady increase in the data sources, it becomes clear for organizations to strengthen their capabilities in order
to capture all the data emanating from different and distributed systems, subject them to a series of deeper and
decisive investigations to extract actionable insights in time, and disseminate the extracted and extrapolated to
the concerned to enable them to consider the correct course of actions to steer the organizations in its anointed
journey. In this white paper, we have explained how IBM SoftLayer can take care of everything to squeeze out
actionable insights out of your big and real-time data.
The concept of cloud represents the extremely optimized and organized IT to succulently enable every kind of
IT capabilities and competencies to be provided as a service via the open, public and cheap Internet
infrastructure to the increasingly connected world.
Authors
Pethuru Raj & Skylab Vanga
IBM Global CAMS Center of Excellence
IBM India, Manyata Tech Park, Bangalore
E-mails: pechelli@in.ibm.com,
skylab.vanga@in.ibm.com