DevEX - reference for building teams, processes, and platforms
Improving Intelligence Analysis Through Cloud Analytics
1. Improving Intelligence Analysis Through
Cloud Analytics
A Hybrid Cloud Approach to Enabling the Mission
Ray Hensberger
Lead Associate
Systems Development
This document is confidential and is intended solely for the use and information of the client to whom it is addressed.
2. Table of contents
• The Challenge
• Our Solution
• Cloud Analytic Techniques
• Results & Applications
• Helping Booz Allen’s Clients be Ready for What’s Next
2
3. Data Volumes Outstrip Analysis Capabilities
Client Applications, Dashboards,
Web Applications, Rich Clients • Complex data inputs
• Variety of formats
• Large volumes
Service Oriented Architecture • Distressed with noise
Business Logic
• Client Mission Needs
• Data correlation
• Quick access to analytic
results
• Ad-hoc query
• Advanced, scalable
analytics
Disparate Data Disparate Data Disparate Data • Real-time alerting
Sources Sources Sources
3
4. Table of contents
• The Challenge
• Our Solution
• Cloud Analytic Techniques
• Results & Applications
• Helping Booz Allen’s Clients be Ready for What’s Next
4
5. Our Solution: A Hybrid Cloud Approach
• A U.S. Government client called on Booz Allen Hamilton to help
improve mission performance while leveraging existing infrastructure
• The client needed a secure, scalable, and automated solution that
would more quickly and precisely sift through growing mountains of
data, ensuring that an analysts’ pipeline of prioritized, actionable
information would meet current and future needs
• Booz Allen worked closely with the client to adopt a data cloud
implementation by augmenting the legacy relational databases with
cloud computing and analytics
• With many existing systems and applications dependent on the legacy
relational database for transactional queries of data, Booz Allen pulled
together excess servers from the client’s infrastructure to build a hybrid
cloud solution
• The design focused on keeping transactional based queries in the
current relational databases, but do the “heavy lifting” in the cloud,
outputting the interesting, processed, or desired analytic results into
relational data stores for quick transactional access
5
6. Hybrid Cloud Architecture
Client Applications, Dashboards,
Web Applications, Rich Clients
Service Oriented Architecture
Business Logic
Transactional
Queries Cloud Advanced Analytics
Accumulo (NoSQL database)
HDFS MapReduce
Disparate Data Disparate Data Disparate Data
Sources Sources Sources
6
7. Table of contents
• The Challenge
• Our Solution
• Cloud Analytic Techniques
• Results & Applications
• Helping Booz Allen’s Clients be Ready for What’s Next
7
8. Cloud Analytic Techniques
• Rather than focus on gaining IT efficiencies using cloud technology for
infrastructure, Booz Allen focused on applying cloud analytics and in-
depth understanding of client operational and mission needs to extract
more value faster from massive data sets
• The solution called for advanced analytics, specifically predictive
analytics to forecast potential events from existing data and anomaly
detection to extract potentially significant information and patterns
Service Oriented Architecture
Business Logic
Cloud
Content Normalization
Accumulo (NoSQL database) and Indexing
HDFS MapReduce Pre-computation Engine
8
Scalable Ingest and Storage
9. Cloud Analytic Techniques
• Our approach leverages the core principles of Cloud Analytics that enable:
• Automated analysis techniques
• Pre-computation and aggressive indexing
• Answer previous unanswerable questions
• Create deep insight through fusion of different data types at scale
• Anomaly detection
Service Oriented Architecture
Business Logic
Data Correlation
Quick access to
Cloud analytic results
Ad-hoc query
Accumulo (NoSQL database) Advanced Scalable
Analytics
Real-time alerting
HDFS MapReduce
9
10. Table of contents
• The Challenge
• Our Solution
• Cloud Analytic Techniques
• Results & Applications
• Helping Booz Allen’s Clients be Ready for What’s Next
10
11. A Scalable Solution with Significant Results
• The new cloud solution provided immediate and striking improvements
across the client’s increasing volume of structured and unstructured
data using aggressive indexing techniques, on-demand analytics, and
pre-computed results for common analytics
• The final product combined sophistication with scalability to move from
humans stitching together sparse bits of data to distilling real-time,
actionable information from the aggregation of data
• Storing large volumes of data in a data cloud provides the ability to
follow the lineage or pedigree of the data (how good the data really is),
allowing you to map cost versus how valuable is the data or how well is
it being used
System Data Ingest Index Time Query (large) Query (small)
Legacy 50 GB / day Minutes ~ 45 min Seconds
Hybrid 300+ GB / day Seconds < 4 min Milliseconds
11
12. Table of contents
• The Challenge
• Our Solution
• Cloud Analytic Techniques
• Results & Applications
• Helping Booz Allen’s Clients be Ready for What’s Next
12
13. Helping the U.S. Government Be Ready for What’s Next
• The cloud analytics solution is now accessible throughout the client’s
organization and is part of a larger set of advanced analytic solutions
that Booz Allen is providing
• As the client’s needs change to adapt to the mission, the solution is
scalable and flexible to support future innovation and evolution without
reengineering
• The success of this project has led to additional federal organizations
to express interest in adapting similar solutions for their environments
• The demand for cloud analytic solutions will only grow in the future with
the White House Office of Management and Budget seeking to
improve policy and operational decisions, lower costs, and improve
mission effectiveness across the federal government
13
14. Learn More about our Cloud Analytic Capabilities
www.boozallen.com/analytics
Ray Hensberger
Lead Associate / Systems Development
hensberger_raymond@bah.com
This document is confidential and is intended solely for the use and information of the client to whom it is addressed.
Editor's Notes
I’m Ray Hensberger, Lead Associate at Booz Allen Hamilton with our Systems Development group. Before jumping into this webinar, it’s important to recognize that we are living in the greatest age of information discovery our civilization has ever known. In 2012, we will generate more data each *day* than we did since the invention of the computer through 2003 combined. With more than 5 Billion mobile phones in use around the world and data rates growing about 40% each year, our world is increasingly measured, instrumented, monitored, and automated in ways that generate incredible amounts of rich and complex data. The ability to compete and win will be led by powerful analytics which draw insights and value from data, as well as information visualizations that influence decision making and consumer purchasing. Many of the world’s IT systems are not ready for the technology revolution that will be happening soon. Booz Allen is helping clients be ready for what’s next, in this case, by modernizing National Intelligence systems to harness the power and cost savings of Cloud Analytics through the creation of reference architectures and new analytic capabilities to capture, store, correlate, pre-compute, and extract value from large sets of data for superiority in Intelligence Analysis.In this webinar we will explore the technical problems and our approach to enabling cloud analytic capabilities for a specific intelligence system overwhelmed with data.
First I’ll review the challenge that we had to overcome, then our solution to the problem and the cloud analytic techniques we applied. And I’ll wrap up by discussing the results and how addressing the complexities we faced through innovation and new ways of thinking can help other clients.
Cyber data sets are often large, can be unstructured, and have the capability to grow at exponential rates, which makes the management, processing, and analysis of these data sets particularly challengingIn recent years, a U.S. Government client had seen existing data sets and new ones grow larger and more complex. The legacy IT system in place that ingested, stored, and analyzed data became incapable of supporting the real-time Cyber mission of that client.For example, to index 24 hours worth of data, it would take the system upwards of 36 hours. You cannot take a day and a half to ingest a single days worth of data. That immediately puts you at a disadvantage and introduces a whole series of intelligence problems.Also, query responses slowed as data processing increased. Large queries could take upwards of 45 minutes to complete and small, simple queries would take minutesOverall, the legacy system’s computational speed and design parameters made it increasingly difficult to scale new and existing analytics. The lack of speed and actionable insight presented unacceptable opportunity costs because the client’s high-speed operations tempo required near-real-time results, which they simply weren’t getting anymore. They needed a system that could provide data correlation, quick access to analytic results, ad-hoc queries, advanced scalable analytics, and real-time alerting.
So what was our solution to addressing this problem. . .
We took a hybrid cloud approach. The client called on Booz Allen to help improve mission performance, without rebuilding the existing infrastructure that was there.They needed a secure, scalable, and automated solution that would more quickly and precisely sift through the growing mountains of data and provide analysts with the actionable information they needed to execute their mission not only now, but also in the futureWe worked closely with the client to adopt a data cloud implementation that augmented the legacy relational databases with cloud computing and analyticsWith many existing systems and applications dependent on the legacy relational database that was in place for transactional queries of data, we pulled together excess servers from the client’s existing infrastructure to build a hybrid cloud solutionThis design focused on keeping transactional based queries in the current relational databases, but do the “heavy lifting” in the cloud, outputting the interesting, processed, or desired analytic results back into the relational data stores for quick transactional access
Looking at the new architecture, you can see we are doing just that. The cloud can be accessed directly by the service layer, or can it output data to the relational stores that already tie into the service layer.Within the cloud itself we use the Hadoop Distributed File System, or HDFS, for our foundation and leverage the MapReduce programming model for many of the analytics. On top of this sits Accumulo, which is a secure NoSQL database similar to Google’s BigTable. Accumulo uses HDFS to provide a distributed table store with relaxed schema constraints, so we are able to quickly ingest and store new complex data sets with minimal overhead.
Using the hybrid cloud approach gives us lots of flexibility and an environment to be innovative with our analytic techniques.
Rather than focus on gaining IT efficiencies by using cloud technology for infrastructure, we focused on applying cloud analytics and in-depth understanding of client operational and mission needs to extract more value faster from our clients massive data setsWe needed a platform that enabled advanced analytics, specifically predictive analytics to forecast events from existing data and anomaly detection to extract significant information and patterns.The solution provides several key capabilities, the first of which is content normalization and indexing. Accumulo let’s us represent data as key-value pairs and offers a standard representation for data security markings. It’s indexing affords multiple instances of data to be optimized for the questions being asked. We can useMapReduce as a pre-computation engine, where the ongoing execution of MapReducejobs produce frequently used data slicesWith HDFS, we can execute scalable ingest to parallelize data loading of large quantities of data.
We think about and leverage the Core Principles of Cloud Analytics that enable our client’s mission needs. Automated analysis techniques such as data mining and machine learning effectively applied to data facilitate the transformation of data into knowledge, and knowledge into action. Entity classification and predictive models improve with larger data and move the client towards data-driven decisionsWe can also derive meaning from data at scale with pre-computation and aggressive indexing. Pre-computation of complex mathematics against all data and then having the results of those computations ready on-demand provides quicker insightrequired for mission successAdditionally, we gain the ability to now answer previously unanswerable questions.By scaling storage and processing power to orders of magnitude larger than the legacy approach, the solution allows for computing very expensive computations. We use cloud analytics to pre-compute and discover very interesting hidden relationships and patterns in data that analysts could never before see or even ask questions ofWe also create deep insight through fusion of different data types at scale. Allowing analysts to do this type of explorationquickly results in innovative breakthroughs in data fusionFurthermore, we enable anomaly detection. By building what is the “normal”, anomalies that would otherwise be missed, stand out when they can be seen in contrast to a large corpus of background data
So what were the results we saw when we implemented this approach?
Well, we saw immediate and striking improvements using aggressive indexing, on-demand analytics, and pre-computation. Looking at the comparison table, we were able to take on 6 times the amount of data and spend 91% less time executing queries.The final product combined sophistication with scalability to move from humans stitching together sparse bits of data to distilling real-time, actionable information from the aggregation of dataWe’re also able to follow the lineage or pedigree of the data, allowing our client to map cost versus how valuable is the data or how well it’s being used
Pulling all of this together, how are we helping our client’s be ready for what’s next?
The cloud solution we built is now available across the client organization and is part of a larger set of advanced analytic solutions that Booz Allen is providingAs the client’s needs change to adapt to the mission, the solution is scalable and flexible to support the necessary innovation and evolution requiredThe success of this project has led to additional federal organizations expressing interest in adopting similar solutions for their environmentsto support their mission With many factors pushing clients towards cloud computing and analytics, including sheer advancements in technology as well as federal policy changes and budget mandates, the demand for cloud analytic solutions will only grow in the future, and Booz Allen is helping our clients be ready.
With that, I appreciate your time and interest in our cloud analytic capabilities. To learn more, my contact information is below and feel free to check out our website at www.boozallen.com/analytics. Thanks.