3. The DoDIIS App Conundrum
Budget Cuts
•
•
•
•
•
There is not enough money to transition over 400+ apps within DIA (business and mission)
Outsourcing IaaS to C2S and GovCloud needs to be monitored for cost reimbursable over time
Application elasticity is critical to understanding true costs of ownership and maintenance
Data is a much bigger cost than expected
Need to consolidate systems engineering support
Technology migration is not simple
• Most apps are CRUD based; write a report, find a report
• Security business logic is baked into each app
• Number one question: why can’t I choose the technology that best fits my app?
Not a Big Data problem….yet
• On the order of TBs at best
• Highly connected but not big
Security is the ultimate killer of time
• Most time is spent meeting PL3 needs and encrypting traffic
3!
4. Analytics are great but…
THE COMMUNITY DRIVE TO ANALYTICS AND ENRICHMENT
ENGINES HAS LEFT DIA PLAYING CATCH UP IN ITS MIGRATION OF
APPS TO A COMMON PLATFORM.
1.
Make it as easy as possible for any legacy app to transition
2.
Will not dictate technologies
3.
Provide standards for security and access to datastores
4. The platform must deploy across multiple brokers, i.e. EC2,
OpenStack, VMware and be completely transparent to the app
team
4!
5. Where to start?
Starting was hard. But it became very clear that several epics were essential to
migrate applications in an efficient manner:
1. Streaming of data into applications must be done in a standard way. Velocity
and size of data is not as much as a factor to DIA as is the method to which the
data is consumed and distributed. To answer this a stream-based data interface
must be built to support the nexus of data distribution within the environment, we
call this Frack.
2. Everyone likes the concept of migrating to NoSQL but it becomes unmanageable
from a DevOps perspective if everyone picks their own database for their own use
cases. Furthermore, the point is to be multi-tenant. So we created datasets, a
means to expose indexing patterns instead of explicit databases, exposed
through a common security layer.
3. Too much time is spent on baking in non-application specific logic into each
application vice supporting a common service tier. In order to build standards
around common service-based functions we built Services.
5!
6. Integration vs. Engineering
Of the major issues identified early on in
the project the most hindering of issues
was the deployment model.
•
App teams are spending 80% of their
time integrating to new database and
new services vice building application
functionality
•
Applications would each follow their
own System Installation Procedure (SIP)
by which each would deploy their own
software
•
Scale was defined through provisioning
of machines vice true automated
elasticity
•
Start developing within 1 hour and
deploy capability within 30 days
6!
7. EzBake
EzBake provides an integrated way to compose the different
elements of your application: collecting, processing, storing, and
querying data.
•
Focus on application logic
•
Simple API that leverages complex, distributed frameworks
•
Easy to use local development kit
•
Deploy in minutes
•
Framework is accredited, applications inherit accreditation
•
Subscription-based data-feed-model
•
Automated elasticity
•
Design for failure
7!
8. The Components
The core of the platform is pure open-source solutions and is broken
into the following primary components:
•
Streaming Ingest (Frack): This is the interface for building data flow topologies which
abstracts the physical stream processor
•
Common Services (Procedures): Scaled and commonly used thrift services, typically
utilized during streaming ingest
•
Data Persistence (Dataset): These are our indexing patterns, called Datasets, exposed
as Thrift services and abstracts the physical databases
•
Query: Both direct access to Datasets and Aggregate Query across the various
Datasets
•
Security: Both at the data persistence and user access layers
•
Batch Analytics: MapReduce abstractions that allow input from Datasets and output to
Datasets and will leverage the GovCloud DataCloud
•
Deployment: Currently use OpenShift for automated deployment but plan to migrate to
Docker + YARN
8!
9. Technology Agnostic
Each app has their own needs and it is not on the platform builder to
force the team into a particular technology, rather offer a solution to
meet the use case
•
Instead of a jack-of-all-trades
indexing for free text search,
geospatial search, etc use mission
specific indices for specific
application logic needs
•
Focus on storage patterns vice
database specific operations
thereby enforcing data access
standards across the enterprise
•
Allow for new cartridges for web
frameworks including node.js,
python, Ruby, etc.
9!
11. Sharing
Sharing is the key component of EzBake in order to achieve cost
savings and provide agility for the application developer
•
Sharing is exposed via the Common Services and the Aggregate Query
•
The intent of the Common Services is to expose any functionality currently ingrained
within stove-piped applications. By exposing that functionality as a service, other
applications can leverage it, instead of application teams writing the same logic over and
over again, such as entity extraction, date normalization, etc.
•
The Common Services are wrapped in Thrift services, scaled out on the virtual
infrastructure deployed through OpenShift
•
The Aggregate Query is in development for delivery in EzBake v2.0, the current design
will extend Impala to expose the EzBake Datasets as input for the distributed query
engine
•
App teams will expose “intents” within the Datasets for which they can respond, like a
“person”, “place”, or “event” and the Impala engine will query plan and aggregate the
results back to the requestor
11!
12. Security
Built-in from the start, EzBake implements security across all features.
•
Datasets are where the bulk of
the security occurs, applying
row level security to the data
based on the user’s
authorization string
•
Row level security must be
implemented in different ways,
to support multiple types of
datastores, for example, for the
term dataset, which is
ElasticSearch, we included a
filter plugin that applies the
boolean logic check at query
time
•
Embedding security across the
platform allows the application
teams to streamline their
accreditation process
12!
13. Metering and Monitoring
Data driven decisions
•
Javascript API for web
apps, Thrift API for
services and REST for
others
•
Improve application
usability/usefulness by
examining analytics on
usage patterns
•
Diagnose issues with
system, services and apps
•
Determine cost allocation
based on what agencies
and organizations are
using the system
13!
15. What’s Next
EzBake provides an integrated way to compose the different
elements of your application: collecting, processing, storing, and
querying data.
• Distributed query via Impala (Intents are coming)
• Apache Spark integration (dynamic ranking)
• Graph support - Titan
• Change YARN to control Docker
• Upgrade to CDH5
• Extend Apache Sentry
15!