Strategies for Landing an Oracle DBA Job as a Fresher
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
1.
2. Big Data Challenges
in the DoD and IC
Wes Caldwell
Chief Architect
Intelligent Software Solutions
3. Topics
• Introduction to ISS
• The growth of data
• Our customer’s data environment
• The need for effective big-data management
• Search as the cornerstone of a big-data strategy
4. About ISS
• Headquartered in Colorado Springs
• Other offices located in Washington DC, Hampton VA,
Tampa FL, and Rome NY
• Innovative Solutions from “Space to Mud
and Everything Between”
• Sole prime on multiple Air Force Research Labs
programs IDIQ
• Currently Executing More Than 100 Software
Development Projects
• Over 800 employees
• Strength in Solutions Development and
Deployment
• Consistently Recognized as a Leader
• Recognized as a Deloitte Fast 50 Colorado
company and a Deloitte Fast 500 company over
eight consecutive years
• Three-time Inc. Magazine 500 winner
• 2009 Defense Company of the Year
5. ISS Solution Space/Value Proposition
• Reusable and license-free to US
Federal Government (GOTS)
• Committed to providing best ROI
to our customers by integrating
leading open-source solutions into
our products and services
• Scalable from a single desktop
solution to large distributed
networks with thousands of users
• Customizable to each
organization’s unique analytical and
information technology
infrastructure
• Operationally proven, secure and
accredited for all major classified
networks
6. ISS Business Strategy
Government
Off The Shelf
(GOTS)
Commercial
Off The Shelf
(COTS)
Subject
MatterExperts
(SMEs)
• Low Barrier to Entry: No license
fees to US Government Agencies
• Fast: Proven baseline provides
immediate capability
• Turnkey: Highly customizable
solutions can be implemented
quickly with no development
• Solutions Oriented: Subject Matter
Experts support implementation in
each domain
• Low Cost: Cost of Adding Features
is shared across large customer
base; all customers benefit
Blending the best elements of each industry model to provide low risk,
nonproprietary, high payoff solutions—fast! 6
7. The growth of data
• Most electronic information is not relational,
but unstructured (textual, binary) or semi-
structured (spreadsheet, RSS feed, etc.)
– In 2007, the estimated information content of all
human knowledge was 295 exabytes(295 million
terabytes)
– Data production will be 44 times greater in 2020
than in 2009
• Approx 35 zetabytes total (35 billion terabytes)
• A majority of the data produced in the future will
be unstructured
– A tremendous amount of information and
knowledge is dormant within unstructured data
8. Our customer’s data environment
• Literally thousands of data sources/feeds
from a variety of strategic, national, and
tactical sources
– Media (documents, images, etc.)
– Human interactions
– Geospatial
– Open Source (News feeds, RSS)
– Imagery/Video
– Many more…
10. The need for effective “big-data” management
• Analysts are looking to extract knowledge from the massive heterogeneous
data sets, providing “actionable intelligence”
• Tactical environments absolutely demand effective management of data
– Time to live on the relevance of data collected can be very short
– Communications pipes aren’t as optimal as large CONUS-based data
centers, so reduction of data based on tactical conditions (i.e. AOR,
Problem Domain, etc.) is critical
• Search and Analytics are key enablers to allow an analyst to reliably search
through large amounts of information, and to focus their efforts around a
subset of that information to perform deeper analysis
11. Search IS the cornerstone of an effective big-data strategy
Structured Content
Semi-Structured
Content
Un-Structured
Content
Content Cache
(Haystacks)
Content Acquisition
Tenets
• Connector architecture
• Data normalization
• Data staging
• Data Compartmenting
(Multiple Haystacks)
Tenets
• Optimized Index of Content
for Search and Discovery of
Big Data
• Analyst Topics that “Shrink
the Haystack” Search
Features (Facets, Auto-
Complete, Tagging,
Comments, etc.)
• Semantic (Synonym) Search
based on pluggable
taxonomies
Search/Discovery
Content Index
NLP Pipeline
Semantic Enrichment
Categorization
Named
Entity
Recognition
Clustering
Gazetteers
Tenets
• “Domain Spaces” that
support pluggable entity
recognition and
categorization
• Continuous feedback loop
that improves the system
over time with analyst
input
• Lexicon-based analytics
that allows for targeted
categorization across
corpus of data
Tenets
• Data Reduction into
focused “Data
Perspectives”
• Data perspectives stored
in optimized formats
(e.g. Graph, Time Series,
Geo, etc.) for the
questions being asked
• Leveraging industry-
standard parallel
processing frameworks
for scalable analytics
Data Perspectives
Data