Mais conteúdo relacionado Semelhante a Identity and Biometrics in the Big Data & Analytics Context (20) Identity and Biometrics in the Big Data & Analytics Context1. © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
1 © 2009 IBM CorporationIBM Confidential
June, 2013
1© 2009 IBM Corporation
Identity and Biometrics in
the Big Data & Analytics
Context
Dr. Charles Li
Analytics Solution Center
Washington, DC
Charles _Li@us.ibm.com
Leveraging Information for Smarter Organizational Outcomes
2. © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
2
Topics
ID Management, Identity & Biometrics
Views on Biometrics Technology and
System
The Concept of the Big Data,
Analytics and Challenges
Identity Establishment from All
Sources
Identity and Biometrics in the Cloud
Identity and Biometrics Analytics in
Near Real Time
Summary
3. © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
ID Management, Identity and Biometrics
Identity
Elements
Players
Entitlement(s)
Actions
Identity
Trust
(Rules)
Status
(Environment)
Reputation
(History)
Identity Management
4. © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Views on biometrics
technology and system
4
What is missing?
5. © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
5
Extract insight from a high volume, variety and velocity of data in a
timely and cost-effective manner
Big Data Concept
Data in many forms –
structured, unstructured, text
and multimedia
Data in Motion – Analysis of
streaming data to enable
decisions within fractions of a
second
Data at Scale - from
terabytes to zettabytes
Variety:
Velocity:
Volume:
6. © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
6
Analytics Concept
Structured
Data &
Unstructured
Content
Descriptive
Analytics
Prescriptive
Analytics
Predictive
Analytics
Made
consumable
and
accessible to
everyone
What if
these
trends
continue?
Forecasting
How can we
achieve the best
outcome and
address variability?
Stochastic
Optimisation
What is
happening
What
exactly is
the
problem?
How many,
how often,
where?
What
actions are
needed?
What could
happen?
Simulation
How can we
achieve the best
outcome?
Optimisation
What will
happen
next if?
Predictive
Modelling
Extracting
insight,
concepts and
relationships
Content
Analytics
Deep insights
to improve
visualization
and
marketing
interactions
Visual
Analytics
7. © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Biometrics Data at Scale – Static & Single Instance
1 Billion Arrivals 2012 world wide
United States – 100-200 million
international arrivals 2012
1 Exabytes traveling data
Unique Identification Authority of India (UIDAI)
plans to enroll 1.2 billion citizens.(UID
Program) ( enroll million /day; half billion by
2014) 3-4 Exabytes Biometrics &
Biographic Data
Prolific Usage of Mobile Phones
6 Billion Mobile Phones
6 Exabytes of behavior data
ID Cards/Border Crossings/Benefits/Multiple
Instances
7,000,000,000x(10 Print 0.5-1MB + Face 200KB +
IRIS KB)
7 Exabytes
EU VIS Biometrics Matching System (BMS) at
70 million individuals and 100K daily enrollment
~100 Terabyte
US DoS has in the range of
100 million faces & Others
~ at least 10-50 Terabytes
DHS IDENT over 150 million
identities;
125,000 transactions daily
~100-300 Terabytes
FBI NGI ~ over100 Million
Fingerprints & More coming plus
Faces/Iris
~100-200 Terabytes
1 GigaBytes = 1000MB
1 TeraBytes = 1000GB
1 PetaBytes = 1000TB
1 ExaByes = 1000PB
1 ZettaBytes = 1000EB
1 YottaBytes = 1000ZB
many instances, history, transaction, logs… data in reality
8. © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
8
Big Data Sources
System Transaction, Log and Transition Data – Several Times More!
9. © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Other Big data examples
150 Exabytes global size of
“Big Data” in Healthcare, growing
between 1.2 and 2.4 EX / year
For every session, NY Stock
Exchange captures 1 Terabyte
of trade information
AT&T transfers about
30 Petabytes of data through
its network daily
Hadron Collider at CERN
generates 40 Terabytes
of usable data / day
Facebook processes
500+ Terabytes of data daily
Google processes
> 24 Petabytes
of data in a single day
Twitter processes
12 Terabytes of data daily
By 2016, annual Internet traffic
will reach 1.3 Zettabytes
We don’t have the most challenging problem!
10. © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
“Brutal Force” De-Duplication
• Cumulative de-duplication / Total number of checks= N(N-1)/2 –
“Combination Problem”
• De-duplicate 100 million population enrollment results
4,999,999,950,000,000 checking!!!
• 15 years to complete with 10 million matches per second
Biometric Accuracy Challenge
• FMR at 1 Identification false match per million;
• 500 False Matches with 1 million enrollment population
• 5 million false matches with 100 million enrollment population
Biometric Performance at Giga Scale*
* Courtesy to Bojan Cukic* Courtesy to Bojan Cukic
Prohibitive!
We have some unique challenges!
Prohibitive!
We have some unique challenges!
11. © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Face the Challenges
Identity Establishment with All Data Sources
- Leverage Entity Resolution Technologies
Biometrics Services in the Cloud
- Leverage Big Data Infrastructure, Platforms and Software Services
Identity and Biometrics Analytics in Motion
11
12. © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Establishment Identity with All Sources
Biometrics(physical and behavioral)
Biographic information
Behavior data (Social media usage)
Travel data (API, PNR)
Banking Information
Web or Desktop usage behavior
• Emails
• Multimedia
Spatial and temporal information
12
Entity /Identity
Resolution
With all
Sources
Entity / Identity Resolution - a
complex process involving the
application of sophisticated
algorithms across multiple
heterogeneous data sources to
resolve multiple records into a
single fused view of an individual
• Reduce search space and• Reduce search space and
computing resources
• Compliment to low quality images
• Cost and benefits tradeoff
• Systematic research necessary
• Successful programs
13. © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Infrastructure
Platform
Management
and Administration
Availability and
Performance
Security and
Compliance
Usage and
Accounting
Enterprise
Application Services
Application
Lifecycle
Application
Resources
Application
Environments
Application
Management
Integration
Cloud Services
Infrastructure and Platform as a Service
Smarter Commerce Smarter Cities
Social BusinessBusiness Analytics
and Optimization
Enterprise+
Cloud Solutions
Software and Business Process as a Service
Infrastructure
aaS
Platform
PaaS
Software
SaaS
Business Process
BPaaS
Deployment
Private, Public and Hybrid Models
Biometrics Services in the Cloud - Leverage Big Data
Infrastructure, Platform and Software Services
Standard Interface
Process
Data
Process
Data
Process
Data
Process
Data
Process
Data
Process
Data
Process
Data
Process
Data
Process
Data
Enrolment Service
1:1 Identification Service
….
Fingerprint Biometric Data
Iris
Face
Note: Cloud & Big Data not the same
14. © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
A Prototype - Leveraging the cloud for Big Data Biometrics
• E. Kohlwey et al. “Leveraging the Cloud for Big Data Biometrics,
2011
• A prototype system for generalized searching of cloud-scale
biometric data as well as an application of this system to the task of
matching collection of synthetic human iris images
• Implemented with Hadoop (Map/Reduce framework)
Successful deployment of Identification algorithms for India
UID program
• Non-traditional matching vendor technologies
Biometrics as a Service
• Business process as a service
• Software as a service
14
Progress
15. © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Focus on Parallelism and Scalability
• Excellent research and testing areas
• Bring algorithms into operational environment
Explore defining biometrics as a service program –
new way of thinking about acquisition
• Business process as a service
• Software as a service
Encourage partnership among Big Data & Analytics
developers, traditional biometrics solution
providers
• Big Data and Analytics players
15
Challenges
16. © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Big Data Appliance Examples
IBM Nettezza
Oracle EXADATA
Terradata
EMC2 Greenplum
SAP HANA
Schooner Appliance MySQL
Example - (CBP) 40TB data (per appliance, a few hundreds
cores) hosted by a little more than a dozen appliances support
30 – 40 % of DHS’s operations
16
17. © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
17
Identity and Biometrics Analytics in Near Real Time
ROC curve calibration along the security vs convenience
• Allow systems to dynamically change operation criteria based on live situation
• This is a real challenge due to the needed ground truth…
Quality Feedback to the Collection
• Avoid collecting ‘bad’ data to degrade the system
Operating Metrics Monitoring
• Rates on enrollment, rejection and etc.
• Geo-location and temporal information
Fuse all data sources based on real time feedback
• Dynamically allocating fusion algorithms and configurations
Provide controlled parallelism
• System and algorithms levels
18. © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Achieve scale:
By partitioning applications into software components
By distributing across stream-connected hardware hosts
Infrastructure provides services for
Scheduling analytics across hardware hosts,
Establishing streaming connectivity
Transform
Filter / Sample
Classify
Correlate
Annotate
Where appropriate:
Elements can be fused together
for lower communication latency
Continuous ingestion
Continuous analysis
One Approach - Streams Technology in Working
© 2013 IBM
Corporation1
Near Real Time on Big Data Platform
19. © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
19
Summary
Re-focus on Identity
• Biometrics as an enabling technology
Re-thinking on
• Open architecture
• Vendor agnostic solution via biometrics middleware
Big Impact by Big Data and Cloud Technologies
• Biometrics as a Service to Leverage Cloud Computing
Big Data Real Time Platform
• Near real time analytics requirements
20. © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
20
Page 20 6/18/2013
21. © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
21
A New Look - Identity and Biometrics Analytics
Stream in
Parallel
Big Data
Platform
Entity /Identity
Resolution
Big Data
Solution
Pipeline
Identification
Services
Including
many
Models
Massively
Parallel
Processing
Real
Time
High
Volume
Travel Data
Banking Data
Spatial Data
Temporal Data
Real-time feeds
Biometrics
Capture Data
Biographic
Data
Unstructured data
Social Media
Info on Web
Behavioral data
Report – Descriptive
Analytics
Predictive Models
Business
Workflow Resolution
Visualization Analytics
Content
Analytics