SlideShare uma empresa Scribd logo
1 de 17
Big Data at Aadhaar

Dr. Pramod K Varma       Regunath Balasubramaian
pramod.uid@gmail.com        regunathb@gmail.com
Twitter: @pramodkvarma       Twitter: @RegunathB
Aadhaar at a Glance




         2
India
• 1.2 billion residents
   – 640,000 villages, ~60% lives under $2/day
   – ~75% literacy, <3% pays Income Tax, <20% banking
   – ~800 million mobile, ~200-300 mn migrant workers


• Govt. spends about $25-40 bn on direct subsidies
   – Residents have no standard identity document
   – Most programs plagued with ghost and multiple
     identities causing leakage of 30-40%


                             3
Vision
• Create a common “national identity” for every
  “resident”
  – Biometric backed identity to eliminate duplicates
  – “Verifiable online identity” for portability


• Applications ecosystem using open APIs
  – Aadhaar enabled bank account and payment platform
  – Aadhaar enabled electronic, paperless KYC



                             4
Aadhaar System
• Enrolment
  –   One time in a person’s lifetime
  –   Minimal demographics
  –   Multi-modal biometrics (Fingerprints, Iris)
  –   12-digit unique Aadhaar number assigned


• Authentication
  – Verify “you are who you claim to be”
  – Open API based
  – Multi-device, multi-factor, multi-modal
                               5
Architecture Principles
• Design for scale
   – Every component needs to scale to large volumes
   – Millions of transactions and billions of records
   – Accommodate failure and design for recovery
• Open architecture
   – Use of open standards to ensure interoperability
   – Allow the ecosystem to build libraries to standard APIs
   – Use of open-source technologies wherever prudent
• Security
   – End to end security of resident data
   – Use of open source
   – Data privacy handling (API and data anonymization)


                                    6
Designed for Scale
• Horizontal scalability for all components
   –   “Open Scale-out” is the key
   –   Distributed computing on commodity hardware
   –   Distributed data store and data partitioning
   –   Horizontal scaling of “data store” a must!
   –   Use of right data store for right purpose
• No single point of bottleneck for scaling
• Asynchronous processing throughout the system
   – Allows loose coupling various components
   – Allows independent component level scaling

                              7
Enrolment Volume
• 600 to 800 million UIDs in 4 years
   – 1 million a day
   – 200+ trillion matches every day!!!
• ~5MB per resident
   – Maps to about 10-15 PB of raw data (2048-bit PKI encrypted!)
   – About 30 TB I/O every day
   – Replication and backup across DCs of about 5+ TB of incremental
     data every day
   – Lifecycle updates and new enrolments will continue for ever
• Additional process data
   – Several million events on an average moving through async
     channels (some persistent and some transient)
   – Needing complete update and insert guarantees across data stores

                                    8
Authentication Volume
• 100+ million authentications per day (10 hrs)
   – Possible high variance on peak and average
   – Sub second response
   – Guaranteed audits
• Multi-DC architecture
   – All changes needs to be propagated from enrolment data stores to
     all authentication sites
• Authentication request is about 4 K
   –   100 million authentications a day
   –   1 billion audit records in 10 days (30+ billion a year)
   –   4 TB encrypted audit logs in 10 days
   –   Audit write must be guaranteed

                                       9
Open APIs
• Aadhaar Services
  – Core Authentication API and supporting Best
    Finger Detection, OTP Request APIs
  – New services being built on top
• Aadhaar Open Standards for Plug-n-play
  – Biometric Device API
  – Biometric SDK API
  – Biometric Identification System API
  – Transliteration API for Indian Languages
                         10
Implementation




       11
Patterns & Technologies
• Principles
    • POJO based application implementation
    • Light-weight, custom application container
    • Http gateway for APIs

• Compute Patterns
   • Data Locality
   • Distribute compute (within a OS process and across)

• Compute Architectures
   • SEDA – Staged Event Driven Architecture
   • Master-Worker(s) Compute Grid

• Data Access types
   • High throughput streaming : bio-dedupe, analytics
   • High volume, moderate latency : workflow, UID records
   • High volume , low latency : auth, demo-dedupe,
                                 search – eAadhaar, KYC
Aadhaar Data Stores
                                           (Data consistency challenges..)
Shard        Shard           Shard        Shard
  0            2               6            9
                                                                                            Low latency indexed read (Documents per sec),
                                                             Solr cluster                   Low latency random search (Documents per sec)
Shard       Shard          Shard            (all enrolment records/documents
  a           d              f
                                                  – selected demographics only)


    Shard        Shard
      1            2
                               Shard                                                 Low latency indexed read (Documents per sec),
                                 3
                                                   Mongo cluster                     High latency random search (seconds per read)
   Shard        Shard                  (all enrolment records/documents
     4            5                               – demographics + photo)


                                                                                                 Low latency indexed read (milli-seconds
                       Enrolment
   UID master             DB                                                   MySQL             per read),
    (sharded)                           (all UID generated records - demographics only,          High latency random search (seconds per
                                                        track & trace, enrolment status )        read)


                                                               HBase                High read throughput (MB per sec),
 Region      Region         Region      Region             (all enrolment           Low-to-Medium latency read (milli-seconds per read)
 Ser. 1      Ser. 10        Ser. ..     Ser. 20
                                                     biometric templates)

 Data
Node 1
             Data
            Node 10
                             Data
                            Node ..
                                           Data
                                          Node 20
                                                                  HDFS               High read throughput (MB per sec),
                                                           (all raw packets)         High latency read (seconds per read)



 LUN 1       LUN 2         LUN 3       LUN 4                                         Moderate read throughput,
                                                                     NFS             High latency read (seconds per read)
                                                  (all archived raw packets)
Aadhaar Architecture
                       • Real-time monitoring using Events


• Work distribution
  using SEDA &
  Messaging
• Ability to scale
  within JVM and
  across
• Recovery through
  check-pointing


• Sync Http based
  Auth gateway
• Protocol Buffers &
  XML payloads
• Sharded clusters

                                        • Near Real-time data delivery to warehouse
                                        • Nightly data-sets used to build
                                          dashboards, data marts and reports
Deployment Monitoring
Learnings
• Make everything API based
• Everything fails
  (hardware, software, network, storage)
  – System must recover, retry transactions, and sort of self-
    heal
• Security and privacy should not be an afterthought
• Scalability does not come from one product
• Open scale out is the only way you should go.
  – Heterogeneous, multi-vendor, commodity
    compute, growing linear fashion. Nothing else can
    adapt!
                              16
Thank You!
Dr. Pramod K Varma            Regunath Balasubramaian
pramod.uid@gmail.com             regunathb@gmail.com
Twitter: @pramodkvarma            Twitter: @RegunathB




                         17

Mais conteúdo relacionado

Mais procurados

Introduction oauth 2.0 et openid connect 1.0
Introduction oauth 2.0 et openid connect 1.0Introduction oauth 2.0 et openid connect 1.0
Introduction oauth 2.0 et openid connect 1.0Marc-André Tousignant
 
Presentation citrix desktop virtualization
Presentation   citrix desktop virtualizationPresentation   citrix desktop virtualization
Presentation citrix desktop virtualizationxKinAnx
 
The IBM Netezza Data Warehouse Appliance
The IBM Netezza Data Warehouse ApplianceThe IBM Netezza Data Warehouse Appliance
The IBM Netezza Data Warehouse ApplianceIBM Sverige
 
MOM - Message Oriented Middleware
MOM - Message Oriented MiddlewareMOM - Message Oriented Middleware
MOM - Message Oriented MiddlewarePeter R. Egli
 
Pragmatic RESTful API Design: Apigee Webinar
Pragmatic RESTful API Design: Apigee WebinarPragmatic RESTful API Design: Apigee Webinar
Pragmatic RESTful API Design: Apigee WebinarApigee | Google Cloud
 
Workshop: Advanced Federation Use-Cases with PingFederate
Workshop: Advanced Federation Use-Cases with PingFederateWorkshop: Advanced Federation Use-Cases with PingFederate
Workshop: Advanced Federation Use-Cases with PingFederateCraig Wu
 
Real Application Security (RAS) and Oracle Application Express (APEX)
Real Application Security (RAS) and Oracle Application Express (APEX)Real Application Security (RAS) and Oracle Application Express (APEX)
Real Application Security (RAS) and Oracle Application Express (APEX)Dimitri Gielis
 
Exploring the replication and sharding in MongoDB
Exploring the replication and sharding in MongoDBExploring the replication and sharding in MongoDB
Exploring the replication and sharding in MongoDBIgor Donchovski
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security ArchitectureOwen O'Malley
 
Cloudciti Disaster Recovery as a Service
Cloudciti Disaster Recovery as a Service   Cloudciti Disaster Recovery as a Service
Cloudciti Disaster Recovery as a Service PT Datacomm Diangraha
 
IBM INTEGRATION BUS (IIB V10)—DATA ROUTING AND TRANSFORMATION
IBM INTEGRATION BUS (IIB V10)—DATA ROUTING AND TRANSFORMATIONIBM INTEGRATION BUS (IIB V10)—DATA ROUTING AND TRANSFORMATION
IBM INTEGRATION BUS (IIB V10)—DATA ROUTING AND TRANSFORMATIONKellton Tech Solutions Ltd
 
Security: Odoo Code Hardening
Security: Odoo Code HardeningSecurity: Odoo Code Hardening
Security: Odoo Code HardeningOdoo
 

Mais procurados (20)

Datapower Steven Cawn
Datapower Steven CawnDatapower Steven Cawn
Datapower Steven Cawn
 
Introduction oauth 2.0 et openid connect 1.0
Introduction oauth 2.0 et openid connect 1.0Introduction oauth 2.0 et openid connect 1.0
Introduction oauth 2.0 et openid connect 1.0
 
Presentation citrix desktop virtualization
Presentation   citrix desktop virtualizationPresentation   citrix desktop virtualization
Presentation citrix desktop virtualization
 
The IBM Netezza Data Warehouse Appliance
The IBM Netezza Data Warehouse ApplianceThe IBM Netezza Data Warehouse Appliance
The IBM Netezza Data Warehouse Appliance
 
Redis introduction
Redis introductionRedis introduction
Redis introduction
 
MOM - Message Oriented Middleware
MOM - Message Oriented MiddlewareMOM - Message Oriented Middleware
MOM - Message Oriented Middleware
 
Pragmatic RESTful API Design: Apigee Webinar
Pragmatic RESTful API Design: Apigee WebinarPragmatic RESTful API Design: Apigee Webinar
Pragmatic RESTful API Design: Apigee Webinar
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
 
Workshop: Advanced Federation Use-Cases with PingFederate
Workshop: Advanced Federation Use-Cases with PingFederateWorkshop: Advanced Federation Use-Cases with PingFederate
Workshop: Advanced Federation Use-Cases with PingFederate
 
Real Application Security (RAS) and Oracle Application Express (APEX)
Real Application Security (RAS) and Oracle Application Express (APEX)Real Application Security (RAS) and Oracle Application Express (APEX)
Real Application Security (RAS) and Oracle Application Express (APEX)
 
Exploring the replication and sharding in MongoDB
Exploring the replication and sharding in MongoDBExploring the replication and sharding in MongoDB
Exploring the replication and sharding in MongoDB
 
Rbac
RbacRbac
Rbac
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
HBase Storage Internals
HBase Storage InternalsHBase Storage Internals
HBase Storage Internals
 
Mapreduce Tutorial
Mapreduce TutorialMapreduce Tutorial
Mapreduce Tutorial
 
Cloudciti Disaster Recovery as a Service
Cloudciti Disaster Recovery as a Service   Cloudciti Disaster Recovery as a Service
Cloudciti Disaster Recovery as a Service
 
IBM INTEGRATION BUS (IIB V10)—DATA ROUTING AND TRANSFORMATION
IBM INTEGRATION BUS (IIB V10)—DATA ROUTING AND TRANSFORMATIONIBM INTEGRATION BUS (IIB V10)—DATA ROUTING AND TRANSFORMATION
IBM INTEGRATION BUS (IIB V10)—DATA ROUTING AND TRANSFORMATION
 
Security: Odoo Code Hardening
Security: Odoo Code HardeningSecurity: Odoo Code Hardening
Security: Odoo Code Hardening
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 

Destaque

practical risks in aadhaar project and measures to overcome them
practical risks in aadhaar project and measures to overcome thempractical risks in aadhaar project and measures to overcome them
practical risks in aadhaar project and measures to overcome themsaipriyadonthula
 
Hadoop at aadhaar
Hadoop at aadhaarHadoop at aadhaar
Hadoop at aadhaarRegunath B
 
Aesop change data propagation
Aesop change data propagationAesop change data propagation
Aesop change data propagationRegunath B
 
Building tiered data stores using aesop to bridge sql and no sql systems
Building tiered data stores using aesop to bridge sql and no sql systemsBuilding tiered data stores using aesop to bridge sql and no sql systems
Building tiered data stores using aesop to bridge sql and no sql systemsRegunath B
 
Building the Flipkart phantom
Building the Flipkart phantomBuilding the Flipkart phantom
Building the Flipkart phantomRegunath B
 
Facebook style notifications using hbase and event streams
Facebook style notifications using hbase and event streamsFacebook style notifications using hbase and event streams
Facebook style notifications using hbase and event streamsRegunath B
 
Unique identification authority of india uid
Unique identification authority of india   uidUnique identification authority of india   uid
Unique identification authority of india uidAjit Dadresa
 
E commerce data migration in moving systems across data centres
E commerce data migration in moving systems across data centres E commerce data migration in moving systems across data centres
E commerce data migration in moving systems across data centres Regunath B
 
Oss as a competitive advantage
Oss as a competitive advantageOss as a competitive advantage
Oss as a competitive advantageRegunath B
 
Authentication(pswrd,token,certificate,biometric)
Authentication(pswrd,token,certificate,biometric)Authentication(pswrd,token,certificate,biometric)
Authentication(pswrd,token,certificate,biometric)Ali Raw
 

Destaque (14)

Srikanth Nadhamuni
Srikanth NadhamuniSrikanth Nadhamuni
Srikanth Nadhamuni
 
practical risks in aadhaar project and measures to overcome them
practical risks in aadhaar project and measures to overcome thempractical risks in aadhaar project and measures to overcome them
practical risks in aadhaar project and measures to overcome them
 
Hadoop at aadhaar
Hadoop at aadhaarHadoop at aadhaar
Hadoop at aadhaar
 
Aesop change data propagation
Aesop change data propagationAesop change data propagation
Aesop change data propagation
 
Building tiered data stores using aesop to bridge sql and no sql systems
Building tiered data stores using aesop to bridge sql and no sql systemsBuilding tiered data stores using aesop to bridge sql and no sql systems
Building tiered data stores using aesop to bridge sql and no sql systems
 
Building the Flipkart phantom
Building the Flipkart phantomBuilding the Flipkart phantom
Building the Flipkart phantom
 
Uid
UidUid
Uid
 
Facebook style notifications using hbase and event streams
Facebook style notifications using hbase and event streamsFacebook style notifications using hbase and event streams
Facebook style notifications using hbase and event streams
 
What database
What databaseWhat database
What database
 
Unique identification authority of india uid
Unique identification authority of india   uidUnique identification authority of india   uid
Unique identification authority of india uid
 
E commerce data migration in moving systems across data centres
E commerce data migration in moving systems across data centres E commerce data migration in moving systems across data centres
E commerce data migration in moving systems across data centres
 
Aadhaar
AadhaarAadhaar
Aadhaar
 
Oss as a competitive advantage
Oss as a competitive advantageOss as a competitive advantage
Oss as a competitive advantage
 
Authentication(pswrd,token,certificate,biometric)
Authentication(pswrd,token,certificate,biometric)Authentication(pswrd,token,certificate,biometric)
Authentication(pswrd,token,certificate,biometric)
 

Semelhante a Aadhaar at 5th_elephant_v3

Real time monitoring-alerting: storing 2Tb of logs a day in Elasticsearch
Real time monitoring-alerting: storing 2Tb of logs a day in ElasticsearchReal time monitoring-alerting: storing 2Tb of logs a day in Elasticsearch
Real time monitoring-alerting: storing 2Tb of logs a day in ElasticsearchAli Kheyrollahi
 
DRBD Deep Dive - Philipp Reisner - LINBIT
DRBD Deep Dive - Philipp Reisner - LINBITDRBD Deep Dive - Philipp Reisner - LINBIT
DRBD Deep Dive - Philipp Reisner - LINBITShapeBlue
 
Supercharging Data Performance for Real-Time Data Analysis
Supercharging Data Performance for Real-Time Data Analysis Supercharging Data Performance for Real-Time Data Analysis
Supercharging Data Performance for Real-Time Data Analysis Ryft
 
Key-value databases in practice Redis @ DotNetToscana
Key-value databases in practice Redis @ DotNetToscanaKey-value databases in practice Redis @ DotNetToscana
Key-value databases in practice Redis @ DotNetToscanaMatteo Baglini
 
ModeShape 3 overview
ModeShape 3 overviewModeShape 3 overview
ModeShape 3 overviewRandall Hauch
 
SSD Performance Benchmarking
SSD Performance BenchmarkingSSD Performance Benchmarking
SSD Performance BenchmarkingShirish Jamthe
 
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...Chris Fregly
 
Redis Streams - Fiverr Tech5 meetup
Redis Streams - Fiverr Tech5 meetupRedis Streams - Fiverr Tech5 meetup
Redis Streams - Fiverr Tech5 meetupItamar Haber
 
Deploying Grid Services Using Hadoop
Deploying Grid Services Using HadoopDeploying Grid Services Using Hadoop
Deploying Grid Services Using HadoopGeorge Ang
 
Future Architectures for genomics
Future Architectures for genomicsFuture Architectures for genomics
Future Architectures for genomicsGuy Coates
 
Millions quotes per second in pure java
Millions quotes per second in pure javaMillions quotes per second in pure java
Millions quotes per second in pure javaRoman Elizarov
 
Introduction to near real time computing
Introduction to near real time computingIntroduction to near real time computing
Introduction to near real time computingTao Li
 
Telco analytics at scale
Telco analytics at scaleTelco analytics at scale
Telco analytics at scaledatamantra
 
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016Adrianos Dadis
 
Common MongoDB Use Cases
Common MongoDB Use CasesCommon MongoDB Use Cases
Common MongoDB Use CasesDATAVERSITY
 

Semelhante a Aadhaar at 5th_elephant_v3 (20)

Real time monitoring-alerting: storing 2Tb of logs a day in Elasticsearch
Real time monitoring-alerting: storing 2Tb of logs a day in ElasticsearchReal time monitoring-alerting: storing 2Tb of logs a day in Elasticsearch
Real time monitoring-alerting: storing 2Tb of logs a day in Elasticsearch
 
DRBD Deep Dive - Philipp Reisner - LINBIT
DRBD Deep Dive - Philipp Reisner - LINBITDRBD Deep Dive - Philipp Reisner - LINBIT
DRBD Deep Dive - Philipp Reisner - LINBIT
 
Supercharging Data Performance for Real-Time Data Analysis
Supercharging Data Performance for Real-Time Data Analysis Supercharging Data Performance for Real-Time Data Analysis
Supercharging Data Performance for Real-Time Data Analysis
 
Key-value databases in practice Redis @ DotNetToscana
Key-value databases in practice Redis @ DotNetToscanaKey-value databases in practice Redis @ DotNetToscana
Key-value databases in practice Redis @ DotNetToscana
 
ModeShape 3 overview
ModeShape 3 overviewModeShape 3 overview
ModeShape 3 overview
 
SSD Performance Benchmarking
SSD Performance BenchmarkingSSD Performance Benchmarking
SSD Performance Benchmarking
 
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
 
Redis Streams - Fiverr Tech5 meetup
Redis Streams - Fiverr Tech5 meetupRedis Streams - Fiverr Tech5 meetup
Redis Streams - Fiverr Tech5 meetup
 
Data engineering
Data engineeringData engineering
Data engineering
 
Traitement d'événements
Traitement d'événementsTraitement d'événements
Traitement d'événements
 
Deploying Grid Services Using Hadoop
Deploying Grid Services Using HadoopDeploying Grid Services Using Hadoop
Deploying Grid Services Using Hadoop
 
Openstack swift - VietOpenStack 6thmeeetup
Openstack swift - VietOpenStack 6thmeeetupOpenstack swift - VietOpenStack 6thmeeetup
Openstack swift - VietOpenStack 6thmeeetup
 
Kafka & Hadoop in Rakuten
Kafka & Hadoop in RakutenKafka & Hadoop in Rakuten
Kafka & Hadoop in Rakuten
 
Future Architectures for genomics
Future Architectures for genomicsFuture Architectures for genomics
Future Architectures for genomics
 
Millions quotes per second in pure java
Millions quotes per second in pure javaMillions quotes per second in pure java
Millions quotes per second in pure java
 
Introduction to near real time computing
Introduction to near real time computingIntroduction to near real time computing
Introduction to near real time computing
 
Telco analytics at scale
Telco analytics at scaleTelco analytics at scale
Telco analytics at scale
 
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
 
You suck at Memory Analysis
You suck at Memory AnalysisYou suck at Memory Analysis
You suck at Memory Analysis
 
Common MongoDB Use Cases
Common MongoDB Use CasesCommon MongoDB Use Cases
Common MongoDB Use Cases
 

Último

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Último (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Aadhaar at 5th_elephant_v3

  • 1. Big Data at Aadhaar Dr. Pramod K Varma Regunath Balasubramaian pramod.uid@gmail.com regunathb@gmail.com Twitter: @pramodkvarma Twitter: @RegunathB
  • 2. Aadhaar at a Glance 2
  • 3. India • 1.2 billion residents – 640,000 villages, ~60% lives under $2/day – ~75% literacy, <3% pays Income Tax, <20% banking – ~800 million mobile, ~200-300 mn migrant workers • Govt. spends about $25-40 bn on direct subsidies – Residents have no standard identity document – Most programs plagued with ghost and multiple identities causing leakage of 30-40% 3
  • 4. Vision • Create a common “national identity” for every “resident” – Biometric backed identity to eliminate duplicates – “Verifiable online identity” for portability • Applications ecosystem using open APIs – Aadhaar enabled bank account and payment platform – Aadhaar enabled electronic, paperless KYC 4
  • 5. Aadhaar System • Enrolment – One time in a person’s lifetime – Minimal demographics – Multi-modal biometrics (Fingerprints, Iris) – 12-digit unique Aadhaar number assigned • Authentication – Verify “you are who you claim to be” – Open API based – Multi-device, multi-factor, multi-modal 5
  • 6. Architecture Principles • Design for scale – Every component needs to scale to large volumes – Millions of transactions and billions of records – Accommodate failure and design for recovery • Open architecture – Use of open standards to ensure interoperability – Allow the ecosystem to build libraries to standard APIs – Use of open-source technologies wherever prudent • Security – End to end security of resident data – Use of open source – Data privacy handling (API and data anonymization) 6
  • 7. Designed for Scale • Horizontal scalability for all components – “Open Scale-out” is the key – Distributed computing on commodity hardware – Distributed data store and data partitioning – Horizontal scaling of “data store” a must! – Use of right data store for right purpose • No single point of bottleneck for scaling • Asynchronous processing throughout the system – Allows loose coupling various components – Allows independent component level scaling 7
  • 8. Enrolment Volume • 600 to 800 million UIDs in 4 years – 1 million a day – 200+ trillion matches every day!!! • ~5MB per resident – Maps to about 10-15 PB of raw data (2048-bit PKI encrypted!) – About 30 TB I/O every day – Replication and backup across DCs of about 5+ TB of incremental data every day – Lifecycle updates and new enrolments will continue for ever • Additional process data – Several million events on an average moving through async channels (some persistent and some transient) – Needing complete update and insert guarantees across data stores 8
  • 9. Authentication Volume • 100+ million authentications per day (10 hrs) – Possible high variance on peak and average – Sub second response – Guaranteed audits • Multi-DC architecture – All changes needs to be propagated from enrolment data stores to all authentication sites • Authentication request is about 4 K – 100 million authentications a day – 1 billion audit records in 10 days (30+ billion a year) – 4 TB encrypted audit logs in 10 days – Audit write must be guaranteed 9
  • 10. Open APIs • Aadhaar Services – Core Authentication API and supporting Best Finger Detection, OTP Request APIs – New services being built on top • Aadhaar Open Standards for Plug-n-play – Biometric Device API – Biometric SDK API – Biometric Identification System API – Transliteration API for Indian Languages 10
  • 12. Patterns & Technologies • Principles • POJO based application implementation • Light-weight, custom application container • Http gateway for APIs • Compute Patterns • Data Locality • Distribute compute (within a OS process and across) • Compute Architectures • SEDA – Staged Event Driven Architecture • Master-Worker(s) Compute Grid • Data Access types • High throughput streaming : bio-dedupe, analytics • High volume, moderate latency : workflow, UID records • High volume , low latency : auth, demo-dedupe, search – eAadhaar, KYC
  • 13. Aadhaar Data Stores (Data consistency challenges..) Shard Shard Shard Shard 0 2 6 9 Low latency indexed read (Documents per sec), Solr cluster Low latency random search (Documents per sec) Shard Shard Shard (all enrolment records/documents a d f – selected demographics only) Shard Shard 1 2 Shard Low latency indexed read (Documents per sec), 3 Mongo cluster High latency random search (seconds per read) Shard Shard (all enrolment records/documents 4 5 – demographics + photo) Low latency indexed read (milli-seconds Enrolment UID master DB MySQL per read), (sharded) (all UID generated records - demographics only, High latency random search (seconds per track & trace, enrolment status ) read) HBase High read throughput (MB per sec), Region Region Region Region (all enrolment Low-to-Medium latency read (milli-seconds per read) Ser. 1 Ser. 10 Ser. .. Ser. 20 biometric templates) Data Node 1 Data Node 10 Data Node .. Data Node 20 HDFS High read throughput (MB per sec), (all raw packets) High latency read (seconds per read) LUN 1 LUN 2 LUN 3 LUN 4 Moderate read throughput, NFS High latency read (seconds per read) (all archived raw packets)
  • 14. Aadhaar Architecture • Real-time monitoring using Events • Work distribution using SEDA & Messaging • Ability to scale within JVM and across • Recovery through check-pointing • Sync Http based Auth gateway • Protocol Buffers & XML payloads • Sharded clusters • Near Real-time data delivery to warehouse • Nightly data-sets used to build dashboards, data marts and reports
  • 16. Learnings • Make everything API based • Everything fails (hardware, software, network, storage) – System must recover, retry transactions, and sort of self- heal • Security and privacy should not be an afterthought • Scalability does not come from one product • Open scale out is the only way you should go. – Heterogeneous, multi-vendor, commodity compute, growing linear fashion. Nothing else can adapt! 16
  • 17. Thank You! Dr. Pramod K Varma Regunath Balasubramaian pramod.uid@gmail.com regunathb@gmail.com Twitter: @pramodkvarma Twitter: @RegunathB 17