SlideShare uma empresa Scribd logo
1 de 10
Baixar para ler offline
Search data store for the world's largest
                            biometric identity system


                    Regunath Balasubramanian         Shashikant Soni
                      regunathb@gmail.com      soni.shashikant@gmail.com
                       twitter @regunathb




CONFIDENTIAL: For limited circulation only                                 Slide 1
India
● 1.2 billion residents
   ● 640,000 villages, ~60% lives under $2/day
   ● ~75% literacy, <3% pays Income Tax, <20% banking
   ● ~800 million mobile, ~200-300 mn migrant workers

● Govt. spends about $25-40B on direct subsidies
   ● Residents have no standard identity document
   ● Most programs plagued with ghost and multiple identities causing
     leakage of 30-40%




                                                                        Slide 2
Aadhaar
● Create a common ‘national identity’ for every ‘resident’
   ●Biometric backed identity to eliminate duplicates
   ●‘Verifiable online identity’ for portability
● Applications ecosystem using open APIs
   ●Aadhaar enabled bank account and payment platform
   ●Aadhaar enabled electronic, paperless KYC (Know Your
     Customer)




                                                             Slide 3
Search Requirements
● Multi-attribute query like:
   name contains ‘regunath’ AND city = ‘bangalore’ AND
   address contains ‘J P Nagar’ AND YearOfBirth = ……


● Search 1.2B resident data with photo, history
   ●35Kb - Average record size
● Response times in milliseconds
● Open scale out


                                                         Slide 4
Why MongoDB
● Auto-sharding
● Replication
● Failover
   … Essentially an AP (slaveOk) data store in CAP parlance

● Evolving schema
● Map-Reduce for analysis
● Full text search
   ●Compound (or) multi-keys


                                                              Slide 5
Design

               { _id:123456789, name: ‘abcde’, year:1980, ….. }
    MongoDB         2

                                             Search API                                  Client App
                                                                  Name=‘abcde’
    Solr            1
                                                                  Address=‘some place’
  Indexes     Name: ‘abcde’                                       Year= 1980
              Address: ‘some place’
              year: 1980



● Read/Search
   ●Sharded Solr indexes for search
   ●Keyed document read from MongoDB
● Write
   ●Eventual consistency (across data sources) driven by
    application
   ●Composite MongodDB-Solr app persistence handler                                                   Slide 6
Implementation and Deployment
   ● Start - 4M records in 2 shards
   Current - 250M records in 8 shards ( 8 x ~2 TB x 3 replicas)
   ● Performance , Reliability & Durability
      ●SlaveOk
      ●getLastError, Write Concern: availability vs durability
          j = journaling
          w = nodes-to-write
   ● Replica-sets / Shards – how?
            RS 1                RS 1              RS 1
            Rs 2                                  RS 2              RS 2

Primary
                     Config 1          Config 2          Config 3
Secondary

Arbiter               Router           Router            Router
                                                                           Slide 7
Monitoring and Troubleshooting
● Monitoring tools evaluated
   ●MMS
   ●munin
● Manual approach - daily ritual
   ●RS, DB, config, router - health and stats
● Problem analysis stats
   ●mongostat, iostat, currentOps, logs
   ●Client connections
● Stats for storage, shards addition
   ●Data file size
   ●Shard data distribution
   ●Replication
                                                Slide 8
Key Learnings on MongoDB
● Indexing 32 fields
   ●Compound indexes
   ●Multi-keys indexes
       {…"indexes" : [{ "email":"john.doe@email.com", "phone":"123456789“ }] }
       db.coll.find ({ "indexes.email" : "john.doe@email.com" })
   ●Indexes use b-tree
   ●Many fields to index
   ●Performs well upto 1-2M documents
   ●Best if index fits in memory
● Data replication, RS failover
   ●Rollback when RS goes out of sync
       Manual restore (physical data copy)
       Restarting a very stale node
                                                                            Slide 9
Questions?



                    Regunath Balasubramanian               Shashikant Soni
                      regunathb@gmail.com            soni.shashikant@gmail.com
                       twitter @regunathb




CONFIDENTIAL: For limited circulation only                                       Slide 10

Mais conteúdo relacionado

Destaque

Fingerprintattendancesystem 131016052949-phpapp01
Fingerprintattendancesystem 131016052949-phpapp01Fingerprintattendancesystem 131016052949-phpapp01
Fingerprintattendancesystem 131016052949-phpapp01Muhammad Tahir Mehmood
 
From Cash to Cashless
From Cash to CashlessFrom Cash to Cashless
From Cash to CashlessMudit Shukla
 
Unified Payments Interface (UPI) - Introduction
Unified Payments Interface (UPI) - Introduction Unified Payments Interface (UPI) - Introduction
Unified Payments Interface (UPI) - Introduction indiastack
 
Dissertation report on customer satisfaction towards rupay card
Dissertation report on customer satisfaction towards rupay cardDissertation report on customer satisfaction towards rupay card
Dissertation report on customer satisfaction towards rupay cardSardar Ji
 
Indian Banking - In a Time For Change - Nandan Nilekani
Indian Banking - In a Time For Change - Nandan NilekaniIndian Banking - In a Time For Change - Nandan Nilekani
Indian Banking - In a Time For Change - Nandan NilekaniProductNation/iSPIRT
 
Go cashless, India
Go cashless, IndiaGo cashless, India
Go cashless, IndiaRanjan Varma
 

Destaque (7)

Fingerprintattendancesystem 131016052949-phpapp01
Fingerprintattendancesystem 131016052949-phpapp01Fingerprintattendancesystem 131016052949-phpapp01
Fingerprintattendancesystem 131016052949-phpapp01
 
From Cash to Cashless
From Cash to CashlessFrom Cash to Cashless
From Cash to Cashless
 
Unified Payments Interface (UPI) - Introduction
Unified Payments Interface (UPI) - Introduction Unified Payments Interface (UPI) - Introduction
Unified Payments Interface (UPI) - Introduction
 
Dissertation report on customer satisfaction towards rupay card
Dissertation report on customer satisfaction towards rupay cardDissertation report on customer satisfaction towards rupay card
Dissertation report on customer satisfaction towards rupay card
 
Indian Banking - In a Time For Change - Nandan Nilekani
Indian Banking - In a Time For Change - Nandan NilekaniIndian Banking - In a Time For Change - Nandan Nilekani
Indian Banking - In a Time For Change - Nandan Nilekani
 
Go cashless, India
Go cashless, IndiaGo cashless, India
Go cashless, India
 
Digital payment merchants
Digital payment merchantsDigital payment merchants
Digital payment merchants
 

Mais de MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
 

Mais de MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Search data store for the world's largest biometric identity system

  • 1. Search data store for the world's largest biometric identity system Regunath Balasubramanian Shashikant Soni regunathb@gmail.com soni.shashikant@gmail.com twitter @regunathb CONFIDENTIAL: For limited circulation only Slide 1
  • 2. India ● 1.2 billion residents ● 640,000 villages, ~60% lives under $2/day ● ~75% literacy, <3% pays Income Tax, <20% banking ● ~800 million mobile, ~200-300 mn migrant workers ● Govt. spends about $25-40B on direct subsidies ● Residents have no standard identity document ● Most programs plagued with ghost and multiple identities causing leakage of 30-40% Slide 2
  • 3. Aadhaar ● Create a common ‘national identity’ for every ‘resident’ ●Biometric backed identity to eliminate duplicates ●‘Verifiable online identity’ for portability ● Applications ecosystem using open APIs ●Aadhaar enabled bank account and payment platform ●Aadhaar enabled electronic, paperless KYC (Know Your Customer) Slide 3
  • 4. Search Requirements ● Multi-attribute query like: name contains ‘regunath’ AND city = ‘bangalore’ AND address contains ‘J P Nagar’ AND YearOfBirth = …… ● Search 1.2B resident data with photo, history ●35Kb - Average record size ● Response times in milliseconds ● Open scale out Slide 4
  • 5. Why MongoDB ● Auto-sharding ● Replication ● Failover … Essentially an AP (slaveOk) data store in CAP parlance ● Evolving schema ● Map-Reduce for analysis ● Full text search ●Compound (or) multi-keys Slide 5
  • 6. Design { _id:123456789, name: ‘abcde’, year:1980, ….. } MongoDB 2 Search API Client App Name=‘abcde’ Solr 1 Address=‘some place’ Indexes Name: ‘abcde’ Year= 1980 Address: ‘some place’ year: 1980 ● Read/Search ●Sharded Solr indexes for search ●Keyed document read from MongoDB ● Write ●Eventual consistency (across data sources) driven by application ●Composite MongodDB-Solr app persistence handler Slide 6
  • 7. Implementation and Deployment ● Start - 4M records in 2 shards Current - 250M records in 8 shards ( 8 x ~2 TB x 3 replicas) ● Performance , Reliability & Durability ●SlaveOk ●getLastError, Write Concern: availability vs durability  j = journaling  w = nodes-to-write ● Replica-sets / Shards – how? RS 1 RS 1 RS 1 Rs 2 RS 2 RS 2 Primary Config 1 Config 2 Config 3 Secondary Arbiter Router Router Router Slide 7
  • 8. Monitoring and Troubleshooting ● Monitoring tools evaluated ●MMS ●munin ● Manual approach - daily ritual ●RS, DB, config, router - health and stats ● Problem analysis stats ●mongostat, iostat, currentOps, logs ●Client connections ● Stats for storage, shards addition ●Data file size ●Shard data distribution ●Replication Slide 8
  • 9. Key Learnings on MongoDB ● Indexing 32 fields ●Compound indexes ●Multi-keys indexes  {…"indexes" : [{ "email":"john.doe@email.com", "phone":"123456789“ }] }  db.coll.find ({ "indexes.email" : "john.doe@email.com" }) ●Indexes use b-tree ●Many fields to index ●Performs well upto 1-2M documents ●Best if index fits in memory ● Data replication, RS failover ●Rollback when RS goes out of sync  Manual restore (physical data copy)  Restarting a very stale node Slide 9
  • 10. Questions? Regunath Balasubramanian Shashikant Soni regunathb@gmail.com soni.shashikant@gmail.com twitter @regunathb CONFIDENTIAL: For limited circulation only Slide 10