SlideShare uma empresa Scribd logo
1 de 14
Is the Elephant in the room?

                                         Regunath B

                               regunathb@gmail.com
                                Twitter : @RegunathB
Quick read 1.8 million words?




The story is about a battle between great kings and sons, with the principal characters being
Arjuna, Pandu, Bhishma, Bharata, Karna, Duryodhana, Yudhishthira etc.
                                                            Source : The Gramener blog for visualizations –
                                                Analysis of the entire text contained in the Mahabharatha
                                                       (http://blog.gramener.com/category/visualisations)
Insights from Social Media




                         Source : ttwick Billionaires page (Bill Gates' Twitter Social Media profile)
                                         (http://ttwick.com/blog/bill-gates-twitter-social-media/)
Insights from Social Media




                                         Source : Impact page of Satyamevjayate
                             (http://www.satyamevjayate.in/impact/impact.php/)
What is Big Data?

●   Big Data challenges and opportunities arise when information in an enterprise
    demonstrates following characteristics:

     –   Volume
          ●   Transaction data from enterprise systems
                   –   For example : Financial transactions, Orders
     –   Variety
          ●   Structured and Unstructured data
                   –   For example : Customer contact, Social Media, Biometrics
     –   Velocity
          ●   High information arrival rates
                   –   For example : Application events, Tagging, Rating of content



●   Big Data opportunities arise when the enterprise is able to derive Value from the
    data characteristics defined above
Food for thought.... on theorems and laws
●   Do hardware and technology trends affect your technology selection?
     –   CPU, RAM and disk size double every 18-24 months [Moore’s law]
     –   Disk seek time remains nearly constant at around 5% speed-up per year


●   Data Seek vs. Data transfer
     –   Software that leverage one of the above (or) a combination
         B+ tree index, LSM tree index, “Fractal tree”


●   CAP theorem effect – ability to achieve only 2 of 3 properties of shared-
    data systems : data Consistency, system Availability and tolerance to
    network Partitions


●   Bandwidth is the most scare commodity in a Data Center
Aadhaar Patterns & Technologies
•
    Principles
      •
         POJO based application implementation
      •
         Light-weight, custom application container
      •
         Http gateway for APIs

•
    Compute Patterns
     •
       Data Locality
     •
       Distribute compute (within a OS process and across)

•
    Compute Architectures
     •
       SEDA – Staged Event Driven Architecture
     •
       Master-Worker(s) Compute Grid

•
    Data Access types
     •
        High throughput streaming : bio-dedupe, analytics
     •
        High volume, moderate latency : workflow, UID records
     •
        High volume , low latency : auth, demo-dedupe,
                         search – eAadhaar, KYC
Aadhaar Architecture
                              •
                                  Real-time monitoring using Events


•
    Work distribution
    using SEDA &
    Messaging
•
    Ability to scale within
    JVM and across
•
    Recovery through
    check-pointing




•
    Sync Http based Auth
    gateway
•
    Protocol Buffers &
    XML payloads
•
    Sharded clusters

                                                   •
                                                       Near Real-time data delivery to warehouse
                                                   •
                                                       Nightly data-sets used to build dashboards, data
                                                       marts and reports
Putting data to work at Aadhaar
Deployment Monitoring
Big Data at Flipkart
 ●   Website traffic
      –   Millions of page hits per day – product catalogs, item availability, promotions,
          search
      –   Millions of active sessions and shopping carts
      –   Latencies measured in low digit milliseconds
 ●   Growing list of categories (Books, Mobiles, Toys, Personal,Home,Baby, Digital music...)
      –   Electronic inventory – MP3, eBooks, movies
 ●   New business models, newer channels
 ●   Understanding users, user profiles, social media, experience
      –   Tera bytes of logs containing browsing behavior, data from multiple
          engagement channels
      –   Recommendations based on millions of possible item matches and relevance
          algorithms
Is the Elephant in the room?




From Wikipedia:

"Elephant in the room" is an English metaphorical idiom for an obvious truth that is being ignored
or goes unaddressed.




Big Data opportunities and challenges are real and present -
It is the Elephant in the room.
Some takeaways from experience


●   Make everything API based
●   Everything fails (hardware, software, network, storage)
     –   System must recover, retry transactions, and sort of self-heal
●   Security and privacy should not be an afterthought
●   Scalability does not come from one product
     –   Watch out for solution and technology stereotyping
●   Open scale out is the only way to go
     –   Heterogeneous, multi-vendor, commodity compute, growing linear fashion.
         Nothing else can adapt!

Mais conteúdo relacionado

Destaque

practical risks in aadhaar project and measures to overcome them
practical risks in aadhaar project and measures to overcome thempractical risks in aadhaar project and measures to overcome them
practical risks in aadhaar project and measures to overcome themsaipriyadonthula
 
Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Regunath B
 
E commerce data migration in moving systems across data centres
E commerce data migration in moving systems across data centres E commerce data migration in moving systems across data centres
E commerce data migration in moving systems across data centres Regunath B
 
Oss as a competitive advantage
Oss as a competitive advantageOss as a competitive advantage
Oss as a competitive advantageRegunath B
 
Authentication(pswrd,token,certificate,biometric)
Authentication(pswrd,token,certificate,biometric)Authentication(pswrd,token,certificate,biometric)
Authentication(pswrd,token,certificate,biometric)Ali Raw
 

Destaque (7)

practical risks in aadhaar project and measures to overcome them
practical risks in aadhaar project and measures to overcome thempractical risks in aadhaar project and measures to overcome them
practical risks in aadhaar project and measures to overcome them
 
Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3
 
E commerce data migration in moving systems across data centres
E commerce data migration in moving systems across data centres E commerce data migration in moving systems across data centres
E commerce data migration in moving systems across data centres
 
What database
What databaseWhat database
What database
 
Aadhaar
AadhaarAadhaar
Aadhaar
 
Oss as a competitive advantage
Oss as a competitive advantageOss as a competitive advantage
Oss as a competitive advantage
 
Authentication(pswrd,token,certificate,biometric)
Authentication(pswrd,token,certificate,biometric)Authentication(pswrd,token,certificate,biometric)
Authentication(pswrd,token,certificate,biometric)
 

Último

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 

Último (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 

Is the elephant in the room

  • 1. Is the Elephant in the room? Regunath B regunathb@gmail.com Twitter : @RegunathB
  • 2. Quick read 1.8 million words? The story is about a battle between great kings and sons, with the principal characters being Arjuna, Pandu, Bhishma, Bharata, Karna, Duryodhana, Yudhishthira etc. Source : The Gramener blog for visualizations – Analysis of the entire text contained in the Mahabharatha (http://blog.gramener.com/category/visualisations)
  • 3. Insights from Social Media Source : ttwick Billionaires page (Bill Gates' Twitter Social Media profile) (http://ttwick.com/blog/bill-gates-twitter-social-media/)
  • 4. Insights from Social Media Source : Impact page of Satyamevjayate (http://www.satyamevjayate.in/impact/impact.php/)
  • 5. What is Big Data? ● Big Data challenges and opportunities arise when information in an enterprise demonstrates following characteristics: – Volume ● Transaction data from enterprise systems – For example : Financial transactions, Orders – Variety ● Structured and Unstructured data – For example : Customer contact, Social Media, Biometrics – Velocity ● High information arrival rates – For example : Application events, Tagging, Rating of content ● Big Data opportunities arise when the enterprise is able to derive Value from the data characteristics defined above
  • 6. Food for thought.... on theorems and laws ● Do hardware and technology trends affect your technology selection? – CPU, RAM and disk size double every 18-24 months [Moore’s law] – Disk seek time remains nearly constant at around 5% speed-up per year ● Data Seek vs. Data transfer – Software that leverage one of the above (or) a combination B+ tree index, LSM tree index, “Fractal tree” ● CAP theorem effect – ability to achieve only 2 of 3 properties of shared- data systems : data Consistency, system Availability and tolerance to network Partitions ● Bandwidth is the most scare commodity in a Data Center
  • 7. Aadhaar Patterns & Technologies • Principles • POJO based application implementation • Light-weight, custom application container • Http gateway for APIs • Compute Patterns • Data Locality • Distribute compute (within a OS process and across) • Compute Architectures • SEDA – Staged Event Driven Architecture • Master-Worker(s) Compute Grid • Data Access types • High throughput streaming : bio-dedupe, analytics • High volume, moderate latency : workflow, UID records • High volume , low latency : auth, demo-dedupe, search – eAadhaar, KYC
  • 8. Aadhaar Architecture • Real-time monitoring using Events • Work distribution using SEDA & Messaging • Ability to scale within JVM and across • Recovery through check-pointing • Sync Http based Auth gateway • Protocol Buffers & XML payloads • Sharded clusters • Near Real-time data delivery to warehouse • Nightly data-sets used to build dashboards, data marts and reports
  • 9. Putting data to work at Aadhaar
  • 11. Big Data at Flipkart ● Website traffic – Millions of page hits per day – product catalogs, item availability, promotions, search – Millions of active sessions and shopping carts – Latencies measured in low digit milliseconds ● Growing list of categories (Books, Mobiles, Toys, Personal,Home,Baby, Digital music...) – Electronic inventory – MP3, eBooks, movies ● New business models, newer channels ● Understanding users, user profiles, social media, experience – Tera bytes of logs containing browsing behavior, data from multiple engagement channels – Recommendations based on millions of possible item matches and relevance algorithms
  • 12.
  • 13. Is the Elephant in the room? From Wikipedia: "Elephant in the room" is an English metaphorical idiom for an obvious truth that is being ignored or goes unaddressed. Big Data opportunities and challenges are real and present - It is the Elephant in the room.
  • 14. Some takeaways from experience ● Make everything API based ● Everything fails (hardware, software, network, storage) – System must recover, retry transactions, and sort of self-heal ● Security and privacy should not be an afterthought ● Scalability does not come from one product – Watch out for solution and technology stereotyping ● Open scale out is the only way to go – Heterogeneous, multi-vendor, commodity compute, growing linear fashion. Nothing else can adapt!