SlideShare uma empresa Scribd logo
1 de 20
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Fingerprinting Latent
Structure in Data
MRITYUNJAY KUMAR & GUNTUR RAVINDRA
TECHNOLOGY EXCELLENCE GROUP
TALENTICA SOFTWARE
PRESENTED AT DAIR (DATA ANALYTICS AND INTELLIGENCE RESEARCH ,INDIAN INSTITUTE OF TECHNOLOGY, DELHI)
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Agenda
 Challenge with building data-driven algorithms
 Small-data
 Introduction to data fingerprinting
 Two problem statements
 Solving a Question complexity problem
 Solving an Image recognition problem
 Fingerprinting the structure in data
 Extracting structure
 Representing structure as a signature
 Other complex problems
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
What is data fingerprinting
 A method to represent a block of data as an entity
 Applications: Easy validation, proof of originality, tamper detection, DLP
 Classical techniques
 Bloom filters, cryptographic hashes
 Main issues with fingerprinting
 Do not capture data semantics
 Large number of fingerprints  complexity
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Two Problems
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Recognizing question complexity
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Recognizing question complexity
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Recognizing structural deformation in
cells
Data source: https://www.kaggle.com/c/data-science-bowl-2018/data
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Data-driven algorithms with Small-
Data
 Need for problem-specific data
 Rule-based approaches
 Rule-based approaches are easy to implement
 Not all data characteristics can be captured as rules
 Does not automatically adapt to the data
 Machine learning approach
 ML approaches need large amounts of data
 Generic models and open-source data are not suitable for application-specific
needs
 Can build complex structures and designs
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Architecting a solution
• Knowledge has a latent structure
• Sequence, Geometry
• There can be a hierarchies of structures
• convert structure to a computational representation
• Objective: context of application
capabilities Influences computational
representation
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Problem Formulation
A set of elements : images, questions, Text messages
An objective
A subset of structures relevant to an objective
How do we define and how do we find
Transformation of elements into a structure and hence
a computational entity
A human in the loop
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Structures in Data
How many buses are plying in Mumbai on a route originating at Dadar and ending at Vashi?
How many students are in the class?
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Structures in Data
Intensity Projections
Oriented gradients
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Problem Formulation
For computational ease we make
A function that maps a structure to vector
The inverse of the function results in one of
many structures
a binary bit-vector
Goal is to find so as to satisfy the constraints
This is a constrained optimization formulation
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Solution : Optimization formulation
 Based on the problem formulation
 We have an optimization formulation that has an inverse that results in the
variable itself or a subset of variables
 A related function is a neural auto-encoder
 Solution boils down to
 Training an auto-encoder with one class of data
 Recognizing data class involves
 Data clustering
 Human intelligence/visual inspection to mark clusters
 Data in clusters used to train the auto-encoder
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Recognition : Cell Structure
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Recognition : Question Complexity
How much can the SP alter income tax in Scotland?
What is stage 1 in the life of a bill?
Who is the President of Egypt?
Why do some people purposely resist officers of the law?
Why is the need for acceptance of punishment needed?
Why would one plead guilty to a crime involving civil disobedience?
Why is giving a defiant speech sometimes more harmful for the individual?
Why did Harvard end its early admission program?
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
 The auto-encoder output has distortions
 Detect the distortion
 Quantify the distortion
Solution : Recognition
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Building Complexity
 Incremental addition of data classes
 Using stacking
 Unique binary code injected in each
stacked layer
 Collapse stacked layers into a
classification model  redeploy
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Data Type Test
Cases
True
Positive
False
Positive
True False
Negative
With classes
like in
training data
1781 1774 NA NA 7
With classes
not like in
training data
8789 NA 13 8776 NA
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Summary
 A large number of applications are still small-data applications
 Data has latent structure
 Extraction is objective based and data specific
 We can harness data-hungry algorithms for small-data applications
 Use structures instead of raw data
 Auto-encoders are powerful tools
 Build incremental complexity

Mais conteúdo relacionado

Mais procurados

5 Questions To Ask Before Getting Started With Data Annotation
5 Questions To Ask Before Getting Started With Data Annotation5 Questions To Ask Before Getting Started With Data Annotation
5 Questions To Ask Before Getting Started With Data AnnotationInnodata, Inc
 
Wk 5 case 3 knowledge management and collaboration at tata consulting service...
Wk 5 case 3 knowledge management and collaboration at tata consulting service...Wk 5 case 3 knowledge management and collaboration at tata consulting service...
Wk 5 case 3 knowledge management and collaboration at tata consulting service...dyadelm
 
Less Artificial More Intelligent
Less Artificial More IntelligentLess Artificial More Intelligent
Less Artificial More Intelligentpipemode
 
Artificial Intelligence: What is it? Where do information professionals fit?
Artificial Intelligence: What is it? Where do information professionals fit?Artificial Intelligence: What is it? Where do information professionals fit?
Artificial Intelligence: What is it? Where do information professionals fit?CILIP
 
Pragmatic use of artificial intelligence in smart cities 03262018
Pragmatic use of artificial intelligence in smart cities 03262018Pragmatic use of artificial intelligence in smart cities 03262018
Pragmatic use of artificial intelligence in smart cities 03262018ThomasCook16
 
Understanding the ABC's of AI
Understanding the ABC's of AIUnderstanding the ABC's of AI
Understanding the ABC's of AIDickson Lukose
 
Ross Chayka. Gartner Hype Cycle for Emerging Tech Analysis
Ross Chayka. Gartner Hype Cycle for Emerging Tech AnalysisRoss Chayka. Gartner Hype Cycle for Emerging Tech Analysis
Ross Chayka. Gartner Hype Cycle for Emerging Tech AnalysisRostyslav Chayka
 
Dennis Hills - Introduction to Machine Learning on Mobile.pdf
Dennis Hills -  Introduction to Machine Learning on Mobile.pdfDennis Hills -  Introduction to Machine Learning on Mobile.pdf
Dennis Hills - Introduction to Machine Learning on Mobile.pdfAmazon Web Services
 
Constructing Knowledge Graph for Social Networks in a Deep and Holistic Way
Constructing Knowledge Graph for Social Networks in a Deep and Holistic WayConstructing Knowledge Graph for Social Networks in a Deep and Holistic Way
Constructing Knowledge Graph for Social Networks in a Deep and Holistic WayBaoxu Shi
 
Data Science for Internet of Things with Ajit Jaokar
Data Science for Internet of Things with Ajit JaokarData Science for Internet of Things with Ajit Jaokar
Data Science for Internet of Things with Ajit JaokarJessica Willis
 
Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...
Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...
Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...AI Frontiers
 
Robotics & Artificial (RAI) Intelligence webinar: Law & Regulation for RAI In...
Robotics & Artificial (RAI) Intelligence webinar: Law & Regulation for RAI In...Robotics & Artificial (RAI) Intelligence webinar: Law & Regulation for RAI In...
Robotics & Artificial (RAI) Intelligence webinar: Law & Regulation for RAI In...KTN
 
AI and Managerial Decision Making
AI and Managerial Decision MakingAI and Managerial Decision Making
AI and Managerial Decision MakingLee Schlenker
 
[Mindslab] Success stories and use cases of artificial intelligence of MindsLab.
[Mindslab] Success stories and use cases of artificial intelligence of MindsLab.[Mindslab] Success stories and use cases of artificial intelligence of MindsLab.
[Mindslab] Success stories and use cases of artificial intelligence of MindsLab.Taejoon Yoo
 

Mais procurados (18)

5 Questions To Ask Before Getting Started With Data Annotation
5 Questions To Ask Before Getting Started With Data Annotation5 Questions To Ask Before Getting Started With Data Annotation
5 Questions To Ask Before Getting Started With Data Annotation
 
Wk 5 case 3 knowledge management and collaboration at tata consulting service...
Wk 5 case 3 knowledge management and collaboration at tata consulting service...Wk 5 case 3 knowledge management and collaboration at tata consulting service...
Wk 5 case 3 knowledge management and collaboration at tata consulting service...
 
Ai trends and startups in india
Ai trends and startups in india Ai trends and startups in india
Ai trends and startups in india
 
Less Artificial More Intelligent
Less Artificial More IntelligentLess Artificial More Intelligent
Less Artificial More Intelligent
 
Artificial Intelligence: What is it? Where do information professionals fit?
Artificial Intelligence: What is it? Where do information professionals fit?Artificial Intelligence: What is it? Where do information professionals fit?
Artificial Intelligence: What is it? Where do information professionals fit?
 
Pragmatic use of artificial intelligence in smart cities 03262018
Pragmatic use of artificial intelligence in smart cities 03262018Pragmatic use of artificial intelligence in smart cities 03262018
Pragmatic use of artificial intelligence in smart cities 03262018
 
Understanding the ABC's of AI
Understanding the ABC's of AIUnderstanding the ABC's of AI
Understanding the ABC's of AI
 
LegalTech - Bots vs Lawyers
LegalTech - Bots vs LawyersLegalTech - Bots vs Lawyers
LegalTech - Bots vs Lawyers
 
Using Open Data to fuel LegalTech Innovation
Using Open Data to fuel LegalTech InnovationUsing Open Data to fuel LegalTech Innovation
Using Open Data to fuel LegalTech Innovation
 
Ross Chayka. Gartner Hype Cycle for Emerging Tech Analysis
Ross Chayka. Gartner Hype Cycle for Emerging Tech AnalysisRoss Chayka. Gartner Hype Cycle for Emerging Tech Analysis
Ross Chayka. Gartner Hype Cycle for Emerging Tech Analysis
 
Dennis Hills - Introduction to Machine Learning on Mobile.pdf
Dennis Hills -  Introduction to Machine Learning on Mobile.pdfDennis Hills -  Introduction to Machine Learning on Mobile.pdf
Dennis Hills - Introduction to Machine Learning on Mobile.pdf
 
Resume
ResumeResume
Resume
 
Constructing Knowledge Graph for Social Networks in a Deep and Holistic Way
Constructing Knowledge Graph for Social Networks in a Deep and Holistic WayConstructing Knowledge Graph for Social Networks in a Deep and Holistic Way
Constructing Knowledge Graph for Social Networks in a Deep and Holistic Way
 
Data Science for Internet of Things with Ajit Jaokar
Data Science for Internet of Things with Ajit JaokarData Science for Internet of Things with Ajit Jaokar
Data Science for Internet of Things with Ajit Jaokar
 
Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...
Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...
Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...
 
Robotics & Artificial (RAI) Intelligence webinar: Law & Regulation for RAI In...
Robotics & Artificial (RAI) Intelligence webinar: Law & Regulation for RAI In...Robotics & Artificial (RAI) Intelligence webinar: Law & Regulation for RAI In...
Robotics & Artificial (RAI) Intelligence webinar: Law & Regulation for RAI In...
 
AI and Managerial Decision Making
AI and Managerial Decision MakingAI and Managerial Decision Making
AI and Managerial Decision Making
 
[Mindslab] Success stories and use cases of artificial intelligence of MindsLab.
[Mindslab] Success stories and use cases of artificial intelligence of MindsLab.[Mindslab] Success stories and use cases of artificial intelligence of MindsLab.
[Mindslab] Success stories and use cases of artificial intelligence of MindsLab.
 

Semelhante a Data Fingerprinting for Small Data Problems Using Autoencoders

The future of FinTech product using pervasive Machine Learning automation - A...
The future of FinTech product using pervasive Machine Learning automation - A...The future of FinTech product using pervasive Machine Learning automation - A...
The future of FinTech product using pervasive Machine Learning automation - A...Shift Conference
 
A Pragmatic AI Maturity Model
A Pragmatic AI Maturity ModelA Pragmatic AI Maturity Model
A Pragmatic AI Maturity ModelDATAVERSITY
 
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...Edureka!
 
Designing a Successful Governed Citizen Data Science Strategy
Designing a Successful Governed Citizen Data Science StrategyDesigning a Successful Governed Citizen Data Science Strategy
Designing a Successful Governed Citizen Data Science StrategyDATAVERSITY
 
Natural Language Understanding at AI and Machine Learning in Clinical Trials ...
Natural Language Understanding at AI and Machine Learning in Clinical Trials ...Natural Language Understanding at AI and Machine Learning in Clinical Trials ...
Natural Language Understanding at AI and Machine Learning in Clinical Trials ...Saama
 
CompTIA Cyber Career Pathway: Developing skills for 2020 and beyond
CompTIA Cyber Career Pathway: Developing skills for 2020 and beyondCompTIA Cyber Career Pathway: Developing skills for 2020 and beyond
CompTIA Cyber Career Pathway: Developing skills for 2020 and beyondZeshan Sattar
 
Santisook s telligence ai-innovation-digital big bang-thailand2018-share
Santisook s telligence ai-innovation-digital big bang-thailand2018-shareSantisook s telligence ai-innovation-digital big bang-thailand2018-share
Santisook s telligence ai-innovation-digital big bang-thailand2018-sharestelligence
 
Santisook stelligence ai-innovation-digital big bang-thailand2018-share
Santisook stelligence ai-innovation-digital big bang-thailand2018-shareSantisook stelligence ai-innovation-digital big bang-thailand2018-share
Santisook stelligence ai-innovation-digital big bang-thailand2018-shareSantisook Limpeeticharoenchot
 
Overview about Emerging Technologies
Overview about Emerging TechnologiesOverview about Emerging Technologies
Overview about Emerging TechnologiesMurali Venkatesh
 
CompTIA powered Cybersecurity Apprenticeships
CompTIA powered Cybersecurity ApprenticeshipsCompTIA powered Cybersecurity Apprenticeships
CompTIA powered Cybersecurity ApprenticeshipsZeshan Sattar
 
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...Edureka!
 
Introduction to Machine Learning on Mobile: Mobile Week SF
Introduction to Machine Learning on Mobile: Mobile Week SFIntroduction to Machine Learning on Mobile: Mobile Week SF
Introduction to Machine Learning on Mobile: Mobile Week SFAmazon Web Services
 
Artificial Intelligence nella realtà di oggi: come utilizzarla al meglio
Artificial Intelligence nella realtà di oggi: come utilizzarla al meglioArtificial Intelligence nella realtà di oggi: come utilizzarla al meglio
Artificial Intelligence nella realtà di oggi: come utilizzarla al meglioAmazon Web Services
 
Functional programming, TypeScript and RXJS
Functional programming, TypeScript and RXJSFunctional programming, TypeScript and RXJS
Functional programming, TypeScript and RXJSVivek Tikar
 
DataRobot - 머신러닝 자동화 플랫폼
DataRobot - 머신러닝 자동화 플랫폼DataRobot - 머신러닝 자동화 플랫폼
DataRobot - 머신러닝 자동화 플랫폼Sutaek Kim
 
[AIIM18] Automation and Integration (Not Rip and Replace) - Stephen Ludlow
[AIIM18] Automation and Integration (Not Rip and Replace) - Stephen Ludlow[AIIM18] Automation and Integration (Not Rip and Replace) - Stephen Ludlow
[AIIM18] Automation and Integration (Not Rip and Replace) - Stephen LudlowAIIM International
 
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...Amazon Web Services
 

Semelhante a Data Fingerprinting for Small Data Problems Using Autoencoders (20)

The future of FinTech product using pervasive Machine Learning automation - A...
The future of FinTech product using pervasive Machine Learning automation - A...The future of FinTech product using pervasive Machine Learning automation - A...
The future of FinTech product using pervasive Machine Learning automation - A...
 
A Pragmatic AI Maturity Model
A Pragmatic AI Maturity ModelA Pragmatic AI Maturity Model
A Pragmatic AI Maturity Model
 
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
Designing a Successful Governed Citizen Data Science Strategy
Designing a Successful Governed Citizen Data Science StrategyDesigning a Successful Governed Citizen Data Science Strategy
Designing a Successful Governed Citizen Data Science Strategy
 
Machine Learning on Mobile
Machine Learning on MobileMachine Learning on Mobile
Machine Learning on Mobile
 
Machine Learning on Mobile
Machine Learning on MobileMachine Learning on Mobile
Machine Learning on Mobile
 
Natural Language Understanding at AI and Machine Learning in Clinical Trials ...
Natural Language Understanding at AI and Machine Learning in Clinical Trials ...Natural Language Understanding at AI and Machine Learning in Clinical Trials ...
Natural Language Understanding at AI and Machine Learning in Clinical Trials ...
 
CompTIA Cyber Career Pathway: Developing skills for 2020 and beyond
CompTIA Cyber Career Pathway: Developing skills for 2020 and beyondCompTIA Cyber Career Pathway: Developing skills for 2020 and beyond
CompTIA Cyber Career Pathway: Developing skills for 2020 and beyond
 
Santisook s telligence ai-innovation-digital big bang-thailand2018-share
Santisook s telligence ai-innovation-digital big bang-thailand2018-shareSantisook s telligence ai-innovation-digital big bang-thailand2018-share
Santisook s telligence ai-innovation-digital big bang-thailand2018-share
 
Santisook stelligence ai-innovation-digital big bang-thailand2018-share
Santisook stelligence ai-innovation-digital big bang-thailand2018-shareSantisook stelligence ai-innovation-digital big bang-thailand2018-share
Santisook stelligence ai-innovation-digital big bang-thailand2018-share
 
Overview about Emerging Technologies
Overview about Emerging TechnologiesOverview about Emerging Technologies
Overview about Emerging Technologies
 
CompTIA powered Cybersecurity Apprenticeships
CompTIA powered Cybersecurity ApprenticeshipsCompTIA powered Cybersecurity Apprenticeships
CompTIA powered Cybersecurity Apprenticeships
 
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
 
Introduction to Machine Learning on Mobile: Mobile Week SF
Introduction to Machine Learning on Mobile: Mobile Week SFIntroduction to Machine Learning on Mobile: Mobile Week SF
Introduction to Machine Learning on Mobile: Mobile Week SF
 
Artificial Intelligence nella realtà di oggi: come utilizzarla al meglio
Artificial Intelligence nella realtà di oggi: come utilizzarla al meglioArtificial Intelligence nella realtà di oggi: come utilizzarla al meglio
Artificial Intelligence nella realtà di oggi: come utilizzarla al meglio
 
Functional programming, TypeScript and RXJS
Functional programming, TypeScript and RXJSFunctional programming, TypeScript and RXJS
Functional programming, TypeScript and RXJS
 
DataRobot - 머신러닝 자동화 플랫폼
DataRobot - 머신러닝 자동화 플랫폼DataRobot - 머신러닝 자동화 플랫폼
DataRobot - 머신러닝 자동화 플랫폼
 
[AIIM18] Automation and Integration (Not Rip and Replace) - Stephen Ludlow
[AIIM18] Automation and Integration (Not Rip and Replace) - Stephen Ludlow[AIIM18] Automation and Integration (Not Rip and Replace) - Stephen Ludlow
[AIIM18] Automation and Integration (Not Rip and Replace) - Stephen Ludlow
 
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...
 

Último

Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Milind Agarwal
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 

Último (20)

Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 

Data Fingerprinting for Small Data Problems Using Autoencoders

  • 1. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Fingerprinting Latent Structure in Data MRITYUNJAY KUMAR & GUNTUR RAVINDRA TECHNOLOGY EXCELLENCE GROUP TALENTICA SOFTWARE PRESENTED AT DAIR (DATA ANALYTICS AND INTELLIGENCE RESEARCH ,INDIAN INSTITUTE OF TECHNOLOGY, DELHI)
  • 2. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Agenda  Challenge with building data-driven algorithms  Small-data  Introduction to data fingerprinting  Two problem statements  Solving a Question complexity problem  Solving an Image recognition problem  Fingerprinting the structure in data  Extracting structure  Representing structure as a signature  Other complex problems
  • 3. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. What is data fingerprinting  A method to represent a block of data as an entity  Applications: Easy validation, proof of originality, tamper detection, DLP  Classical techniques  Bloom filters, cryptographic hashes  Main issues with fingerprinting  Do not capture data semantics  Large number of fingerprints  complexity
  • 4. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Two Problems
  • 5. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Recognizing question complexity
  • 6. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Recognizing question complexity
  • 7. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Recognizing structural deformation in cells Data source: https://www.kaggle.com/c/data-science-bowl-2018/data
  • 8. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Data-driven algorithms with Small- Data  Need for problem-specific data  Rule-based approaches  Rule-based approaches are easy to implement  Not all data characteristics can be captured as rules  Does not automatically adapt to the data  Machine learning approach  ML approaches need large amounts of data  Generic models and open-source data are not suitable for application-specific needs  Can build complex structures and designs
  • 9. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Architecting a solution • Knowledge has a latent structure • Sequence, Geometry • There can be a hierarchies of structures • convert structure to a computational representation • Objective: context of application capabilities Influences computational representation
  • 10. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Problem Formulation A set of elements : images, questions, Text messages An objective A subset of structures relevant to an objective How do we define and how do we find Transformation of elements into a structure and hence a computational entity A human in the loop
  • 11. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Structures in Data How many buses are plying in Mumbai on a route originating at Dadar and ending at Vashi? How many students are in the class?
  • 12. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Structures in Data Intensity Projections Oriented gradients
  • 13. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Problem Formulation For computational ease we make A function that maps a structure to vector The inverse of the function results in one of many structures a binary bit-vector Goal is to find so as to satisfy the constraints This is a constrained optimization formulation
  • 14. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Solution : Optimization formulation  Based on the problem formulation  We have an optimization formulation that has an inverse that results in the variable itself or a subset of variables  A related function is a neural auto-encoder  Solution boils down to  Training an auto-encoder with one class of data  Recognizing data class involves  Data clustering  Human intelligence/visual inspection to mark clusters  Data in clusters used to train the auto-encoder
  • 15. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Recognition : Cell Structure
  • 16. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Recognition : Question Complexity How much can the SP alter income tax in Scotland? What is stage 1 in the life of a bill? Who is the President of Egypt? Why do some people purposely resist officers of the law? Why is the need for acceptance of punishment needed? Why would one plead guilty to a crime involving civil disobedience? Why is giving a defiant speech sometimes more harmful for the individual? Why did Harvard end its early admission program?
  • 17. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.  The auto-encoder output has distortions  Detect the distortion  Quantify the distortion Solution : Recognition
  • 18. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Building Complexity  Incremental addition of data classes  Using stacking  Unique binary code injected in each stacked layer  Collapse stacked layers into a classification model  redeploy
  • 19. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Data Type Test Cases True Positive False Positive True False Negative With classes like in training data 1781 1774 NA NA 7 With classes not like in training data 8789 NA 13 8776 NA
  • 20. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Summary  A large number of applications are still small-data applications  Data has latent structure  Extraction is objective based and data specific  We can harness data-hungry algorithms for small-data applications  Use structures instead of raw data  Auto-encoders are powerful tools  Build incremental complexity

Notas do Editor

  1. Sequence of systemcalls execution  a computer program Sequence of words  a sentence Organization of pixel intensities in a 2d space  image Sequence of images  video
  2. Explain objective : an objective is to detect if a question can be answered by a trained API-based model. Objective can also be to detect if a cell is not deformed.
  3. Explain that this is similar to an auto encoder’s F and INV(F) except that INV can return the representation of any element in S’