SlideShare a Scribd company logo
1 of 30
Download to read offline
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Shipment Address Classification in Logistics in
the absence of Geolocation Information
Dr. T. Ravindra Babu,
Data Scientist,
Flipkart
August 1, 2015
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Presentation Plan
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Motivation and Problem Definition
Motivation
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Motivation and Problem Definition
Motivation
Problem Definition
Typical Operations Scenario at Delivery Hub without a model
Inscan of shipments received from Mother Hub
Manual reading of address; Assign to the Route/FE
Sorting and Delivery
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Motivation and Problem Definition
Motivation
Problem Definition
Typical Operations Scenario at Delivery Hub without a model
Inscan of shipments received from Mother Hub
Manual reading of address; Assign to the Route/FE
Sorting and Delivery
Overview of Proposed Solution
Capturing FEs’ domain knowledge and modelling around it
Classifying an address to be belonging to a pre-defined subarea
Allocation of the shipments to Route/FE based on Machine
Learning based Classifier
Sorting and Delivery
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Delivery Hub and Subareas
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Insights into Address Data
No. of words in an addresses ranges from 4 to 75 leaving few
outliers of more than 100.
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Insights into Address Data
No. of words in an addresses ranges from 4 to 75 leaving few
outliers of more than 100.
Word like Apartments is spelt in 263 different ways; whitefield
24 ways, industrial 25 ways, Bangalore 161 ways, karnataka
70 ways, etc.
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Insights into Address Data
No. of words in an addresses ranges from 4 to 75 leaving few
outliers of more than 100.
Word like Apartments is spelt in 263 different ways; whitefield
24 ways, industrial 25 ways, Bangalore 161 ways, karnataka
70 ways, etc.
Structure in address is lacking even in city like Bangalore.
Few examples.
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Insights into Address Data
No. of words in an addresses ranges from 4 to 75 leaving few
outliers of more than 100.
Word like Apartments is spelt in 263 different ways; whitefield
24 ways, industrial 25 ways, Bangalore 161 ways, karnataka
70 ways, etc.
Structure in address is lacking even in city like Bangalore.
Few examples.
Some words a specific to certain places/states. Examples:
halli, hobli; bawdi, kuan; society; layout; etc.
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Insights into Address Data
No. of words in an addresses ranges from 4 to 75 leaving few
outliers of more than 100.
Word like Apartments is spelt in 263 different ways; whitefield
24 ways, industrial 25 ways, Bangalore 161 ways, karnataka
70 ways, etc.
Structure in address is lacking even in city like Bangalore.
Few examples.
Some words a specific to certain places/states. Examples:
halli, hobli; bawdi, kuan; society; layout; etc.
Addressing Systems across the world: US, Europe, Korea,
Japan; countries like Brazil, and India
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Proposed Model
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Preprocessing
An elaborate preprocessing model was necessary that accounts
for the following.
Retaining only those terms that possibly help classification
(discriminability)
Merging of terms by empirical statistical models as well as
domain knowledge based rules, n-grams, abbreviating, etc.
Developing data dependent dictionaries based on pattern
clustering (Machine Learning) and forming an equivalent set
Preprocessing reduces the vocabulary size by 65% as
measured on a large dataset
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Preprocessing for Data Compaction
Figure: Impact of Preprocessing
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing::Fraud Address Classification -
Address Strings
Sl.No. Address
1 adf6546s54f6sadfsd6dsa4f6sd54f6sd46fasd54sd6f
2 gasdfashagadfasmejastic
3 fdgdf
4 hjsdhaddsdsasdsa
5 dsfadafadsasdfsdafsda
6 hjsdhaddsdsasdsa
7 asd
8 lmflvml
9 assasfsafasfsasfsfsafashaphilomena
10 faskjbdasdlkjbsaasd
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing::Fraud Address Classification -
Address Strings-Heatmap
Figure: MonkeyType Addresses
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing::Fraud Address Classification -
Items Bought
Figure: Items bought by such people
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing :: Probabilistic Separation of
Compound Words
To a large extent, Addresses are not amenable to English
Dictionaries
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing :: Probabilistic Separation of
Compound Words
To a large extent, Addresses are not amenable to English
Dictionaries
While writing addresses it is often found that the customer
either inadvertently misses the space or removed during
storage/retrieval
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing :: Probabilistic Separation of
Compound Words
To a large extent, Addresses are not amenable to English
Dictionaries
While writing addresses it is often found that the customer
either inadvertently misses the space or removed during
storage/retrieval
Separating such compound words
Compute empirical probabilities of words
Assuming conditional independence, if the joint probability of a
compound word is less than the product of the individual
words, separate the words
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing :: Frequent Pattern Tree for
n-gram Generation
Frequent pattern tree is a celebrated approach in mining large
datasets
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing :: Frequent Pattern Tree for
n-gram Generation
Frequent pattern tree is a celebrated approach in mining large
datasets
We implement a modified version of the tree to generate
n-grams
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing :: Frequent Pattern Tree for
n-gram Generation
Frequent pattern tree is a celebrated approach in mining large
datasets
We implement a modified version of the tree to generate
n-grams
Conventional method
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing :: Frequent Pattern Tree for
n-gram Generation
Frequent pattern tree is a celebrated approach in mining large
datasets
We implement a modified version of the tree to generate
n-grams
Conventional method
New approach
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing::Clustering for equivalent set of
words with spell variations - Ex. koramangala, electronics
koramanagala koromangala kormanagala koramnagala
koramangalato kanamangala koramanagla koremangala
koaramangala koramamgala karamangala tkoramangala
kormangalla koramongala koarmangala korammangala
koramangalla koramangale koramanagal
electronice eclectronic elelctronic eelectronic electronica electroincs
electronics electroninc electrinics electroncis electronincs
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing:: Clustering for ... spell variations
- Ex. Bannerghattaroad(61 variations)
bannerghattaroad, bannergattaroad, banerghattaroad, bannerghataroad,
bannerughattaroad, bannarghattaroad, banergattaroad,
banneraghattaroad, bannerghettaroad, bannerugattaroad,
bhannerghattaroad, bennerghattaroad, bannerghttaroad,
bannargattaroad, banarghattaroad, banneghattaroad, banneragattaroad,
bennarghattaroad, baneerghattaroad, bannergettaroad,
banngerghattaroad, banerghataroad, bannerghuttaroad, bannergatharoad,
benerghattaroad, bannerghattaroadto, bannergataroad,
bannergattharoad, banerghettaroad, bannerguttaroad, bannarghataroad,
bannnerghattaroad, bannarghettaroad, banerughattaroad,
bannergahttaroad, bhannerughattaroad, bennergattaroad,
bannerghattroad, bannaraghattaroad, bannerhattaroad,
bannerghatharoad, banneerghattaroad, bannaerghattaroad,
baneergattaroad, bhannergattaroad, bhanerghattaroad,
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Post-processing :: Semi-Supervised Methods
Discussion
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Revisiting The Model
Supervised Classification
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Summary
Novelty
Solution is novel and developed in-house
No similar solution found in the Literature
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Thank You
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G

More Related Content

Viewers also liked

Online Student Registration System
Online Student Registration SystemOnline Student Registration System
Online Student Registration SystemSanjana Agarwal
 
Student information system project
Student information system projectStudent information system project
Student information system projectRizwan Ashraf
 
Procedure qualification
Procedure qualificationProcedure qualification
Procedure qualificationvaasuBandaru
 
M02 Uml Overview
M02 Uml OverviewM02 Uml Overview
M02 Uml OverviewDang Tuan
 
Types of Grading and Reporting System
Types of Grading and Reporting System Types of Grading and Reporting System
Types of Grading and Reporting System Cyra Mae Soreda
 
9321885 online-university-admission-system (1)
9321885 online-university-admission-system (1)9321885 online-university-admission-system (1)
9321885 online-university-admission-system (1)Amani Mrisho
 
Student information-system-project-outline
Student information-system-project-outlineStudent information-system-project-outline
Student information-system-project-outlineAmit Panwar
 
Course registration system dfd
Course registration system dfdCourse registration system dfd
Course registration system dfdUtsav mistry
 
Modeling- Object, Dynamic and Functional
Modeling- Object, Dynamic and FunctionalModeling- Object, Dynamic and Functional
Modeling- Object, Dynamic and FunctionalRajani Bhandari
 
Use Case Diagram
Use Case DiagramUse Case Diagram
Use Case DiagramAshesh R
 

Viewers also liked (12)

Online Student Registration System
Online Student Registration SystemOnline Student Registration System
Online Student Registration System
 
Student information system project
Student information system projectStudent information system project
Student information system project
 
Procedure qualification
Procedure qualificationProcedure qualification
Procedure qualification
 
M02 Uml Overview
M02 Uml OverviewM02 Uml Overview
M02 Uml Overview
 
Types of Grading and Reporting System
Types of Grading and Reporting System Types of Grading and Reporting System
Types of Grading and Reporting System
 
Grading system
Grading systemGrading system
Grading system
 
5 Type Of Architecture Design Process
5 Type Of Architecture Design Process 5 Type Of Architecture Design Process
5 Type Of Architecture Design Process
 
9321885 online-university-admission-system (1)
9321885 online-university-admission-system (1)9321885 online-university-admission-system (1)
9321885 online-university-admission-system (1)
 
Student information-system-project-outline
Student information-system-project-outlineStudent information-system-project-outline
Student information-system-project-outline
 
Course registration system dfd
Course registration system dfdCourse registration system dfd
Course registration system dfd
 
Modeling- Object, Dynamic and Functional
Modeling- Object, Dynamic and FunctionalModeling- Object, Dynamic and Functional
Modeling- Object, Dynamic and Functional
 
Use Case Diagram
Use Case DiagramUse Case Diagram
Use Case Diagram
 

Similar to Shipment address classification in logistics, Ravindra Babu, Flipkart

Address classification
Address classificationAddress classification
Address classificationNamanChikara1
 
How to Answer Candidate Questions About Your DEI Strategy
How to Answer Candidate Questions About Your DEI StrategyHow to Answer Candidate Questions About Your DEI Strategy
How to Answer Candidate Questions About Your DEI StrategyRecruitingDaily.com LLC
 
Big Data in Human Resources
Big Data in Human ResourcesBig Data in Human Resources
Big Data in Human ResourcesMatthias Vallaey
 
Leaderhip dancefloor weminar
Leaderhip dancefloor weminarLeaderhip dancefloor weminar
Leaderhip dancefloor weminarAngel Diaz-Maroto
 
LinkedIn Recruiter Certification: get in, get smart, get certified | Talent C...
LinkedIn Recruiter Certification: get in, get smart, get certified | Talent C...LinkedIn Recruiter Certification: get in, get smart, get certified | Talent C...
LinkedIn Recruiter Certification: get in, get smart, get certified | Talent C...LinkedIn Talent Solutions
 
Human Resource Planning PowerPoint Presentation Slides
Human Resource Planning PowerPoint Presentation Slides Human Resource Planning PowerPoint Presentation Slides
Human Resource Planning PowerPoint Presentation Slides SlideTeam
 
Rubric Name Undergraduate Generic Case and SLP Grading Rubric - Nov.docx
Rubric Name Undergraduate Generic Case and SLP Grading Rubric - Nov.docxRubric Name Undergraduate Generic Case and SLP Grading Rubric - Nov.docx
Rubric Name Undergraduate Generic Case and SLP Grading Rubric - Nov.docxjoellemurphey
 
Break Out of the Training Box with the Six Boxes® Approach
Break Out of the Training Box with the Six Boxes® ApproachBreak Out of the Training Box with the Six Boxes® Approach
Break Out of the Training Box with the Six Boxes® Approachcarlbinder
 
MM Bagali ......HR...... Succession planning......HRM......HRD.......Management
MM Bagali ......HR...... Succession planning......HRM......HRD.......ManagementMM Bagali ......HR...... Succession planning......HRM......HRD.......Management
MM Bagali ......HR...... Succession planning......HRM......HRD.......Managementdr m m bagali, phd in hr
 
Sharon G Wilder Resume v1
Sharon G Wilder Resume v1Sharon G Wilder Resume v1
Sharon G Wilder Resume v1Sharon Wilder
 
RDF Data Quality Assessment - connecting the pieces
RDF Data Quality Assessment - connecting the piecesRDF Data Quality Assessment - connecting the pieces
RDF Data Quality Assessment - connecting the piecesConnected Data World
 
Succession Management PowerPoint Presentation Slides
Succession Management PowerPoint Presentation Slides Succession Management PowerPoint Presentation Slides
Succession Management PowerPoint Presentation Slides SlideTeam
 
Make L&D Count - Shape a strong business case for L&D
Make L&D Count - Shape a strong business case for L&DMake L&D Count - Shape a strong business case for L&D
Make L&D Count - Shape a strong business case for L&DAlexandra Lederer
 
Successful ERP Selection
Successful ERP SelectionSuccessful ERP Selection
Successful ERP SelectionKatie Flanagan
 
Dfwtrn SourceCon2012 recap
Dfwtrn SourceCon2012 recapDfwtrn SourceCon2012 recap
Dfwtrn SourceCon2012 recapDorothy Beach
 
Replacement Planning PowerPoint Presentation Slides
Replacement Planning PowerPoint Presentation Slides Replacement Planning PowerPoint Presentation Slides
Replacement Planning PowerPoint Presentation Slides SlideTeam
 
Planning Your Workforce During Turbulent Times
Planning Your Workforce During Turbulent TimesPlanning Your Workforce During Turbulent Times
Planning Your Workforce During Turbulent TimesWorkday, Inc.
 

Similar to Shipment address classification in logistics, Ravindra Babu, Flipkart (20)

Address classification
Address classificationAddress classification
Address classification
 
Vedant Borse
Vedant BorseVedant Borse
Vedant Borse
 
How to Answer Candidate Questions About Your DEI Strategy
How to Answer Candidate Questions About Your DEI StrategyHow to Answer Candidate Questions About Your DEI Strategy
How to Answer Candidate Questions About Your DEI Strategy
 
Big Data in Human Resources
Big Data in Human ResourcesBig Data in Human Resources
Big Data in Human Resources
 
Leaderhip dancefloor weminar
Leaderhip dancefloor weminarLeaderhip dancefloor weminar
Leaderhip dancefloor weminar
 
LinkedIn Recruiter Certification: get in, get smart, get certified | Talent C...
LinkedIn Recruiter Certification: get in, get smart, get certified | Talent C...LinkedIn Recruiter Certification: get in, get smart, get certified | Talent C...
LinkedIn Recruiter Certification: get in, get smart, get certified | Talent C...
 
4 Steps to Become an HR Analytics Champion
4 Steps to Become an HR Analytics Champion4 Steps to Become an HR Analytics Champion
4 Steps to Become an HR Analytics Champion
 
Human Resource Planning PowerPoint Presentation Slides
Human Resource Planning PowerPoint Presentation Slides Human Resource Planning PowerPoint Presentation Slides
Human Resource Planning PowerPoint Presentation Slides
 
Rubric Name Undergraduate Generic Case and SLP Grading Rubric - Nov.docx
Rubric Name Undergraduate Generic Case and SLP Grading Rubric - Nov.docxRubric Name Undergraduate Generic Case and SLP Grading Rubric - Nov.docx
Rubric Name Undergraduate Generic Case and SLP Grading Rubric - Nov.docx
 
Break Out of the Training Box with the Six Boxes® Approach
Break Out of the Training Box with the Six Boxes® ApproachBreak Out of the Training Box with the Six Boxes® Approach
Break Out of the Training Box with the Six Boxes® Approach
 
MM Bagali ......HR...... Succession planning......HRM......HRD.......Management
MM Bagali ......HR...... Succession planning......HRM......HRD.......ManagementMM Bagali ......HR...... Succession planning......HRM......HRD.......Management
MM Bagali ......HR...... Succession planning......HRM......HRD.......Management
 
Sharon G Wilder Resume v1
Sharon G Wilder Resume v1Sharon G Wilder Resume v1
Sharon G Wilder Resume v1
 
RDF Data Quality Assessment - connecting the pieces
RDF Data Quality Assessment - connecting the piecesRDF Data Quality Assessment - connecting the pieces
RDF Data Quality Assessment - connecting the pieces
 
Succession Management PowerPoint Presentation Slides
Succession Management PowerPoint Presentation Slides Succession Management PowerPoint Presentation Slides
Succession Management PowerPoint Presentation Slides
 
Make L&D Count - Shape a strong business case for L&D
Make L&D Count - Shape a strong business case for L&DMake L&D Count - Shape a strong business case for L&D
Make L&D Count - Shape a strong business case for L&D
 
Finding Your Path to Value
Finding Your Path to ValueFinding Your Path to Value
Finding Your Path to Value
 
Successful ERP Selection
Successful ERP SelectionSuccessful ERP Selection
Successful ERP Selection
 
Dfwtrn SourceCon2012 recap
Dfwtrn SourceCon2012 recapDfwtrn SourceCon2012 recap
Dfwtrn SourceCon2012 recap
 
Replacement Planning PowerPoint Presentation Slides
Replacement Planning PowerPoint Presentation Slides Replacement Planning PowerPoint Presentation Slides
Replacement Planning PowerPoint Presentation Slides
 
Planning Your Workforce During Turbulent Times
Planning Your Workforce During Turbulent TimesPlanning Your Workforce During Turbulent Times
Planning Your Workforce During Turbulent Times
 

Recently uploaded

Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 

Recently uploaded (20)

Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 

Shipment address classification in logistics, Ravindra Babu, Flipkart

  • 1. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Shipment Address Classification in Logistics in the absence of Geolocation Information Dr. T. Ravindra Babu, Data Scientist, Flipkart August 1, 2015 Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 2. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Presentation Plan Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 3. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Motivation and Problem Definition Motivation Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 4. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Motivation and Problem Definition Motivation Problem Definition Typical Operations Scenario at Delivery Hub without a model Inscan of shipments received from Mother Hub Manual reading of address; Assign to the Route/FE Sorting and Delivery Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 5. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Motivation and Problem Definition Motivation Problem Definition Typical Operations Scenario at Delivery Hub without a model Inscan of shipments received from Mother Hub Manual reading of address; Assign to the Route/FE Sorting and Delivery Overview of Proposed Solution Capturing FEs’ domain knowledge and modelling around it Classifying an address to be belonging to a pre-defined subarea Allocation of the shipments to Route/FE based on Machine Learning based Classifier Sorting and Delivery Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 6. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Delivery Hub and Subareas Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 7. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Insights into Address Data No. of words in an addresses ranges from 4 to 75 leaving few outliers of more than 100. Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 8. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Insights into Address Data No. of words in an addresses ranges from 4 to 75 leaving few outliers of more than 100. Word like Apartments is spelt in 263 different ways; whitefield 24 ways, industrial 25 ways, Bangalore 161 ways, karnataka 70 ways, etc. Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 9. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Insights into Address Data No. of words in an addresses ranges from 4 to 75 leaving few outliers of more than 100. Word like Apartments is spelt in 263 different ways; whitefield 24 ways, industrial 25 ways, Bangalore 161 ways, karnataka 70 ways, etc. Structure in address is lacking even in city like Bangalore. Few examples. Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 10. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Insights into Address Data No. of words in an addresses ranges from 4 to 75 leaving few outliers of more than 100. Word like Apartments is spelt in 263 different ways; whitefield 24 ways, industrial 25 ways, Bangalore 161 ways, karnataka 70 ways, etc. Structure in address is lacking even in city like Bangalore. Few examples. Some words a specific to certain places/states. Examples: halli, hobli; bawdi, kuan; society; layout; etc. Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 11. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Insights into Address Data No. of words in an addresses ranges from 4 to 75 leaving few outliers of more than 100. Word like Apartments is spelt in 263 different ways; whitefield 24 ways, industrial 25 ways, Bangalore 161 ways, karnataka 70 ways, etc. Structure in address is lacking even in city like Bangalore. Few examples. Some words a specific to certain places/states. Examples: halli, hobli; bawdi, kuan; society; layout; etc. Addressing Systems across the world: US, Europe, Korea, Japan; countries like Brazil, and India Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 12. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Proposed Model Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 13. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Preprocessing An elaborate preprocessing model was necessary that accounts for the following. Retaining only those terms that possibly help classification (discriminability) Merging of terms by empirical statistical models as well as domain knowledge based rules, n-grams, abbreviating, etc. Developing data dependent dictionaries based on pattern clustering (Machine Learning) and forming an equivalent set Preprocessing reduces the vocabulary size by 65% as measured on a large dataset Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 14. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Preprocessing for Data Compaction Figure: Impact of Preprocessing Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 15. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Models in Preprocessing::Fraud Address Classification - Address Strings Sl.No. Address 1 adf6546s54f6sadfsd6dsa4f6sd54f6sd46fasd54sd6f 2 gasdfashagadfasmejastic 3 fdgdf 4 hjsdhaddsdsasdsa 5 dsfadafadsasdfsdafsda 6 hjsdhaddsdsasdsa 7 asd 8 lmflvml 9 assasfsafasfsasfsfsafashaphilomena 10 faskjbdasdlkjbsaasd Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 16. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Models in Preprocessing::Fraud Address Classification - Address Strings-Heatmap Figure: MonkeyType Addresses Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 17. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Models in Preprocessing::Fraud Address Classification - Items Bought Figure: Items bought by such people Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 18. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Models in Preprocessing :: Probabilistic Separation of Compound Words To a large extent, Addresses are not amenable to English Dictionaries Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 19. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Models in Preprocessing :: Probabilistic Separation of Compound Words To a large extent, Addresses are not amenable to English Dictionaries While writing addresses it is often found that the customer either inadvertently misses the space or removed during storage/retrieval Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 20. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Models in Preprocessing :: Probabilistic Separation of Compound Words To a large extent, Addresses are not amenable to English Dictionaries While writing addresses it is often found that the customer either inadvertently misses the space or removed during storage/retrieval Separating such compound words Compute empirical probabilities of words Assuming conditional independence, if the joint probability of a compound word is less than the product of the individual words, separate the words Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 21. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Models in Preprocessing :: Frequent Pattern Tree for n-gram Generation Frequent pattern tree is a celebrated approach in mining large datasets Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 22. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Models in Preprocessing :: Frequent Pattern Tree for n-gram Generation Frequent pattern tree is a celebrated approach in mining large datasets We implement a modified version of the tree to generate n-grams Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 23. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Models in Preprocessing :: Frequent Pattern Tree for n-gram Generation Frequent pattern tree is a celebrated approach in mining large datasets We implement a modified version of the tree to generate n-grams Conventional method Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 24. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Models in Preprocessing :: Frequent Pattern Tree for n-gram Generation Frequent pattern tree is a celebrated approach in mining large datasets We implement a modified version of the tree to generate n-grams Conventional method New approach Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 25. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Models in Preprocessing::Clustering for equivalent set of words with spell variations - Ex. koramangala, electronics koramanagala koromangala kormanagala koramnagala koramangalato kanamangala koramanagla koremangala koaramangala koramamgala karamangala tkoramangala kormangalla koramongala koarmangala korammangala koramangalla koramangale koramanagal electronice eclectronic elelctronic eelectronic electronica electroincs electronics electroninc electrinics electroncis electronincs Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 26. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Models in Preprocessing:: Clustering for ... spell variations - Ex. Bannerghattaroad(61 variations) bannerghattaroad, bannergattaroad, banerghattaroad, bannerghataroad, bannerughattaroad, bannarghattaroad, banergattaroad, banneraghattaroad, bannerghettaroad, bannerugattaroad, bhannerghattaroad, bennerghattaroad, bannerghttaroad, bannargattaroad, banarghattaroad, banneghattaroad, banneragattaroad, bennarghattaroad, baneerghattaroad, bannergettaroad, banngerghattaroad, banerghataroad, bannerghuttaroad, bannergatharoad, benerghattaroad, bannerghattaroadto, bannergataroad, bannergattharoad, banerghettaroad, bannerguttaroad, bannarghataroad, bannnerghattaroad, bannarghettaroad, banerughattaroad, bannergahttaroad, bhannerughattaroad, bennergattaroad, bannerghattroad, bannaraghattaroad, bannerhattaroad, bannerghatharoad, banneerghattaroad, bannaerghattaroad, baneergattaroad, bhannergattaroad, bhanerghattaroad, Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 27. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Models in Post-processing :: Semi-Supervised Methods Discussion Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 28. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Revisiting The Model Supervised Classification Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 29. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Summary Novelty Solution is novel and developed in-house No similar solution found in the Literature Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 30. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Thank You Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G