SlideShare uma empresa Scribd logo
1 de 15
Fast Exact String Pattern-Matching Algorithm for
Fixed Length Patterns
Ing. Ľuboš Takáč
PhD student
Faculty of Management Science and Informatics
University of Žilina
Presentation overview
• Motivation
• Problem Definition
• Existing Solutions
• Our Implemented Algorithm
• Testing Results
• Conclusion
Motivation
• Word search game generator
• Searching string patterns with
fixed length
– M . . . E R
– . . . A H
– . . . . .
Problem Definition
• Design fast in-memory data structure (class)
• Requirements
– fast searching, if it is possible with O(1) complexity
– each founded word get only once
– each founded word must be randomly chosen
– founded word have to match the pattern
class Model
FastStringPatternSearch
+ FastStringPatternSearch(String[], Random)
+ FastStringPatternSearch(String[])
+ reset() : void
+ searchPattern(String) : String
Existing Solutions
• Relational DB table with full-text index - access to hard drive
• Linked List or array in memory – O(N) complexity
• Indexing of array – necessary to index all possible combination of
patterns to have O(1) complexity
Number of
undefined
positions
0 1 2 3 4 5 6 7 8
Example of
pattern
PATTER
NS
PATT-
RNS
PA-TE-
NS
-AT--
RNS
P-T-E--S -A--ER-- --TT---- ----E--- --------
All combinations
count 1 8 28 56 70 56 28 8 1
Total combinations
count 256
Our Implemented Algorithm
• Dynamic in-memory tree(s) with linked list of words (id’s) on nodes
• Roots are in 3-dimensional matrix
• Nodes has 2-dimensional matrix of children
Root
• 3 dimensional matrix of root nodes with linked shuffled lists
– alphabet dimension
– word length dimension
– character position dimension
• Example
– We put the word “NAUTICAL” into nodes [N][8][1], [A][8][2], [U][8][3], …,
[L][8][8]
– When we search for pattern “. . U . . . . .”, we are looking into root node [U][8][3]
where we find word “NAUTICAL” in linked list
Root
Child nodes
• 2 dimensional matrix of child nodes with linked shuffled lists
– alphabet dimension
– word length dimension – can be determine from ancestor
– character position dimension
Searching algorithm
• Searching for pattern “. . T . E R . .”
1. Get the first defined character, pattern length and the position of first
defined character (T, 8, 3). Get a node of three-dimensional array data structure
at [character][length][position] ([T][8][3]). Continue to step 2 with this node.
2. If a node is null, string with this pattern does not exists. – END.
If a node is not null and a node has not children (leaf node) or pattern has no
further defined characters, find the first string in a node list which matches the
pattern. Return founded string or null if no string matches the pattern. – END.
If a node is not null and a node has children (not leaf node), take the next
defined character in pattern (E, position 5) and access two-dimensional array of
children nodes of node at element [position][character] ([5][E]), go to step 2
with the given node.
Complexity of algorithm
• We can set MaxListSize on leaf nodes, which determine the
complexity to O(L+MaxListSize), where L is the length of the string
• low MaxListSize = fast searching, high memory consumption,
slow initializing
• High MaxListSize = slow searching, low memory consumption,
fast initializing
• Recommendation
– Set it based on purpose, dictionary size
– Create data structure only once and share it
Other requirements
• Get every word only once
– Creating array map with boolean value “used” and comparing and updating it
– Function reset, which set all values to “not used” - O(N)
• Get randomly chosen words
– All linked list are shuffled after initialization
– After finding the word, we put the word on the end of linked list – O(1)
• Get words with pattern without character e.g. “. . . . . . .”
– Creating special linked lists with all sizes and put the words from dictionary there
Testing Results
• Dictionary with 225 thousands word
• Generating 5 000 word search games of size 25x25
• More than 1300 times faster than naive algorithm
We used for testing HP ProBook 6550b with configuration Win 7 Professional 64bit, Intel® Core ™ i5 CPU M450 2cores 2.40GHz, 4GB RAM, Java 7.
MaxListSize
Initializing
time (s)
Generating
time (s)
Memory
consumption
(MB)
Unlimited
1,508 989,643 86
5000
2,726 839,294 101
1000
4,843 400,539 265
500
7,062 324,728 340
100
16,141 279,410 808
Naive algorithm O(N) 0,095 381 073,600 15
Conclusion
• We design and implement fast in-memory data structure for searching
string patterns with fixed length
• Dynamic structure, up to O(1) complexity
• Randomly chosen words matching the pattern, each founded only
once
• Options to reset data structure, to get all words again without
initializing data structure ( complexity O(N) )
Thank you for your attention!
lubos.takac@gmail.com

Mais conteúdo relacionado

Mais procurados

Prolog (present)
Prolog (present) Prolog (present)
Prolog (present) Melody Joey
 
Introduction To Data Structures.
Introduction To Data Structures.Introduction To Data Structures.
Introduction To Data Structures.Education Front
 
Introduction to Data structure & Algorithms - Sethuonline.com | Sathyabama Un...
Introduction to Data structure & Algorithms - Sethuonline.com | Sathyabama Un...Introduction to Data structure & Algorithms - Sethuonline.com | Sathyabama Un...
Introduction to Data structure & Algorithms - Sethuonline.com | Sathyabama Un...sethuraman R
 
Prolog Programming : Basics
Prolog Programming : BasicsProlog Programming : Basics
Prolog Programming : BasicsMitul Desai
 
Introduction To Autumata Theory
 Introduction To Autumata Theory Introduction To Autumata Theory
Introduction To Autumata TheoryAbdul Rehman
 
Toward Description Generation for Tables in Scientific Articles
Toward Description Generation for Tables in Scientific ArticlesToward Description Generation for Tables in Scientific Articles
Toward Description Generation for Tables in Scientific ArticlesJUNJIEXu9
 
DATA STRUCTURE
DATA STRUCTUREDATA STRUCTURE
DATA STRUCTURERohit Rai
 
Introduction to XPath
Introduction to XPathIntroduction to XPath
Introduction to XPathtorp42
 
Mca iii dfs u-4 tree and graph
Mca iii dfs u-4 tree and graphMca iii dfs u-4 tree and graph
Mca iii dfs u-4 tree and graphRai University
 
Lecture 8 strings and characters
Lecture 8  strings and charactersLecture 8  strings and characters
Lecture 8 strings and charactersalvin567
 
Data Structure # vpmp polytechnic
Data Structure # vpmp polytechnicData Structure # vpmp polytechnic
Data Structure # vpmp polytechniclavparmar007
 
Vi INFOTECH php-syllabus
Vi INFOTECH php-syllabusVi INFOTECH php-syllabus
Vi INFOTECH php-syllabusViINFOTECH
 

Mais procurados (19)

PROLOG: Introduction To Prolog
PROLOG: Introduction To PrologPROLOG: Introduction To Prolog
PROLOG: Introduction To Prolog
 
Prolog (present)
Prolog (present) Prolog (present)
Prolog (present)
 
Introduction To Data Structures.
Introduction To Data Structures.Introduction To Data Structures.
Introduction To Data Structures.
 
Introduction to Data structure & Algorithms - Sethuonline.com | Sathyabama Un...
Introduction to Data structure & Algorithms - Sethuonline.com | Sathyabama Un...Introduction to Data structure & Algorithms - Sethuonline.com | Sathyabama Un...
Introduction to Data structure & Algorithms - Sethuonline.com | Sathyabama Un...
 
Prolog Programming : Basics
Prolog Programming : BasicsProlog Programming : Basics
Prolog Programming : Basics
 
Introduction To Autumata Theory
 Introduction To Autumata Theory Introduction To Autumata Theory
Introduction To Autumata Theory
 
Toward Description Generation for Tables in Scientific Articles
Toward Description Generation for Tables in Scientific ArticlesToward Description Generation for Tables in Scientific Articles
Toward Description Generation for Tables in Scientific Articles
 
DATA STRUCTURE
DATA STRUCTUREDATA STRUCTURE
DATA STRUCTURE
 
Tree - Data Structure
Tree - Data StructureTree - Data Structure
Tree - Data Structure
 
Data structure using c++
Data structure using c++Data structure using c++
Data structure using c++
 
Data Structure Basics
Data Structure BasicsData Structure Basics
Data Structure Basics
 
Introduction to XPath
Introduction to XPathIntroduction to XPath
Introduction to XPath
 
Mca iii dfs u-4 tree and graph
Mca iii dfs u-4 tree and graphMca iii dfs u-4 tree and graph
Mca iii dfs u-4 tree and graph
 
Lecture 8 strings and characters
Lecture 8  strings and charactersLecture 8  strings and characters
Lecture 8 strings and characters
 
Ch02
Ch02Ch02
Ch02
 
Data Structure # vpmp polytechnic
Data Structure # vpmp polytechnicData Structure # vpmp polytechnic
Data Structure # vpmp polytechnic
 
Vi INFOTECH php-syllabus
Vi INFOTECH php-syllabusVi INFOTECH php-syllabus
Vi INFOTECH php-syllabus
 
Chapter 5 ds
Chapter 5 dsChapter 5 ds
Chapter 5 ds
 
Data structures
Data structures Data structures
Data structures
 

Destaque

2014 dti monitoring-solution_for_dangerous_goods_carried_by_intermodal_transport
2014 dti monitoring-solution_for_dangerous_goods_carried_by_intermodal_transport2014 dti monitoring-solution_for_dangerous_goods_carried_by_intermodal_transport
2014 dti monitoring-solution_for_dangerous_goods_carried_by_intermodal_transportkvaderlipa
 
2014 dt takac-radius-degree_layout-fast_and_easy_graph_visualization_layout
2014 dt takac-radius-degree_layout-fast_and_easy_graph_visualization_layout2014 dt takac-radius-degree_layout-fast_and_easy_graph_visualization_layout
2014 dt takac-radius-degree_layout-fast_and_easy_graph_visualization_layoutkvaderlipa
 
Art & Science Data Visualization
Art & Science Data VisualizationArt & Science Data Visualization
Art & Science Data Visualizationkvaderlipa
 
Visualization of Large Multivariate Data Sets using Parallel Coordinates
Visualization of Large Multivariate Data Sets using Parallel CoordinatesVisualization of Large Multivariate Data Sets using Parallel Coordinates
Visualization of Large Multivariate Data Sets using Parallel Coordinateskvaderlipa
 
Design and Development of New Automatic on-line Media Monitoring System
Design and Development of New Automatic on-line Media Monitoring SystemDesign and Development of New Automatic on-line Media Monitoring System
Design and Development of New Automatic on-line Media Monitoring Systemkvaderlipa
 
Big data, Hadoop, NoSQL DB - introduction
Big data, Hadoop, NoSQL DB - introductionBig data, Hadoop, NoSQL DB - introduction
Big data, Hadoop, NoSQL DB - introductionkvaderlipa
 

Destaque (7)

2014 dti monitoring-solution_for_dangerous_goods_carried_by_intermodal_transport
2014 dti monitoring-solution_for_dangerous_goods_carried_by_intermodal_transport2014 dti monitoring-solution_for_dangerous_goods_carried_by_intermodal_transport
2014 dti monitoring-solution_for_dangerous_goods_carried_by_intermodal_transport
 
2014 dt takac-radius-degree_layout-fast_and_easy_graph_visualization_layout
2014 dt takac-radius-degree_layout-fast_and_easy_graph_visualization_layout2014 dt takac-radius-degree_layout-fast_and_easy_graph_visualization_layout
2014 dt takac-radius-degree_layout-fast_and_easy_graph_visualization_layout
 
Art & Science Data Visualization
Art & Science Data VisualizationArt & Science Data Visualization
Art & Science Data Visualization
 
Visualization of Large Multivariate Data Sets using Parallel Coordinates
Visualization of Large Multivariate Data Sets using Parallel CoordinatesVisualization of Large Multivariate Data Sets using Parallel Coordinates
Visualization of Large Multivariate Data Sets using Parallel Coordinates
 
Design and Development of New Automatic on-line Media Monitoring System
Design and Development of New Automatic on-line Media Monitoring SystemDesign and Development of New Automatic on-line Media Monitoring System
Design and Development of New Automatic on-line Media Monitoring System
 
Big data, Hadoop, NoSQL DB - introduction
Big data, Hadoop, NoSQL DB - introductionBig data, Hadoop, NoSQL DB - introduction
Big data, Hadoop, NoSQL DB - introduction
 
Ebolusyon ng salapi
Ebolusyon ng salapiEbolusyon ng salapi
Ebolusyon ng salapi
 

Semelhante a Fast Exact String Pattern-Matching Algorithm for Fixed Length Patterns

Processing data with Python, using standard library modules you (probably) ne...
Processing data with Python, using standard library modules you (probably) ne...Processing data with Python, using standard library modules you (probably) ne...
Processing data with Python, using standard library modules you (probably) ne...gjcross
 
Suffix Tree and Suffix Array
Suffix Tree and Suffix ArraySuffix Tree and Suffix Array
Suffix Tree and Suffix ArrayHarshit Agarwal
 
Semi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific TablesSemi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific TablesElsevier
 
Data structures and algorithms
Data structures and algorithmsData structures and algorithms
Data structures and algorithmsJulie Iskander
 
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...HendraPurnama31
 
Python - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning LibrariesPython - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning LibrariesAndrew Ferlitsch
 
DSJ_Unit I & II.pdf
DSJ_Unit I & II.pdfDSJ_Unit I & II.pdf
DSJ_Unit I & II.pdfArumugam90
 
Data structure and algorithm using java
Data structure and algorithm using javaData structure and algorithm using java
Data structure and algorithm using javaNarayan Sau
 
python-numpyandpandas-170922144956 (1).pptx
python-numpyandpandas-170922144956 (1).pptxpython-numpyandpandas-170922144956 (1).pptx
python-numpyandpandas-170922144956 (1).pptxAkashgupta517936
 
Towards advanced data retrieval from learning objects repositories
Towards advanced data retrieval from learning objects repositoriesTowards advanced data retrieval from learning objects repositories
Towards advanced data retrieval from learning objects repositoriesValentina Paunovic
 
L1 - Recap.pdf
L1 - Recap.pdfL1 - Recap.pdf
L1 - Recap.pdfIfat Nix
 
Text classification using Text kernels
Text classification using Text kernelsText classification using Text kernels
Text classification using Text kernelsDev Nath
 

Semelhante a Fast Exact String Pattern-Matching Algorithm for Fixed Length Patterns (20)

Processing data with Python, using standard library modules you (probably) ne...
Processing data with Python, using standard library modules you (probably) ne...Processing data with Python, using standard library modules you (probably) ne...
Processing data with Python, using standard library modules you (probably) ne...
 
Suffix Tree and Suffix Array
Suffix Tree and Suffix ArraySuffix Tree and Suffix Array
Suffix Tree and Suffix Array
 
Tries .ppt
Tries .pptTries .ppt
Tries .ppt
 
Regular Expressions
Regular ExpressionsRegular Expressions
Regular Expressions
 
Semi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific TablesSemi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific Tables
 
Data structures and algorithms
Data structures and algorithmsData structures and algorithms
Data structures and algorithms
 
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...
 
Python - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning LibrariesPython - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning Libraries
 
numpy.pdf
numpy.pdfnumpy.pdf
numpy.pdf
 
DSJ_Unit I & II.pdf
DSJ_Unit I & II.pdfDSJ_Unit I & II.pdf
DSJ_Unit I & II.pdf
 
Python Tutorial Part 1
Python Tutorial Part 1Python Tutorial Part 1
Python Tutorial Part 1
 
Data structure and algorithm using java
Data structure and algorithm using javaData structure and algorithm using java
Data structure and algorithm using java
 
python-numpyandpandas-170922144956 (1).pptx
python-numpyandpandas-170922144956 (1).pptxpython-numpyandpandas-170922144956 (1).pptx
python-numpyandpandas-170922144956 (1).pptx
 
Towards advanced data retrieval from learning objects repositories
Towards advanced data retrieval from learning objects repositoriesTowards advanced data retrieval from learning objects repositories
Towards advanced data retrieval from learning objects repositories
 
Unit -I Toc.pptx
Unit -I Toc.pptxUnit -I Toc.pptx
Unit -I Toc.pptx
 
Numpy.pdf
Numpy.pdfNumpy.pdf
Numpy.pdf
 
Python with data Sciences
Python with data SciencesPython with data Sciences
Python with data Sciences
 
Numpy
NumpyNumpy
Numpy
 
L1 - Recap.pdf
L1 - Recap.pdfL1 - Recap.pdf
L1 - Recap.pdf
 
Text classification using Text kernels
Text classification using Text kernelsText classification using Text kernels
Text classification using Text kernels
 

Último

Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 

Último (20)

Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 

Fast Exact String Pattern-Matching Algorithm for Fixed Length Patterns

  • 1. Fast Exact String Pattern-Matching Algorithm for Fixed Length Patterns Ing. Ľuboš Takáč PhD student Faculty of Management Science and Informatics University of Žilina
  • 2. Presentation overview • Motivation • Problem Definition • Existing Solutions • Our Implemented Algorithm • Testing Results • Conclusion
  • 3. Motivation • Word search game generator • Searching string patterns with fixed length – M . . . E R – . . . A H – . . . . .
  • 4. Problem Definition • Design fast in-memory data structure (class) • Requirements – fast searching, if it is possible with O(1) complexity – each founded word get only once – each founded word must be randomly chosen – founded word have to match the pattern class Model FastStringPatternSearch + FastStringPatternSearch(String[], Random) + FastStringPatternSearch(String[]) + reset() : void + searchPattern(String) : String
  • 5. Existing Solutions • Relational DB table with full-text index - access to hard drive • Linked List or array in memory – O(N) complexity • Indexing of array – necessary to index all possible combination of patterns to have O(1) complexity Number of undefined positions 0 1 2 3 4 5 6 7 8 Example of pattern PATTER NS PATT- RNS PA-TE- NS -AT-- RNS P-T-E--S -A--ER-- --TT---- ----E--- -------- All combinations count 1 8 28 56 70 56 28 8 1 Total combinations count 256
  • 6. Our Implemented Algorithm • Dynamic in-memory tree(s) with linked list of words (id’s) on nodes • Roots are in 3-dimensional matrix • Nodes has 2-dimensional matrix of children
  • 7. Root • 3 dimensional matrix of root nodes with linked shuffled lists – alphabet dimension – word length dimension – character position dimension • Example – We put the word “NAUTICAL” into nodes [N][8][1], [A][8][2], [U][8][3], …, [L][8][8] – When we search for pattern “. . U . . . . .”, we are looking into root node [U][8][3] where we find word “NAUTICAL” in linked list
  • 9. Child nodes • 2 dimensional matrix of child nodes with linked shuffled lists – alphabet dimension – word length dimension – can be determine from ancestor – character position dimension
  • 10. Searching algorithm • Searching for pattern “. . T . E R . .” 1. Get the first defined character, pattern length and the position of first defined character (T, 8, 3). Get a node of three-dimensional array data structure at [character][length][position] ([T][8][3]). Continue to step 2 with this node. 2. If a node is null, string with this pattern does not exists. – END. If a node is not null and a node has not children (leaf node) or pattern has no further defined characters, find the first string in a node list which matches the pattern. Return founded string or null if no string matches the pattern. – END. If a node is not null and a node has children (not leaf node), take the next defined character in pattern (E, position 5) and access two-dimensional array of children nodes of node at element [position][character] ([5][E]), go to step 2 with the given node.
  • 11. Complexity of algorithm • We can set MaxListSize on leaf nodes, which determine the complexity to O(L+MaxListSize), where L is the length of the string • low MaxListSize = fast searching, high memory consumption, slow initializing • High MaxListSize = slow searching, low memory consumption, fast initializing • Recommendation – Set it based on purpose, dictionary size – Create data structure only once and share it
  • 12. Other requirements • Get every word only once – Creating array map with boolean value “used” and comparing and updating it – Function reset, which set all values to “not used” - O(N) • Get randomly chosen words – All linked list are shuffled after initialization – After finding the word, we put the word on the end of linked list – O(1) • Get words with pattern without character e.g. “. . . . . . .” – Creating special linked lists with all sizes and put the words from dictionary there
  • 13. Testing Results • Dictionary with 225 thousands word • Generating 5 000 word search games of size 25x25 • More than 1300 times faster than naive algorithm We used for testing HP ProBook 6550b with configuration Win 7 Professional 64bit, Intel® Core ™ i5 CPU M450 2cores 2.40GHz, 4GB RAM, Java 7. MaxListSize Initializing time (s) Generating time (s) Memory consumption (MB) Unlimited 1,508 989,643 86 5000 2,726 839,294 101 1000 4,843 400,539 265 500 7,062 324,728 340 100 16,141 279,410 808 Naive algorithm O(N) 0,095 381 073,600 15
  • 14. Conclusion • We design and implement fast in-memory data structure for searching string patterns with fixed length • Dynamic structure, up to O(1) complexity • Randomly chosen words matching the pattern, each founded only once • Options to reset data structure, to get all words again without initializing data structure ( complexity O(N) )
  • 15. Thank you for your attention! lubos.takac@gmail.com