SlideShare uma empresa Scribd logo
1 de 11
Baixar para ler offline
/**
* @mainpage
* @anchor mainpage
* @brief
* @details
* @copyright Russell John Childs, PhD, 2016
* @author Russell John Childs, PhD
* @date 2016-05-07
*
* This file contains classes: ExtendedRegExp, Parser
*
* Problem statement:
* You are given a dictionary (dictionary.txt), containing a list of words, one
* per line. Imagine you have seven tiles. Each tile is either blank or contains
* a single lowercase letter (a-z).
*
* Please list all the words from the dictionary that can be produced by using
* some or all of the seven tiles, in any order. A blank tile is a wildcard,
* and can be used in place of any letter.
*
* Try to use a minimal amount of memory.
*
* 1. Find all of the words that can be formed if you don't have to deal with
* blank tiles. (You may skip this step and go straight to step 2).
*
* 2. Find all of the words that can be formed, including those where blank
* tiles are used as wildcards.
*
* 3. Would you do things differently if you had to process several hundred
* tile sets with the same dictionary?
*
* Expectations:
*
* a) Please write down the reasoning or the explanation behind your solution in
* plain English or pseudo-code.
*
* b) Please provide the source code of your implementation. Only 1 and 2 need
* source code.
*
* c) Please include instructions on how to compile and run your code.
*
* d) Bonus points for source code in C/C++/C#.
*
*
* Solution: Use a bucket sort array,
* e.g. "bbbaa " -> bsa[0]=2, bsa[1]=3, bsa[size]=4;
* Iterate through chars in string to be matched:
*
* (1) If chr != wildcard, decrement bsa[chr] iff bsa[chr] > 0
*
* (2) if chr == wildcard or bsa[chr]<0, decrement bsa[size]
*
* (3) if bsa[size] < 0, there is no match.
*
* Specifications:
*
* (1) C shall denote a range of contiguous ASCII characters
*and shall defalut to ['a','z'].
*[This is the specification relating to "tiles"]
*
* (2) W shall denote a "wildcard" character, W.
* [This is the specification relating to "blank tiles"]
*
* (3) The user shall use the single-space ' ' for W.
* [This is the specification relating to "blank tiles"]
* (4) The system shall maintain an internal value for
* W of char(127) so that it is greater than the non-wildcard characters.
*
* (5) R shall denote the regular expression [CW]{n}, where n shall be
* specified by the user and default to 7
* [This is the specification relating to "7 tiles"]
*
* (6) S shall denote the set of all permutations of R,
* i.e. the set of all regular expressions that may be formed by
* permuting the characters in R.
*
(7) D shall be a set of strings delimited by the newline character.
* [This is the specification relating to "dictionary"]
*
* (8) No string in D shall contain the wildcard character ' ',
* i.e. no string shall contain a single space.
*
* (9) The system shall list all strings from D for which a match against
* any element in the set S, or a substr of the element,
* exists, in the order in which they appear in D.
* [This is the specification relating to requirements (1) and (2). (1) & (2)
* may be reduced to 1 requirement by deleting " " in regex,
* eg: "abc" <--> "a b c"]
*
* (10) The list specified in (8) shall be returned one string at a time and
* shall not be stored as an internal list of matching strings.
* [This is the specification relating to "minimal memory"]
*
* (11) Matching on a string in D shall be O(n) in complexity.
* [This is the specification relating to requirement (3)]
*
*
* (12) [Next release]. The sytem shall be multithreaded and shall divide
* dictionary.tmp into thread data and shall implement a multithreaded
* bucket-sort [This is the specification relating to requirement (3)].
*
* Compiled and tested under Linux Mint, using g++ 4.8.
*
* g++ options: -O0 -g3 -Wall -O0 -fopenmp -mavx -m64 -g -Wall -c
* -fmessage-length=0 -fno-omit-frame-pointer --fast-math
* -std=c++11 -I/opt/intel/vtune_amplifier_xe_2013/include/
*
* Linker options: -Wl,--no-as-needed -fopenmp
* -L/opt/intel/vtune_amplifier_xe_2013/lib64/
*
* Cmd line options: -lpthread -latomic -littnotify -ldl
*
* Documentation: Doxygen comments for interfaces, normal for impl.
*
* Usage: Place this cpp and unit_test.hpp in the same directory and compile.
* After compiling, run and specify choices at the prompts:
*
* "Please specify the pattern to be matched"
*
* "Please specify the fully-qualified dictionary filename"
*
* "Please specify the fully-qualified filename for the results"
*
* The binary will send:
* n
* (1) test results to stdout
*
* (2) results to results file
*
* (3) Temporary dictionary file to "./tmp_dictionary"
*
* Inputs and outputs:
*
* Input: dictionary file
*
* Input/output: tmp_dictionary file
*
* Output: results file
*
* The file unit_test.hpp is required for the tests and must be requested from
* author.
*
* @file dimensional_mechanics.cpp
* @see
* @ref mainpage
*/
#include <string>
#include <fstream>
#include <regex>
#include <algorithm>
#include<set>
#include<random>
#include "unit_test.hpp"
//Unit test framework is written for Linux. This #define ports it to
//Visual Studio 2013
//#define __PRETTY_FUNCTION__ __FUNCSIG__
/**
* addtogroup RegexpProblem
* @{
*/
namespace RegexpProblem
{
/**
* This class implements an O(n) algorithm that determines whether a match can
* be found between any permutation of the characters in a regular expression
*(including wildcard characters).
*
* The algorithm uses bucket sort array that counts the number of occurrences of
* each char in the regex, with the number of wildcards recorded at the end
* of the array.
*
* Example: ".a.bb." - bsa[0]=1, bsa[1]=2, bsa[sizeof(bsa)]=3
*
* Having built the array, the string to be matched is examined.
* Each char found in the string is used to decrement the corresponding
* entry in the array. Example "a" -> bsa[1]=2 ---> bsa[1]=1
*
* When a particular entry is 0, then the entry for the wildcard is decremented
* instead to signify it is being used in place of the character.
*
* Finally, if the entry for the wildcard goes negative, it must mean there is
* no match.
*
* This class is intended to be a replaceable helper class that supplements
* the std::regex lib.
*
*/
class ExtendedRegExp
{
public:
/**
* @param regexp {const std::string&} - A regex, e.g. ".a.b.c"
* @param wildcard {char} - Thr char used as a wildcard, e.g. ' ', '.', '?'
* @param bucket_sort_size {unsigned} - The size for the bucket sort array
*/
ExtendedRegExp(const std::string& regexp, char wildcard=' ',
unsigned bucket_sort_size=28) try :
m_regexp(regexp),
m_buckets(bucket_sort_size, 0),
m_wildcard(127),
m_begin(127)
{
//Initialise bucket sort
std::replace(m_regexp.begin(), m_regexp.end(), ' ', char(127));
m_begin = *std::min_element(m_regexp.begin(), m_regexp.end());
for (auto chr : m_regexp)
{
++m_buckets[chr != m_wildcard ? index(chr) : bucket_sort_size - 1];
}
}
catch (std::exception& except)
{
//Print any excpetion thrown
std::cout << "Exception initialising bucket sort in constructor: "
<< except.what()
<< std::endl;
}
/**
* Dtor
*/
~ExtendedRegExp(void)
{
}
/**
* This returns the index of a char relative to the smallest char in the
* regex
*
* Example: If the smallest char in the regex is 'a' then the index
* of 'c' will be 3
*
* @param chr {char} - the char whose relative "ascii" index is to be found
*
* @return {unsigned} - The relative index
*/
unsigned index(char chr)
{
return unsigned((unsigned char)chr) - unsigned((unsigned char)m_begin);
};
/**
* This matches a string against all permutations of the regex used to
* initialise the class object. It uses the bucket sort array described
* in the documentation for this class. Blank strings are a match and
* strings matched against a substr of regex are also a match.
*
* @param word {const std::string&} - the char whose relative "ascii" index
* is to be found
*
* @return {unsigned} - The relative index
*/
bool operator==(const std::string& word)
{
bool ret_val = false;
auto buckets = m_buckets;
auto size = buckets.size();
//Only consider words short enough to be spanned by regecp
auto len = word.length();
if ((0 < len) && (len <= m_regexp.length()))
{
//Loop over chars in word
for (auto chr : word)
{
//Decrement corresponding non-wildcard count in regexcp
if ((index(chr) < (size-1)) && (buckets[index(chr)]>0))
{
//Decrement char count
ret_val = ((--buckets[index(chr)]) >= 0);
}
else //use wildcard if 0 non-wildcards left or not non-wildcard
{
//Decrement wildcard count
ret_val = ((--buckets[size-1]) >= 0);
}
//Only continue if we have not encountered a non-match
if (ret_val == false)
{
break;
}
}
}
return ((len > 0) ? ret_val : true);
}
/**
* This returns the length of the regex used to initialise this class object.
*
* @return {unsigned} - The length of the rexgex.
*/
unsigned size(void)
{
return m_regexp.size();
}
private:
std::string m_regexp;
std::vector<int> m_buckets;
char m_wildcard;
char m_begin;
};
/**
* This class iterates over the lines in a dictionary file seeking those that
* match the regegular epxression provided.
*
* @tparam RegExp - The regular expression matching engine to be used.
* The default is ExtendedRegExp. Any user-defined class must provide the
* same public interface as ExtendedRegExp.
*
*/
template<typename RegExp = ExtendedRegExp >
class Parser
{
public:
/**
* This defines the "no-match" string pattern.
*
* @return {const std::string&} - The "no-match" pattern.
*/
static const std::string& no_match(void)
{
static const std::string no_match("!£$%^&&^%$££$%^&");
return no_match;
}
/**
* @param dictionary {const std::string&} - The dictionary file to be parsed
* @param regexp {const RegExp&} - The regular expression to be used for
* matching
*/
Parser(const std::string& dictionary, const RegExp& regexp) :
m_dictionary(dictionary),
m_regexp(regexp)
{
}
/**
* Dtor
*/
~Parser(void)
{
}
/**
* This resets the dictionary file and regular expression.
*
* @param dictionary {const std::string&} - The dictionary file to be parsed
* @param regexp {const RegExp&} - The regular expression to be used for
* matching
*/
void reset(const std::string& dictionary, const RegExp& regexp)
{
m_dictionary.close();
m_dictionary = std::ifstream(dictionary);
m_regexp = regexp;
}
/**
* This returns the next line in the dictionary file that is successfully
* matched against the regular expression used to initiliase this class
* object.
*
* @return {std::string} - The next line matched
*/
std::string get_next_match(void)
{
//Buffer for dictionary string and return value
std::string ret_val = no_match();
bool ret = false;
//Get length of regexp
unsigned length = m_regexp.size();
//Verify file is good
if (m_dictionary && (m_dictionary.eof() == false))
{
//Loop over strings in file until next match is found
std::getline(m_dictionary, ret_val);
ret = (m_regexp == ret_val);
while ((ret == false) && (m_dictionary.eof() == false))
{
std::getline(m_dictionary, ret_val);
ret = (m_regexp == ret_val);
}
}
//Return the match
return ret ? ret_val : no_match();
}
private:
std::ifstream m_dictionary;
RegExp m_regexp;
};
}
/**
* @}
*/
/**
* addtogroup Tests
* @{
*/
namespace Tests
{
/**
* Wrapper class for std::vector converting {a, b, c, ...} to "a b c ..."
*/
struct PrintVector
{
PrintVector(const std::vector<std::string>& vec) :
m_vec(vec)
{
}
std::string str()
{
std::stringstream ss;
for (auto elem : m_vec)
{
ss << elem << " ";
}
return ss.str();
}
std::vector<std::string> m_vec;
};
/**
*
* =========
*
* Test plan
*
* =========
*
* 1. Extract n random lines from dictionary.txt, store in tmp_dictionary.txt.
* n=500.
*
* 2. Randomly select k=0<= k < tmp_dictionary.size for several values of k
*
* 3. For each k, extract kth line from tmp_dictionary.txt and use as
* pattern to ne matched (regex)
*
* 4. For each k, replace 0, 1, 2, ... all chars in regex with " ",
* to test wildcard substitution to give regex'.
*
* 5. For each regex' extract all matching lines in tmp_dictionary and
* store in "results" vector.
*
* 6. For each regex' iterate over all permutations of the characters.
*
* 7 For each permutation of regex' extract all matching lines from
* tmp_dictionary.txt using std::regex_match and store in "control" vector.
*
* 8. Validate that results ==control.
*
* NB: It is unnecessary to consider all permutations of locations to
* place the wildcards, (e.g. "..aa", ".a.a"), since the test already considers
* all permutations of regex and thus covers this equivalence partition.
*
* @param dictionary_file {const std::string&} - Fully-qualified filename
* for dictionary
*
*/
void tests(const std::string& dictionary_file)
{
typedef std::vector<std::string> sv;
//Namespaces for unit test framework and parser.
using namespace UnitTest;
using namespace RegexpProblem;
//Vectors for results of parser algorithm
// "control" generated by exhaustive search
sv results;
sv control;
//The regexp to be matched and the matching lines from dictionary
std::string regexp;
std::string match;
//Lambda to access file as file(line_number)
std::string buffer;
unsigned counter = 0;
auto get_line = [&](std::fstream& stream, unsigned index)
{
while (counter != index)
{
std::getline(stream, buffer);
++counter;
}
return buffer;
};
//Create small dictionary with random entries from original to speed tests.
//Create set of random numbers to extract random lines from dictionary file
const unsigned tmp_dictionary_size = 500;
std::fstream dictionary(dictionary_file);
std::fstream tmp_dictionary("tmp_dictionary.txt");
auto size = std::count(std::istreambuf_iterator<char>(dictionary),
std::istreambuf_iterator<char>(), 'n');
dictionary.seekg(0);
std::set<unsigned> random_lines;
//std::random_device dev;
std::mt19937 generator(0);
std::uniform_int_distribution<> distr(1, size);
for (int i = 1; i <= tmp_dictionary_size; ++i)
{
random_lines.insert(distr(generator));
}
//Create temporary dictionary file using the random numbers
for (auto random_line : random_lines)
{
get_line(dictionary, random_line);
if (counter == random_line)
{
tmp_dictionary << buffer << std::endl;
}
}
//Lmabda to perform exhaustive matching over all permuations of regex
//to verify algorithm
auto exhaustive_match = [&](std::string regexp)
{
std::cout << std::endl
<< "Verification requires exhaustive search, please wait";
//Loop over dictionary words while file is good
while (tmp_dictionary && (tmp_dictionary.eof() == false))
{
std::cout << ".";
std::string buffer;
std::getline(tmp_dictionary, buffer);
std::string padded = (buffer.length() >= regexp.length() ? buffer :
buffer + regexp.substr(buffer.length(), regexp.length()));
std::string truncated = regexp.substr(0, buffer.length());
bool is_found = std::regex_match(buffer, std::regex(truncated));
//Sort regexp and remove duplicate chars for std::next_permuation
// (it took me two hours to track down this this silly bug).
unsigned count = 126;
std::string tmp = regexp;
for (auto& chr : tmp)
{
if (chr == '.')
{
chr = count;
--count;
}
}
std::sort(tmp.begin(), tmp.end());
//Loop over permutations of regex until match found or exhausted
while ((buffer.length() <= regexp.length()) &&
(is_found == false) &&
std::next_permutation(tmp.begin(), tmp.end()))
{
//Undo the elimination of duplicates for std::next_permutation
//and put them back in
std::string tmp_1 = tmp;
std::replace_if(tmp_1.begin(), tmp_1.end(),
[&](char chr) {return (chr > count); }, '.');
std::string truncated = tmp_1.substr(0, buffer.length());
is_found = std::regex_match(buffer, std::regex(truncated));
}
//Add matches to list of "control" values used for verification
if (is_found)
{
control.push_back(buffer);
}
}
std::cout << std::endl;
};
//Create instances of parser with regexp = tmp_dictionary[random]
unsigned trials = 1;
for (unsigned trial = 1; trial <= trials; ++trial)
{
//Reset dictionary
tmp_dictionary.seekg(0);
counter = 0;
//Get radnwom line from dictionary to act as regexp
std::uniform_int_distribution<> rand_distr(1, tmp_dictionary_size);
unsigned random = rand_distr(generator);
regexp = get_line(tmp_dictionary, random);
//Loop over num of wildcards to use (0 to all chars).
//NB All-wildcards should match all strings of length=regexp.length
auto num_wildcards = regexp.size();
for (unsigned num = 0; num <= num_wildcards; ++num)
{
results.clear();
control.clear();
//Replace relevant char with wildcard
if (num > 0)
{
regexp.replace(num - 1, 1, " ");
}
//Reset dictionary
//tmp_dictionary.seekg(0);
tmp_dictionary.close();
counter = 0;
//Create parser and loop over matches found
Parser<> parser("tmp_dictionary.txt", ExtendedRegExp(regexp, ' '));
while ((match = parser.get_next_match()) != Parser<>::no_match())
{
results.push_back(match);
}
//Perform exhaustive match search for verification
tmp_dictionary.close();
tmp_dictionary.open("tmp_dictionary.txt");
counter = 0;
//for regex - replace ' ' with '.' as wildcard
auto tmp_regexp = regexp;
//std::cout << "regex before= " << regexp << std::endl;
std::replace(tmp_regexp.begin(), tmp_regexp.end(), ' ', '.');
//std::cout << "regex after= " << regexp << std::endl;
exhaustive_match(tmp_regexp);
//Verify algorithm against exhaustive match search
VERIFY(std::string("Regexp = ") + """ + tmp_regexp + """,
PrintVector(results).str()) == PrintVector(control).str();
}
}
}
}
/**
* @}
*/
int main(void)
{
using namespace Tests;
using namespace UnitTest;
using namespace RegexpProblem;
/*
//This struct pauses at the end of tests to print out results
struct BreakPointAfterMainExits
{
BreakPointAfterMainExits(void)
{
static BreakPointAfterMainExits tmp;
}
~BreakPointAfterMainExits(void)
{
unsigned set_bp_here_for_test_results = 0;
}
} dummy;
*/
std::string pattern;
std::cout << "Please specify the pattern to be matched" << std::endl;
std::getline(std::cin, pattern, 'n');
std::cout << pattern << std::endl;
std::string dictionary;
std::cout << "Please specify the fully-qualified dictionary filename"
<< std::endl;
std::cin >> dictionary;
std::cout << dictionary << std::endl;
std::string results_file;
std::cout << "Please specify the fully-qualified filename for the results"
<< std::endl;
std::cin >> results_file;
std::cout << results_file << std::endl;
//Run tests
char yes_no;
std::cout << "Do you wish to run the tests (they run slowly on Windows "
<< "and quickly under Linux)? - (y/n):";
std::cin >> yes_no;
if (yes_no == 'y' || yes_no == 'Y')
{
tests(dictionary);
Verify<Results> results;
}
//Match dictionary.tmp against " carnage", send results to results.txt
std::cout << "matching strings in dictionary.txt against " "
<< pattern
<< " " "
<< " Results will be sent to "
<< results_file
<< ". Please wait ..."
<< std::endl;
Parser<> parser(dictionary, ExtendedRegExp(pattern, ' '));
std::string match;
std::ofstream output(results_file);
while ((match = parser.get_next_match()) != Parser<>::no_match())
{
output << match << std::endl;
}
return 0;
}

Mais conteúdo relacionado

Mais procurados

Memory Management with Java and C++
Memory Management with Java and C++Memory Management with Java and C++
Memory Management with Java and C++
Mohammad Shaker
 
Lecture08 stacks and-queues_v3
Lecture08 stacks and-queues_v3Lecture08 stacks and-queues_v3
Lecture08 stacks and-queues_v3
Hariz Mustafa
 

Mais procurados (20)

C++11 - STL Additions
C++11 - STL AdditionsC++11 - STL Additions
C++11 - STL Additions
 
C++11 Multithreading - Futures
C++11 Multithreading - FuturesC++11 Multithreading - Futures
C++11 Multithreading - Futures
 
Memory Management C++ (Peeling operator new() and delete())
Memory Management C++ (Peeling operator new() and delete())Memory Management C++ (Peeling operator new() and delete())
Memory Management C++ (Peeling operator new() and delete())
 
Memory Management with Java and C++
Memory Management with Java and C++Memory Management with Java and C++
Memory Management with Java and C++
 
POLITEKNIK MALAYSIA
POLITEKNIK MALAYSIAPOLITEKNIK MALAYSIA
POLITEKNIK MALAYSIA
 
Chapter 2
Chapter 2Chapter 2
Chapter 2
 
WebGL 2.0 Reference Guide
WebGL 2.0 Reference GuideWebGL 2.0 Reference Guide
WebGL 2.0 Reference Guide
 
C1320prespost
C1320prespostC1320prespost
C1320prespost
 
Lecture08 stacks and-queues_v3
Lecture08 stacks and-queues_v3Lecture08 stacks and-queues_v3
Lecture08 stacks and-queues_v3
 
Type Classes in Scala and Haskell
Type Classes in Scala and HaskellType Classes in Scala and Haskell
Type Classes in Scala and Haskell
 
classes & objects in cpp overview
classes & objects in cpp overviewclasses & objects in cpp overview
classes & objects in cpp overview
 
Map(), flatmap() and reduce() are your new best friends: simpler collections,...
Map(), flatmap() and reduce() are your new best friends: simpler collections,...Map(), flatmap() and reduce() are your new best friends: simpler collections,...
Map(), flatmap() and reduce() are your new best friends: simpler collections,...
 
iOS Session-2
iOS Session-2iOS Session-2
iOS Session-2
 
L13 string handling(string class)
L13 string handling(string class)L13 string handling(string class)
L13 string handling(string class)
 
Standard Template Library
Standard Template LibraryStandard Template Library
Standard Template Library
 
Vector class in C++
Vector class in C++Vector class in C++
Vector class in C++
 
An Introduction to Part of C++ STL
An Introduction to Part of C++ STLAn Introduction to Part of C++ STL
An Introduction to Part of C++ STL
 
Linked list
Linked listLinked list
Linked list
 
R Programming: Importing Data In R
R Programming: Importing Data In RR Programming: Importing Data In R
R Programming: Importing Data In R
 
CS101- Introduction to Computing- Lecture 29
CS101- Introduction to Computing- Lecture 29CS101- Introduction to Computing- Lecture 29
CS101- Introduction to Computing- Lecture 29
 

Semelhante a Interview C++11 code

Lecture 15_Strings and Dynamic Memory Allocation.pptx
Lecture 15_Strings and  Dynamic Memory Allocation.pptxLecture 15_Strings and  Dynamic Memory Allocation.pptx
Lecture 15_Strings and Dynamic Memory Allocation.pptx
JawadTanvir
 
C aptitude questions
C aptitude questionsC aptitude questions
C aptitude questions
Srikanth
 
C - aptitude3
C - aptitude3C - aptitude3
C - aptitude3
Srikanth
 
Assignment 13assg-13.cppAssignment 13assg-13.cpp   @auth.docx
Assignment 13assg-13.cppAssignment 13assg-13.cpp   @auth.docxAssignment 13assg-13.cppAssignment 13assg-13.cpp   @auth.docx
Assignment 13assg-13.cppAssignment 13assg-13.cpp   @auth.docx
braycarissa250
 
Javascript built in String Functions
Javascript built in String FunctionsJavascript built in String Functions
Javascript built in String Functions
Avanitrambadiya
 
For this lab, you will write the following filesAbstractDataCalc.pdf
For this lab, you will write the following filesAbstractDataCalc.pdfFor this lab, you will write the following filesAbstractDataCalc.pdf
For this lab, you will write the following filesAbstractDataCalc.pdf
alokindustries1
 
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary dataKernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Anne Nicolas
 
The Morse code (see Table 6.10 in book) is a common code that is use.pdf
The Morse code (see Table 6.10 in book) is a common code that is use.pdfThe Morse code (see Table 6.10 in book) is a common code that is use.pdf
The Morse code (see Table 6.10 in book) is a common code that is use.pdf
bhim1213
 
I have question in c++ program I need the answer as soon as possible.docx
I have question in c++ program I need the answer as soon as possible.docxI have question in c++ program I need the answer as soon as possible.docx
I have question in c++ program I need the answer as soon as possible.docx
delciegreeks
 
1.1 Nested Loops – Lab3C.cppLoops often have loops inside .docx
1.1 Nested Loops – Lab3C.cppLoops often have loops inside .docx1.1 Nested Loops – Lab3C.cppLoops often have loops inside .docx
1.1 Nested Loops – Lab3C.cppLoops often have loops inside .docx
christiandean12115
 
Lab Assignment 4 CSE330 Spring 2014 Skeleton Code for ex.docx
 Lab Assignment 4 CSE330 Spring 2014  Skeleton Code for ex.docx Lab Assignment 4 CSE330 Spring 2014  Skeleton Code for ex.docx
Lab Assignment 4 CSE330 Spring 2014 Skeleton Code for ex.docx
MARRY7
 

Semelhante a Interview C++11 code (20)

Lecture 15_Strings and Dynamic Memory Allocation.pptx
Lecture 15_Strings and  Dynamic Memory Allocation.pptxLecture 15_Strings and  Dynamic Memory Allocation.pptx
Lecture 15_Strings and Dynamic Memory Allocation.pptx
 
C aptitude questions
C aptitude questionsC aptitude questions
C aptitude questions
 
C - aptitude3
C - aptitude3C - aptitude3
C - aptitude3
 
Beyond javascript using the features of tomorrow
Beyond javascript   using the features of tomorrowBeyond javascript   using the features of tomorrow
Beyond javascript using the features of tomorrow
 
Assignment 13assg-13.cppAssignment 13assg-13.cpp   @auth.docx
Assignment 13assg-13.cppAssignment 13assg-13.cpp   @auth.docxAssignment 13assg-13.cppAssignment 13assg-13.cpp   @auth.docx
Assignment 13assg-13.cppAssignment 13assg-13.cpp   @auth.docx
 
Javascript built in String Functions
Javascript built in String FunctionsJavascript built in String Functions
Javascript built in String Functions
 
For this lab, you will write the following filesAbstractDataCalc.pdf
For this lab, you will write the following filesAbstractDataCalc.pdfFor this lab, you will write the following filesAbstractDataCalc.pdf
For this lab, you will write the following filesAbstractDataCalc.pdf
 
Using Regular Expressions and Staying Sane
Using Regular Expressions and Staying SaneUsing Regular Expressions and Staying Sane
Using Regular Expressions and Staying Sane
 
C (PPS)Programming for problem solving.pptx
C (PPS)Programming for problem solving.pptxC (PPS)Programming for problem solving.pptx
C (PPS)Programming for problem solving.pptx
 
String searching
String searchingString searching
String searching
 
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary dataKernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
 
Dbms
DbmsDbms
Dbms
 
Best C++ Programming Homework Help
Best C++ Programming Homework HelpBest C++ Programming Homework Help
Best C++ Programming Homework Help
 
The Morse code (see Table 6.10 in book) is a common code that is use.pdf
The Morse code (see Table 6.10 in book) is a common code that is use.pdfThe Morse code (see Table 6.10 in book) is a common code that is use.pdf
The Morse code (see Table 6.10 in book) is a common code that is use.pdf
 
I have question in c++ program I need the answer as soon as possible.docx
I have question in c++ program I need the answer as soon as possible.docxI have question in c++ program I need the answer as soon as possible.docx
I have question in c++ program I need the answer as soon as possible.docx
 
Getting started with Perl XS and Inline::C
Getting started with Perl XS and Inline::CGetting started with Perl XS and Inline::C
Getting started with Perl XS and Inline::C
 
C++ Programming Homework Help
C++ Programming Homework HelpC++ Programming Homework Help
C++ Programming Homework Help
 
1.1 Nested Loops – Lab3C.cppLoops often have loops inside .docx
1.1 Nested Loops – Lab3C.cppLoops often have loops inside .docx1.1 Nested Loops – Lab3C.cppLoops often have loops inside .docx
1.1 Nested Loops – Lab3C.cppLoops often have loops inside .docx
 
Functions
FunctionsFunctions
Functions
 
Lab Assignment 4 CSE330 Spring 2014 Skeleton Code for ex.docx
 Lab Assignment 4 CSE330 Spring 2014  Skeleton Code for ex.docx Lab Assignment 4 CSE330 Spring 2014  Skeleton Code for ex.docx
Lab Assignment 4 CSE330 Spring 2014 Skeleton Code for ex.docx
 

Mais de Russell Childs

Full resume dr_russell_john_childs_2016
Full resume dr_russell_john_childs_2016Full resume dr_russell_john_childs_2016
Full resume dr_russell_john_childs_2016
Russell Childs
 
Full_resume_Dr_Russell_John_Childs
Full_resume_Dr_Russell_John_ChildsFull_resume_Dr_Russell_John_Childs
Full_resume_Dr_Russell_John_Childs
Russell Childs
 

Mais de Russell Childs (20)

spinor_quantum_simulator_user_guide_.pdf
spinor_quantum_simulator_user_guide_.pdfspinor_quantum_simulator_user_guide_.pdf
spinor_quantum_simulator_user_guide_.pdf
 
String searching o_n
String searching o_nString searching o_n
String searching o_n
 
String searching o_n
String searching o_nString searching o_n
String searching o_n
 
String searching o_n
String searching o_nString searching o_n
String searching o_n
 
Permute
PermutePermute
Permute
 
Permute
PermutePermute
Permute
 
Feature extraction using adiabatic theorem
Feature extraction using adiabatic theoremFeature extraction using adiabatic theorem
Feature extraction using adiabatic theorem
 
Feature extraction using adiabatic theorem
Feature extraction using adiabatic theoremFeature extraction using adiabatic theorem
Feature extraction using adiabatic theorem
 
Wavelets_and_multiresolution_in_two_pages
Wavelets_and_multiresolution_in_two_pagesWavelets_and_multiresolution_in_two_pages
Wavelets_and_multiresolution_in_two_pages
 
Relativity 2
Relativity 2Relativity 2
Relativity 2
 
Recursion to iteration automation.
Recursion to iteration automation.Recursion to iteration automation.
Recursion to iteration automation.
 
Dirac demo (quantum mechanics with C++). Please note: There is a problem with...
Dirac demo (quantum mechanics with C++). Please note: There is a problem with...Dirac demo (quantum mechanics with C++). Please note: There is a problem with...
Dirac demo (quantum mechanics with C++). Please note: There is a problem with...
 
Shared_memory_hash_table
Shared_memory_hash_tableShared_memory_hash_table
Shared_memory_hash_table
 
Full resume dr_russell_john_childs_2016
Full resume dr_russell_john_childs_2016Full resume dr_russell_john_childs_2016
Full resume dr_russell_john_childs_2016
 
Simple shared mutex UML
Simple shared mutex UMLSimple shared mutex UML
Simple shared mutex UML
 
Design pattern to avoid downcasting
Design pattern to avoid downcastingDesign pattern to avoid downcasting
Design pattern to avoid downcasting
 
Interview uml design
Interview uml designInterview uml design
Interview uml design
 
Full_resume_Dr_Russell_John_Childs
Full_resume_Dr_Russell_John_ChildsFull_resume_Dr_Russell_John_Childs
Full_resume_Dr_Russell_John_Childs
 
Dynamic programming burglar_problem
Dynamic programming burglar_problemDynamic programming burglar_problem
Dynamic programming burglar_problem
 
K d tree_cpp
K d tree_cppK d tree_cpp
K d tree_cpp
 

Último

%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
chiefasafspells
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
masabamasaba
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 

Último (20)

%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 

Interview C++11 code

  • 1. /** * @mainpage * @anchor mainpage * @brief * @details * @copyright Russell John Childs, PhD, 2016 * @author Russell John Childs, PhD * @date 2016-05-07 * * This file contains classes: ExtendedRegExp, Parser * * Problem statement: * You are given a dictionary (dictionary.txt), containing a list of words, one * per line. Imagine you have seven tiles. Each tile is either blank or contains * a single lowercase letter (a-z). * * Please list all the words from the dictionary that can be produced by using * some or all of the seven tiles, in any order. A blank tile is a wildcard, * and can be used in place of any letter. * * Try to use a minimal amount of memory. * * 1. Find all of the words that can be formed if you don't have to deal with * blank tiles. (You may skip this step and go straight to step 2). * * 2. Find all of the words that can be formed, including those where blank * tiles are used as wildcards. * * 3. Would you do things differently if you had to process several hundred * tile sets with the same dictionary? * * Expectations: * * a) Please write down the reasoning or the explanation behind your solution in * plain English or pseudo-code. * * b) Please provide the source code of your implementation. Only 1 and 2 need * source code. * * c) Please include instructions on how to compile and run your code. * * d) Bonus points for source code in C/C++/C#. * * * Solution: Use a bucket sort array, * e.g. "bbbaa " -> bsa[0]=2, bsa[1]=3, bsa[size]=4; * Iterate through chars in string to be matched: * * (1) If chr != wildcard, decrement bsa[chr] iff bsa[chr] > 0 * * (2) if chr == wildcard or bsa[chr]<0, decrement bsa[size] * * (3) if bsa[size] < 0, there is no match. * * Specifications: * * (1) C shall denote a range of contiguous ASCII characters *and shall defalut to ['a','z']. *[This is the specification relating to "tiles"] * * (2) W shall denote a "wildcard" character, W. * [This is the specification relating to "blank tiles"] * * (3) The user shall use the single-space ' ' for W. * [This is the specification relating to "blank tiles"] * (4) The system shall maintain an internal value for * W of char(127) so that it is greater than the non-wildcard characters. * * (5) R shall denote the regular expression [CW]{n}, where n shall be
  • 2. * specified by the user and default to 7 * [This is the specification relating to "7 tiles"] * * (6) S shall denote the set of all permutations of R, * i.e. the set of all regular expressions that may be formed by * permuting the characters in R. * (7) D shall be a set of strings delimited by the newline character. * [This is the specification relating to "dictionary"] * * (8) No string in D shall contain the wildcard character ' ', * i.e. no string shall contain a single space. * * (9) The system shall list all strings from D for which a match against * any element in the set S, or a substr of the element, * exists, in the order in which they appear in D. * [This is the specification relating to requirements (1) and (2). (1) & (2) * may be reduced to 1 requirement by deleting " " in regex, * eg: "abc" <--> "a b c"] * * (10) The list specified in (8) shall be returned one string at a time and * shall not be stored as an internal list of matching strings. * [This is the specification relating to "minimal memory"] * * (11) Matching on a string in D shall be O(n) in complexity. * [This is the specification relating to requirement (3)] * * * (12) [Next release]. The sytem shall be multithreaded and shall divide * dictionary.tmp into thread data and shall implement a multithreaded * bucket-sort [This is the specification relating to requirement (3)]. * * Compiled and tested under Linux Mint, using g++ 4.8. * * g++ options: -O0 -g3 -Wall -O0 -fopenmp -mavx -m64 -g -Wall -c * -fmessage-length=0 -fno-omit-frame-pointer --fast-math * -std=c++11 -I/opt/intel/vtune_amplifier_xe_2013/include/ * * Linker options: -Wl,--no-as-needed -fopenmp * -L/opt/intel/vtune_amplifier_xe_2013/lib64/ * * Cmd line options: -lpthread -latomic -littnotify -ldl * * Documentation: Doxygen comments for interfaces, normal for impl. * * Usage: Place this cpp and unit_test.hpp in the same directory and compile. * After compiling, run and specify choices at the prompts: * * "Please specify the pattern to be matched" * * "Please specify the fully-qualified dictionary filename" * * "Please specify the fully-qualified filename for the results" * * The binary will send: * n * (1) test results to stdout * * (2) results to results file * * (3) Temporary dictionary file to "./tmp_dictionary" * * Inputs and outputs: * * Input: dictionary file * * Input/output: tmp_dictionary file * * Output: results file
  • 3. * * The file unit_test.hpp is required for the tests and must be requested from * author. * * @file dimensional_mechanics.cpp * @see * @ref mainpage */ #include <string> #include <fstream> #include <regex> #include <algorithm> #include<set> #include<random> #include "unit_test.hpp" //Unit test framework is written for Linux. This #define ports it to //Visual Studio 2013 //#define __PRETTY_FUNCTION__ __FUNCSIG__ /** * addtogroup RegexpProblem * @{ */ namespace RegexpProblem { /** * This class implements an O(n) algorithm that determines whether a match can * be found between any permutation of the characters in a regular expression *(including wildcard characters). * * The algorithm uses bucket sort array that counts the number of occurrences of * each char in the regex, with the number of wildcards recorded at the end * of the array. * * Example: ".a.bb." - bsa[0]=1, bsa[1]=2, bsa[sizeof(bsa)]=3 * * Having built the array, the string to be matched is examined. * Each char found in the string is used to decrement the corresponding * entry in the array. Example "a" -> bsa[1]=2 ---> bsa[1]=1 * * When a particular entry is 0, then the entry for the wildcard is decremented * instead to signify it is being used in place of the character. * * Finally, if the entry for the wildcard goes negative, it must mean there is * no match. * * This class is intended to be a replaceable helper class that supplements * the std::regex lib. * */ class ExtendedRegExp { public: /** * @param regexp {const std::string&} - A regex, e.g. ".a.b.c" * @param wildcard {char} - Thr char used as a wildcard, e.g. ' ', '.', '?' * @param bucket_sort_size {unsigned} - The size for the bucket sort array */ ExtendedRegExp(const std::string& regexp, char wildcard=' ', unsigned bucket_sort_size=28) try : m_regexp(regexp), m_buckets(bucket_sort_size, 0), m_wildcard(127), m_begin(127) { //Initialise bucket sort
  • 4. std::replace(m_regexp.begin(), m_regexp.end(), ' ', char(127)); m_begin = *std::min_element(m_regexp.begin(), m_regexp.end()); for (auto chr : m_regexp) { ++m_buckets[chr != m_wildcard ? index(chr) : bucket_sort_size - 1]; } } catch (std::exception& except) { //Print any excpetion thrown std::cout << "Exception initialising bucket sort in constructor: " << except.what() << std::endl; } /** * Dtor */ ~ExtendedRegExp(void) { } /** * This returns the index of a char relative to the smallest char in the * regex * * Example: If the smallest char in the regex is 'a' then the index * of 'c' will be 3 * * @param chr {char} - the char whose relative "ascii" index is to be found * * @return {unsigned} - The relative index */ unsigned index(char chr) { return unsigned((unsigned char)chr) - unsigned((unsigned char)m_begin); }; /** * This matches a string against all permutations of the regex used to * initialise the class object. It uses the bucket sort array described * in the documentation for this class. Blank strings are a match and * strings matched against a substr of regex are also a match. * * @param word {const std::string&} - the char whose relative "ascii" index * is to be found * * @return {unsigned} - The relative index */ bool operator==(const std::string& word) { bool ret_val = false; auto buckets = m_buckets; auto size = buckets.size(); //Only consider words short enough to be spanned by regecp auto len = word.length(); if ((0 < len) && (len <= m_regexp.length())) { //Loop over chars in word for (auto chr : word) { //Decrement corresponding non-wildcard count in regexcp if ((index(chr) < (size-1)) && (buckets[index(chr)]>0)) { //Decrement char count ret_val = ((--buckets[index(chr)]) >= 0); } else //use wildcard if 0 non-wildcards left or not non-wildcard {
  • 5. //Decrement wildcard count ret_val = ((--buckets[size-1]) >= 0); } //Only continue if we have not encountered a non-match if (ret_val == false) { break; } } } return ((len > 0) ? ret_val : true); } /** * This returns the length of the regex used to initialise this class object. * * @return {unsigned} - The length of the rexgex. */ unsigned size(void) { return m_regexp.size(); } private: std::string m_regexp; std::vector<int> m_buckets; char m_wildcard; char m_begin; }; /** * This class iterates over the lines in a dictionary file seeking those that * match the regegular epxression provided. * * @tparam RegExp - The regular expression matching engine to be used. * The default is ExtendedRegExp. Any user-defined class must provide the * same public interface as ExtendedRegExp. * */ template<typename RegExp = ExtendedRegExp > class Parser { public: /** * This defines the "no-match" string pattern. * * @return {const std::string&} - The "no-match" pattern. */ static const std::string& no_match(void) { static const std::string no_match("!£$%^&&^%$££$%^&"); return no_match; } /** * @param dictionary {const std::string&} - The dictionary file to be parsed * @param regexp {const RegExp&} - The regular expression to be used for * matching */ Parser(const std::string& dictionary, const RegExp& regexp) : m_dictionary(dictionary), m_regexp(regexp) { } /** * Dtor
  • 6. */ ~Parser(void) { } /** * This resets the dictionary file and regular expression. * * @param dictionary {const std::string&} - The dictionary file to be parsed * @param regexp {const RegExp&} - The regular expression to be used for * matching */ void reset(const std::string& dictionary, const RegExp& regexp) { m_dictionary.close(); m_dictionary = std::ifstream(dictionary); m_regexp = regexp; } /** * This returns the next line in the dictionary file that is successfully * matched against the regular expression used to initiliase this class * object. * * @return {std::string} - The next line matched */ std::string get_next_match(void) { //Buffer for dictionary string and return value std::string ret_val = no_match(); bool ret = false; //Get length of regexp unsigned length = m_regexp.size(); //Verify file is good if (m_dictionary && (m_dictionary.eof() == false)) { //Loop over strings in file until next match is found std::getline(m_dictionary, ret_val); ret = (m_regexp == ret_val); while ((ret == false) && (m_dictionary.eof() == false)) { std::getline(m_dictionary, ret_val); ret = (m_regexp == ret_val); } } //Return the match return ret ? ret_val : no_match(); } private: std::ifstream m_dictionary; RegExp m_regexp; }; } /** * @} */ /** * addtogroup Tests * @{ */ namespace Tests { /**
  • 7. * Wrapper class for std::vector converting {a, b, c, ...} to "a b c ..." */ struct PrintVector { PrintVector(const std::vector<std::string>& vec) : m_vec(vec) { } std::string str() { std::stringstream ss; for (auto elem : m_vec) { ss << elem << " "; } return ss.str(); } std::vector<std::string> m_vec; }; /** * * ========= * * Test plan * * ========= * * 1. Extract n random lines from dictionary.txt, store in tmp_dictionary.txt. * n=500. * * 2. Randomly select k=0<= k < tmp_dictionary.size for several values of k * * 3. For each k, extract kth line from tmp_dictionary.txt and use as * pattern to ne matched (regex) * * 4. For each k, replace 0, 1, 2, ... all chars in regex with " ", * to test wildcard substitution to give regex'. * * 5. For each regex' extract all matching lines in tmp_dictionary and * store in "results" vector. * * 6. For each regex' iterate over all permutations of the characters. * * 7 For each permutation of regex' extract all matching lines from * tmp_dictionary.txt using std::regex_match and store in "control" vector. * * 8. Validate that results ==control. * * NB: It is unnecessary to consider all permutations of locations to * place the wildcards, (e.g. "..aa", ".a.a"), since the test already considers * all permutations of regex and thus covers this equivalence partition. * * @param dictionary_file {const std::string&} - Fully-qualified filename * for dictionary * */ void tests(const std::string& dictionary_file) { typedef std::vector<std::string> sv; //Namespaces for unit test framework and parser. using namespace UnitTest; using namespace RegexpProblem; //Vectors for results of parser algorithm // "control" generated by exhaustive search sv results; sv control;
  • 8. //The regexp to be matched and the matching lines from dictionary std::string regexp; std::string match; //Lambda to access file as file(line_number) std::string buffer; unsigned counter = 0; auto get_line = [&](std::fstream& stream, unsigned index) { while (counter != index) { std::getline(stream, buffer); ++counter; } return buffer; }; //Create small dictionary with random entries from original to speed tests. //Create set of random numbers to extract random lines from dictionary file const unsigned tmp_dictionary_size = 500; std::fstream dictionary(dictionary_file); std::fstream tmp_dictionary("tmp_dictionary.txt"); auto size = std::count(std::istreambuf_iterator<char>(dictionary), std::istreambuf_iterator<char>(), 'n'); dictionary.seekg(0); std::set<unsigned> random_lines; //std::random_device dev; std::mt19937 generator(0); std::uniform_int_distribution<> distr(1, size); for (int i = 1; i <= tmp_dictionary_size; ++i) { random_lines.insert(distr(generator)); } //Create temporary dictionary file using the random numbers for (auto random_line : random_lines) { get_line(dictionary, random_line); if (counter == random_line) { tmp_dictionary << buffer << std::endl; } } //Lmabda to perform exhaustive matching over all permuations of regex //to verify algorithm auto exhaustive_match = [&](std::string regexp) { std::cout << std::endl << "Verification requires exhaustive search, please wait"; //Loop over dictionary words while file is good while (tmp_dictionary && (tmp_dictionary.eof() == false)) { std::cout << "."; std::string buffer; std::getline(tmp_dictionary, buffer); std::string padded = (buffer.length() >= regexp.length() ? buffer : buffer + regexp.substr(buffer.length(), regexp.length())); std::string truncated = regexp.substr(0, buffer.length()); bool is_found = std::regex_match(buffer, std::regex(truncated)); //Sort regexp and remove duplicate chars for std::next_permuation // (it took me two hours to track down this this silly bug). unsigned count = 126; std::string tmp = regexp; for (auto& chr : tmp) { if (chr == '.')
  • 9. { chr = count; --count; } } std::sort(tmp.begin(), tmp.end()); //Loop over permutations of regex until match found or exhausted while ((buffer.length() <= regexp.length()) && (is_found == false) && std::next_permutation(tmp.begin(), tmp.end())) { //Undo the elimination of duplicates for std::next_permutation //and put them back in std::string tmp_1 = tmp; std::replace_if(tmp_1.begin(), tmp_1.end(), [&](char chr) {return (chr > count); }, '.'); std::string truncated = tmp_1.substr(0, buffer.length()); is_found = std::regex_match(buffer, std::regex(truncated)); } //Add matches to list of "control" values used for verification if (is_found) { control.push_back(buffer); } } std::cout << std::endl; }; //Create instances of parser with regexp = tmp_dictionary[random] unsigned trials = 1; for (unsigned trial = 1; trial <= trials; ++trial) { //Reset dictionary tmp_dictionary.seekg(0); counter = 0; //Get radnwom line from dictionary to act as regexp std::uniform_int_distribution<> rand_distr(1, tmp_dictionary_size); unsigned random = rand_distr(generator); regexp = get_line(tmp_dictionary, random); //Loop over num of wildcards to use (0 to all chars). //NB All-wildcards should match all strings of length=regexp.length auto num_wildcards = regexp.size(); for (unsigned num = 0; num <= num_wildcards; ++num) { results.clear(); control.clear(); //Replace relevant char with wildcard if (num > 0) { regexp.replace(num - 1, 1, " "); } //Reset dictionary //tmp_dictionary.seekg(0); tmp_dictionary.close(); counter = 0; //Create parser and loop over matches found Parser<> parser("tmp_dictionary.txt", ExtendedRegExp(regexp, ' ')); while ((match = parser.get_next_match()) != Parser<>::no_match()) { results.push_back(match); } //Perform exhaustive match search for verification tmp_dictionary.close(); tmp_dictionary.open("tmp_dictionary.txt"); counter = 0; //for regex - replace ' ' with '.' as wildcard
  • 10. auto tmp_regexp = regexp; //std::cout << "regex before= " << regexp << std::endl; std::replace(tmp_regexp.begin(), tmp_regexp.end(), ' ', '.'); //std::cout << "regex after= " << regexp << std::endl; exhaustive_match(tmp_regexp); //Verify algorithm against exhaustive match search VERIFY(std::string("Regexp = ") + """ + tmp_regexp + """, PrintVector(results).str()) == PrintVector(control).str(); } } } } /** * @} */ int main(void) { using namespace Tests; using namespace UnitTest; using namespace RegexpProblem; /* //This struct pauses at the end of tests to print out results struct BreakPointAfterMainExits { BreakPointAfterMainExits(void) { static BreakPointAfterMainExits tmp; } ~BreakPointAfterMainExits(void) { unsigned set_bp_here_for_test_results = 0; } } dummy; */ std::string pattern; std::cout << "Please specify the pattern to be matched" << std::endl; std::getline(std::cin, pattern, 'n'); std::cout << pattern << std::endl; std::string dictionary; std::cout << "Please specify the fully-qualified dictionary filename" << std::endl; std::cin >> dictionary; std::cout << dictionary << std::endl; std::string results_file; std::cout << "Please specify the fully-qualified filename for the results" << std::endl; std::cin >> results_file; std::cout << results_file << std::endl; //Run tests char yes_no; std::cout << "Do you wish to run the tests (they run slowly on Windows " << "and quickly under Linux)? - (y/n):"; std::cin >> yes_no; if (yes_no == 'y' || yes_no == 'Y') { tests(dictionary); Verify<Results> results; } //Match dictionary.tmp against " carnage", send results to results.txt std::cout << "matching strings in dictionary.txt against " "
  • 11. << pattern << " " " << " Results will be sent to " << results_file << ". Please wait ..." << std::endl; Parser<> parser(dictionary, ExtendedRegExp(pattern, ' ')); std::string match; std::ofstream output(results_file); while ((match = parser.get_next_match()) != Parser<>::no_match()) { output << match << std::endl; } return 0; }