SlideShare uma empresa Scribd logo
1 de 4
Baixar para ler offline
COMSM1402 Advanced Algorithms 2010
                             Rapha¨l Clifford
                                  e
                            November 5, 2010

   Due: The coursework should be handed in at the start of the lecture on
Friday, 17 December. This is both the normal and late deadline. Online sub-
missions can be made up to midnight. Your marks will be based on the best
five answers out of the first six questions plus your mark for question seven.
Problems:
  1. The purpose of this question is to show a weakly universal class of hash
                                       √
     functions H for which E[M ] =       n − 1 . M is the maximum load as-
     suming n items are hashed into n slots using a universal family of hash
                                                        def
     functions. For positive n, we use the notation [n] = {0, . . . , n − 1}.
     Define a family H of hash functions from [n] to [n] as follows. Let be
     an integer, with 1 ≤ ≤ n. For each V ⊂ [n] of cardinality , we define
     the hash function hV : [n] → [n] by the following property. hV maps each
     element of V onto 0, and hV maps [n]V injectively into [n]{0}. Note
     that hV is not uniquely determined by this property, but we can always
     choose one hV satisfying this property (verify). Define
                           H := {hV : V ⊂ [n], |V | = }
                                               √
     Argue that H is weakly universal if   ≤       n − 1. Note that the maximum
     load always equals .
     [10 points]
  2. The following approach is useful in streaming algorithms; you should think
     about why this might be. Suppose that we have a sequence of items,
     passing by one at a time. We want to maintain a sample of one item that
     has the property that it is uniformly distributed over all the items that
     we have seen at each step. Moreover, we want to accomplish this without
     knowing the total number of items in advance or storing all of the items
     that we see. Consider the following algorithm, which stores just one item
     in memory at all times. When the first item appears, it is stored in the
     memory. When the kth item appears, it replaces the item in memory with
     probability 1/k. Explain why this algorithm solves the problem.
     Now suppose instead we want a sample of s items instead of just one,
     without replacement. That is, we don’t want to get the same item multiple


                                      1
times in our sample. If this weren’t an issue, we could get a sample of s
   items with replacement just by running s independent copies of the above.
   Generalize the above process to that case. (Hint: start by taking the first
   s items and storing them as your sample. With what probability should
   each new item come into the sample?) [10 points]

3. The simplest variant of cuckoo hashing is as follows. There is a table with
   m cells. Each element x can hash into exactly two locations, given by hash
   functions, h1 (x) and h2 (x). When an item is placed into the hash table,
   if at least one of these two location is free, the item is placed in the free
   location. If neither locations is free, x is placed in one of the two locations,
   and kicks out the element y that is in that location. Then y is placed in
   its alternative location. If that location is free, then all is well, and y is
   placed there. Otherwise, y must kick out the element in that location,
   and this new element must try to move to its alternative location, and so
   on.
   It is possible that, at some point, the process will loop. The loop can
   either be found explicitly, or a limit on the number of times elements can
   be kicked out can be enforced and the whole dataset rehashed if this limit
   is ever reached.
   One way to generalise this is to use more than two hash functions so that
   each element has more than two alternatives for which element to kick
   out randomly at each step. The task is to implement a generalised variant
   of cuckoo hashing. You should make a choice about how you will create
   the hash functions and explain it clearly in terms of the randomness and
   independence you are using. You could for example, simply toss some
   coins if you only need a small number of random bits to start off. Feel
   free to try different hash function families and report on what effect, if
   any, this has. You may also want to experiment with creating random
   numbers using methods described in the lectures or otherwise. In your
   experiments, use a table of size 8192, and add elements until the first time
   you cannot add an element. (For convenience, you may assume an element
   cannot be added if, after repeating the kick out step 20 times, you are not
   done.) Using 2 hash functions and then 3 hash functions, and running
   the experiment 1000 times, examine how full the hash table can be before
   problems start to occur. Compare your results with the bounds from the
   theory and discuss what you find. For this problem, please submit your
   code.
   You can choose any programming language you like, but please include
   clear instructions on how to run your code on a lab machine in a file called
   readme.txt that is included with your submission.
   [10 points]
4. This question has two parts. A naive implementation of a van Emde Boas
   tree uses O(|U |) space, where |U | is the universe size. Explain in detail


                                       2
how this can be reduced to O(n) space (where n is the number of elements
  to be stored). What are the complexities of the different operations in your
  reduced space data stucture?
  The van Emde Boas tree layout can be used to implement a number of
  other data stuctures and to speed up important applications. Find an
  example from the literature and explain in detail how the van Emde Boas
  tree improves the time complexities of the relevant operations. Your ex-
  planation should give suitable citations and ideally provide proofs of any
  results you report.
  [10 points]

5. Consider the following pattern matching problem involving wildcard sym-
   bols. A single character wildcard is said to match any other symbol in the
   input alphabet.
  INPUT: Text T = t1 . . . tn , pattern P = p1 . . . pm . At most of the pat-
  tern characters pi are non-wildcards (i.e. normal characters) and the rest
  single character wildcards.
  OUTPUT: The Hamming distance between P and every substring of T of
  length m.

  Example: let p = ab?ab and text t = b?bbabba and       = 4. The output is
  3, 0, 2, 4.


   (a) Give an algorithm that solves this problem.
   (b) What is the asymptotic time complexity of your algorithm? Make
       sure to explain your working carefully.

  The better the time complexity, the more marks will be awarded. In
  particular, extra marks will be given for fast solutions whose running time
  is parameterised by as well as n and m. A Θ(nm) time solution will gain
  no marks.
  You can assume it takes no more than log2 n bits (i.e a single word of
  memory) to represent any of the input symbols and that simple arithmetic
  operations on the input symbols, including addition and multiplication
  take constant time.
  [10 points]
6. (a) The recurrence for the running time of the algorithm for computing
       a suffix array presented in lectures is T (n) = T (2n/3) + O(n). Show
       how to modify the algorithm to give one whose recurrence is T (n) =
       T (3n/7) + O(n). Is 3/7 the best possible, or can you do better?
   (b) Suppose we have a pattern p and a text t and we want to find for every
       position in t the longest substring of p that matches there exactly.


                                    3
Give a fast algorithm to solve this problem together with its analysis.
          The better the time complexity, the more marks will be awarded.
     [10 points]
  7. For this question you are asked to write a two page summary of a research
     paper. I would like you to choose a highly cited paper from one of the
     leading algorithms conferences to write about. Luckily there is already a
     website (http://www.cs.utah.edu/~suresh/citations/) that has been
     through the papers written from 1997–2006 for FOCS, STOC and SODA
     (look up what these stand for) and counted the citation numbers for you
     although these numbers are now underestimates in most cases. Alter-
     natively you may choose a paper from any of the conferences listed at
     http://www.cs.tau.ac.il/~iftgam/eventlist.htm. You should check
     on http://scholar.google.com that any paper you choose has a current
     citation count of at least one hundred.
     Please post the title of the paper, its authors, the conference name and
     the number of citations on the unit forum as soon as you have made your
     choice. You may not, of course, choose the same paper as someone else.
     Your two page review should include:
        • A short one or two paragraph summary of the paper.
        • A deeper, more extensive outline of the main points of the paper,
          including for example assumptions made, arguments presented, data
          analyzed, and conclusions drawn.
        • Any limitations or extensions you see for the ideas in the paper.
        • Your opinion of the paper; primarily, the quality of the ideas and its
          real or potential impact.

     [30 points]

Academic Integrity: All the work you hand in should be your own. If you
work with other students, you should list them on your coursework along with
a brief explanation of which topics you discussed. In general, any source other
than the lectures should be explicitly cited at the point where it is used.




                                       4

Mais conteúdo relacionado

Mais procurados

Concurrent Hashing and Natural Parallelism : The Art of Multiprocessor Progra...
Concurrent Hashing and Natural Parallelism : The Art of Multiprocessor Progra...Concurrent Hashing and Natural Parallelism : The Art of Multiprocessor Progra...
Concurrent Hashing and Natural Parallelism : The Art of Multiprocessor Progra...
Subhajit Sahu
 
Concept of hashing
Concept of hashingConcept of hashing
Concept of hashing
Rafi Dar
 

Mais procurados (20)

Concurrent Hashing and Natural Parallelism : The Art of Multiprocessor Progra...
Concurrent Hashing and Natural Parallelism : The Art of Multiprocessor Progra...Concurrent Hashing and Natural Parallelism : The Art of Multiprocessor Progra...
Concurrent Hashing and Natural Parallelism : The Art of Multiprocessor Progra...
 
Recursion and Sorting Algorithms
Recursion and Sorting AlgorithmsRecursion and Sorting Algorithms
Recursion and Sorting Algorithms
 
Hashing
HashingHashing
Hashing
 
Unit i
Unit iUnit i
Unit i
 
358 33 powerpoint-slides_15-hashing-collision_chapter-15
358 33 powerpoint-slides_15-hashing-collision_chapter-15358 33 powerpoint-slides_15-hashing-collision_chapter-15
358 33 powerpoint-slides_15-hashing-collision_chapter-15
 
A Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation ProblemA Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation Problem
 
Concept of hashing
Concept of hashingConcept of hashing
Concept of hashing
 
Data structures and algorithms
Data structures and algorithmsData structures and algorithms
Data structures and algorithms
 
Arrays in Data Structure and Algorithm
Arrays in Data Structure and Algorithm Arrays in Data Structure and Algorithm
Arrays in Data Structure and Algorithm
 
Hashing
HashingHashing
Hashing
 
Unit 4
Unit 4Unit 4
Unit 4
 
Data Structure and Algorithms
Data Structure and Algorithms Data Structure and Algorithms
Data Structure and Algorithms
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
4.4 hashing
4.4 hashing4.4 hashing
4.4 hashing
 
Learn from Example and Learn Probabilistic Model
Learn from Example and Learn Probabilistic ModelLearn from Example and Learn Probabilistic Model
Learn from Example and Learn Probabilistic Model
 
Hash tables
Hash tablesHash tables
Hash tables
 
Quadratic probing
Quadratic probingQuadratic probing
Quadratic probing
 
Data Streaming Algorithms
Data Streaming AlgorithmsData Streaming Algorithms
Data Streaming Algorithms
 
Cs341
Cs341Cs341
Cs341
 
Non-Bayesian Additive Regularization for Multimodal Topic Modeling of Large C...
Non-Bayesian Additive Regularization for Multimodal Topic Modeling of Large C...Non-Bayesian Additive Regularization for Multimodal Topic Modeling of Large C...
Non-Bayesian Additive Regularization for Multimodal Topic Modeling of Large C...
 

Destaque (9)

Presentation LAN 430 revised
Presentation LAN 430 revised Presentation LAN 430 revised
Presentation LAN 430 revised
 
Leave A Legacy Guide May 2012
Leave A Legacy Guide May 2012Leave A Legacy Guide May 2012
Leave A Legacy Guide May 2012
 
Industry internship project 2014-16
Industry internship project 2014-16Industry internship project 2014-16
Industry internship project 2014-16
 
Dhan
DhanDhan
Dhan
 
Meridian overview
Meridian overviewMeridian overview
Meridian overview
 
JMJ
JMJJMJ
JMJ
 
система Deitermann !
система Deitermann !система Deitermann !
система Deitermann !
 
презентация лицей
презентация лицейпрезентация лицей
презентация лицей
 
Companies That Can SPREAD THEIR STORIES Are Winning.
Companies That Can SPREAD THEIR STORIES Are Winning.Companies That Can SPREAD THEIR STORIES Are Winning.
Companies That Can SPREAD THEIR STORIES Are Winning.
 

Semelhante a Cwkaa 2010

hw1.docxCS 211 Homework #1Please complete the homework problem.docx
hw1.docxCS 211 Homework #1Please complete the homework problem.docxhw1.docxCS 211 Homework #1Please complete the homework problem.docx
hw1.docxCS 211 Homework #1Please complete the homework problem.docx
wellesleyterresa
 
Aad introduction
Aad introductionAad introduction
Aad introduction
Mr SMAK
 
Sienna 2 analysis
Sienna 2 analysisSienna 2 analysis
Sienna 2 analysis
chidabdu
 

Semelhante a Cwkaa 2010 (20)

Algorithm Homework Help
Algorithm Homework HelpAlgorithm Homework Help
Algorithm Homework Help
 
Computer Science Exam Help
Computer Science Exam Help Computer Science Exam Help
Computer Science Exam Help
 
tutorial.ppt
tutorial.ppttutorial.ppt
tutorial.ppt
 
Algorithms notes tutorials duniya
Algorithms notes   tutorials duniyaAlgorithms notes   tutorials duniya
Algorithms notes tutorials duniya
 
Algorithms Exam Help
Algorithms Exam HelpAlgorithms Exam Help
Algorithms Exam Help
 
hw1.docxCS 211 Homework #1Please complete the homework problem.docx
hw1.docxCS 211 Homework #1Please complete the homework problem.docxhw1.docxCS 211 Homework #1Please complete the homework problem.docx
hw1.docxCS 211 Homework #1Please complete the homework problem.docx
 
Programming Exam Help
Programming Exam Help Programming Exam Help
Programming Exam Help
 
Algorithm Assignment Help
Algorithm Assignment HelpAlgorithm Assignment Help
Algorithm Assignment Help
 
A Probabilistic Attack On NP-Complete Problems
A Probabilistic Attack On NP-Complete ProblemsA Probabilistic Attack On NP-Complete Problems
A Probabilistic Attack On NP-Complete Problems
 
Perform brute force
Perform brute forcePerform brute force
Perform brute force
 
Mit6 006 f11_quiz1
Mit6 006 f11_quiz1Mit6 006 f11_quiz1
Mit6 006 f11_quiz1
 
Hub102 - JS - Lesson3
Hub102 - JS - Lesson3Hub102 - JS - Lesson3
Hub102 - JS - Lesson3
 
Perspective in Informatics 3 - Assignment 2 - Answer Sheet
Perspective in Informatics 3 - Assignment 2 - Answer SheetPerspective in Informatics 3 - Assignment 2 - Answer Sheet
Perspective in Informatics 3 - Assignment 2 - Answer Sheet
 
A01
A01A01
A01
 
Create and analyse programs
Create and analyse programsCreate and analyse programs
Create and analyse programs
 
Design and analysis of algorithms question paper 2015 tutorialsduniya.com
Design and analysis of algorithms  question paper 2015   tutorialsduniya.comDesign and analysis of algorithms  question paper 2015   tutorialsduniya.com
Design and analysis of algorithms question paper 2015 tutorialsduniya.com
 
C++ Notes by Hisham Ahmed Rizvi for Class 12th Board Exams
C++ Notes by Hisham Ahmed Rizvi for Class 12th Board ExamsC++ Notes by Hisham Ahmed Rizvi for Class 12th Board Exams
C++ Notes by Hisham Ahmed Rizvi for Class 12th Board Exams
 
Aad introduction
Aad introductionAad introduction
Aad introduction
 
DA lecture 3.pptx
DA lecture 3.pptxDA lecture 3.pptx
DA lecture 3.pptx
 
Sienna 2 analysis
Sienna 2 analysisSienna 2 analysis
Sienna 2 analysis
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Cwkaa 2010

  • 1. COMSM1402 Advanced Algorithms 2010 Rapha¨l Clifford e November 5, 2010 Due: The coursework should be handed in at the start of the lecture on Friday, 17 December. This is both the normal and late deadline. Online sub- missions can be made up to midnight. Your marks will be based on the best five answers out of the first six questions plus your mark for question seven. Problems: 1. The purpose of this question is to show a weakly universal class of hash √ functions H for which E[M ] = n − 1 . M is the maximum load as- suming n items are hashed into n slots using a universal family of hash def functions. For positive n, we use the notation [n] = {0, . . . , n − 1}. Define a family H of hash functions from [n] to [n] as follows. Let be an integer, with 1 ≤ ≤ n. For each V ⊂ [n] of cardinality , we define the hash function hV : [n] → [n] by the following property. hV maps each element of V onto 0, and hV maps [n]V injectively into [n]{0}. Note that hV is not uniquely determined by this property, but we can always choose one hV satisfying this property (verify). Define H := {hV : V ⊂ [n], |V | = } √ Argue that H is weakly universal if ≤ n − 1. Note that the maximum load always equals . [10 points] 2. The following approach is useful in streaming algorithms; you should think about why this might be. Suppose that we have a sequence of items, passing by one at a time. We want to maintain a sample of one item that has the property that it is uniformly distributed over all the items that we have seen at each step. Moreover, we want to accomplish this without knowing the total number of items in advance or storing all of the items that we see. Consider the following algorithm, which stores just one item in memory at all times. When the first item appears, it is stored in the memory. When the kth item appears, it replaces the item in memory with probability 1/k. Explain why this algorithm solves the problem. Now suppose instead we want a sample of s items instead of just one, without replacement. That is, we don’t want to get the same item multiple 1
  • 2. times in our sample. If this weren’t an issue, we could get a sample of s items with replacement just by running s independent copies of the above. Generalize the above process to that case. (Hint: start by taking the first s items and storing them as your sample. With what probability should each new item come into the sample?) [10 points] 3. The simplest variant of cuckoo hashing is as follows. There is a table with m cells. Each element x can hash into exactly two locations, given by hash functions, h1 (x) and h2 (x). When an item is placed into the hash table, if at least one of these two location is free, the item is placed in the free location. If neither locations is free, x is placed in one of the two locations, and kicks out the element y that is in that location. Then y is placed in its alternative location. If that location is free, then all is well, and y is placed there. Otherwise, y must kick out the element in that location, and this new element must try to move to its alternative location, and so on. It is possible that, at some point, the process will loop. The loop can either be found explicitly, or a limit on the number of times elements can be kicked out can be enforced and the whole dataset rehashed if this limit is ever reached. One way to generalise this is to use more than two hash functions so that each element has more than two alternatives for which element to kick out randomly at each step. The task is to implement a generalised variant of cuckoo hashing. You should make a choice about how you will create the hash functions and explain it clearly in terms of the randomness and independence you are using. You could for example, simply toss some coins if you only need a small number of random bits to start off. Feel free to try different hash function families and report on what effect, if any, this has. You may also want to experiment with creating random numbers using methods described in the lectures or otherwise. In your experiments, use a table of size 8192, and add elements until the first time you cannot add an element. (For convenience, you may assume an element cannot be added if, after repeating the kick out step 20 times, you are not done.) Using 2 hash functions and then 3 hash functions, and running the experiment 1000 times, examine how full the hash table can be before problems start to occur. Compare your results with the bounds from the theory and discuss what you find. For this problem, please submit your code. You can choose any programming language you like, but please include clear instructions on how to run your code on a lab machine in a file called readme.txt that is included with your submission. [10 points] 4. This question has two parts. A naive implementation of a van Emde Boas tree uses O(|U |) space, where |U | is the universe size. Explain in detail 2
  • 3. how this can be reduced to O(n) space (where n is the number of elements to be stored). What are the complexities of the different operations in your reduced space data stucture? The van Emde Boas tree layout can be used to implement a number of other data stuctures and to speed up important applications. Find an example from the literature and explain in detail how the van Emde Boas tree improves the time complexities of the relevant operations. Your ex- planation should give suitable citations and ideally provide proofs of any results you report. [10 points] 5. Consider the following pattern matching problem involving wildcard sym- bols. A single character wildcard is said to match any other symbol in the input alphabet. INPUT: Text T = t1 . . . tn , pattern P = p1 . . . pm . At most of the pat- tern characters pi are non-wildcards (i.e. normal characters) and the rest single character wildcards. OUTPUT: The Hamming distance between P and every substring of T of length m. Example: let p = ab?ab and text t = b?bbabba and = 4. The output is 3, 0, 2, 4. (a) Give an algorithm that solves this problem. (b) What is the asymptotic time complexity of your algorithm? Make sure to explain your working carefully. The better the time complexity, the more marks will be awarded. In particular, extra marks will be given for fast solutions whose running time is parameterised by as well as n and m. A Θ(nm) time solution will gain no marks. You can assume it takes no more than log2 n bits (i.e a single word of memory) to represent any of the input symbols and that simple arithmetic operations on the input symbols, including addition and multiplication take constant time. [10 points] 6. (a) The recurrence for the running time of the algorithm for computing a suffix array presented in lectures is T (n) = T (2n/3) + O(n). Show how to modify the algorithm to give one whose recurrence is T (n) = T (3n/7) + O(n). Is 3/7 the best possible, or can you do better? (b) Suppose we have a pattern p and a text t and we want to find for every position in t the longest substring of p that matches there exactly. 3
  • 4. Give a fast algorithm to solve this problem together with its analysis. The better the time complexity, the more marks will be awarded. [10 points] 7. For this question you are asked to write a two page summary of a research paper. I would like you to choose a highly cited paper from one of the leading algorithms conferences to write about. Luckily there is already a website (http://www.cs.utah.edu/~suresh/citations/) that has been through the papers written from 1997–2006 for FOCS, STOC and SODA (look up what these stand for) and counted the citation numbers for you although these numbers are now underestimates in most cases. Alter- natively you may choose a paper from any of the conferences listed at http://www.cs.tau.ac.il/~iftgam/eventlist.htm. You should check on http://scholar.google.com that any paper you choose has a current citation count of at least one hundred. Please post the title of the paper, its authors, the conference name and the number of citations on the unit forum as soon as you have made your choice. You may not, of course, choose the same paper as someone else. Your two page review should include: • A short one or two paragraph summary of the paper. • A deeper, more extensive outline of the main points of the paper, including for example assumptions made, arguments presented, data analyzed, and conclusions drawn. • Any limitations or extensions you see for the ideas in the paper. • Your opinion of the paper; primarily, the quality of the ideas and its real or potential impact. [30 points] Academic Integrity: All the work you hand in should be your own. If you work with other students, you should list them on your coursework along with a brief explanation of which topics you discussed. In general, any source other than the lectures should be explicitly cited at the point where it is used. 4