Seja a primeira pessoa a gostar disto
Automatic text correction is one of the human-computer interaction challenges. It is directly interposed with several application areas like post handwritten text digitizing correction or indirectly such as user's queries correction before applying a retrieval process in interactive databases.
Automatic text correction process passes through two major phases: error detection and candidates suggestion. Techniques for both phases are categorized into: Procedural and statistical. Procedural techniques are based on using rules to govern texts acceptability, including Natural Language Processing Techniques. Statistical techniques, on the other hand, are dependent on statistics and probabilities collected from large corpus based on what is commonly used by humans.
In this work, natural language processing techniques are used as bases for analysis and both spell and grammar acceptance checking of English texts. A prefix dependent hash-indexing scheme is used to shorten the time of looking up the underhand dictionary which contains all English tokens. The dictionary is used as a base for the error detection process.
Candidates generation is based on calculating source token similarity, measured using an improved Levenshtein method, to the dictionary tokens and ranking them accordingly; however this process is time extensive, therefore, tokens are divided into smaller groups according to spell similarity in such a way keeps the random access availability. Finally, candidates suggestion involves examining a set of commonly committed mistakes related features. The system selects the optimal candidate which provides the highest suitability and doesn't violate grammar rules to generate linguistically accepted text.
Testing the system accuracy showed better results than Microsoft Word and some other systems. The enhanced similarity measure reduced the time complexity to be on the boundaries of the original Levenshtein method with an additional error type discovery.