1. ParSyll Algorithm
While the code and test environment still refers to SYLLABIX (the earlier name assigned to the prototype
algorithm prior to the year 2000), it has been renamed due to the fact that a game with the name
Syllabix is now in existence.
At some time the program names, files and environment will be updated to reflect the new name. In the
interim, rights are claimed by way of use, reference, communication and publication including this very
document now emailed, distributed and reflected in electronic media.
Copyright and right are claimed in terms of the Berne Copyright Convention and in terms of the
Copyright Act 98 of 1978 of South Africa. No part of this publication or of the program(s) or any
associated code may be reproduced or transmitted in any form or by any means, electronic or
mechanical, including photocopying, recording or by any information storage and retrieval system,
without permission in writing from the author Trevor Nigel Gadd. All rights reserved.
The following is a brief description of the algorithm.
The purpose of the algorithm is to segment written words and names into auto-determined 'syllables'
which are then interpreted phonetically to a degree, and used to construct a retrieval 'code' that
inherently 'groups' like-sounding words or names together to 'broaden' search results during a textual
enquiry.
It is important to note that the ParSyll Algorithm does not attempt to emulate dictionary syllable
definition. It uses instead, raw logic to attempt syllable segmentation in isolation from referential data
and NO WHOLE WORDS are stored or referenced in the execution of its task.
The algorithm is divided into eight major segments, executed sequentially :-
1. An initial segmentation
1.1 Incorporates some temporary special character-sequence augmentation
which is deleted again at the end of initial segmentation
2. Diphthongs and Triphthongs
2.1 Segmentation is based on 'majority-fit' solutions, resulting in some
incorrect sound-splits and conjoins (is 'ruin' one syllable or two?
'IENCE' in SCIENCE? 'IENCE' in CONSCIENCE? etc.)
3. Complex segmentation
4. Ending sound segmentation
4.1 Some sequences, eg 'NG' in the middle of a word might be split as a
result of syllable segmentation, eg '..N~G..'. The same sequence in
an ending sound or final syllable might not, eg. '~ING'
5. Phonetic substitutions
The phonetic substitutions of PHONIX are established and documented.
It is anticipated that syllable segmentation will enable different,
if similar, substitutions to be defined. Significantly, simpler
substitutions may suffice by virtue of the 'added definition' of
2. syllable boundaries.
5.1 First Syllable
5.1.1 Leading character substitutions
5.1.2 Embedded & trailing character substitutions and negations
5.2 Middle Syllables Substitutions
5.2.1 General character substitutions and negations
5.3 Last Syllable Substitutions
5.3.1 Ending-sound substitutions and negations
6. Elimination of carrier vocalization (vowels) and 'silent' consonants
7. Character mapping to broaden search results
8. Indexing of results for retrieval purposes
Data evaluation of results for the development of algorithm segment 5 is underway
T.N. Gadd
22 December 2015