This document outlines a natural language processing engine built in C# that defines tokens and sentences as strongly-typed classes and methods. It uses object-oriented principles like inheritance to model real-world relationships. The engine builds an efficient parsing graph at startup to parse inputs into tokens, handles contextual conversations by tracking history, and was designed for use in home automation but can integrate with various interfaces. Future plans include expanding its knowledge corpus and improving performance.
2. Introduction This presentation outlines the C# Natural Language Engine used by the ‘Smartest House in the World’, a home automation system developed by the author If you are interested in using the C# Natural Language Engine presented here in a commercial product please contact the author.
3. C# Natural LanguageEngine Existing Natural LanguageEngines Have a large, STATIC dictionary data file Can parsecomplex sentence structure Hand back a tree of tokens (strings) Don’thandle conversations C# NLP Engine Definesstrongly-typedtokens in code Uses type inheritance to model ‘is a’ Defines sentences in code Rulesengineexecutes sentences Understandscontext (conversation history)
4. Sample conversation … Complex temporal expressions … Ask it to play music … become database queries Handles async conversations Understands names …
5. Goals Makeiteasy to definetokens and sentences (not XML) Safe, compile-time checkeddefinition of the syntax and grammar (not XML) Model real-world inheritancewith C# class inheritance: ‘a labrador’ is ‘a dog’ is ‘an animal’ is ‘a thing’ Handleambiguity, e.g.playsomethingin the air tonightin the kitchenremind me at 4pm to call johnat 5pm
7. Tokens - TokenDefinition A hierarchy of Token-derived classes Uses inheritance, e.g. TokenOnis a TokenOnOffis a TokenStateis a Token This allows a single sentence rule to handle multiple cases, e.g. On and Off Derivedfrom base Token class Simple tokens are a set of words e.g. « is | are » Complextokens have a parser e.g. TokenDouble
8. A Simple TokenDefinition publicclassTokenPersonalPronoun: TokenGenericNoun{ internalstaticstringwordz { get { return"he,him,she,her,them"; } } } Recognizesany of the wordsspecified Can use inheritance (as in thisexamplewhere a PersonalPronounismodelled as a subclass of GenericNoun)
9. A ComplexToken publicabstractclassTokenNumber: Token { publicstaticIEnumerable<TokenResult> Initialize(string input) { … Initializemethodparses input and returns one or more possible parses. TokenNumberis a good example: Parsesanynumeric value and returns one or more of TokenInt, TokenLong, TokenIntOrdinal, TokenDouble, or TokenPercentageresults.
10. The catch-all TokenPhrase publicclassTokenPhrase : Token TokenPhrase matches anything, especiallyanything in quote marks add a remindercall Brunoat 4pm Sentence signature couldbe (…, TokenAdd, TokenReminder, TokenPhrase, TokenExactTime) This would match the ruletoo … add a reminderdiscuss 6pm conference call with Bruno at 4pm
11. TemporalTokens A complete set of tokens and related classes for representing time Point in time, e.g. todayat 5pm Approximate time, e.g. whocalledat 5pm today Finitesequence, e.g. every Thursday in May 2009 Infinitesequence, e.g. every Thursday Ambiguous time withcontext, e.g. remind me on Tuesday (contextmeansitisnext Tuesday) Null time Unknowable/incomprehensible time
12. TemporalTokens (Cont.) Code to merge any sequence of temporal tokens to the smallest canonical representation, e.g. the first thursday in may 2009 {TIMETHEFIRST the first} + {THURSDAY thursday} + {MAY in may} + {INT 2009 -> 2009} [TEMPORALSETFINITESINGLEINTERVAL [Thursday 5/7/2009] ]
13. TemporalTokens (Cont.) Finite TemporalClasses provide A way to enumerate the DateTimeRanges they cover All TemporalClasses provide A LINQ expression generator and Entity-SQL expression generator allowing them to be used to query a database
14. Existing Token Types Numbers (double, long, int, percentage, phone, temperature) File names, Directories URLs, Domain names Names, Companies, Addresses Rooms, Lights, Sensors, Sprinklers, … States (On, Off, Dim, Bright, Loud, Quiet, …) Units of Time, Weight, Distance Songs, albums, artists, genres, tags Temporal expressions Commands, verbs, nouns, pronouns, …
15. Rules - A simple rule ///<summary> /// Set a light to a given state ///</summary> privatestaticvoidLightState(NLPStatest, TokenLighttlight, TokenStateOnOffts) { if (ts.IsTrueState == true) tlight.ForceOn(st.Actor); if (ts.IsTrueState == false) tlight.ForceOff(st.Actor); st.Say("I turned it " + ts.LowerCased); } Any method matching this signature is a sentence rule:- NLPState, Token* Rule matching respects inheritance, and variable repeats … … (NLPStatest, TokenThingtt, TokenStatetokenState, TokenTimeConstraint[] constraints)
16. State - NLPState Every sentence method takes an NLPState first parameter State includes RememberedObject(s) allowing sentences to react to anything that happened earlier in a conversation Non-interactive uses can pass a dummy state State can be per-user or per-conversation for non-realtime conversations like email
17. User Interface Works with a variety of user interfaces Chat (e.g Jabber/Gtalk) Web chat Email Calendar (do X at time Y) Rich client application
18. Token and Rule Discovery No configuration needed: all Tokens and Rules are discovered using reflection Builds a recursive descent parser tree on startup to efficiently parse any token stream Dependency injection like code to call rules methods based on matching token sequences Parser can handle array parameters as well as single parameters for more flexibility
19. Summary Strongly-typednaturallanguageengine Compile time checking, inheritance, … Definetokens and sentences (rules) in C# Strongly-typedtokens: numbers, percentages, times, dates, file names, urls, people, business objects, … Builds an efficient parse graph Tracks conversation history
20. Future plans Expanded corpus of knowledge Companynames, locations, documents, … Performance improvements Onlytryparsingtokensvalid for currentparsetree state .NET 4 Optional Arguments Account for these in reflection code duringparsetreecreation GenerateiCal/GdataRecurrence FromTimeExpressions