January 2015 Archives

Voice interactive ... again.

"Once more unto the breach, dear friends"...

It seems like I was only here not long ago.  This time I was at it with OSX, taking the opportunity to learn a little objective-c and play with one of the nosql databases which seem to be popular these days.  I decided to play with elastic search as my data store.  The plan is (was) to use the built in enhanced dictation as the recognizer, and Apple's AppKit NSSpeechSynthesizer to create an interactive dialog system.  

The basic loop is kicked off by starting the recognizer.  When it finishes recognizing a phrase (uninterrupted speech followed by a pause), it gives me a controlTextDidChange notification.  I then interrupt the recognizer, and pass the message via another notification to a central controller.  The recognized phrase goes through 4 phases:  direct recognition, query engine, personalized recognition, and then a AIML based chat bot.  In other words... ya I'm just re-inventing Siri... and doing it poorly mind you.  
Direct recognition is exact match for some canned queries.  "When is sunset tomorrow?" or "What time is it?".  The phrases can get interesting when you consider location, so I dumped zip code + city name + lat & long data into es, and make queries to earthtools.org to determine for example sunrise and sunset in any city (in the US).  It does seem as if their data doesn't jive with weather.com (I didn't like their api) but in the end I'm just doing this for fun so accuracy doesn't matter as much as "oh shiny".  

Theres a pretty cool NSLinguisticTagger that can be handy.  This page was a pretty useful jumping off point into the world of LinguisticTagger and I've really only scratched the surface.   For example, given the query "WHO WROTE THE DECLARATION OF INDEPENDENCE" I get back:

2015-01-20 23:47:22.596 Chatter[52212:595641] WHO: Pronoun

2015-01-20 23:47:22.596 Chatter[52212:595641] WROTE: Verb

2015-01-20 23:47:22.596 Chatter[52212:595641] THE: Determiner

2015-01-20 23:47:22.596 Chatter[52212:595641] DECLARATION: Noun

2015-01-20 23:47:22.597 Chatter[52212:595641] OF: Preposition

2015-01-20 23:47:22.597 Chatter[52212:595641] INDEPENDENCE: Noun

Granted thats not how I dealt with the above query.  I figured using a knowledge system approach would be best for a query engine.  So I have a collection of facts dumped in one path of es: 

Thomas Jefferson wrote the Declaration of Independence

George Washington was the first President of the United States

Adolf Hilter was the leader of the nazi party

Robert Oppenheimer was the father of the atomic bomb

Albert Einstein developed the theory of relativity

Bill Clinton was the 42nd President of the United States

... and so on.

Given any who question, we can then chop the "who" and search the rest:

"who developed the theory of relativity?"  --> becomes an es query on "developed the theory of relativity" --> becomes the response "Albert Einstein developed the theory of relativity."  There are some similar tricks for the "where" and "when".  The "how" is a bit tougher.  And these are of course simple tricks, not conclusive or exhaustive.  The good news is that knowledge representation is a fairly well known area, and given the internet and google, wikipedia, and wolfram alpha, we have access to a very nice data source.  

The personalized recognition was more for things like location, interests, calendar, birthdays, contact information, identifying relationships, etc.  The idea being that as you interact with the system, it should "remember" information about you, your relationships, and more.  At some point the system should know that I have a child (if I've mentioned it) and even one day spontaneously inquire as to the child's well being.  Still working on this... ;-)  

The last part is the "fall back".  AIML chat bots have gone a pretty long way at simulating basic conversation skills.  If the other systems didn't respond with anything it would be up to an AIML base chat bot to reply.

The response is fed to NSSpeechSynthesizer, which we act as a delegate for.  When we receive the notification didFinishSpeaking then we know that we can re-enable the recognizer for the next round.

Alas other projects and interests have caught me and this is really just me taking notes while this gets pushed to the back burner.  So until next time... adieu.




January 2015

Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31