A key part of Skylar’s architecture is the Language Engine. The Language Engine has 3 primary functions: intent classification, entity extraction, and machine learning (more on this one later). In this post, we’re going to discuss intent classification, in other words, how Skylar transforms user input into its corresponding intent!
Intent classification is essentially the same as text classification, the only difference is the amount of text you have to work with. In text classification, you would normally have several pages of text that are fed into a machine learning algorithm along with their appropriate labels. The algorithm would use the text and its important features to classify new text it hasn’t been trained on. Intent classification is most commonly used to predict a label for an input sentence. This technique is exactly what’s being used in commercial digital assistants such as Google Home and Amazon’s Alexa, and now Skylar too.
My explanation is very simplified if you’d like to learn more about text classification and the field that is natural language processing, I recommend checking out the resources at the end of this post. I personally used many of them to teach myself about these things!
Let’s say I have the following sentence: “Skylar, play some music on Spotify.”, an intent classifier would map this to a label that indicates the user wants music to be played.
In my tests for Skylar’s intent classifier I created five intents:
- MUSIC_PLAY - play music
- MUSIC_PAUSE - pause music
- SKYLAR_SHUTDOWN - shutdown the system
- SKYLAR_TELLTIME - tells the time
- SKYLAR_HELLO - greets the user
Each one corresponds to a different action that Skylar should perform, the classifier is given each intent along with several sample utterances that should be used to train the classifier. Instead of having to write each utterance by hand I took some inspiration from this project which created a DSL (domain specific language) to simplify the process of creating a dataset for training NLP (natural language processing) models. My current solution uses information from a file containing the appropriate JSON data and generates a given number of utterances for each intent.
Essentially now I all have to do in order to create a new intent is come up with a few “skeleton” sentences that each have blanks in them. These blanks will be filled in by the utterance generator, they could be the name of an artist, which streaming service to use, or a song. I provide Skylar with a list of several entities that could fill in these slots. Skylar will generate every possible unique sentence for each “skeleton” sentence and randomly choose a user-specified amount for each intent.
With these sentences, I then use a Linear Support Vector Machine (SVM) for Classification to create a model that can classify a sentence into one of the five intents. For more information on what a Linear SVM is, click here.
Next up Skylar needs the ability to gather specific pieces of information from the sentences we’ve classified. We call these pieces of information entities and as such, we need to build an entity extractor. There are many ways to go about doing this, you could roll your own solution, use open source libraries, or even buy a commercial product. In my case, at least for now, I’ve decided to use an open source library to handle this for me. The library is called spaCy, it has many Natural Language Processing functions, in my case I’m only concerned with its Entity Recognizer. spaCy allows you to use pre-trained recognizers as well as train your own either on top of a pre-trained one or entirely from scratch. For Skylar, right now I’m using a custom recognizer to gather entities specified by the files containing the JSON data for each intent. In the future, I plan on using a few entity extractors, each with their own specific domain of entities they recognize. Instead of going over how I did this, I’ll instead point you to a couple of links here and here that go over most of what I did.
The machine learning component of the language engine has yet to be implemented, I plan on making it one of the last features I implement for a couple of reasons: I’m not too sure what exactly it’ll do and I don’t have enough meaningful data. I know that I want Skylar to learn from the experiences it has with the user, I intend for this component to be the one that handles that. Once Skylar has been functioning for a while, I intend on having each interaction stored locally in a database, kinda like a long term memory. The machine learning component would use the database to discover different patterns and trends in the data that can better help the user, whether it be by improving conversations or automating certain actions with user permission. This feature will have to wait until I’ve figured out the rest of Skylar’s architecture though.
That was a lot of information at once so I’m gonna wrap up the post here. I’ve already made significant progress with Skylar since my last post in April, hopefully, it was enough so that I can make a video showing off some basic features. In the meantime, look out for more posts and other projects coming soon!
Thanks for reading!