Interactive Learning¶
Note
This is the place to start if you have a great idea for a bot but you don’t have any conversations to use as training data. We will assume that you’ve already thought of what intents and entities you need (check out the Rasa NLU docs if you don’t know what those are).
Motivation: Why Interactive Learning?¶
There are some complications to chatbot training which makes them more tricky than most machine learning problems.
The first is that there are several ways of getting to the same goal, and they may all be equivalently good. Therefore it is wrong to say with certainty that given X, you should do Y, and if you do not do exactly Y then you are wrong. This is essentially what you do in a fully supervised learning case. We want the bot to be able to learn it can get to a successful state through a number of different means.
Secondly, the utterances from users will be strongly affected by the actions of the bot. That means that a network trained on pre-collected data will suffer from exposure bias, this is when a system is trained to make predictions but is never given the ability to train on its own predictions, instead being given the ground truth every time. This has been shown to have issues when trying to predict sequences multiple steps into the future.
Also, from a practical perspective Rasa Core developers should be able to train via the Wizard of Oz method. I.e. if you want a bot to do a certain task, you can simply pretend to be a bot for a little while and at the end it will learn how to respond. This is a good way of learning how to make natural and flowing
The Bot¶
We will build a bot that can recommend concerts to go to. We’ll show how to implement context-aware behaviour without writing a flow chart. For example, if our user asks the question: which of those has better reviews?, our bot should know whether they want to compare musicians or venues.
Let’s go!
The Domain¶
We will keep the concert domain simple, and won’t add any slots just yet.
We’ll also only support these intents:
"greet", "thankyou", "goodbye", "search_concerts", "search_venues", "compare_reviews"
Here is the domain definition:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | slots:
concerts:
type: list
venues:
type: list
intents:
- greet
- thankyou
- goodbye
- search_concerts
- search_venues
- compare_reviews
entities:
- name
templates:
utter_greet:
- "hey there!"
utter_goodbye:
- "goodbye :("
utter_default:
- "default message"
utter_youarewelcome:
- "you're very welcome"
actions:
- utter_default
- utter_greet
- utter_goodbye
- utter_youarewelcome
- actions.ActionSearchConcerts
- actions.ActionSearchVenues
- actions.ActionShowConcertReviews
- actions.ActionShowVenueReviews
|
Stateless Stories¶
We start by training a stateless model on some simple dialogues in the Rasa story format.
Below is an excerpt of the stories.
In many cases, your bot’s ‘conversations’ are just a single turn and response: “Hello” is always met with a greeting, “goodbye!” is always met with a sign-off, and the correct response to “thank you” is pretty much always “you’re welcome”.
Notice that below, we’ve defined two stories, showing that
action_show_venue_reviews
and action_show_concert_reviews
are both possible responses to the compare_reviews
intent, but neither references
any context. That comes later.
## greet
* _greet
- action_greet
## happy
* _thankyou
- action_youarewelcome
...
## compare_reviews_venues
* _compare_reviews
- action_show_venue_reviews
## compare_reviews_concerts
* _compare_reviews
- action_show_concert_reviews
Training¶
We start by training the policy to recognise these input-output pairs independently of any context.
( You can see the definition of the ConcertPolicy
class in policy.py
. )
Run the script train_init.py
.
This creates a training set of conversations by randomly combining the
stories we’ve provided into longer dialogues, and then trains the policy on that dataset.
Then, run the script run.py
to talk to the bot.
You should be able to have a conversation similar to the one below
Note
we haven’t connected an NLU tool here, so when you type messages to the bot you have to type the intent starting with a _. If you want to use Rasa NLU / wit.ai / Lex you can just swap the Interpreter class in run.py.
Bot loaded. Type hello and press enter :
_greet
hey there!
_search_concerts
Here's what I found:
Katy Perry, Foo Fighters
_goodbye
goodbye :(
Now we’ll train the bot to use context
to respond correctly to the compare_reviews
intent.
Interactive Learning¶
Run the script train_online.py
.
This first repeats the process in the train_init
script, creating
a stateless policy.
It then runs the bot so that you can provide feedback to train it:
Happy paths
We can start talking to the bot as before,
directly entering the intents. For example, if we type _greet
, we get the following prompt:
_greet
------
Chat history:
bot did: action_listen
user said: _greet
whose intent is: greet
we currently have slots: {'location': None}
------
The bot wants to [greet] due to the intent. Is this correct?
1. Yes
2. No, intent is right but the action is wrong
3. The intent is wrong
This gives you all the info you should hopefully need to decide
what the bot should have done.
In this case, the bot chose the right action (‘greet’), so we type 1
and hit enter.
We continue this loop until the bot chooses the wrong action.
Providing feedback on errors
We’ve just asked the bot to search for concerts, and now we’re asking it to compare reviews. The bot happens to choose the wrong one out of the two possibilities we wrote in the stories:
_compare_reviews
------
Chat history:
bot did: action_search_concerts
bot did: action_suggest
bot did: action_listen
user said: _compare_reviews
whose intent is: compare_reviews
we currently have slots: {'location': None}
------
The bot wants to [show_venue_reviews] due to the intent. Is this correct?
1. Yes
2. No, intent is right but the action is wrong
3. The intent is wrong
Now we type 2
, because it chose the wrong action,
and we get a new prompt asking for the correct one.
This also shows the probabilities the model has assigned to each of the actions.
------
what is the next action for the bot?
0 default 0.00148131744936
1 greet 0.0970264300704
2 goodbye 0.0288009047508
3 listen 0.00123148341663
6 search_cinemas 0.000627864559647
8 search_films 0.0367559418082
9 suggest 0.0261212754995
11 youarewelcome 0.594935178757
13 explain_options 0.0516758263111
14 store_slot 0.00145904591773
15 show_cinema_reviews 0.00887114647776
16 show_film_reviews 0.0870243906975
In this case, the bot should show_film_reviews
(rather than cinema reviews!) so we type 16
and hit enter.
Note
The policy model will get updated on-the-fly, so that it’s less likely to make the same mistake again. You can also export all of the conversations you have with the bot so you can add these as training stories in the future.
Now we can keep talking to the bot for as long as we like
to create a longer conversation. At any point you can type _export
and the bot will write the current conversation to a file,
which you can then add as a training example for the future.