Interactive Learning¶
Note
We’re using this Example Code on GitHub.
The Problem¶
Your bot usually has well-defined goals it should reach when talking to a user. There are often numerous different ways the conversation could develop before reaching this final stage. We’ll teach you how to use Rasa Core to bootstrap full-blown conversations from minimal to no training data.
The Bot¶
Say you want to build a bot that recommends concerts to go to. There is one goal: you know that at the end of the conversation you want your bot to make a recommendation. We’ll show how to implement context-aware behaviour without writing a flow chart. For example, if our user asks the question: which of those has better reviews?, our bot should know whether they want to compare musicians or venues.
Head over to examples/concertbot
for this example. Let’s go!
The Domain¶
We will keep the concert domain simple, and won’t add any slots just yet.
We’ll also only support these intents:
"greet", "thankyou", "goodbye", "search_concerts", "search_venues",
"compare_reviews"
. Here is the domain definition (concert_domain.yml
):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | slots:
concerts:
type: list
venues:
type: list
intents:
- greet
- thankyou
- goodbye
- search_concerts
- search_venues
- compare_reviews
entities:
- name
templates:
utter_greet:
- "hey there!"
utter_goodbye:
- "goodbye :("
utter_default:
- "default message"
utter_youarewelcome:
- "you're very welcome"
actions:
- utter_default
- utter_greet
- utter_goodbye
- utter_youarewelcome
- actions.ActionSearchConcerts
- actions.ActionSearchVenues
- actions.ActionShowConcertReviews
- actions.ActionShowVenueReviews
|
Stateless Stories¶
We start by training a stateless model on some simple dialogues in the Rasa story format. This means we define conversations with one user utterance and only a few (typically one) bot action in response. We will use these stateless stories as a starting point for interactive learning.
In many cases, simple training ‘conversations’ are just a single turn and response: “Hello” is always met with a greeting, “goodbye!” is always met with a sign-off, and the correct response to “thank you” is pretty much always “you’re welcome”.
Below is an excerpt of the stories.
Note
Notice that below, we’ve defined two stories, showing that
action_show_venue_reviews
and action_show_concert_reviews
are both possible responses to the compare_reviews
intent, but neither
references any context. That comes later.
## greet
* greet
- utter_greet
## happy
* thankyou
- utter_youarewelcome
...
## compare_reviews_venues
* compare_reviews
- action_show_venue_reviews
## compare_reviews_concerts
* compare_reviews
- action_show_concert_reviews
Interactive Learning¶
Run the script train_online.py
. This first creats a stateless policy by combining the stories
we’ve provided into longer dialogues, and then trains the policy on that
dataset.
It then runs the bot so that you can provide feedback to train it (this is where the learning becomes interactive):
Happy paths
Note
We haven’t connected an NLU tool here,
so when you type messages to the bot you have to
type the intent starting with a /
(see Buttons and other Structured Input).
If you want to use Rasa NLU / wit.ai / Lex you
can just swap the Interpreter
class in run.py
and train_online.py
.
We now start talking to the bot by directly entering the intents. For
example, if we type /greet
, we get the following prompt:
/greet
------
Chat history:
bot did: None
bot did: action_listen
user said: /greet
whose intent is: greet
we currently have slots: concerts: None, venues: None
------
The bot wants to [utter_greet] due to the intent. Is this correct?
1. Yes
2. No, intent is right but the action is wrong
3. The intent is wrong
0. Export current conversations as stories and quit
This gives you all the info you should hopefully need to decide
what the bot should have done. In this case, the bot chose the right
action (‘utter_greet’), so we type 1
and hit enter.
Then we type 1
again, because ‘action_listen’ is the correct action after greeting.
We continue this loop until the bot chooses the wrong action.
Providing feedback on errors
If you ask /search_concerts
, the bot should suggest action_search_concerts
and then action_listen
.
Now let’s ask it to /compare_reviews
. The bot happens to choose the wrong one out of the two
possibilities we wrote in the stories:
/compare_reviews
------
Chat history:
bot did: action_search_concerts
bot did: action_listen
user said: /compare_reviews
whose intent is: compare_reviews
we currently have slots: concerts: [{'artist': 'Foo Fighters', 'reviews': 4.5}, {'artist': 'Katy Perry', 'reviews': 5.0}], venues: None
------
The bot wants to [action_show_venue_reviews] due to the intent. Is this correct?
1. Yes
2. No, intent is right but the action is wrong
3. The intent is wrong
0. Export current conversations as stories and quit
Now we type 2
, because it chose the wrong action, and we get a new
prompt asking for the correct one. This also shows the probabilities the
model has assigned to each of the actions.
what is the next action for the bot?
0 action_listen 0.19
1 action_restart 0.00
2 utter_default 0.00
3 utter_greet 0.03
4 utter_goodbye 0.03
5 utter_youarewelcome 0.02
6 action_search_concerts 0.09
7 action_search_venues 0.02
8 action_show_concert_reviews 0.29
9 action_show_venue_reviews 0.33
In this case, the bot should action_show_concert_reviews
(rather than venue
reviews!) so we type 8
and hit enter.
Note
The policy model will get updated on-the-fly, so that it’s less likely to make the same mistake again. You can also export all of the conversations you have with the bot so you can add these as training stories in the future.
Now we can keep talking to the bot for as long as we like to create a longer
conversation. At any point you can type 0
and the bot will write the
current conversation to a file and exit the conversation. Make sure to
combine the dumped story with your original training data for the next
training.
Note
If you run the bot with not enough training data, it might get action_listen
as a most probable response to your input and therefore do nothing.
If you continue to input something and get no answer, please head to
interactive training and check if action_listen
was chosen as a response.
Correct the bot’s behaviour, add additional stories and run train.py
then
run the bot again.
Motivation: Why Interactive Learning?¶
There are some complications to chatbot training which makes it more tricky than most machine learning problems.
The first is that there are several ways of getting to the same goal, and they may all be equivalently good. Therefore it is wrong to say with certainty that given X, you should do Y, and if you do not do exactly Y then you are wrong. This is essentially what you do in a fully supervised learning case. We want the bot to be able to learn it can get to a successful state through a number of different means.
Secondly, the utterances from users will be strongly affected by the actions of the bot. That means that a network trained on pre-collected data will suffer from exposure bias. This is when a system is trained to make predictions but is never given the ability to train on its own predictions, instead being given the ground truth every time. This has been shown to have issues when trying to predict sequences of multiple steps into the future.
Furthermore, from a practical perspective, Rasa Core developers should be able to train via the Wizard of Oz method. This means that if you want a bot to do a certain task, you can simply pretend to be a bot for a little while and at the end it will learn how to respond. This is a good way of learning how to make the conversation natural and flowing.