.. _tutorial_interactive_learning: Interactive Learning ==================== .. note:: This is the place to start if you have a great idea for a bot but you don't have any conversations to use as training data. We will assume that you've already thought of what intents and entities you need (check out the `Rasa NLU `_ docs if you don't know what those are). We're using this `Example Code on GitHub `_. The Problem ^^^^^^^^^^^ Your bot usually has well-defined goals it should reach when talking to a user. There are often numerous different ways the conversation could develop before reaching this final stage. We'll teach you how to use Rasa Core to bootstrap full-blown conversations from minimal to no training data. The Bot ^^^^^^^ Say you want to build a bot that recommends concerts to go to. There is one goal: you know that at the end of the conversation you want your bot to make a recommendation. We'll show how to implement context-aware behaviour without writing a flow chart. For example, if our user asks the question: *which of those has better reviews?*, our bot should know whether they want to compare *musicians* or *venues*. Head over to ``examples/concertbot`` for this example. Let's go! The Domain ^^^^^^^^^^ We will keep the concert domain simple, and won't add any slots just yet. We'll also only support these intents: ``"greet", "thankyou", "goodbye", "search_concerts", "search_venues", "compare_reviews"``. Here is the domain definition (``concert_domain.yml``): .. literalinclude:: ../examples/concertbot/concert_domain.yml :linenos: Stateless Stories ^^^^^^^^^^^^^^^^^ We start by training a stateless model on some simple dialogues in the Rasa story format. This means we define conversations with one user utterance and only a few (typically one) bot action in response. We will use these stateless stories as a starting point for interactive learning. In many cases, simple training 'conversations' are just a single turn and response: "Hello" is always met with a greeting, "goodbye!" is always met with a sign-off, and the correct response to "thank you" is pretty much always "you're welcome". Below is an excerpt of the stories. .. note:: Notice that below, we've defined two stories, showing that ``action_show_venue_reviews`` and ``action_show_concert_reviews`` are both possible responses to the ``compare_reviews`` intent, but neither references any context. That comes later. .. code-block:: md ## greet * greet - utter_greet ## happy * thankyou - utter_youarewelcome ... ## compare_reviews_venues * compare_reviews - action_show_venue_reviews ## compare_reviews_concerts * compare_reviews - action_show_concert_reviews Interactive Learning ^^^^^^^^^^^^^^^^^^^^ Run the script ``train_online.py``. This first creats a stateless policy by combining the stories we've provided into longer dialogues, and then trains the policy on that dataset. It then runs the bot so that you can provide feedback to train it (this is where the learning *becomes interactive*): **Happy paths** .. note:: We haven't connected an NLU tool here, so when you type messages to the bot you have to type the intent starting with a ``/`` (see :ref:`fixed_intent_format`). If you want to use Rasa NLU / wit.ai / Lex you can just swap the ``Interpreter`` class in ``run.py`` and ``train_online.py``. We now start talking to the bot by directly entering the intents. For example, if we type ``/greet``, we get the following prompt: .. code-block:: text /greet ------ Chat history: bot did: None bot did: action_listen user said: /greet whose intent is: greet we currently have slots: concerts: None, venues: None ------ The bot wants to [utter_greet] due to the intent. Is this correct? 1. Yes 2. No, intent is right but the action is wrong 3. The intent is wrong 0. Export current conversations as stories and quit This gives you all the info you should hopefully need to decide what the bot *should* have done. In this case, the bot chose the right action ('utter_greet'), so we type ``1`` and hit enter. Then we type ``1`` again, because 'action_listen' is the correct action after greeting. We continue this loop until the bot chooses the wrong action. **Providing feedback on errors** If you ask ``/search_concerts``, the bot should suggest ``action_search_concerts`` and then ``action_listen``. Now let's ask it to ``/compare_reviews``. The bot happens to choose the wrong one out of the two possibilities we wrote in the stories: .. code-block:: text /compare_reviews ------ Chat history: bot did: action_search_concerts bot did: action_listen user said: /compare_reviews whose intent is: compare_reviews we currently have slots: concerts: [{'artist': 'Foo Fighters', 'reviews': 4.5}, {'artist': 'Katy Perry', 'reviews': 5.0}], venues: None ------ The bot wants to [action_show_venue_reviews] due to the intent. Is this correct? 1. Yes 2. No, intent is right but the action is wrong 3. The intent is wrong 0. Export current conversations as stories and quit Now we type ``2``, because it chose the wrong action, and we get a new prompt asking for the correct one. This also shows the probabilities the model has assigned to each of the actions. .. code-block:: text what is the next action for the bot? 0 action_listen 0.19 1 action_restart 0.00 2 utter_default 0.00 3 utter_greet 0.03 4 utter_goodbye 0.03 5 utter_youarewelcome 0.02 6 action_search_concerts 0.09 7 action_search_venues 0.02 8 action_show_concert_reviews 0.29 9 action_show_venue_reviews 0.33 In this case, the bot should ``action_show_concert_reviews`` (rather than venue reviews!) so we type ``8`` and hit enter. .. note:: The policy model will get updated *on-the-fly*, so that it's less likely to make the same mistake again. You can also export all of the conversations you have with the bot so you can add these as training stories in the future. Now we can keep talking to the bot for as long as we like to create a longer conversation. At any point you can type ``0`` and the bot will write the current conversation to a file and exit the conversation. Make sure to combine the dumped story with your original training data for the next training. .. note:: If you run the bot with not enough training data, it might get ``action_listen`` as a most probable response to your input and therefore do nothing. If you continue to input something and get no answer, please head to interactive training and check if ``action_listen`` was chosen as a response. Correct the bot's behaviour, add additional stories and run ``train.py`` then run the bot again. Motivation: Why Interactive Learning? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ There are some complications to chatbot training which makes it more tricky than most machine learning problems. The first is that there are several ways of getting to the same goal, and they may all be equivalently good. Therefore it is wrong to say with certainty that given X, you should do Y, and if you do not do exactly Y then you are wrong. This is essentially what you do in a fully supervised learning case. We want the bot to be able to learn it can get to a successful state through a number of different means. Secondly, the utterances from users will be strongly affected by the actions of the bot. That means that a network trained on pre-collected data will suffer from `exposure bias `_. This is when a system is trained to make predictions but is never given the ability to train on its own predictions, instead being given the ground truth every time. This has been shown to have issues when trying to predict sequences of multiple steps into the future. Furthermore, from a practical perspective, Rasa Core developers should be able to train via the `Wizard of Oz `_ method. This means that if you want a bot to do a certain task, you can simply pretend to be a bot for a little while and at the end it will learn how to respond. This is a good way of learning how to make the conversation natural and flowing. .. raw:: html :file: poll.html