.. _policies:

Policies
========


The ``Policy`` is the core of your bot, with its most important method:

.. literalinclude:: ../rasa_core/policies/policy.py
   :pyobject: Policy.predict_action_probabilities

This uses the current state of the conversation (provided by the tracker) to choose the next action to take.
The domain is there if you need it, but only some policy types make use of it. The returned array contains
the probabilities for each action to be executed next. The action that is most likely will be executed.


Let's look at a simple example for a custom policy:

.. doctest::

  from rasa_core.policies import Policy
  from rasa_core.actions.action import ACTION_LISTEN_NAME
  from rasa_core import utils
  import numpy as np

  class SimplePolicy(Policy):
      def predict_action_probabilities(self, tracker, domain):
          responses = {"greet": 3}

          if tracker.latest_action_name == ACTION_LISTEN_NAME:
              key = tracker.latest_message.intent["name"]
              action = responses[key] if key in responses else 2
              return utils.one_hot(action, domain.num_actions)
          else:
              return np.zeros(domain.num_actions)


**How does this work?**
When the controller processes a message from a user, it will keep asking for the next most likely action using
``predict_action_probabilities``. The bot then executes that action, then call ``predict_action_probabilities`` again
with a new ``tracker``, until it receives an ``ActionListen`` instruction.
This breaks the loop and makes the bot await further instructions. 

In pseudocode, what the ``SimplePolicy`` above does is:

.. code-block:: md

    -> a new message has come in

    if we were previously listening:
        return a canned response
    else:
        we must have just said something, so let's listen again


Note that the policy itself is stateless, and all the state is carried by the ``tracker`` object.


Creating Policies from Stories
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Writing rules like in the SimplePolicy above is not a great way to build a bot, it gets messy fast & is hard to debug.
If you've found Rasa Core, it's likely you've already tried this approach and were looking for something better.

The second important method of any policy is ``train(...)``:

.. literalinclude:: ../rasa_core/policies/policy.py
   :pyobject: Policy.train

This method creates "some rules" for prediction depending on the training data.


Memorising the training data
----------------------------

A good next step is to use our story framework to build a policy by giving it some example conversations.
We won't use machine learning yet, we will just create a policy which memorises these stories. 

We can use the ``MemoizationPolicy`` to do this.

.. note::
    For the ``MemoizationPolicy``, the ``train()`` method just memorises
    the actions taken in the story of ``max_history`` turns, so that when your bot encounters an
    identical situation it will make the decision you intended.


Augmented memoization
---------------------

If it is needed to recall turns from training dialogues
where some ``slots`` might not be set during prediction time,
add relevant stories without such ``slots`` to training data.
E.g. reminder stories.

Since ``slots`` that are set some time in the past are
preserved in all future feature vectors until they are set
to None, this policy has a capability to recall the turns
up to ``max_history`` and less from training stories during prediction,
even if additional slots were filled in the past for current dialogue.


Generalising to new Dialogues
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The stories data format gives you a compact way to describe a large number of possible dialogues without much effort. 
But humans are infinitely creative, and you could never hope to describe *every* possible dialogue programatically.
Even if you could, it probably wouldn't fit in memory :)

So how do we create a policy which behaves well even in scenarios you haven't thought of?
We will try to achieve this generalisation by creating a policy based on Machine Learning. 

Any policy should be initialized with a featurizer.
The policy's ``train`` method calls this featurizer on provided ``training_trackers`` to create ``X, y`` data,
suitable for ML algorithm (see :ref:`featurization` for details).

The method to featurize trackers is defined here:

.. literalinclude:: ../rasa_core/policies/policy.py
   :pyobject: Policy.featurize_for_training


Keras policy
------------

You can use whichever machine learning library you like to train your policy.
One implementation that ships with Rasa is the ``KerasPolicy``,
which uses Keras as a machine learning library to train your dialogue model.
This class has already implemented the logic of persisting and reloading models.

The model is defined here:

.. literalinclude:: ../rasa_core/policies/keras_policy.py
   :pyobject: KerasPolicy.model_architecture

and the training is run here:

.. literalinclude:: ../rasa_core/policies/keras_policy.py
   :pyobject: KerasPolicy.train

You can implement the model of your choice by overriding these methods,
or initialize ``KerasPolicy`` with already defined ``keras model``.