Warning: This document is for an old version of Rasa Core. The latest version is 0.14.5.


The Policy is the core of your bot, with its most important method:

    def predict_action_probabilities(self, tracker, domain):
        # type: (DialogueStateTracker, Domain) -> List[float]
        """Predicts the next action the bot should take
        after seeing the tracker.

        Returns the list of probabilities for the next actions"""

        raise NotImplementedError("Policy must have the capacity "
                                  "to predict.")

This uses the current state of the conversation (provided by the tracker) to choose the next action to take. The domain is there if you need it, but only some policy types make use of it. The returned array contains the probabilities for each action to be executed next. The action that is most likely will be executed.

Let’s look at a simple example for a custom policy:

from rasa_core.policies import Policy
from rasa_core.actions.action import ACTION_LISTEN_NAME
from rasa_core import utils
import numpy as np

class SimplePolicy(Policy):
    def predict_action_probabilities(self, tracker, domain):
        responses = {"greet": 3}

        if tracker.latest_action_name == ACTION_LISTEN_NAME:
            key = tracker.latest_message.intent["name"]
            action = responses[key] if key in responses else 2
            return utils.one_hot(action, domain.num_actions)
            return np.zeros(domain.num_actions)

How does this work? When the controller processes a message from a user, it will keep asking for the next most likely action using predict_action_probabilities. The bot then executes that action, then call predict_action_probabilities again with a new tracker, until it receives an ActionListen instruction. This breaks the loop and makes the bot await further instructions.

In pseudocode, what the SimplePolicy above does is:

-> a new message has come in

if we were previously listening:
    return a canned response
    we must have just said something, so let's listen again

Note that the policy itself is stateless, and all the state is carried by the tracker object.

Creating Policies from Stories

Writing rules like in the SimplePolicy above is not a great way to build a bot, it gets messy fast & is hard to debug. If you’ve found Rasa Core, it’s likely you’ve already tried this approach and were looking for something better.

The second important method of any policy is train(...):

    def train(self,
              training_trackers,  # type: List[DialogueStateTracker]
              domain,  # type: Domain
              **kwargs  # type: **Any
        # type: (...) -> None
        """Trains the policy on given training trackers."""

        raise NotImplementedError("Policy must have the capacity "
                                  "to train.")

This method creates “some rules” for prediction depending on the training data.

Memorising the training data

A good next step is to use our story framework to build a policy by giving it some example conversations. We won’t use machine learning yet, we will just create a policy which memorises these stories.

We can use the MemoizationPolicy to do this.


For the MemoizationPolicy, the train() method just memorises the actions taken in the story of max_history turns, so that when your bot encounters an identical situation it will make the decision you intended.

Augmented memoization

If it is needed to recall turns from training dialogues where some slots might not be set during prediction time, add relevant stories without such slots to training data. E.g. reminder stories.

Since slots that are set some time in the past are preserved in all future feature vectors until they are set to None, this policy has a capability to recall the turns up to max_history and less from training stories during prediction, even if additional slots were filled in the past for current dialogue.

Generalising to new Dialogues

The stories data format gives you a compact way to describe a large number of possible dialogues without much effort. But humans are infinitely creative, and you could never hope to describe every possible dialogue programatically. Even if you could, it probably wouldn’t fit in memory :)

So how do we create a policy which behaves well even in scenarios you haven’t thought of? We will try to achieve this generalisation by creating a policy based on Machine Learning.

Any policy should be initialized with a featurizer. The policy’s train method calls this featurizer on provided training_trackers to create X, y data, suitable for ML algorithm (see Featurization for details).

The method to featurize trackers is defined here:

    def featurize_for_training(
            training_trackers,  # type: List[DialogueStateTracker]
            domain,  # type: Domain
            **kwargs  # type: **Any
        # type: (...) -> DialogueTrainingData
        """Transform training trackers into a vector representation.
        The trackers, consisting of multiple turns, will be transformed
        into a float vector which can be used by a ML model."""

        training_data = self.featurizer.featurize_trackers(training_trackers,

        max_training_samples = kwargs.get('max_training_samples')
        if max_training_samples is not None:
            logger.debug("Limit training data to {} training samples."

        return training_data

Keras policy

You can use whichever machine learning library you like to train your policy. One implementation that ships with Rasa is the KerasPolicy, which uses Keras as a machine learning library to train your dialogue model. This class has already implemented the logic of persisting and reloading models.

The model is defined here:

    def model_architecture(
            input_shape,  # type: Tuple[int, int]
            output_shape  # type: Tuple[int, Optional[int]]
        # type: (...) -> keras.models.Sequential
        """Build a keras model and return a compiled model."""

        from keras.models import Sequential
        from keras.layers import \
            Masking, LSTM, Dense, TimeDistributed, Activation

        # Build Model
        model = Sequential()

        # the shape of the y vector of the labels,
        # determines which output from rnn will be used
        # to calculate the loss
        if len(output_shape) == 1:
            # y is (num examples, num features) so
            # only the last output from the rnn is used to
            # calculate the loss
            model.add(Masking(mask_value=-1, input_shape=input_shape))
            model.add(LSTM(self.rnn_size, dropout=0.2))
            model.add(Dense(input_dim=self.rnn_size, units=output_shape[-1]))
        elif len(output_shape) == 2:
            # y is (num examples, max_dialogue_len, num features) so
            # all the outputs from the rnn are used to
            # calculate the loss, therefore a sequence is returned and
            # time distributed layer is used

            # the first value in input_shape is max dialogue_len,
            # it is set to None, to allow dynamic_rnn creation
            # during prediction
                              input_shape=(None, input_shape[1])))
            model.add(LSTM(self.rnn_size, return_sequences=True, dropout=0.2))
            raise ValueError("Cannot construct the model because"
                             "length of output_shape = {} "
                             "should be 1 or 2."




        return model

and the training is run here:

    def train(self,
              training_trackers,  # type: List[DialogueStateTracker]
              domain,  # type: Domain
              **kwargs  # type: **Any
        # type: (...) -> Dict[Text: Any]

        if kwargs.get('rnn_size') is not None:
            logger.debug("Parameter `rnn_size` is updated with {}"
            self.rnn_size = kwargs.get('rnn_size')

        training_data = self.featurize_for_training(training_trackers,

        shuffled_X, shuffled_y = training_data.shuffled_X_y()

        if self.model is None:
            self.model = self.model_architecture(shuffled_X.shape[1:],

        validation_split = kwargs.get("validation_split", 0.0)
        logger.info("Fitting model with {} total samples and a validation "
                    "split of {}".format(training_data.num_examples(),
        # filter out kwargs that cannot be passed to fit
        params = self._get_valid_params(self.model.fit, **kwargs)

        self.model.fit(shuffled_X, shuffled_y, **params)
        # the default parameter for epochs in keras fit is 1
        self.current_epoch = kwargs.get("epochs", 1)
        logger.info("Done fitting keras policy model")

You can implement the model of your choice by overriding these methods, or initialize KerasPolicy with already defined keras model.