Evaluating and Testing¶
Evaluating a Trained Model¶
You can evaluate your trained model on a set of test stories by using the evaluate script:
python -m rasa_core.evaluate -d models/dialogue \
-s test_stories.md -o matrix.pdf --failed failed_stories.md
This will print the failed stories to failed_stories.md
.
We count any story as failed if at least one of the actions
was predicted incorrectly.
In addition, this will save a confusion matrix to a file called
matrix.pdf
. The confusion matrix shows, for each action in your
domain, how often that action was predicted, and how often an
incorrect action was predicted instead.
The full list of options for the script is:
Calling `rasa_core.evaluate` is deprecated. Please use `rasa_core.test` instead.
usage: evaluate.py [-h] {default,compare} ...
evaluates a dialogue model
positional arguments:
{default,compare} mode
default default mode: evaluate a dialogue model
compare compare mode: evaluate multiple dialogue models to
compare policies
optional arguments:
-h, --help show this help message and exit
Comparing Policies¶
To choose a specific policy, or to choose hyperparameters for a specific policy, you want to measure how well Rasa Core will generalise to conversations which it hasn’t seen before. Especially in the beginning of a project, you do not have a lot of real conversations to use to train your bot, so you don’t just want to throw some away to use as a test set.
Rasa Core has some scripts to help you choose and fine-tune your policy.
Once you are happy with it, you can then train your final policy on your
full data set. To do this, split your training data into multiple files
in a single directory. You can then use the train_paper
script to
train multiple policies on the same data. You can choose one of the
files to be partially excluded. This means that Rasa Core will be
trained multiple times, with 0, 5, 25, 50, 70, 90, 95, and 100% of
the stories in that file removed from the training data. By evaluating
on the full set of stories, you can measure how well Rasa Core is
predicting the held-out stories.
The full list of options for the script is:
/home/travis/virtualenv/python3.5.6/bin/python: No module named rasa_core.train_paper
The full list of options for the evaluation script is:
/home/travis/virtualenv/python3.5.6/bin/python: No module named rasa_core.evaluate_paper
Have questions or feedback?¶
We have a very active support community on Rasa Community Forum that is happy to help you with your questions. If you have any feedback for us or a specific suggestion for improving the docs, feel free to share it by creating an issue on Rasa Core GitHub repository.