Continuous Integration of Machine Learning Models

As you annotate more and more NLU and Core training data from real user interactions, it is a good idea to train your NLU and Core models regularly to keep them up to date.

Here we will walk through an automated process of training a new NLU model, evaluating it, and activating it if it passes the desired checks.

The full script is available here

We use a couple of convenenience utilities from rasa_nlu:

  • the EndpointConfig class, which simplifies making multiple requests to authenticated endpoints
  • the do_train function, which trains and persists a model.

Here is the code for the main function:

if __name__ == "__main__":

    ### initialize constants
    platform_host = os.environ.get("RASA_PLATFORM_HOST",
    api_token = os.environ.get("RASA_API_TOKEN", "")
    path = "./projects"
    project = "default"
    config_path = "nlu_config.yml"

    data_endpoint = EndpointConfig(

    ### train a model with the latest training data from Rasa Platform
    _, _, model_path = do_train(config.load(config_path),
                                data_endpoint=data_endpoint)"training finished, model path {}".format(model_path))

    ### upload newly trained model to Rasa Platform
    model_name = push_model_to_platform(platform_host, api_token, model_path)

    ### mark new model as active if performance is good
    eval_new, eval_active = eval_new_and_active_models(platform_host,
                                                       api_token, model_name)
    f1_new, f1_active = eval_new["f1_score"], eval_active["f1_score"]"intent evaluation f1 scores:")"new: {}".format(f1_new))"active: {}".format(f1_active))

    if f1_new > f1_active:
            "Newly trained model is better! going to mark it as active")
        success = set_model_as_active(platform_host, api_token, model_name)
        if success:
  "successfully activated model {}".format(model_name))
  "failed to activate model {}".format(model_name))
    else:"model did not improve")

Once the model is trained, this calls the push_model_to_platform function, which looks like this:

def push_model_to_platform(platform_host, api_token, model_path):
    model_name = os.path.basename(model_path)
    filename = zip_folder(model_path)

    files = {
        'model': ('{}.zip'.format(model_name),
        , 'rb'), 'application/zip')
    url = "{}/api/projects/default/models/nlu?api_token={}"
    url = url.format(platform_host, api_token), files=files)

    return model_name

After the model is pushed to Rasa Platform, we can evaluate both models, and compare their performance:

def eval_new_and_active_models(platform_host, api_token, model_name):
    # get the active model
    url = "{}/api/projects/default/models/nlu?api_token={}"
    url = url.format(platform_host, api_token)
    models = requests.get(url).json()

    active_model = None
    for model in models:
        if "active" in model["tags"]:
            active_model = model["model"]
    if active_model is None:
        raise ValueError("no active model found")

    logger.debug("active model is: {}".format(active_model))

    def get_eval(model_name):
        url = "{}/api/projects/default/evaluations/{}?api_token={}"
        url = url.format(platform_host, model_name, api_token)
        _ = requests.put(url)
        response = requests.get(url)
        return response.json()["intent_evaluation"]

    eval_new = get_eval(model_name)
    eval_active = get_eval(active_model)

    return eval_new, eval_active

Finally, if we are happy with the new model’s performance, we make it active:

def set_model_as_production(platform_host, api_token, model_name):
    url = "{}/api/projects/default/models/nlu/{}/tags/production?api_token={}"
    url = url.format(platform_host, model_name, api_token)
    response = requests.put(url)
    return response.status_code == 204

Running as a Cron Job

The script we’ve provided relies on a couple of environment variables being set, so the easiest option is to create a small script to run this:

export RASA_API_TOKEN="alskdjkjcvlkjflkjq34"
/usr/bin/python /home/

And then using the crontab command on linux to add a line like the following:

0 0 * * * /home/

Which will run this script every day at midnight.

Next Steps: Customization

In this example script, we are checking if the intent classification f1 score has improved. It is up to you to define the criteria for whether a new model is acceptable and should me made active. For example, you may have a list of important/frequently used utterances which must be classified correctly in addition to the f1 score criterion.

There are many ways to improve the script we have provided. For example, you could perform multiple training runs with different configurations, trying different pipelines and hyperparameters. You might also send a message to Slack or another chat tool to let everyone know that a new model has been activated.