Continuous Integration of Machine Learning Models¶
As you annotate more and more NLU and Core training data from real user interactions, it is a good idea to train your NLU and Core models regularly to keep them up to date.
Here we will walk through an automated process of training a new NLU model, evaluating it, and activating it if it passes the desired checks.
The full script is available here
We use a couple of convenenience utilities from rasa_nlu
:
- the
EndpointConfig
class, which simplifies making multiple requests to authenticated endpoints - the
do_train
function, which trains and persists a model.
Here is the code for the main function:
if __name__ == "__main__":
### initialize constants
platform_host = os.environ.get("RASA_PLATFORM_HOST",
"https://rasa.example.com")
api_token = os.environ.get("RASA_API_TOKEN", "")
path = "./projects"
project = "default"
config_path = "nlu_config.yml"
data_endpoint = EndpointConfig(
"{}/api/projects/default/data.json".format(platform_host),
token=api_token,
token_name="api_token")
### train a model with the latest training data from Rasa Platform
_, _, model_path = do_train(config.load(config_path),
'',
path=path,
project=project,
data_endpoint=data_endpoint)
logger.info("training finished, model path {}".format(model_path))
### upload newly trained model to Rasa Platform
model_name = push_model_to_platform(platform_host, api_token, model_path)
### mark new model as active if performance is good
eval_new, eval_active = eval_new_and_active_models(platform_host,
api_token, model_name)
f1_new, f1_active = eval_new["f1_score"], eval_active["f1_score"]
logger.info("intent evaluation f1 scores:")
logger.info("new: {}".format(f1_new))
logger.info("active: {}".format(f1_active))
if f1_new > f1_active:
logger.info(
"Newly trained model is better! going to mark it as active")
success = set_model_as_active(platform_host, api_token, model_name)
if success:
logger.info("successfully activated model {}".format(model_name))
else:
logger.info("failed to activate model {}".format(model_name))
else:
logger.info("model did not improve")
Once the model is trained, this calls the push_model_to_platform
function, which looks like this:
def push_model_to_platform(platform_host, api_token, model_path):
model_name = os.path.basename(model_path)
filename = zip_folder(model_path)
files = {
'model': ('{}.zip'.format(model_name),
io.open(filename, 'rb'), 'application/zip')
}
url = "{}/api/projects/default/models/nlu?api_token={}"
url = url.format(platform_host, api_token)
requests.post(url, files=files)
return model_name
After the model is pushed to Rasa Platform, we can evaluate both models, and compare their performance:
def eval_new_and_active_models(platform_host, api_token, model_name):
# get the active model
url = "{}/api/projects/default/models/nlu?api_token={}"
url = url.format(platform_host, api_token)
models = requests.get(url).json()
active_model = None
for model in models:
if "active" in model["tags"]:
active_model = model["model"]
break
if active_model is None:
raise ValueError("no active model found")
logger.debug("active model is: {}".format(active_model))
def get_eval(model_name):
url = "{}/api/projects/default/evaluations/{}?api_token={}"
url = url.format(platform_host, model_name, api_token)
_ = requests.put(url)
response = requests.get(url)
print(response)
return response.json()["intent_evaluation"]
eval_new = get_eval(model_name)
eval_active = get_eval(active_model)
return eval_new, eval_active
Finally, if we are happy with the new model’s performance, we make it active:
def set_model_as_production(platform_host, api_token, model_name):
url = "{}/api/projects/default/models/nlu/{}/tags/production?api_token={}"
url = url.format(platform_host, model_name, api_token)
response = requests.put(url)
return response.status_code == 204
Running as a Cron Job¶
The script we’ve provided relies on a couple of environment variables being set, so the easiest option
is to create a small run_train.sh
script to run this:
#!/bin/bash
export RASA_PLATFORM_HOST="https://rasa.example.com/"
export RASA_API_TOKEN="alskdjkjcvlkjflkjq34"
/usr/bin/python /home/train_cron.py
And then using the crontab
command on linux to add a line like the following:
0 0 * * * /home/run_train.sh
Which will run this script every day at midnight.
Next Steps: Customization¶
In this example script, we are checking if the intent classification f1 score has improved. It is up to you to define the criteria for whether a new model is acceptable and should me made active. For example, you may have a list of important/frequently used utterances which must be classified correctly in addition to the f1 score criterion.
There are many ways to improve the script we have provided. For example, you could perform multiple training runs with different configurations, trying different pipelines and hyperparameters. You might also send a message to Slack or another chat tool to let everyone know that a new model has been activated.