Using Rasa NLU as a HTTP server¶
Note
Before you can use the server, you should train a model! See Training a New Model for your Project
The HTTP api exists to make it easy for non-python projects to use Rasa NLU, and to make it trivial for projects currently using wit/LUIS/Dialogflow to try it out.
Running the server¶
You can run a simple http server that handles requests using your projects with :
$ python -m rasa_nlu.server --path projects
The server will look for existing projects under the folder defined by
the path
parameter. By default a project will load the latest
trained model.
Emulation¶
Rasa NLU can ‘emulate’ any of these three services by making the /parse
endpoint compatible with your existing code. To activate this, either add
'emulate' : 'luis'
to your config file or run the server with -e luis
.
For example, if you would normally send your text to be parsed to LUIS,
you would make a GET
request to
https://api.projectoxford.ai/luis/v2.0/apps/<app-id>?q=hello%20there
in luis emulation mode you can call Rasa by just sending this request to
http://localhost:5000/parse?q=hello%20there
any extra query params are ignored by rasa, so you can safely send them along.
To use the emulation, pass the emulation mode to the server script:
$ python -m rasa_nlu.server --path projects --emulate wit
Endpoints¶
POST /parse
(no emulation)¶
You must POST data in this format '{"q":"<your text to parse>"}'
,
you can do this with
$ curl -XPOST localhost:5000/parse -d '{"q":"hello there"}'
By default, when the project is not specified in the query, the
"default"
one will be used.
You can (should) specify the project you want to use in your query :
$ curl -XPOST localhost:5000/parse -d '{"q":"hello there", "project": "my_restaurant_search_bot"}'
By default the latest trained model for the project will be loaded. You can also query against a specific model for a project :
$ curl -XPOST localhost:5000/parse -d '{"q":"hello there", "project": "my_restaurant_search_bot", "model": "<model_XXXXXX>"}'
POST /train
¶
You can post your training data to this endpoint to train a new model for a project.
This request will wait for the server answer: either the model
was trained successfully or the training exited with an error.
Using the HTTP server, you must specify the project you want to train a
new model for to be able to use it during parse requests later on :
/train?project=my_project
. The configuration of the model should be
posted as the content of the request:
Using training data in json format:
language: "en"
pipeline: "spacy_sklearn"
# data contains the same json, as described in the training data section
data: {
"rasa_nlu_data": {
"common_examples": [
{
"text": "hey",
"intent": "greet",
"entities": []
}
]
}
}
Using training data in md format:
language: "en"
pipeline: "spacy_sklearn"
# data contains the same md, as described in the training data section
data: |
## intent:affirm
- yes
- yep
## intent:goodbye
- bye
- goodbye
Here is an example request showcasing how to send the config to the server to start the training:
$ curl -XPOST -H "Content-Type: application/x-yml" localhost:5000/train?project=my_project \
-d @sample_configs/config_train_server_md.yml
Note
You cannot send a training request for a project already training a new model (see below).
POST /evaluate
¶
You can use this endpoint to evaluate data on a model. The query string
takes the project
(required) and a model
(optional). You must
specify the project in which the model is located. N.b. if you don’t specify
a model, the latest one will be selected. This endpoint returns some common
sklearn evaluation metrics (accuracy, f1 score,
precision, as well as
a summary report).
$ curl -XPOST localhost:5000/evaluate?project=my_project&model=model_XXXXXX -d @data/examples/rasa/demo-rasa.json | python -mjson.tool
{
"accuracy": 0.19047619047619047,
"f1_score": 0.06095238095238095,
"precision": 0.036281179138321996,
"predictions": [
{
"intent": "greet",
"predicted": "greet",
"text": "hey",
"confidence": 1.0
},
...,
]
"report": ...
}
GET /status
¶
This returns all the currently available projects, their status (training
or ready
) and their models loaded in memory.
also returns a list of available projects the server can use to fulfill /parse
requests.
$ curl localhost:5000/status | python -mjson.tool
{
"available_projects": {
"my_restaurant_search_bot" : {
"status" : "ready",
"available_models" : [
<model_XXXXXX>,
<model_XXXXXX>
]
}
}
}
GET /version
¶
This will return the current version of the Rasa NLU instance.
$ curl localhost:5000/version | python -mjson.tool
{
"version" : "0.8.2"
}
GET /config
¶
This will return the default model configuration of the Rasa NLU instance.
$ curl localhost:5000/config | python -mjson.tool
{
"config": "/app/rasa_shared/config_mitie.json",
"data": "/app/rasa_nlu/data/examples/rasa/demo-rasa.json",
"duckling_dimensions": null,
"emulate": null,
...
}
DELETE /models
¶
This will unload a model from the server memory
$ curl -X DELETE localhost:5000/models -d '{"project": "my_restaurant_search_bot", "model": <model_XXXXXX>}'
Authorization¶
To protect your server, you can specify a token in your Rasa NLU configuration, e.g. by adding "token" : "12345"
to your config file, or by setting the RASA_TOKEN
environment variable.
If set, this token must be passed as a query parameter in all requests, e.g. :
$ curl localhost:5000/status?token=12345
On default CORS (cross-origin resource sharing) calls are not allowed. If you want to call your Rasa NLU server from another domain (for example from a training web UI) then you can whitelist that domain by adding it to the config value cors_origin
.
Serving Multiple Apps¶
Depending on your choice of backend, Rasa NLU can use quite a lot of memory. So if you are serving multiple models in production, you want to serve these from the same process & avoid duplicating the memory load.
Note
Although this saves the backend from loading the same backend twice, it still needs to load one set of word vectors (which make up most of the memory consumption) per language and backend.
As stated previously, Rasa NLU naturally handles serving multiple apps : by default the server will load all projects found
under the path
directory defined in the configuration. The file structure under path directory
is as follows :
- <path>
- <project_A>
- <model_XXXXXX>
- <model_XXXXXX>
...
- <project_B>
- <model_XXXXXX>
...
...
So you can specify which one to use in your /parse
requests:
$ curl 'localhost:5000/parse?q=hello&project=my_restaurant_search_bot'
or
$ curl -XPOST localhost:5000/parse -d '{"q":"I am looking for Chinese food", "project":"my_restaurant_search_bot"}'
You can also specify the model you want to use for a given project, the default used being the latest trained :
$ curl -XPOST localhost:5000/parse -d '{"q":"I am looking for Chinese food", "project":"my_restaurant_search_bot", "model":<model_XXXXXX>}'
If no project is to be found by the server under the path
directory, a "default"
one will be used, using a simple fallback model.
Server Parameters¶
There are a number of parameters you can pass when running the server.
$ python -m rasa_nlu.server
Here is a quick overview:
usage: server.py [-h] [-e {wit,luis,dialogflow}] [-P PORT]
[--pre_load PRE_LOAD [PRE_LOAD ...]] [-t TOKEN] [-w WRITE]
--path PATH [--cors [CORS [CORS ...]]]
[--max_training_processes MAX_TRAINING_PROCESSES]
[--num_threads NUM_THREADS] [--endpoints ENDPOINTS]
[--wait_time_between_pulls WAIT_TIME_BETWEEN_PULLS]
[--response_log RESPONSE_LOG] [--storage STORAGE] [-c CONFIG]
[--debug] [-v]
parse incoming text
optional arguments:
-h, --help show this help message and exit
-e {wit,luis,dialogflow}, --emulate {wit,luis,dialogflow}
which service to emulate (default: None i.e. use
simple built in format)
-P PORT, --port PORT port on which to run server
--pre_load PRE_LOAD [PRE_LOAD ...]
Preload models into memory before starting the server.
If given `all` as input all the models will be loaded.
Else you can specify a list of specific project names.
Eg: python -m rasa_nlu.server --pre_load project1
--path projects -c config.yaml
-t TOKEN, --token TOKEN
auth token. If set, reject requests which don't
provide this token as a query parameter
-w WRITE, --write WRITE
file where logs will be saved
--path PATH working directory of the server. Models areloaded from
this directory and trained models will be saved here.
--cors [CORS [CORS ...]]
List of domain patterns from where CORS (cross-origin
resource sharing) calls are allowed. The default value
is `[]` which forbids all CORS requests.
--max_training_processes MAX_TRAINING_PROCESSES
Number of processes used to handle training requests.
Increasing this value will have a great impact on
memory usage. It is recommended to keep the default
value.
--num_threads NUM_THREADS
Number of parallel threads to use for handling parse
requests.
--endpoints ENDPOINTS
Configuration file for the model server as a yaml file
--wait_time_between_pulls WAIT_TIME_BETWEEN_PULLS
Wait time in seconds between NLU model serverqueries.
--response_log RESPONSE_LOG
Directory where logs will be saved (containing queries
and responses).If set to ``null`` logging will be
disabled.
--storage STORAGE Set the remote location where models are stored. E.g.
on AWS. If nothing is configured, the server will only
serve the models that are on disk in the configured
`path`.
-c CONFIG, --config CONFIG
Default model configuration file used for training.
--debug Print lots of debugging statements. Sets logging level
to DEBUG
-v, --verbose Be verbose. Sets logging level to INFO