Change Log¶
All notable changes to this project will be documented in this file. This project adheres to Semantic Versioning starting with version 0.7.0.
[0.9.1] - 2017-07-11¶
Fixed¶
- removed obsolete
--output
parameter oftrain.py
. use--path
instead. fixes #473
[0.9.0] - 2017-07-07¶
Added¶
- increased test coverage to avoid regressions (ongoing)
- added regex featurization to support intent classification and entity extraction (
intent_entity_featurizer_regex
)
Changed¶
- replaced existing CRF library (python-crfsuite) with sklearn-crfsuite (due to better windows support)
- updated to spacy 1.8.2
- logging format of logged request now includes model name and timestamp
- use module specific loggers instead of default python root logger
- output format of the duckling extractor changed. the
value
field now includes the complete value from duckling instead of just text (so this is an property is an object now instead of just text). includes granularity information now. - deprecated
intent_examples
andentity_examples
sections in training data. all examples should go into thecommon_examples
section - weight training samples based on class distribution during ner_crf cross validation and sklearn intent classification training
- large refactoring of the internal training data structure and pipeline architecture
- numpy is now a required dependency
Removed¶
- luis data tokenizer configuration value (not used anymore, luis exports char offsets now)
Fixed¶
- properly update coveralls coverage report from travis
- persistence of duckling dimensions
- changed default response of untrained
intent_classifier_sklearn
from"intent": None
to"intent": {"name": None, "confidence": 0.0}
/status
endpoint showing all available models instead of only those whose name starts with model- properly return training process ids #391
[0.8.9] - 2017-05-26¶
Fixed
—–^
- properly handle response_log configuration variable being set to null
[0.8.8] - 2017-05-26¶
Fixed¶
- /status endpoint showing all available models instead of only those whose name starts with model
[0.8.0] - 2017-05-08¶
Added¶
- ngram character featurizer (allows better handling of out-of-vocab words)
- replaced pre-wired backends with more flexible pipeline definitions
- return top 10 intents with sklearn classifier #199
- python type annotations for nearly all public functions
- added alternative method of defining entity synonyms
- support for arbitrary spacy language model names
- duckling components to provide normalized output for structured entities
- Conditional random field entity extraction (Markov model for entity tagging, better named entity recognition with low and medium data and similarly well at big data level)
- allow naming of trained models instead of generated model names
- dynamic check of requirements for the different components & error messages on missing dependencies
- support for using multiple entity extractors and combining results downstream
Changed¶
unified tokenizers, classifiers and feature extractors to implement common component interface
src
directory renamed torasa_nlu
when loading data in a foreign format (api.ai, luis, wit) the data gets properly split into intent & entity examples
- Configuration:
- added
max_number_of_ngrams
- removed
backend
and addedpipeline
as a replacement - added
luis_data_tokenizer
- added
duckling_dimensions
- added
- parser output format changed
from
{"intent": "greeting", "confidence": 0.9, "entities": []}
to
{"intent": {"name": "greeting", "confidence": 0.9}, "entities": []}
- entities output format changed
from
{"start": 15, "end": 28, "value": "New York City", "entity": "GPE"}
to
{"extractor": "ner_mitie", "processors": ["ner_synonyms"], "start": 15, "end": 28, "value": "New York City", "entity": "GPE"}
where
extractor
denotes the entity extractor that originally found an entity, andprocessor
denotes components that alter entities, such as the synonym component.
camel cased MITIE classes (e.g.
MITIETokenizer
→MitieTokenizer
)model metadata changed, see migration guide
updated to spacy 1.7 and dropped training and loading capabilities for the spacy component (breaks existing spacy models!)
introduced compatibility with both Python 2 and 3
Removed¶
[0.7.4] - 2017-03-27¶
Fixed¶
- fixed failed loading of example data after renaming attributes, i.e. “KeyError: ‘entities’”
[0.7.3] - 2017-03-15¶
Fixed¶
- fixed regression in mitie entity extraction on special characters
- fixed spacy fine tuning and entity recognition on passed language instance
[0.7.1] - 2017-03-10¶
[0.7.0] - 2017-03-10¶
This is a major version update. Please also have a look at the Migration Guide.
Added¶
- Changelog ;)
- option to use multi-threading during classifier training
- entity synonym support
- proper temporary file creation during tests
- mitie_sklearn backend using mitie tokenization and sklearn classification
- option to fine-tune spacy NER models
- multithreading support of build in REST server (e.g. using gunicorn)
- multitenancy implementation to allow loading multiple models which share the same backend
Fixed¶
- error propagation on failed vector model loading (spacy)
- escaping of special characters during mitie tokenization