2. What is
• Startup Barcelona / San
Francisco
• SaaS / On-Premise
• Connect sources of
information, gather, analyze
and offer a UI to search for
information
• Knowledge Management and
apps for eDiscovery and
contract management
• API centric solution
6. Static models
• Detect interesting content type
• Get dataset - apply generic feature extraction
• Train model … for example: SVM (binary) - versions of the
models as contained files
• Embed in processing container jenkinsfile / dockerfile
• Build container and push version to registry
• Adding to production stack
• Continuous build on each commit
9. • Testing accuracy on production processing new messages.
Plug to queue and don’t ack messages to get reprocessed
• Scale the # of pods/beats for each one based on the size of the
queue
• Dead letter queues for not well processed
• Delay queues for incremental processing
• Experimental beats
• REST API to our Beats engine to extract information about a
resource
Static models
11. Guillotina
• Open Source scalable resource management framework (data
plaform)
• Provides an extensible Transactional Traversal REST API with
distributed DB support
• Event triggering
• Security model
• All async
• PyData talk 21/5
12. Dynamic models
• Distributed continuous training
• Serve multi model live
• Model storage
• Direct access to distributed
data
• Inference and train
14. Simplified example
• User defines labels - ML - BUSINESS
• Asks to train a classification model based on
logistic regression
• A model is saved on the DB (SavedModel)
• User defines a Rule to apply the model (tf serving) -
each time I have a new Mail from Gmail
15. tf.serving pods (c++)
• gRPC from Canonical to inference with mini batch /
model spec
• SourceAdapter (Loader) connector for Canonical
model loading (versions handling of the model)
• ServableAdapter with sharing vocabulary and
multiple loaded models (multiple models on a
serving component)
• k8s scaling and monitoring
16. Dynamic distributed flow
• Allocate variables and workers pods dynamic with k8s python api (there are
limits)
• Packs an Experiment and the Estimator (with Keras model or direct TF)
• input_fn is a feed generator that gets the data from the web socket API to
tfRecord (Guillotina model to Protobuff model)
• Re-train with new documents loading the model last check
• Write the model by the main worker on the canonical
• Another pod runs the validation on the saved canonical for each checkpoint
• Dev can runs the tensorboard to the canonical model endpoint