Aspect-based sentiment analysis is a text analysis technique that breaks down text into aspects (attributes or components of a product or service), and then scores the sentiment level (positive, negative or neutral) of each aspect. In this talk we'll walk through a production pipeline for training large Aspect Based Sentiment Analysis model in python with the Intel NLP Architect package based on the following open sourced code https://github.com/microsoft/nlp-recipes/tree/master/examples/sentiment_analysis/absa
12. What is Azure Machine Learning Service?
Set of Azure
Cloud Services
Python
SDK
Prepare Data
Build Models
Train Models
Manage Models
Track Experiments
Deploy Models
That enables
you to:
19. Challenges of distributed training
Dependencies and
Containers
Provision clusters of
VMs
Schedule jobs
Distribute data
Gather results
Handle failures
Scale resources
Secure Access
24. from azureml.core import Workspace
ws = Workspace.create(name='myworkspace',
subscription_id='<azure-subscription-id>',
resource_group='myresourcegroup',
create_resource_group=True,
location='eastus2' # or other supported Azure region
)
# see workspace details
ws.get_details()
Step 2 – Create an Experiment
experiment_name = ‘my-experiment-1'
from azureml.core import Experiment
exp = Experiment(workspace=ws, name=experiment_name)
Step 1 – Create a workspace
25. Step 3 – Create remote compute target
# choose a name for your cluster, specify min and max nodes
compute_name = os.environ.get("BATCHAI_CLUSTER_NAME", "cpucluster")
compute_min_nodes = os.environ.get("BATCHAI_CLUSTER_MIN_NODES", 0)
compute_max_nodes = os.environ.get("BATCHAI_CLUSTER_MAX_NODES", 4)
# This example uses CPU VM. For using GPU VM, set SKU to STANDARD_NC6
vm_size = os.environ.get("BATCHAI_CLUSTER_SKU", "STANDARD_D2_V2")
provisioning_config = AmlCompute.provisioning_configuration(
vm_size = vm_size,
min_nodes = compute_min_nodes,
max_nodes = compute_max_nodes)
# create the cluster
print(‘ creating a new compute target... ')
compute_target = ComputeTarget.create(ws, compute_name, provisioning_config)
# You can poll for a minimum number of nodes and for a specific timeout.
# if no min node count is provided it will use the scale settings for the cluster
compute_target.wait_for_completion(show_output=True,
min_node_count=None, timeout_in_minutes=20)
26. # note that while loading, we are shrinking the intensity values (X) from 0-255 to 0-1 so that the
model converge faster.
X_train = load_data('./data/train-images.gz', False) / 255.0
y_train = load_data('./data/train-labels.gz', True).reshape(-1)
X_test = load_data('./data/test-images.gz', False) / 255.0
y_test = load_data('./data/test-labels.gz', True).reshape(-1)
First load the compressed files into numpy arrays. Note the ‘load_data’ is a custom function that simply parses the
compressed files into numpy arrays.
Now make the data accessible remotely by uploading that data from your local machine into Azure so it can be
accessed for remote training. The files are uploaded into a directory named mnist at the root of the datastore.
ds = ws.get_default_datastore()
print(ds.datastore_type, ds.account_name, ds.container_name)
ds.upload(src_dir='./data', target_path='mnist', overwrite=True, show_progress=True)
We now have everything you need to start training a model.
Step 4 – Upload data to the cloud
27. %%time from sklearn.linear_model import LogisticRegression
clf = LogisticRegression()
clf.fit(X_train, y_train)
# Next, make predictions using the test set and calculate the accuracy
y_hat = clf.predict(X_test)
print(np.average(y_hat == y_test))
You should see the local model accuracy displayed. [It should be a number like 0.915]
Train a simple logistic regression model using scikit-learn locally. This should take a minute or two.
Step 5 – Train a local model
28. To submit a training job to a remote you have to perform the following tasks:
• 6.1: Create a directory
• 6.2: Create a training script
• 6.3: Create an estimator object
• 6.4: Submit the job
Step 6.1 – Create a directory
Create a directory to deliver the required code from your computer to the remote resource.
import os
script_folder = './sklearn-mnist' os.makedirs(script_folder, exist_ok=True)
Step 6 – Train model on remote cluster
29. %%writefile $script_folder/train.py
# load train and test set into numpy arrays
# Note: we scale the pixel intensity values to 0-1 (by dividing it with 255.0) so the model can
# converge faster.
# ‘data_folder’ variable holds the location of the data files (from datastore)
Reg = 0.8 # regularization rate of the logistic regression model.
X_train = load_data(os.path.join(data_folder, 'train-images.gz'), False) / 255.0
X_test = load_data(os.path.join(data_folder, 'test-images.gz'), False) / 255.0
y_train = load_data(os.path.join(data_folder, 'train-labels.gz'), True).reshape(-1)
y_test = load_data(os.path.join(data_folder, 'test-labels.gz'), True).reshape(-1)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape, sep = 'n’)
# get hold of the current run
run = Run.get_context()
#Train a logistic regression model with regularizaion rate of’ ‘reg’
clf = LogisticRegression(C=1.0/reg, random_state=42)
clf.fit(X_train, y_train)
Step 6.2 – Create a Training Script (1/2)
30. print('Predict the test set’)
y_hat = clf.predict(X_test)
# calculate accuracy on the prediction
acc = np.average(y_hat == y_test)
print('Accuracy is', acc)
run.log('regularization rate', np.float(args.reg))
run.log('accuracy', np.float(acc)) os.makedirs('outputs', exist_ok=True)
# The training script saves the model into a directory named ‘outputs’. Note files saved in the
# outputs folder are automatically uploaded into experiment record. Anything written in this
# directory is automatically uploaded into the workspace.
joblib.dump(value=clf, filename='outputs/sklearn_mnist_model.pkl')
Step 6.2 – Create a Training Script (2/2)
31. An estimator object is used to submit the run.
from azureml.train.estimator import Estimator
script_params = { '--data-folder': ds.as_mount(), '--regularization': 0.8 }
est = Estimator(source_directory=script_folder,
script_params=script_params,
compute_target=compute_target,
entry_script='train.py’,
conda_packages=['scikit-learn'])
Step 6.4 – Submit the job to the cluster for training
run = exp.submit(config=est)
run
Step 6.3 – Create an Estimator
32. Step 7 – Monitor a run
You can watch the progress of the run with a Jupyter widget. The widget is asynchronous and provides live
updates every 10-15 seconds until the job completes.
from azureml.widgets import RunDetails
RunDetails(run).show()
Here is a still snapshot of the widget shown at the end of training:
33. Step 8 – See the results
As model training and monitoring happen in the background. Wait until the model has completed training before
running more code. Use wait_for_completion to show when the model training is complete
run.wait_for_completion(show_output=False)
# now there is a trained model on the remote cluster
print(run.get_metrics())
{'regularization rate': 0.8, 'accuracy':
0.9204}
34. Step 9 – Register the model
This wrote the file ‘outputs/sklearn_mnist_model.pkl’ in a directory named ‘outputs’ in the VM of the cluster where
the job is executed.
• outputs is a special directory in that all content in this directory is automatically uploaded to your workspace.
• This content appears in the run record in the experiment under your workspace.
• Hence, the model file is now also available in your workspace.
joblib.dump(value=clf, filename='outputs/sklearn_mnist_model.pkl')
Recall that the last step in the training script is:
# register the model in the workspace
model = run.register_model (
model_name='sklearn_mnist’,
model_path='outputs/sklearn_mnist_model.pkl’)
The model is now available to query, examine, or deploy
36. Step 9.1 – Create the scoring script
Create the scoring script, called score.py, used by the web service call to show how to use the model.
It requires two functions – init() and run (input data)
from azureml.core.model import Model
def init():
global model
# retreive the path to the model file using the model name
model_path = Model.get_model_path('sklearn_mnist’)
model = joblib.load(model_path)
def run(raw_data):
data = np.array(json.loads(raw_data)['data’])
# make prediction
y_hat = model.predict(data) return json.dumps(y_hat.tolist())
37. Step 9.2 – Create environment file
Create an environment file, called myenv.yml, that specifies all of the script's package dependencies. This file is
used to ensure that all of those dependencies are installed in the Docker image. This example needs scikit-learn
and azureml-sdk.
from azureml.core.conda_dependencies import CondaDependencies
myenv = CondaDependencies()
myenv.add_conda_package("scikit-learn")
with open("myenv.yml","w") as f:
f.write(myenv.serialize_to_string())
Step 9.3 – Create configuration file
Create a deployment configuration file and specify the number of CPUs and gigabyte of RAM needed for the
ACI container. Here we will use the defaults (1 core and 1 gigabyte of RAM)
from azureml.core.webservice import AciWebservice
aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1,
tags={"data": "MNIST", "method" : "sklearn"},
description='Predict MNIST with sklearn')
38. Step 9.4 – Deploy the model to ACI
%%time
from azureml.core.webservice import Webservice
from azureml.core.image import ContainerImage
# configure the image
image_config = ContainerImage.image_configuration(
execution_script ="score.py",
runtime ="python",
conda_file ="myenv.yml")
service = Webservice.deploy_from_model(workspace=ws, name='sklearn-mnist-svc’,
deployment_config=aciconfig, models=[model],
image_config=image_config)
service.wait_for_deployment(show_output=True)
39. Step 10 – Test the deployed model using the HTTP end point
Test the deployed model by sending images to be classified to the HTTP endpoint
import requests
import json
# send a random row from the test set to score
random_index = np.random.randint(0, len(X_test)-1)
input_data = "{"data": [" + str(list(X_test[random_index])) + "]}"
headers = {'Content-Type':'application/json’}
resp = requests.post(service.scoring_uri, input_data, headers=headers)
print("POST to url", service.scoring_uri)
#print("input data:", input_data)
print("label:", y_test[random_index])
print("prediction:", resp.text)
43. Typical ‘manual’ approach to hyperparameter tuning
Dataset
Training
Algorithm 1
Hyperparameter
Values – config 1
Model 1
Hyperparameter
Values – config 2
Model 2
Hyperparameter
Values – config 3
Model 3
Model
Training
InfrastructureTraining
Algorithm 2
Hyperparameter
Values – config 4
Model 4
Complex
Tedious
Repetitive
Time consuming
Expensive
44. Automated Hyperparameter Tuning
Manage Active Jobs
Evaluate training runs for
specified primary metric
Early terminate poor performing
training runs. Early termination
policies include:
Early terminate poorly
performing runs
45. /Docs alert
Get started with Azure
Machine Learning Service
Hyperparameter Tuning
https://aka.ms/AzureMLHyperDrive
For this, the simplest solution would be to use Text analytics, which is Azure Cognitive service that enables you to enter your text and get automatically metadata and features. Tailwind Traders can use it to have insight into how people are interacting with their restaurants. Let’s have a look at how it works.
Why choose these APIs ? They work, and it’s easy.
Easy: The APIs are easy to implement because of the simple REST calls. Being REST APIs, there’s a common way to implement and you can get started with all of them for free simply by going to one place, one website, www.microsoft.com/cognitive. (You don’t have to hunt around to different places.)
Flexible: We’ve got a breadth of intelligence and knowledge APIs so developers will be able to find what intelligence feature they need; and importantly, they all work on whatever language, framework, or platform developers choose. So, devs can integrated into their apps—iOS, Android, Windows—using their own tools they know and love (such as python or node.js, etc.).
Tested: Tap into an ever-growing collection of powerful AI algorithms developed by experts. Developers can trust the quality and expertise build into each by experts in their field from Microsoft’s Research organization, Bing, and Azure machine learning and these capabilities are used across many Microsoft first party products such as Cortana, Bing and Skype.
The pre-built AI services, whether it's in text or computer vision or language understanding usually will fill 80% of your business
scenario needs. However, you're going to find that there are 20% of your business needs that might not be filled by a
prebuilt model, and in those cases you're going to need something more custom. What tends to be really interesting,
in my own personal experience, and as you're going to see this applies to Tailwind Traders as well, is that sometimes these 20% end up being the most valuable.
In many cases, I might want to start with the cognitive services because it really makes it easy to get started, and that enables me to focus more deeply on the 20% of scenarios that I can't
reach with the cognitive services.
So let me give you a quick example in demo. We take the Text Analytics API, and we see here. I had a wonderful trip to
Seattle last week. I even visited the Space Needle 2 times and we get really nice 84% positive sentiment. But here's something that's
Interesting. If I changed one letter in “wonderful”, all of a sudden this typo causes the sentiment to drop to 50%. Now, this specific example is not a crazy
example because you know it's something that I can change with maybe putting in a model for spelling detection. But there
might be other things as I'm building and trying to build more robust and more custom versions of sentiment analysis that might be domain specific to my restaurant, and less applicable in the general model.
In that case, I might want to build my own so the way that I do this is I can use the Azure Machine Learning Service.
What the Azure Machine Learning Services is -- it's a set of cloud services and APIs, with a Python SDK, that enable you to prepare your data, build models, train the models, and manage them, as well as the different experiments that you're using to build these models, and finaly deploy them at scale.
So the Azure ML Services are broken up to a few different artifacts.
We have the models that we're going to build, the experiments of how we come up and develop them, the pipelines that allow us to take our experiments
and build the pipeline for their deployment. This is especially useful for linking our data to model development and redeployment.
We have compute management, which allows us to
manage compute at scale, creating clusters of VMs as needed.
Image management allows us to manage different environments for our models and deployments, allowing us to
deploy them on edge devices, on the cloud, or even on prem. And finally the data storage is also integrated into one workspace.
So now that we have our baseline and I think that's great. Let’s discuss some of the more advanced Azura ML Services. They provide a foundation, on top of which we can train any models, including some open source models, and then put them into production and manage.
We're going to be taking an open source model developed by Intel AI Labs as part of their NLP Architect Library. What it does is it provides us with something called aspect based sentiment analysis. Before when we were looking at our restaurant reviews, we got just a very simple sentiment average for each restaurant. However, for Tailwind Traders that doesn't help to actually go in and fix any of the problems, because they still have one more step that they have to do - they have to send in an inspector, who can actually quantify and see why sentiment is so good at one branch versus another. Aspect based sentiment analysis allows us to address this challenge. What it does is is it takes our text an it extracts what's called a lexicon from it, of both aspect-related and opinion-related words. Each of the aspects represents things we're interested in, like the price of food, cleanness of the place, etc. And when we analyze actual review - it would give us a sentiment for each one of these individual aspects,
which then in turn allows us to provide much more actionable responses from our models.
These are the ABSA key messages:
Producing knowledge regarding specific aspects hence enables to gain targeted business insight.
Unsupervised - does not require costly manually tagged data for train
Domain Adaptive – produces domain specific aspect and opinion terms
These are the additional NLP architect key messages:
Develop low-resource and NLP models that require small amounts of training data.
Increase adoption of NLP by developing deep-learning based tools that can be used by non-machine-learning experts.
Produce optimized NLP models that are ready for deployment on Intel AI hardware.
So, in order to put this Intel model into production, we’re going to use the Azure Machine Learning Service. I think it's important to understand how the end to end process works, from taking a model from open source up to deploying it into production. The first thing that we have to do is to prepare a data set, then we have to choose our experiment environment, in this case we're going to be using Jupyter notebooks. We have to go and train this model, and
iterate a few of different versions. Once we do have a model that is ready for deployment, we need to work on putting the model into production.
In this session, we're going to be focusing on the experimentation phase, though I strongly recommend that you check out the AIML 50 session, which talks about how you can take your models that you've developed and put them into production.
So for Tailwind Traders when they're thinking of how to take an open source state of the art model and put it into production, they've noticed that they have a bunch of critical challenges. Things like being able to manage the environmental dependency across the different packages and their different versions. Training a model with a globally distributed team in a globally distributed company is also a challenge. Managing large data sets and understanding how to optimize costs for storing their data and compute, as well as scaling their compute. Also in terms of compliance and making sure that they are ensured data protection, that only certain people have access to data and only the correct parts of the data.
When they build the models, making sure that the models are interpretable in a way that they can understand how they were trained how they can be reproduced and the last real big challenge that they have is managing distributed training.
Each of these things is somehow provided by the Azure ML Service, but it would take a very long time to go through each and every single one of them individually So what will do here is we're going to drill down into one of them and you will see the complexity of it, and then as we go and we solve and put this aspect based sentiment analysis model for Tailwind Traders in production.
19
20
I am providing links to relevant docs here for your convenience.
When I need to train an Azure ML model, it works as follows. There are just a few steps that we need to get started.
The first thing that I need to do is I need to take the code that I have for model training and I need to send it to the Azure Machine Learning
Service. The service will then create the configuration for the experiment that I’ve just sent, and it will create a Docker image. It will deploy it to the
compute target that I've configured which will be linked to the data that I’ve uploaded in my dataset. It will then launch the training script that I've configured and
then in real time it will send me back the stream logs and metrics so that I can see how the process goes, as if I were developing on my own computer!
To take this Aspect-Based Sentiment Analysis model into production, there would by 8 simple steps!
1. Configure Azure Machine Learning Workspace that
we saw above, it would act like a central place for all our activities – from storing data to training models.
2. Define a compute target – on which our training code is going to run.
3. Upload our data set.
4. Create the experiment with the train code, encompassing all the dependencies that we have and the training script.
5. Training script can be taken from the existing Intel’s code. Sometimes we might need to adjust the train script from the original library for distributed training, in other cases changes to the code would be very minimal.
6. Run the estimator to perform the training
7. Once we're done with the results of training, we get the model which we then need to register.
8. Registered model can be deployed and tested, and becomes ready for production.
Let’s have a look at how such a model can be trained using a compute cluster. Typically, as a data scientist, I will do most of my development from a Jupyter Notebook. So I can select any Jupyter environment (local, Azure Notebooks or a Notebook inside Azure ML Workspace), and start my work from there using Azure ML Python SDK.
When we're dealing with model optimization, we also have a challenge of distributed hyperparameter tuning.
What is distributed hyperparameter tuning?
What are hyperparameters typically? When you see a new state of the art
model that's been open sourced, whether that’s BERT, or
in this case our aspect based sentiment analysis model, typically you see these models trained on certain data. The performance of the model is
Dependent on the distribution of the data that it’s been trained on, and there are set of parameters that are configured as part of the models
that are optimized for the given data set, so that the researcher who is publishing the model can get the best possible performance.
Now, when you take these models and you run them on your own data or train them for your own scenario, you're going to have a
different distribution of data, so these hyperparameters might need to be adjusted, otherwise you will see a big gap in performance of your model when you apply it to Tailwind
Traders data, comparing to what’s written in the leaderboards or in the papers.
This it can be a real challenge going through a manual hyperparameter tuning. Typically, what we do is we take a data set and we come up with an algorithm, and then we configure these parameters to build our first model, and then we change the parameters again and we build a second model and we tried to see whether it's better than the first one, and we do
this a couple of times until we max out the potential of this algorithm. And then we might change our model and then we have to do this process all over again, and you can imagine it’s time-consuming tedious repetitive and really expensive process.
Well, the Azure ML Service helps you manage this by running distributed versions of your model and allowing you to choose which are the primary metrics. You want to optimize based on the hyperparameters that you provide, and it provides a bunch of different policies, such as bandit policies, medium stopping policy and truncation policies that enable early termination, so if it sees that a certain set of hyperparameters isn't providing the best possible results, it will terminate those run jobs and release and then run and prioritize the compute for models that are much more promising -- until you find the optimal set of hyperparameters
If you want to get started with that here's the documentation. We will be diving into it in a second so you can see how this service works now.
So let's tie this all together what we've seen here in literally less than an hour.
There are also a few takeways that I want you to get away with. First of all, AI is instrumental to companies success in the modern world, and most of the AI can be incorporated into production using Cognitve Services. However, for more complex tasks, Microsoft provides both simple “code-less” ways to train a baseline model, and very sophisticated Azure ML Service that can simplify life of a serious data scientist quite a bit. Whether you are experiences ML professional or a “traditional” developer – we have you covered!
Here is the summary of all docs pages that we have touched upon – they all present a great material for you to study after the session. I want especially to draw your attention to the last link, which is an overview article explaining how the thights we have learnt today can be useful in developing production models.
So thank you for your time and I hope you enjoyed.