This notebook contains the steps and code to demonstrate support of AutoAI experiments in Watson Machine Learning service. It introduces Python SDK commands for data retrieval, training experiments, persisting pipelines, testing pipelines, refining pipelines, and scoring the resulting model.
Note: Notebook code generated using AutoAI will execute successfully. If code is modified or reordered, there is no guarantee it will successfully execute. For details, see: Saving an Auto AI experiment as a notebook
Some familiarity with Python is helpful. This notebook uses Python 3.7 and ibm_watson_machine_learning
package.
The learning goals of this notebook are:
This notebook contains the following parts:
Setup
Package installation
Watson Machine Learning connection
Experiment configuration
Experiment metadata
Working with completed AutoAI experiment
Get fitted AutoAI optimizer
Pipelines comparison
Get pipeline as scikit-learn pipeline model
Inspect pipeline
Visualize pipeline model
Preview pipeline model as python code
Deploy and Score
Working with spaces
Running AutoAI experiment with Python SDK
Clean up
Next steps
Copyrights
Before you use the sample code in this notebook, install the following packages:
This cell defines the metadata for the experiment, including: training_data_reference, training_result_reference, experiment_metadata.
This cell imports the pipelines generated for the experiment so they can be compared to find the optimal pipeline to save as a model.
Use get_params()
- to retrieve configuration parameters.
Use the summary()
method to list trained pipelines and evaluation metrics information in
the form of a Pandas DataFrame. You can use the DataFrame to compare all discovered pipelines and select the one you like for further testing.
You can visualize the scoring metric calculated on a holdout data set.
After you compare the pipelines, download and save a scikit-learn pipeline model object from the AutoAI training job.
Tip: If you want to get a specific pipeline you need to pass the pipeline name in:
pipeline_optimizer.get_pipeline(pipeline_name=pipeline_name)
Next, check features importance for selected pipeline.
Tip: If you want to check all model evaluation metrics-details, use:
pipeline_optimizer.get_pipeline_details()
Preview pipeline model stages as a graph. Each node's name links to a detailed description of the stage.
In the next cell, you can preview the saved pipeline model as a python code.
You will be able to review the exact steps used to create the model.
Note: If you want to get sklearn representation add following parameter to pretty_print
call: astype='sklearn'
.
In this section you will specify a deployment space for organizing the assets for deploying and scoring the model. If you do not have an existing space, you can use Deployment Spaces Dashboard to create a new space, following these steps:
space_id
and paste it below.Tip: You can also use the SDK to prepare the space for your work. Learn more here.
Action: assign or update space ID below
Use the print
method for the deployment object to show basic information about the service:
To show all available information about the deployment use the .get_params()
method:
You can make scoring request by calling score()
on the deployed pipeline.
If you want to work with the web service in an external Python application,follow these steps to retrieve the service object:
service = WebService(wml_credentials)
service.list()
methodservice.get('deployment_id')
methodAfter that you can call service.score()
method.
You can delete the existing deployment by calling the service.delete()
command.
To list the existing web services, use service.list()
.
If you want to run AutoAI experiment using python API follow up the steps decribed below. The experiment settings were generated basing on parameters set on UI.
{“HMAC”:true}
, click Add.This configuration parameter adds the following section to the instance credentials, (for use later in this notebook):
cos_hmac_keys”: {
“access_key_id”: “***“,
“secret_access_key”: “***”
}
Action: Please provide cos credentials in following cells.
from ibm_watson_machine_learning.experiment import AutoAI
experiment = AutoAI(wml_credentials, project_id=experiment_metadata['project_id'])
#@hidden_cell
cos_hmac_keys = {
"access_key_id": "PLACE_YOUR_ACCESS_KEY_ID_HERE",
"secret_access_key": "PLACE_YOUR_SECRET_ACCESS_KEY_HERE"
}
cos_api_key = "PLACE_YOUR_API_KEY_HERE"
OPTIMIZER_NAME = 'custom_name'
from ibm_watson_machine_learning.helpers import DataConnection
from ibm_watson_machine_learning.helpers import S3Connection, S3Location
training_data_reference = [DataConnection(
connection=S3Connection(
api_key=cos_api_key,
auth_endpoint='https://iam.bluemix.net/oidc/token/',
endpoint_url='https://s3.ap-geo.objectstorage.softlayer.net',
access_key_id = cos_hmac_keys['access_key_id'],
secret_access_key = cos_hmac_keys['secret_access_key']
),
location=S3Location(
bucket='diabetesearlyrisk-donotdelete-pr-b94sfvhycadc8k',
path='diabetes_data_upload[1].csv'
)),
]
from ibm_watson_machine_learning.helpers import S3Connection, S3Location
training_result_reference = DataConnection(
connection=S3Connection(
api_key=cos_api_key,
auth_endpoint='https://iam.bluemix.net/oidc/token/',
endpoint_url='https://s3.ap-geo.objectstorage.softlayer.net',
access_key_id = cos_hmac_keys['access_key_id'],
secret_access_key = cos_hmac_keys['secret_access_key']
),
location=S3Location(
bucket='diabetesearlyrisk-donotdelete-pr-b94sfvhycadc8k',
path='auto_ml/ff16ec8b-bc41-43f7-863f-a2370f2ec0ca/wml_data/1e08fcc0-f64c-4d19-a325-d90f7ff096e6/data/automl'
))
pipeline_optimizer = experiment.optimizer(
name=OPTIMIZER_NAME,
prediction_type=experiment_metadata['prediction_type'],
prediction_column=experiment_metadata['prediction_column'],
scoring=experiment_metadata['scoring'],
include_only_estimators=experiment_metadata['include_only_estimators'],
holdout_size=experiment_metadata['holdout_size'],
csv_separator=experiment_metadata['csv_separator'],
excel_sheet=experiment_metadata['excel_sheet'],
positive_label=experiment_metadata['positive_label'],
drop_duplicates=experiment_metadata['drop_duplicates'])```
pipeline_optimizer.fit(training_data_reference=training_data_reference,
training_results_reference=training_result_reference,
background_mode=False)
Licensed Materials - Copyright © 2021 IBM. This notebook and its source code are released under the terms of the ILAN License. Use, duplication disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
Note: The auto-generated notebooks are subject to the International License Agreement for Non-Warranted Programs
(or equivalent) and License Information document for Watson Studio Auto-generated Notebook (License Terms),
such agreements located in the link below. Specifically, the Source Components and Sample Materials clause
included in the License Information document for Watson Studio Auto-generated Notebook applies to the auto-generated notebooks.
By downloading, copying, accessing, or otherwise using the materials, you agree to the License Terms