What? AutoAI? What about Google cloud AI, Microsoft Azure cloud AI or AWS AI platform? Well, yes. This cloud platforms is very useful, but AutoAI is better. AutoAI has an advantage from an unexpected side. The advantage is that you don't need to write any single line of code! How it works?
Creating models can be time consuming. If you do it to a notebook you have to decide which attribute to use, deal with absent values, and convert all attributes to a format that is acceptable to a chosen algorithm. Then you have to try different algorithms to find which one perform the best. So, you waste a lot of time on things you don't wanna waste on. AutoAI helps you with this routine tasks.
First, you need to collect and process data. I chose breast cancer Coimbra data set. You can find it here. Let's take a look at the data set.
There are 10 predictors, all quantitative, and a binary dependent variable, indicating the presence or absence of breast cancer. The predictors are anthropometric data and parameters which can be gathered in routine blood analysis. Prediction models based on these predictions, if accurate, can be potentially be used as a biomarker of breast cancer.
Using a regular notebook we would have to write the parameters for each model we chose, separately train models and compare their results. How is it going with AutoAI? You need a free IBM Cloud account. You can register
here. When you logged in our account in search bar you write "Watson Studio". AutoAI is a part of Watson Studio.
Then you choose this service.
In Watson Studio you create a new project.
Choose empty project.
Chose name of you project and description. Also there you will need to create Cloud Object Storage. I already have this object. In simple words this is the place in IBM servers where you will store you data set. You will have option to create this object directly from the page of creating new project.
Now you need to add AutoAI experiment to your project. You choose "Add to project"
And
Now we choose the name of you AutoAI experiment
And you need to create Machine Learning service (just open the link in new tab)
Choose Machine Learning service
Choose region of ML service
Also we can see configuration of ML service (this configuration for free account)
Create ML service
Choose ML service and press "Associate service"
Come back to a tab of creating new AutoAI experiment. Press Reload
Now you can associate new ML service to your experiment. Just press Create button
Now you need to download your data set
When did you choose your CSV file you will see this page
You can predict time series (if you data set is timeseries), or press no and predict some column from your data set. I did choose column "Classification" in my data set.
Now you can see which type of prediction your experiment will have and optimization parameter.
Press "Experiment settings"
In settings you can see 4 types of prediction setting. I did choose Binary classification. My data set is fit for this classification. But I also did Multiclass Classification and this give me slightly worse results.
In Data source you can customize how AutoAI will work with you data set. You can drop duplicate rows, use subsample for a large data sets, detect text (When enabled, columns detected as text will be transformed into vectors to better analyze semantic similarity between strings. Enabling this setting may increase run time), customize how to split your training data
Also you can choose which column use or not.
In the Runtime tab you can see runtime setting
And compute configuration
Close setting and run the experiment. While I went to make myself a cup of tea, the experiment ended. Very fast!
AutoAI use 8 pipelines. 4 for logistic regression and 4 for LGBM classifier. Each with different enhancements: HPO-1 (1st hyperparameter optimization), FE(final feature engineering), HPO-2(final round of hyperparameter optimization), and one pipeline without enhancements.
You can choose pipeline (press violet or blue circle) and see pipeline details.
Interesting tab is Feature Importance. You can see which feature is more important in a model. And you can see that AutoAI create its own features. For example, create a new feature from summing two columns (Glucose and Resistin in my example)
In pipeline comparison tab you can see this. So you can choose the model you need based on the metric you need.
Now you can save experiment code (without writing a single line of code!). Press Save as button
Save your model. As you can see you also can save a notebook.
You will see this pop-up. Just press the link.
Now you can start to deploy your AI-based application. Press the button Promote to deployment space
As you can see you don't have space to promote. Deployment spaces allow you to create deployments for machine learning models and functions and view and manage all of the activity and assets for the deployments, including data connections, data refinery flows, and connected data assets
Click on the menu in the upper left corner
In this menu open View all spaces in new tab
Now press the button New deployment space
Select your existed ML service.
Name your space and create it.
Go back to the previous tab where you promote your model to space. In the target space menu choose space which you just created. Click Promote button.
You will see new pop-up alert. Just click the link in this alert.
Click deploy model (little rocket near to last modified date)
Choose online and press Create button
Choose Deployments tab and choose deployment that you just created.
You will see this page.
Also you can test your model. Just choose test tab and enter your data.
Also you can download notebook. You need come back to the page of your AI experiment, save your model (as we did above), but save it as notebook. And you will got this.
You can see my notebook on my
git.
You can also find a web application that I created based on my date set. All you need is to copy the files to your local computer from my git account.
The link is here.
Conclusion
Minus of IBM AutoAI is difficult navigation in IBM cloud ecosystem. I spend a lot of time just to find where is AutoAI experiment and how to start it. Well, I showed you the easiest way to create an AutoAI experiment. I think IBM need to improve navigation in IBM Cloud. Also dividing IBM cloud and IBM Watson Studio is not clear to me.
Plus of IBM AutoAI – I didn’t write any single line of code! It is incredible and it helps me to focus on more important aspects of Data Science: gathering data, cleaning data, analyzing data and working with results of AI experiments.
Information about data set:
[Patricio, 2018] Patrício, M., Pereira, J., Crisóstomo, J., Matafome, P., Gomes, M., Seiça, R., & Caramelo, F. (2018). Using Resistin, glucose, age and BMI to predict the presence of breast cancer. BMC Cancer, 18(1).
[Web Link]
Comments
Post a Comment