Run and compare multiple models

Open this project in AskAnna

This example is all about running code in AskAnna. It shows how you can train and evaluate multiple models using AskAnna. We based this example on the article Quickly Compare Multiple Models written by data scientist Cole Brendel.

As mentioned in the article, the main idea behind comparing models is finding the model architecture that best fits your data. In this demo you will train multiple models and evaluate the performance using AskAnna. Additionally we demonstrate how you track what you do. It's like version control for data science.

With this example you will learn to:

Quickly compare multiple models
Create a project in AskAnna using a project template
Configure tracking metrics and variables
Run both Python scripts and Jupyter Notebooks
Use the web interface to review the run

If you want to run the models yourself, you need:

an AskAnna account (you can sign up for free)
Python > 3.7
install AskAnna CLI and login

First, we will show the quick tour how to run this example project. Next, we will explain step by step what we did.

Quick tour

In your terminal, run the following command to start a new project using this demo project template:

askanna create --template https://gitlab.com/askanna/demo/demo-multiple-models.git --push

You only have to add a project name. We named it Quickly compare multiple models. The description is optional. If you confirm, the AskAnna CLI will set up a new project.

Now you can run the job to compare multiple models:

askanna run compare-multiple-models

Go to https://beta.askanna.eu and select the project you just created. Or open the project URL that was mentioned in the AskAnna CLI when you created the project.

On the project page, open the tab JOBS and:

click on of the job compare-multiple-models
click on the run
on the run page, click on the tab RESULT

If the run finished, in the RESULT tab you see an image with a summary of the models.

The above is a really short version of how you can run jobs in AskAnna to compare multiple models. Next, we will deep dive into the different steps to share more background about what just happened.

What happened in the quick tour?

Create the "Quickly compare different models" project

In your terminal, you run the command:

askanna create --template https://gitlab.com/askanna/demo/demo-multiple-models.git --push

After you confirmed that you want to create the project, this is what happened:

a new project in AskAnna was created
a new local directory for the project was created based on the project name
the project files were copied from the project template in the local directory
updated the askanna.yml config with the newly created project URL
pushed a version of the project files to AskAnna

For this example, we used one of our demo project templates to quickly set up the project. In the create command, we referred to the template's location via the argument --template.

With the argument --push we gave the instructions that after creating the project, we want to push a version of the project to AskAnna. If you first want to modify the project locally before pushing, then remove --push from the above command.

Read more about creating projects and project templates

Run the jobs

With this example project you can run two jobs:

compare-multiple-models (used in the quick tour)
run-jupyter-notebook

You can use the AskAnna CLI or the web interface to run the jobs. In the quick tour we used the CLI to run the job. In your terminal, you can run the next commands to run both jobs:

askanna run compare-multiple-models
askanna run run-jupyter-notebook

In the web interface on the project page, you can click on the tab JOBS. Then:

click on a job to open the job page
on the job page, scroll down to the section Running the job
start a run with a simple click on the button Run this job

Review the run

On the job page, you find a tab RUNS listing the job runs. When you click on a run, the run page opens. Here you find a summary of the run and several tabs with more metadata.

Result

If you click the RESULT tab, you should see an image containing the performance benchmarks of the different models.

Artifact

On the tab ARTIFACT you find the content of the output directory. If you run the run-jupyter-notebook-job, here you can find two images and a Jupyter Notebook:

Metrics & Variables

The tab METRICS shows the metrics tracked. The same for the tab VARIABLES.

About the project code

In the code directory, you see four files and one directory. The directory output will be used to save the output of the script. The project files are:

demo-multiple-models.py

Python script that uses the breast cancer dataset from scikit-learn. The script runs multiple models and compares the performance & execution time. For more background information about what happens in the script, please read the article "Quickly Compare Multiple Models".

To track variables and metrics, we added the following to the script. First, we imported the tracking modules for AskAnna in line 8:

from askanna import track_metrics, track_metric, track_variable

The script set some specific scorings to calculate. We track these settings as variables of the run. Imagine you try different scorings. With tracking this variable, you can always filter on runs that used a specific scoring. For the full configuration, see line 37 till 41 in the script. A code snippet for the scoring variable:

scoring = [
    "accuracy",
    "precision_weighted",
    "recall_weighted",
    "f1_weighted",
    "roc_auc"
]
track_variable(name="scoring", value=scoring)

For each model, the script calculates a classification report. We track this report as a metric. To trace back which model was used, we also add a label with the model name to the tracked metric:

model_report = classification_report(y_test, y_pred,
                                     target_names=target_names,
                                     output_dict=True
                                    )
track_metrics(model_report, label={"model": name})

At the end of the script (line 93 and 99), we save the generated plots in the output directory:

plt.savefig('output/benchmark_models_performance.png')

As you can see, we added just some additional lines of code to the example described in the article. The extra lines of code make it possible to track information regarding the run. Of course, you don't have to do this, but it can be helpful to trace back what happened in a run. Or if you want to experiment with settings and later compare the settings used per run.

demo-multiple-models.ipynb

Functionally it does exactly the same as demo-multiple-models.py, but then in a Jupyter Notebook. In AskAnna you can view Jupyter Notebooks, and also run them. Run the job run-jupyter-notebook to see an example.

askanna.yml

This is the configuration file for AskAnna. Here we define the jobs. In the job definition, we describe what to run and which output to save. With the push-target, we refer to the project in AskAnna, so you can push code and start jobs via your terminal.

In the askanna.yml we specified two jobs:

compare-multiple-models: to run the Python code that runs and compares multiple models
run-jupyter-notebook: to run the Jupyter Notebook that also runs and compares multiple models...but then in a Notebook

The config of these jobs is similar. Let's use compare-multiple-models:

compare-multiple-models:
  job:
    - pip install -r requirements.txt
    - python demo-multiple-models.py
  output:
    result: output/benchmark_models_performance.png
    artifact:
      - output/

The job contains the commands you want to run. To run the code, we only need to install the Python requirements and run the Python code. In the output we specify the result of the run, and the artifact we want to save.

The Python script writes all relevant data to the output directory. We save this output directory as the artifact of the run. For this job, we see the plot that benchmark the models performance as the main result. So, we save this image as result.

requirements.txt

The Python packages that are needed to run the project. If you install the packages in a virtual environment, you can also run the script locally yourself.

Summary

With this example, you have seen how you can use a project template to set up a project in AskAnna. In this project two jobs are configured. In these jobs you read data, run multiple models and compare the performance of these models.

The output images of the Python script were saved, and you have seen how you can review the result in AskAnna. Also, you have seen how to track variables & metrics, and save output of the run.

We hope that you get an idea about the power of AskAnna via this demo and how this platform can help you and your team collaborate on data science projects. If you have questions or need help starting your first project, don't hesitate to contact us. We love to support you!