Train, select and serve a model

The title already summarizes where this example is all about. We will show how you can:

train multiple models
get run metrics to select the best model
use the best model to serve a prediction
work with input data for jobs

In this project, we train an AI model that you can use to predict the quality of a wine. Besides training the model, we also add a job to show how to serve a model using AskAnna. In this example, serving the model means you can predict the wine quality of a new wine using the trained model.

For training, we use an Elastic Net model. The dataset used is Wine Quality. The source code can be found on GitLab. Inspiration for this example came from MLFlow and UbiOps.

If you want to run the models yourself, you need:

an AskAnna account (you can sign up for free)
Python > 3.6
install the AskAnna CLI and login

First, we will give you a quick tour of running this example. After the quick tour, we will tell more about what happened and share more details regarding this project.

Quick tour

In your terminal, run the following to create a new project using the Demo train, select and serve model template:

askanna create --template https://gitlab.com/askanna/demo/demo-train-select-serve-model.git --push

You only have to specify the project name. We named it Train, select and serve. If you confirm, the AskAnna CLI will set up a new project.

Now you can run the job that starts different train runs and select the best model:

askanna run --data-file input/dummy-select-model.json select-best-model

To check the status of the run, go to https://beta.askanna.eu and select the project you just created. In this project, open the jobs tab. Click on of the job select-best-model. Next, click on the run and check the status.

If the job select-best-model is finished, you can also run the serve job. In your terminal, enter the command:

askanna run --data-file input/dummy-serve-single.json serve-model

Go back to the project in the browser, and open the run of the job serve-model. When the run finished, you can find the prediction for the quality of the wine in the RESULT tab.

The above is an abridged version of how you can train, select and serve models in AskAnna. Next, we will deep dive into the different steps to share more background about what just happened.

What happened in the quick tour?

Create new project

In your terminal, you run the AskAnna command to create a new project using one of our demo projects:

askanna create --template https://gitlab.com/askanna/demo/demo-train-select-serve-model.git --push

After you confirmed that you want to create the project, the next happened:

a new project in AskAnna was created
a new local directory for the project was created based on the project name
the project files were copied from the project template in the local directory
pushed a version of the project files to AskAnna

Read more about creating projects and project templates

Run jobs

In this project, you can run three different jobs. Here we show per job more about running them in AskAnna. In the section About the project code we tell more about the code and the askanna.yml configuration.

Select best model

In the quick tour, we showed how you run the job using the command line:

askanna run --data-file input/dummy-select-model.json select-best-model

Running this command will start a run with the JSON file as the data payload. If you want to try other parameters, refer to another JSON file using the same parameter structure.

You can also start the run using the web interface. When you created the project and pushed the code:

go to the project page
open the JOBS tab
click on the job select-best-model
scroll down and start a new run

You could add an optional JSON payload. In the next image we used the payload:

{
    "parameters": [
        {"alpha": 0.3, "l1_ratio": 0.1},
        {"alpha": 0.2, "l1_ratio": 0.7},
        {"alpha": 0.7, "l1_ratio": 0.4}
    ]
}

When you start a new run, AskAnna will spin up an environment that also starts the configured train jobs. It can take a couple of minutes before the run finishes. When all jobs are finished, you can see which model is selected by checking the metrics. See also review the select best model run.

Train model

If you want to train a model with parameters alpha and l1_ratio, you can use the next command as the base and change the paramater values:

askanna run --data '{"alpha": 0.7, "l1_ratio": 0.4}' train-model

You can use the JSON data also to start the job via the web interface.

{
    "alpha": 0.7,
    "l1_ratio": 0.4
}

Serve model

When at least one select-best-model run is finished, you can serve a result using the AskAnna CLI:

askanna run --data-file input/dummy-serve-single.json serve-model

Besides the option to start the run via the web interface, on the job page you can also find information on how to start this job using the AskAnna API. The API makes it easy to integrate serving a model with an existing application that wants to consume the result of a job.

Review runs

Train run

You can find the trained model on the RESULT tab in the web interface for each train run. Here you can also download the trained model. Also, you can review the metrics to see the performance of the model:

Or variables to check the parameters used:

Select best model run

On the run page of this job, you can find information about the selected best model in the metrics:

If you want to see the train jobs used to select the best model, you can find this information in the VARIABLES tab:

Serve model run

The most important information for the job serve-model is the result that contains per wine a prediction of the quality. You can review these predictions on the tab RESULT. Here you can also download the CSV file.

If you want to check the input used to make the prediction, you can find the payload on the INPUT tab. And to make it possible to reproduce results, you also need to know the model used. The SUUID of the run containing the trained model is tracked as a variable of the run.

About the project code

In the project code, you see two files and three directories. If you did not create the project, you can also view the source code on GitLab. Let's go through the code directory by directory, file by file.

data

This directory contains the Wine Quality dataset from P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J.Reis. We use the data to train the model.

input

The input directory has some dummy files that can be used to run the jobs select-best-model and serve-model. More about these jobs in the description of the file askanna.yml.

src

This directory contains the Python functions used to train, select and serve the model.

train.py

Train the model that can predict the quality of wine. For the model, we use ElasticNet from scikit-learn. The model will be trained on the wine-quality.csv from the data directory.

You can run train.py locally from the command line:

python src/train.py ${alpha} ${l1_ratio}

The arguments alpha and l1_ratio are optional. If you don't provide these arguments, the script will use default values 0.5.

The script's output is a model.pkl file containing the trained model to predict the wine quality.

The train.py script also tracks some additional meta-information:

Variables: model, alpha, l1_ration
Metrics: rmse, r2, mae

Track metrics in a local run

You also track metrics when you run the script locally. In the standard output you can find the location of the JSON files containing the tracked data. Read more about tracking metrics.

select-best-model.py

This Python script starts multiple train jobs in AskAnna. Because of the relation with the train jobs, you can only run this Python script via the command line if you are connected to the AskAnna platform.

The script checks if a payload is available. If not, it will use the dummy payload dummy-select-model.json from the input directory. For loading the payload data we use:

input_path = os.getenv("AA_PAYLOAD_PATH", "input/dummy-select-model.json")
with open(input_path, "r") as f:
    input_data = json.load(f)

parameters = input_data.get("parameters", None)

The first line checks if an environment variable AA_PAYLOAD_PATH is available. When you run a job with a payload, AskAnna creates this variable. If there is no payload, we use the default dummy dataset. The following lines load the JSON data from the file and extract the parameters from it.

Next, the script starts multiple train jobs using the parameters alpha and l1_ratio from the input data. These train jobs run parallel. All SUUIDs of the train runs are stored in a list.

Starting a run via a Python script is done using:

askanna.run.start(job_name="train-model", data=param)

When the train jobs are finished, the script will get the metadata, including metrics and variables from these train runs. The script finds the run with the lowest rmse and selects that run as the best model.

To get the meta data of multiple runs in Python, use:

runs = askanna.run.list(
    run_suuid_list=train_runs,
    include_metrics=True,
    include_variables=True
)

The above code will return a list with runs. Per run you can get the metric value for the RMSE via:

for run in runs:
    run_rmse = run.metrics.get("rmse").metric.value

After selecting the best model, the script tracks the SUUID and metrics of this train run.

serve.py

The Python script serve.py does the following. First, from the last run of the job select-best-model it gets the SUUID of the run that is selected as the best model:

runs = askanna.run.list(
    job_name="select-best-model",
    number_of_results=10,
    order_by="-created_at",
    include_metrics = True
)

for run in runs:
    if run.status == "finished":
        train_run_suuid = run.metrics.get("run_suuid").metric.value
        askanna.track_variable(name="model_suuid", value=train_run_suuid)
        print(f"Selected train run: {train_run_suuid}")
        break

Then it loads the result object of the selected train run using pickle:

model_data = askanna.result.get(train_run_suuid)
model = pickle.loads(model_data)

If a payload is available, the script loads the input data. If not, it will load the dummy payload dummy-serve-single.json from the input directory:

input_path = os.getenv("AA_PAYLOAD_PATH", "input/dummy-serve-single.json")
input_data = pd.read_json(input_path)

And it predicts the wine quality using the loaded model and data:

prediction = model.predict(input_data)

Then it saves the prediction to a CSV file:

pd.DataFrame(prediction).to_csv("prediction.csv", header=["MPG"], index_label="index")

This script also tracks:

Identifier of the used trained model
Count for input rows
Count for output rows

requirements.txt

The Python packages that are needed to run the project. If you install the packages in a virtual environment, you can also run the scripts locally.

askanna.yml

In the file askanna.yml we define the jobs to train, select and serve models.

job: train-model

The job train-model is defined as follow:

train-model:
  job:
    - pip install -r requirements.txt
    - python src/train.py ${alpha} ${l1_ratio}
  output:
    result: model.pkl

The job installs the requirements.txt and then runs the script train.py. Input for this job is optional. When running the job, you can add a JSON payload with the alpha and l1_ratio parameters:

{
  "alpha": 0.7,
  "l1_ratio": 0.4
}

If you provide the above JSON payload, AskAnna will make alpha and l1_ratio available as a payload variable.

The created model.pkl file is saved as the run's result.

job: select-best-model

This job runs the script that starts multiple train jobs, compares the metrics and selects the best model. The job tracks the best model as a metric. This metric is used by the serve.py script.

The job definition to run the Python script is:

select-best-model:
  job:
    - python src/select-best-model.py

job: serve-model

The job serve-model first installs the requirements.txt and then runs the script serve.py. As a result, it saves the prediction.csv that is created by the script. The definition for this job:

serve-model:
  job:
    - pip install -r requirements.txt
    - python src/serve.py
  output:
    result: prediction.csv

If you want to predict the wine quality, you can provide this job with a JSON payload using the template below. This template contains input for two wines, but you are flexible in the number of wines you want to predict at the same time.

{
    "fixed acidity": {
        "0": 7.0,
        "1": 6.3
    },
    "volatile acidity": {
        "0": 0.27,
        "1": 0.3
    },
    "citric acid": {
        "0": 0.36,
        "1": 0.34
    },
    "residual sugar": {
        "0": 20.7,
        "1": 1.6
    },
    "chlorides": {
        "0": 0.045,
        "1": 0.049
    },
    "free sulfur dioxide": {
        "0": 45.0,
        "1": 14.0
    },
    "total sulfur dioxide": {
        "0": 170.0,
        "1": 132.0
    },
    "density": {
        "0": 1.001,
        "1": 0.994
    },
    "pH": {
        "0": 3.0,
        "1": 3.3
    },
    "sulphates": {
        "0": 0.45,
        "1": 0.49
    },
    "alcohol": {
        "0": 8.8,
        "1": 9.5
    }
}

You can also find an example file dummy-serve-multiple.json in the input directory. This dataset contains 347 different wines.

Summary

With this example, you have seen how you can use a project template to set up a project in AskAnna. In this project, three jobs are configured. With these jobs, you can:

Train a model with different parameters
Select the best model by comparing a metric from several train runs
Serve a prediction using the best model
Optionally use JSON data as input for a run

You have seen how you can review each job and get the results of a trained model. And how to get the predictions of the wine quality. Also, you have seen how to track metrics and variables.

We hope that you get an idea about the power of AskAnna via this demo and how this platform can help you and your team to collaborate on data science projects. If you have questions or need help starting your first project, don't hesitate to contact us. We love to support you!