Streamline Production ML With BentoML And Kubeflow

Mar 24, 2023 • Written By Eric Liu

Introduction

The BentoML team is thrilled to announce the integration between BentoML and Kubeflow in the latest Kubeflow 1.7 releases. This represents the initial stride towards a streamlined machine-learning solution at scale.

“As a longtime user of Kubeflow and a very satisfied user of BentoML, this integration makes it more exciting for us to upgrade to Kubeflow 1.7! I believe our Data Scientist colleagues would appreciate being able to build, package, and deploy models easily with minimum hassle!"

- Benjamin Tan, Machine Learning Engineer, DKatalis

Kubeflow has emerged as a comprehensive and adaptable ML platform for Kubernetes, with mature components to address the critical challenges in developing and training models. BentoML allows developers to build AI applications once and deploy them on various platforms without needing to modify any code. This unique ability has made it a preferred tool in the industry.

Benefits of the integration

With the release of Kubeflow 1.7, BentoML now has native integration with Kubeflow, allowing developers to leverage BentoML's cloud-native components. Prior, developers were limited to exporting and deploying Bento as a single container. With this integration, models trained in Kubeflow can easily be packaged, containerized, and deployed to a Kubernetes cluster as microservices. This architecture enables the individual models to run in their own pods, utilizing the most optimal hardware for their respective tasks and enabling independent scaling.

How the Integration Works

To showcase the integration's capabilities, By following along with this tutorial, you'll build a fraud detection service using the Kaggle IEEE-CIS Fraud Detection dataset. The tutorial covers everything from training the models in Kubeflow notebooks to packaging and deploying the resulting BentoML service to a Kubernetes cluster.

This example can also be run from the notebook.ipynb is included in this directory.

What you'll need

This guide assumes that Kubeflow is already installed in the Kubernetes cluster. See Kubeflow Manifests for installation instructions.

Install BentoML cloud-native components and custom resource definitions.

[code]

kustomize build bentoml-yatai-stack/default | kubectl 
  apply -n kubeflow --server-side -f -

Install the required packages to run this example.

git clone --depth 1 https://github.com/bentoml/BentoML
cd BentoML/examples/kubeflow
pip install -r requirements.txt

Download Kaggle Dataset

Set Kaggle user credentials for API access. Accepting the rules of the competition is required for downloading the dataset.

export KAGGLE_USERNAME=
export KAGGLE_KEY=

Download Kaggle dataset.

kaggle competitions download -c ieee-fraud-detection
rm -rf ./data/
unzip -d ./data/ ieee-fraud-detection.zip && rm ieee-fraud-detection.zip

Train Models

In this demonstration, we'll train three fraud detection models using the Kaggle IEEE-CIS Fraud Detection dataset. To showcase saving and serving multiple models with Kubeflow and BentoML, we'll split the dataset into three equal-sized chunks and use each chunk to train a separate model. While this approach has no practical benefits, it will help illustrate how to save and serve multiple models with Kubeflow and BentoML.

import pandas as pd

df_transactions = pd.read_csv("./data/train_transaction.csv")
X = df_transactions.drop(columns=["isFraud"])
y = df_transactions.isFraud

Define the preprocessor.

from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OrdinalEncoder

numeric_features = df_transactions.select_dtypes(include="float64").columns
categorical_features = df_transactions.select_dtypes(include="object").columns

preprocessor = ColumnTransformer(
    transformers=[
        ("num", SimpleImputer(strategy="median"), numeric_features),
        (
            "cat",
            OrdinalEncoder(handle_unknown="use_encoded_value", unknown_value=-1),
            categorical_features,
        ),
    ],
    verbose_feature_names_out=False,
    remainder="passthrough",
)

X = preprocessor.fit_transform(X)

Define our training function with the number of boosting rounds and maximum depths.

import xgboost as xgb

def train(n_estimators, max_depth):
    return xgb.XGBClassifier(
        tree_method="hist",
        n_estimators=n_estimators,
        max_depth=max_depth,
        eval_metric="aucpr",
        objective="binary:logistic",
        enable_categorical=True,
    ).fit(X_train, y_train, eval_set=[(X_test, y_test)])

We will divide the training data into three equal-sized chunks and treat them as independent data sets. Based on these data sets, we will train three separate fraud detection models. The trained model will be saved to the local model store using BentoML model saving API.

import bentoml

from sklearn.model_selection import train_test_split

CHUNKS = 3
CHUNK_SIZE = len(X) // CHUNKS

for i in range(CHUNKS):
    START = i * CHUNK_SIZE
    END = (i + 1) * CHUNK_SIZE
    X_train, X_test, y_train, y_test = train_test_split(X[START:END], y[START:END])

    name = f"ieee-fraud-detection-{i}"
    model = train(10, 5)
    score = model.score(X_test, y_test)
    print(f"Successfully trained model {name} with score {score}.")

    bentoml.xgboost.save_model(
        name,
        model,
        signatures={
            "predict_proba": {"batchable": True},
        },
        custom_objects={"preprocessor": preprocessor},
    )
    print(f"Successfully saved model {name} to the local model store.")

Saved models can be loaded back into the memory and debugged in the notebook.

import bentoml
import pandas as pd
import numpy as np

model_ref = bentoml.xgboost.get("ieee-fraud-detection-0:latest")
model_runner = model_ref.to_runner()
model_runner.init_local()
model_preprocessor = model_ref.custom_objects["preprocessor"]

test_transactions = pd.read_csv("./data/test_transaction.csv")[0:500]
test_transactions = model_preprocessor.transform(test_transactions)
result = model_runner.predict_proba.run(test_transactions)
np.argmax(result, axis=1)

Define Service API

After the models are built and scored, let's create the service definition. You can find the service definition in the service.py module in this example. Let's break down the service.py module and explain what each section does.
First, we will create a list of preprocessors and runners from the three models we saved earlier. Runners are abstractions of the model inferences that can be scaled independently. See Using Runners for more details.

fraud_detection_preprocessors = []
fraud_detection_runners = []

for model_name in ["ieee-fraud-detection-0", "ieee-fraud-detection-1", "ieee-fraud-detection-2"]:
    model_ref = bentoml.xgboost.get(model_name)
    fraud_detection_preprocessors.append(model_ref.custom_objects["preprocessor"])
    fraud_detection_runners.append(model_ref.to_runner())

Next, we will create a service with the list of runners passed in.

svc = bentoml.Service(
  "fraud_detection", runners=fraud_detection_runners
)

Finally, we will create the API function is_fraud. We'll use the @api decorator to declare that the function is an API and specify the input and output types as pandas.DataFrame and JSON, respectively. The function is defined as async so that the inference calls to the runners can happen simultaneously without waiting for the results to return before calling the next runner. The inner function _is_fraud defines the model inference logic for each runner. All runners are called simultaneously through the asyncio.gather function, and the results are aggregated into a list. The function will return True if any of the models return True.

@svc.api(input=PandasDataFrame.from_sample(sample_input), outp1t=JSON())
async def is_fraud(input_df: pd.DataFrame):
    input_df = input_df.astype(sample_input.dtypes)

    async def _is_fraud(preprocessor, runner, input_df):
        input_features = preprocessor.transform(input_df)
        results = await runner.predict_proba.async_run(input_features)
        predictions = np.argmax(results, axis=1)  # 0 is not fraud, 1 is fraud
        return bool(predictions[0])

    # Simultaeously run all models
    results = await asyncio.gather(
        *[
            _is_fraud(p, r, input_df)
            for p, r in zip(fraud_detection_preprocessors, fraud_detection_runners)
        ]
    )

    # Return fraud if at

For more about service definitions, please see Service and APIs.

Build Service

Building the service and models into a bento allows it to be distributed among collaborators, containerized into an OCI image, and deployed in the Kubernetes cluster. To build a service into a bento, we first need to define the bentofile.yaml file. See Building Bentos for more options.

service: "service:svc"
include:
- "service.py"
- "sample.py"
python:
  requirements_txt: ./requirements.txt

Running the following command will build the service into a bento and store it to the local bento store.

bentoml build

Serve Bento

Serving the bento will bring up a service endpoint in HTTP or gRPC for the service API we defined. Use --help to see more serving options.

bentoml serve

Deploy to Kubernetes Cluster

BentoML offers three custom resource definitions (CRDs) in the Kubernetes cluster.

• BentoRequest - Describes the metadata needed for building the container image of the Bento, such as the download URL. Created by the user.

• Bento - Describes the metadata for the Bento such as the address of the image and the runners. Created by users or by the yatai-image-builder operator for reconsiliating BentoRequest resources.

• BentoDeployment - Describes the metadata of the deployment such as resources and autoscaling behaviors. Reconciled by the yatai-deployment operator to create Kubernetes deployments of API Servers and Runners.

Next, we will demonstrate two ways of deployment.

• Deploying using a BentoRequest resource by providing a Bento

• Deploying Using a Bento resource by providing a pre-built container image from a Bento

Deploy with BentoRequest CRD

In this workflow, we will export the Bento to a remote storage. We will then leverage the yatai-image-build operator to containerize the Bento and yatai-deployment operator deploy the containerized Bento image.

Push the Bento built and saved in the local Bento store to a cloud storage such as AWS S3.

bentoml export fraud_detection:o5smnagbncigycvj s3://your_bucket/fraud_detection.bento

Apply the BentoRequest and BentoDeployment resources as defined in deployment_from_bentorequest.yaml included in this example.

kubectl apply -f deployment_from_bentorequest.yaml

Once the resources are created, the yatai-image-builder operator will reconcile the BentoRequest resource and spawn a pod to build the container image from the provided Bento defined in the resource. The yatai-image-builder operator will push the built image to the container registry specified during the installation and create a Bento resource with the same name. At the same time, the yatai-deployment operator will reconcile the BentoDeployment resource with the provided name and create Kubernetes deployments of API Servers and Runners from the container image specified in the Bento resource.

Deploy with Bento CRD

In this workflow, we will build and push the container image from the Bento. We will then leverage the yatai-deployment operator to deploy the containerized Bento image.

Containerize the image through containerize sub-command.

bentoml containerize fraud_detection:o5smnagbncigycvj -t your-username/fraud_detection:o5smnagbncigycvj

Push the containerized Bento image to a remote repository of your choice.

docker push your-username/fraud_detection:o5smnagbncigycvj

Apply the Bento and BentoDeployment resources as defined in deployment_from_bento.yaml file included in this example.

kubectl apply -f deployment_from_bento.yaml

Once the resources are created, the yatai-deployment operator will reconcile the BentoDeployment resource with the provided name and create Kubernetes deployments of API Servers and Runners from the container image specified in the Bento resource.

Verify Deployment

Verify the deployment of API Servers and Runners. Note that API server and runners are run in separate pods and created in separate deployments that can be scaled independently.

kubectl -n kubeflow get pods -l yatai.ai/bento-deployment=fraud-detection

NAME                                        READY   STATUS    RESTARTS   AGE
fraud-detection-67f84686c4-9zzdz            4/4     Running   0          10s
fraud-detection-runner-0-86dc8b5c57-q4c9f   3/3     Running   0          10s
fraud-detection-runner-1-846bdfcf56-c5g6m   3/3     Running   0          10s
fraud-detection-runner-2-6d48794b7-xws4j    3/3     Running   0          10s

Port forward the Fraud Detection service to test locally. You should be able to visit the Swagger page of the service by requesting http://0.0.0.0:8080 while port forwarding.

kubectl -n kubeflow port-forward svc/fraud-detection 8080:3000
  --address 0.0.0.0

Delete the Bento and BentoDeployment resources.

kubectl delete -f deployment.yaml

Let’s review

The 1.7 release is just the beginning of an exciting collaboration between BentoML and Kubeflow. The integration allows developers to easily deploy BentoML services on Kubernetes for optimized hardware utilization and independent scaling. Future plans include integration with Kubeflow Pipeline for more deployment options. Whether you're new to MLOps or a current user of BentoML or Kubeflow, we invite you to try out the integration and provide feedback for further improvements.

If you enjoyed this article, please show your support by ⭐ our projects on GitHub (BentoML, Kubeflow) and joining both the Kubeflow and the BentoML Slack Community. Searching for a great place to run your ML services? Check out Bento Cloud for the easiest and fastest way to deploy your bento.

Join our global Community

Billions of predictions per day 3000+ community members Use by 1000+ organizations

Start a free trial

Get in touch

Subscribe our newsletter

Use Cases

Open Source