Skip to content

Deploy Model to KFServing

Note

If you just want to try Chassis, you can use the test drive, which will deploy Chassis and KFServing for you so you can use Chassis to containerize an MLflow model, push it to your Docker Hub account, and then publish it to the KFServing instance running inside the test drive, in the kfserving.ipynb sample notebook:

💻 Launch Test Drive 💻

Install KFServing in minikube

You just need to clone the KFServing repository and run the quick_install.sh script.

# Install KFServing. Versions above 0.5.1 doesn't work, so force it.
git clone --single-branch --branch v0.5.1 git@github.com:kubeflow/kfserving.git
./kfserving/hack/quick_install.sh

Required variables

There are some environment variables that must be defined for KFServing to work:

  • INTERFACE: kfserving
  • HTTP_PORT: port where kfserving will be running
  • PROTOCOL: it can be v1 or v2
  • MODEL_NAME: a name for the model must be defined

Deploy the model

Assuming the image generated by ChassisML has been uploaded to a repository called carmilso/chassisml-sklearn-demo:latest, just deploy the file that defines the InferenceService for the protocol v1 of KFServing

apiVersion: "serving.kubeflow.org/v1beta1"
kind: "InferenceService"
metadata:
  name: chassisml-sklearn-demo
spec:
  predictor:
    containers:
    - image: carmilso/chassisml-sklearn-demo:latest
      name: chassisml-sklearn-demo-container
      env:
        - name: INTERFACE
          value: kfserving
        - name: HTTP_PORT
          value: "8080"
        - name: PROTOCOL
          value: v1
        - name: MODEL_NAME
          value: digits
      ports:
        - containerPort: 8080
          protocol: TCP

In this case, the variable MODEL_NAME should not be necessary since it's defined when creating the image.

kubectl apply -f custom_v1.yaml

This should output a success message.

Define required variables to query the pod

This is needed in order to be able to communicate with the deployed image.

The SERVICE_NAME must match the name defined in the metadata.name of the InferenceService created above.

The MODEL_NAME must match the name of your model. It can be defined by the data scientist when making the request against Chassis service or overwritten in the InferenceService as defined above.

export INGRESS_HOST=$(minikube ip)
export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].nodePort}')
export SERVICE_NAME=chassisml-sklearn-demo
export MODEL_NAME=digits
export SERVICE_HOSTNAME=$(kubectl get inferenceservice ${SERVICE_NAME} -o jsonpath='{.status.url}' | cut -d "/" -f 3)

Query the model

Now you can just make a request to predict some data. Take into account that you must download inputsv1.json before making the request.

curl -H "Host: ${SERVICE_HOSTNAME}" "http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/${MODEL_NAME}:predict" -d@inputsv1.json | jq

The output should be similar to this:

{
  "predictions": [
    {
      "data": {
        "drift": null,
        "explanation": null,
        "result": {
          "classPredictions": [
            {
              "class": "4",
              "score": "1"
            }
          ]
        }
      }
    },
    {
      "data": {
        "drift": null,
        "explanation": null,
        "result": {
          "classPredictions": [
            {
              "class": "8",
              "score": "1"
            }
          ]
        }
      }
    },
    {
      "data": {
        "drift": null,
        "explanation": null,
        "result": {
          "classPredictions": [
            {
              "class": "8",
              "score": "1"
            }
          ]
        }
      }
    },
    {
      "data": {
        "drift": null,
        "explanation": null,
        "result": {
          "classPredictions": [
            {
              "class": "4",
              "score": "1"
            }
          ]
        }
      }
    },
    {
      "data": {
        "drift": null,
        "explanation": null,
        "result": {
          "classPredictions": [
            {
              "class": "8",
              "score": "1"
            }
          ]
        }
      }
    }
  ]
}

In this case, the data was prepared for the protocol v1, but we can deploy the image using the protocol v2 and make the request using the data for v2.

Deploy the model locally

The model can also be deployed locally:

docker run --rm -p 8080:8080 \
-e INTERFACE=kfserving \
-e HTTP_PORT=8080 \
-e PROTOCOL=v2 \
-e MODEL_NAME=digits \
carmilso/chassisml-sklearn-demo:latest

So we can query it this way. Take into account that you must download inputsv2.json before making the request:

curl localhost:8080/v2/models/digits/infer -d@inputsv2.json