3.2 Offline Scoring Pipelines

We just saw how to run offline scoring in a notebook. It works, but just like in the real-time inference case, notebooks are not a good choice for long-running processes and production-like scenarios. In the case of offline scoring, it’s best to define a pipeline as a series of tasks and submit that pipeline to a backend system that runs and orchestrates the pipeline tasks on the cluster. Let’s have a look at how this works with RHOAI and Data Science Pipelines.

  • In your workbench, double click on the offline-scoring.pipeline file. You can now see a graphical representation of the offline scoring pipeline within the Elyra Pipeline editor.

Elyra is an extension of JupyterLab, which offers a graphical editor for pipelines that can be run on Data Science Pipelines.
  • If you compare this pipeline with the previous offline scoring notebook, you will find the same steps. In fact, the pipeline includes the very same code since the pipeline tasks represent the underlying Python modules of the notebook steps. While the notebook defines a purely linear sequence of steps, the pipeline allows to parallelize tasks such as ingesting the raw data and downloading the trained model, both of which are independent of each other.

  • Submit the pipeline by selecting Run Pipeline (second button in top toolbar). Click OK.

  • Revisit the RHOAI Dashboard and go to the Data Science Pipelines section on the left side. Here you can track all submitted pipelines including their versions.

alt text
  • To monitor your pipeline, select Runs in the left menu and look at the Triggered tab. Each run represents an execution of a given pipeline. Find the run that you submitted and click on its name.

  • You can now see a graphical representation of your pipeline which updates in real-time as the tasks are completed. You can select each task and monitor its progress. Once the final task is finished, you can verify its completion by finding the timestamped results table in your S3 bucket.

Congratulations! You have run your first offline scoring pipeline in RHOAI, which concludes this section of the lab.