Use Vertex Pipelines to build an AutoML classification end-to-end workflowUse Vertex Pipelines to build an AutoML classification end-to-end workflowStaff Developer Advocate

The example notebook has the full component definition.

Sharing component specifications

When the component is compiled, we can also request that a yaml component specification be generated. We did this via the optional output_component_file="tables_eval_component.yaml" arg passed to the @component decorator.

The yaml format allows the component specification to be put under version control and shared with others.

Then, the component can be used in other pipelines by calling the kfp.components.load_component_from_url function (and other variants like load_component_from_file).

Running a pipeline job on Vertex Pipelines

Once a pipeline is defined, the next step is to compile it — which generates a json job spec file— then submit and run it on Vertex Pipelines. When you submit a pipeline job, you can specify values for pipeline input parameters, overriding their defaults.

The example notebook shows the details of how to do this.

Once a pipeline is running, you can view its details in the Cloud Console, including the pipeline run and lineage graphs shown above, as well as pipeline step logs and pipeline Artifact details.

You can also submit pipeline job specs via the Cloud Console UI, and the UI makes it easy to clone pipeline runs. The json pipeline specification file may also be put under version control and shared with others.

Leveraging Pipeline step caching to develop and debug

Vertex Pipelines supports step caching, and this helps with iterating on pipeline development— when you rerun a pipeline, if a component’s inputs have not changed, its cached execution results can be reused. If you run this pipeline more than once, you might notice this feature in action.

If you’re running the example, try making a small change to the example notebook cell that holds the custom component definition (the classif_model_eval_metrics function in the “Define a metrics eval custom component” section) by uncommenting this line:

 # metrics.metadata["model_type"] = "AutoML Tabular classification"

Then re-compile the component, recompile the pipeline without changing the DISPLAY_NAME value, and run it again. When you do so, you should see that Vertex Pipelines can leverage the cached executions for the upstream steps— as their inputs didn’t change— and only needs to re-execute from the changed component. The pipeline DAG for the new run should look as follows, with the ‘recycle’ icon on some of the steps indicating that their cached executions were used.