The example notebook has the full component definition.
Sharing component specifications
When the component is compiled, we can also request that a yaml component specification be generated. We did this via the optional
output_component_file="tables_eval_component.yaml" arg passed to the
The yaml format allows the component specification to be put under version control and shared with others.
Then, the component can be used in other pipelines by calling the kfp.components.load_component_from_url function (and other variants like
Running a pipeline job on Vertex Pipelines
Once a pipeline is defined, the next step is to compile it — which generates a json job spec file— then submit and run it on Vertex Pipelines. When you submit a pipeline job, you can specify values for pipeline input parameters, overriding their defaults.
The example notebook shows the details of how to do this.
Once a pipeline is running, you can view its details in the Cloud Console, including the pipeline run and lineage graphs shown above, as well as pipeline step logs and pipeline Artifact details.
You can also submit pipeline job specs via the Cloud Console UI, and the UI makes it easy to clone pipeline runs. The json pipeline specification file may also be put under version control and shared with others.
Leveraging Pipeline step caching to develop and debug
Vertex Pipelines supports step caching, and this helps with iterating on pipeline development— when you rerun a pipeline, if a component’s inputs have not changed, its cached execution results can be reused. If you run this pipeline more than once, you might notice this feature in action.
If you’re running the example, try making a small change to the example notebook cell that holds the custom component definition (the classif_model_eval_metrics function in the “Define a metrics eval custom component” section) by uncommenting this line:
# metrics.metadata["model_type"] = "AutoML Tabular classification"
Then re-compile the component, recompile the pipeline without changing the
DISPLAY_NAME value, and run it again. When you do so, you should see that Vertex Pipelines can leverage the cached executions for the upstream steps— as their inputs didn’t change— and only needs to re-execute from the changed component. The pipeline DAG for the new run should look as follows, with the ‘recycle’ icon on some of the steps indicating that their cached executions were used.