Monitoring feature attributions: How Google saved one of the largest ML services in troubleMonitoring feature attributions: How Google saved one of the largest ML services in troubleResearch ScientistDeveloper Advocate, Google CloudSoftware Engineer

Monitoring without the ground truth

So what happened? F1 was a feature generated by a separate team. On further investigation, it was found that a certain infrastructure migration caused F1 to significantly lose coverage and consequently its attribution across examples.

The easiest way to detect this kind of model failure is to track one or more model quality metrics (e.g., accuracy), and alert the developer if the metric drops below a threshold. But unfortunately, most model quality metrics rely on comparing the model’s prediction to “ground truth” labels which may not be available in real-time. For instance, in tasks such as fraud detection, credit lending or estimating conversion rates for online ads, the groundtruth for a prediction may lag by days, weeks or months. 

In the absence of the ground truth, ML engineers at Google rely on proxy measures of model quality degradations, derived using model inputs and predictions as two available observables. There are two main measures:

  • Feature Distribution monitoring: detecting the skew and drift of feature distribution 
  • Feature Attribution monitoring: detecting the skew and drift of feature importance score

In the recent post Monitor models for training-serving skew with Vertex AI, we explored the first measure, Feature Distribution monitoring, for detecting any skew and anomalies happening in the feature itself at the serving time (in comparison to training or some other baseline). In the rest of this post, we discuss the second measure, Feature Attribution monitoring, which has also been successfully used to monitor large ML services at Google.

Feature Attributions monitoring

Feature Attributions is a family of methods for explaining a model’s predictions on a given input by attributing it to features of the individual inputs. The attributions are proportional to the contribution of the feature to the prediction. They are typically signed, indicating whether a feature helps push the prediction up or down. Finally, attributions across all features are required to add up to the model’s prediction score.

Leave a Comment