Point-in-time lookups to fetch training data
For model training, you need a training data set that contains examples of your prediction task. These examples consist of instances that include their features and labels. For example, an instance might be a home and you want to determine its market value. Its features might include its location, age, and the prices of nearby homes that were sold. A label is an answer for the prediction task, such as the home eventually sold for $100K.
Because each label is an observation at a specific point in time, you need to fetch feature values that correspond to that point in time when the observation was made, like what were the prices of nearby homes when a particular home was sold. As labels and feature values are collected over time, those feature values change. Hence, when you fetch data from a feature store for model training, it performs point-in-time lookups to fetch the feature values corresponding to the time of each label. What is noteworthy is that the Feature Store performs these point-in-time lookups efficiently, even when the training dataset has tens of millions of labels.
In the following example, we want to retrieve feature values for two training instances with labels L1 and L2. The two labels are observed at time T1 and T2, respectively. Imagine freezing the state of the feature values at those timestamps. Hence, for the point-in-time lookup at T1, Vertex Feature Store returns the latest feature values up to time T1 for Feature 1, Feature 2, and Feature 3 and does not leak any values past T1. As time progresses, the feature values change and, consequently, so does the label. So, at T2, Vertex Feature Store returns different feature values for that point in time.