nucliomlrun

how to use get_offline_features() in the mlrun.feature_store?


I am trying to get a feature from an existing feature store. In the documentation https://docs.mlrun.org/en/latest/api/mlrun.feature_store.html, it says you can either pass a feature vector uri or FeatureVector object to the mlrun.feature_store.get_offline_features().
What is the uri for a feature store?
Where can I find an example?


Solution

  • In MLRun, a Feature Set is a group of features that are ingested together. A Feature Vector is a selection of features from Feature Sets (a few columns here, a few columns there, etc). This is great for joining several data sources together using a common entity/key.

    A full example of creating and querying a feature set from MLRun can be found below:

    import mlrun.feature_store as fs
    from mlrun import set_environment
    import pandas as pd
    
    # Set project - for retrieving features later
    set_environment(project="my-project")
    
    # Feature set to ingest
    df = pd.DataFrame({
        "key" : [0, 1, 2, 3],
        "value" : ["A", "B", "C", "D"]
    })
    
    # Create feature set with desired name and entity/key
    fset = fs.FeatureSet("my-feature-set", entities=[fs.Entity("key")])
    
    # Ingest
    fs.ingest(featureset=fset, source=df)
    
    # Create feature vector (allows for joining multiple feature sets together)
    features = ["my-feature-set.*"] # can also do ["my-feature-set.A", my-feature-set.B", ...]
    vector = fs.FeatureVector("my-feature-vector", features)
    
    # Retrieve offline features (vector object)
    fs.get_offline_features(vector)
    
    # Retrieve offline features (project + name)
    fs.get_offline_features("my-project/my-feature-vector")
    
    # Retrieve offline features as pandas dataframe
    fs.get_offline_features("my-project/my-feature-vector").to_dataframe()
    

    You can find more feature store examples in the documentation here: https://docs.mlrun.org/en/latest/feature-store/feature-store.html