MVVM vs Data Virtualization

I have a TreeView that is bound to a tree of ViewModel instances. The problem is that the model data is coming from a slow repository so I need data virtualization. The list of sub ViewModel below a node should only be loaded when the parent tree view node is expanded and it should be unloaded when it is collapsed.

How can this be implemented while adhering to MVVM principles? How can the ViewModel get notified that it needs to load or unload subnodes? That is when a node was expanded or collapsed without knowning anything about the treeview's existence?

Something makes me feel that data virtualization doesn't go well with MVVM. Since in data virtualization the ViewModel generally needs to know quite a lot about the current state of the UI and aslo needs to control quite a lot of aspects in the UI. Take another example:

A listview with data virtualization. The ViewModel would need to control the length of the ListView's scrollthumb since it depends on the number of items in the Model. Also when the user scrolls, the ViewModel would need to known to what position he scrolled to and how big the listview is (how many items currently fit in) to be able to load the right portion of Model data form the repository.

Solution

The easy way to solve this is with a "virtualizing collection" implementation that maintains weak references to its items along with an algorithm for fetching / creating items. The code for this collection is rather complex, what with all the interfaces required and the data structures to efficiently track ranges of loaded data but here is a partial API for a class that virtualized based on indexes:

public class VirtualizingCollection<T>
  : IList<T>, ICollection<T>, IEnumerable<T>,
    IList, ICollection, IEnumerable,
    INotifyPropertyChanged, INotifyCollectionChanged
{
  protected abstract void FetchItems(int requestedIndex, int gapStartIndex, int gapEndIndex);
  protected void RecordFetchedItems(int startIndex, int count, IEnumerable items) ...
  protected void RecordInsertOrDelete(int startIndex, int countPlusOrMinus) ...
  protected virtual void OnCollectionChanged(CollectionChangedEventArgs e) ...
  protected virtual void Cleanup();
}

The internal data structure here is a balanced tree of data ranges, with each data range containing a start index and an array of weak references.

This class is designed to be subclassed to provide the logic for actually loading the data. Here is how it works:

In the subclass constructor, RecordInsertOrDelete is called to set the initial collection size
When an item is accessed using IList/ICollection/IEnumerable, the tree is used to find the data item. If found in the tree and there is a weak reference and the weak reference still points to a life object, that object is returned, otherwise it is loaded and returned.
When an item needs to be loaded, an index range is computed by searching forward and back though the data structure for the next/previous already-loaded item, then the abstract FetchItems is called so the subclass can load the items.
In the subclass FetchItems implementation, items are fetched and then RecordFetchedItems is called to update the tree of ranges with the new items. Some complexity here is required to merge adjacent nodes to prevent too much tree growth.
When the subclass gets a notification of external data changes it can call RecordInsertOrDelete to update the index tracking. This updates start indexes. For an insert, this may also split a range, and for a delete this may require one or more ranges to be recreated smaller. This same algorithm is used internally when items are added / deleted through the IList and IList<T> interfaces.
The Cleanup method is called in the background to incrementally search the tree of ranges for WeakReferences and whole ranges that can be disposed, and also for ranges that are too sparse (eg only one WeakReference in a range with 1000 slots)

Note that FetchItems is passed a range of unloaded items so it can use a heuristic to load multiple items at once. A simple such heuristic would be loading the next 100 items or up to the end of the current gap, whichever comes first.

With a VirtualizingCollection, WPF's built-in virtualization will cause data loading at the appropriate times for ListBox, ComboBox, etc, as long as you are using eg. VirtualizingStackPanel instead of StackPanel.

For a TreeView, one more step is required: In the HierarchicalDataTemplate set a MultiBinding for ItemsSource that binds to your real ItemsSource and also to IsExpanded on the templated parent. The converter for the MultiBinding returns its first value (the ItemsSource) if the second value (the IsExpanded value) is true, otherwise it returns null. What this does is makes it so that when you collapse a node in the TreeView all references to the collection contents are immediately dropped so that VirtualizingCollection can clean them up.

Note that virtualization need not be done based on indexes. In a tree scenario, it can be all-or-nothing, and in a list scenario an estimated count can be used and ranges filled in as necessary using a "start key" / "end key" mechanism. This is useful when the underlying data may change and the virtualized view should track its current location based on which key is at the top of the screen.