ravendbravendb4

How to Load<T> a RavenDB Document, constrained to a Collection, when not using default ID generation strategy


In RavenDB 4 (v4.0.3-patch-40031) I have two Document types: Apple and Orange. Both have similar, but also distinct, properties. I run into a bug in my code at runtime where sometimes an the ID of an Apple is provided, but an Orange is returned. Scary!

Diving into it, it somewhat makes sense. But I'm struggling with an appropriate solution.

Here goes. In RavenDB, I have stored a single Apple as a Document:

id: "078ff39b-da50-4405-9615-86b0d185ba17"
{
    "Name": "Elstar",
    "@metadata": {
        "@collection": "Apples",
        "Raven-Clr-Type": "FruitTest.Apple, FruitTest"
    }
}

Assume for the sake of this example that I have no Orange documents stored in the database. I would expect this test to succeed:

// arrange - use the ID of an apple, which does not exist in Orange collection
var id_of_apple = "078ff39b-da50-4405-9615-86b0d185ba17";

// act - load an Orange
var target = await _session.LoadAsync<Orange>("078ff39b-da50-4405-9615-86b0d185ba17");

// assert - should be null, because there is no Orange with that Id
target.Should().BeNull(because: "provided ID is not of an Orange but of an Apple");

... but it fails. What happens is that the Document ID exists, so the RavenDB loads the document. Not caring what type it is. And it attempts to map the properties automatically. I expected, or assumed incorrectly, that the Load type specifier would limit the lookup to that particular document collection. Instead, it grabs + maps it throughout the entire database, not constraining it to type <T>. So the behaviour is different from .Query<T>, which does constraint to collection.

Important to note is that I'm using guids as identity strategy, by setting the Id to string.Empty (conform the docs). I assume the default ID strategy, which is like entityname/1001, would not have this issue.

The docs on Loading Entities don't really mention if this is intentional, or not. It only says: "download documents from a database and convert them to entities.".

However, for reasons, I do want to constrain the Load operation to a single collection. Or, better put, as efficiently as possible load a document by ID, from a specific collection. And if it does not exist, return null.

AFAIK, there are two options to achieve this:

  1. Use the more expensive .Query<T>.Where(x => x.Id == id), instead of .Load<T>(id)
  2. Do the .Load<T>(id) first and then check (~somehow, see bottom) if it is part of collection T

My problem can be summarized in two questions:

  1. Is there another, more performant or stable way, than the two options mentioned above?
  2. If there is not, out of the two options - which is recommended in terms of performance and stability?

Especially for the second question, it is very hard to correctly measure this properly. As for stability, e.g. not having side effects, that is something that I guess someone with more in-depth knowledge or experience of the RavenDB internals might shed some light on.

N.B. The question assumes that the explained behaviour is intentional and not a RavenDB bug.

~Somehow would be:

public async Task<T> Get(string id)
{
    var instance = await _session.LoadAsync<T>(id);
    if (instance == null) return null;

    // the "somehow" check for collection
    var expectedTypeName = string.Concat(typeof(T).Name, "s");
    var actualTypeName = _session.Advanced.GetMetadataFor(instance)[Constants.Documents.Metadata.Collection].ToString();
    if (actualTypeName != expectedTypeName)
    {
        // Edge case: Apple != Orange
        return null;
    }

    return instance;
}

How to reproduce

UPDATE 2018/04/19 - Added this reproducible sample after helpful comments (thanks for that).

Models

public interface IFruit
{
    string Id { get; set; }
    string Name { get; set; }
}

public class Apple : IFruit
{
    public string Id { get; set; }
    public string Name { get; set; }
}

public class Orange : IFruit
{
    public string Id { get; set; }
    public string Name { get; set; }
}

Tests
E.g. throws InvalidCastException in same session (works), but in second it doesn't.

public class UnitTest1
{
    [Fact]
    public async Task SameSession_Works_And_Throws_InvalidCastException()
    {
        var store = new DocumentStore()
        {
            Urls = new[] {"http://192.168.99.100:32772"},
            Database = "fruit"
        }.Initialize();

        using (var session = store.OpenAsyncSession())
        {
            var apple = new Apple
            {
                Id = Guid.NewGuid().ToString(),
                Name = "Elstar"
            };

            await session.StoreAsync(apple);
            await session.SaveChangesAsync();

            await Assert.ThrowsAsync<InvalidCastException>(() => session.LoadAsync<Orange>(apple.Id));
        }
    }

    [Fact]
    public async Task Different_Session_Fails()
    {
        var store = new DocumentStore()
        {
            Urls = new[] {"http://192.168.99.100:32772"},
            Database = "fruit"
        }.Initialize();

        using (var session = store.OpenAsyncSession())
        {
            var appleId = "ca5d9fd0-475b-41de-a1ab-57bb1e3ce018";

            // this *should* break, because... it's an apple
            // ... but it doesn't - it returns an ORANGE
            var orange = await session.LoadAsync<Orange>(appleId);

            await Assert.ThrowsAsync<InvalidCastException>(() => session.LoadAsync<Orange>(appleId));
        }
    }
}

Solution

  • well, i found what should be the problem but i don't understand why.

    you said:

    by setting the Id to string.Empty

    but in the example you wrote Id = Guid.NewGuid().ToString(); in my tests i explicitly assign string.Empty and i get the cast exception, when i assigned the generated Guid to the entity (like you) i reproduced your situations. Probably ravendb makes some different considerations in these two cases that creates this behavior, i don't know if it could be considered a bug.

    Then use string.Empty