RavenDB and the Repository pattern
I recently had a short email exchange with Ayende Rahien and he suggested something I hadn't considered before: not using a Repository pattern.
Background
Allow me to elaborate. Before trying RavenDB, I was frequently dealing with data APIs that necessitated use of the Repository pattern (or at least some pattern of abstraction):
- Legacy APIs: legacy APIs often expose data in a format very different to your domain model. Translation of legacy data model to domain model should occur in one place and one place only
- Web services: a lot commercial web services are an absolute mess and you want to avoid exposing them to the rest of the application as much as possible
- Sitecore Data API: majority of CRUD and field access operations involve magic strings and you don't want those dotted all over your application
Reasons for using layers of abstraction boil down to:
- Impedance mismatch between you data source and your domain model
- Badly designed APIs
- Aversion to magic strings
- Other reasons exist, but these three are enough to demonstrate my point
I was so used to dealing with these issues that trying to shoehorn RavenDB into a repository just seemed natural. I didn't give it much thought until I spoke to Ayende.
Why you should not use Repository pattern with RavenDB
I wanted to know how this approach would work in real-life. I branched my website's local Git repository, removed all repositories and all the infrastructure that supported it. Now I get what Ayende meant:
- There is no impedance mismatch: when you ask RavenDB to store some data — you give it an instance of your domain model. When read that data back out, it is returned to you as an instance of your domain model. There are no properties to map, it works automatically
- Well-designed API: considering the underlying complexity, probably one of the best I have worked with
- No magic strings: earlier builds of RavenDB did use strings in some places, but that is no longer the case. Current API uses LINQ and lambda expressions, which makes it very refactor-friendly
Here is an example. This was the old API:
public class BlogPost
{
public int Id { get; set; }
public string Title { get; set; }
public string Content { get; set; }
public DateTimeOffset PublishedOn { get; set; }
}
public interface IRepository<T>
{
T GetById(int id);
}
public interface IBlogPostRepository : IRepository<BlogPost>
{
IList<BlogPost> GetRecentBlogPosts();
}
public abstract class Repository<T>
{
protected readonly IDocumentSession Session;
public Repository(IDocumentSession session)
{
Session = session;
}
public virtual T GetById(int id)
{
return Session.Load<T>(id);
}
}
public class BlogPostRepository : Repository<BlogPost>, IBlogPostRepository
{
public BlogPostRepository(IDocumentSession session) : base(session)
{
}
public IList<BlogPost> GetRecentBlogPosts()
{
var blogPosts = Session.Query<BlogPost>
.OrderByDescending(bp => bp.PublishedOn)
.ToList();
return blogPosts;
}
}
public class BlogController : Controller
{
private readonly IBlogPostRepository _repository;
public BlogController(IBlogPostRepository repository)
{
_repository = repository;
}
public ActionResult ViewBlogPost(int id)
{
var blogPost = _repository.GetById(id);
return View(blogPost);
}
public ActionResult ViewRecentBlogPosts()
{
var blogPosts = _repository.GetRecentBlogPosts();
return View(blogPosts);
}
}
Wow, that's a lot of code just to do a couple of simple queries. This is the new API:
public class BlogPost
{
public int Id { get; set; }
public string Title { get; set; }
public string Content { get; set; }
public DateTimeOffset PublishedOn { get; set; }
}
public class BlogController : Controller
{
private readonly IDocumentSession _session;
public BlogController(IDocumentSession session)
{
_session = session;
}
public ActionResult ViewBlogPost(int id)
{
var blogPost = _session.Load<BlogPost>(id);
return View(blogPost);
}
public ActionResult ViewRecentBlogPosts()
{
var blogPosts = _session.Query<BlogPost>
.OrderByDescending(bp => bp.PublishedOn)
.ToList();
return View(blogPosts);
}
}
Can you spot the difference? The repository-free approach brings with it a number of advantages:
- Less code: you can write and refactor features a lot quicker
- Easier to read and understand: logic does not span multiple files and there are no inheritance hierarchies to deal with
- Easier to test: just test the class that does the query. Thanks to In-Memory mode, you can execute tests without having to mimic the database
- Handles requirement changes more elegantly: given a requirement to limit number of results returned by
GetRecentBlogPoststo 10, how would you handle that in a repository? Add a parameter? Create an overload? Will you have to keep creating new overloads whenever new requirements are presented? Using the new approach, the query logic is isolated and can be change independently of the rest of the application - Allows use of advances APIs: for example you can enable aggressive caching on
ViewBlogPost(id)in a way that does not affect the rest of the application
When you take above points into consideration, do you really want to use an abstraction?
But what about…
By now you may be thinking: "Hold on a sec. My application is different. I really need that extra layer". I have picked out three common concerns people have about this.
What if later I decide to switch to a relational database?
Relational databases and document databases have very different modelling requirements. You will have to not only rewrite the data access portion of your code, but also adjust internal repository APIs to handle the new reality.
Also, remember this is a strategic decision and doesn't happen overnight (if it does where you work - I feel sorry for you).
What if I later decide to switch to another document database?
Switching to another database is no simple task even if it belongs to the same family of databases. Expect to be dealing with a different API, usage patterns and optimisations. Repository pattern doesn't protect you from that. You still have to rewrite code. You still can't use the advanced features because you are shackled by a layer of abstraction.
Won't this lead to a lot of code duplication?
It won't. Providing you use RavenDB correctly. I have seen plenty examples where people initialise a new DocumentSession for every CRUD operation. Don't do that — session lifetime management is an infrastructure concern and should be handled at different level. Initialise your session once, at the beginning of HTTP request. Close your session at the end of HTTP request. Reuse it across all operations. This way your code is simply performing a CRUD operation and there is nothing else to think about. This also allows RavenDB to optimise writes (via batching) and makes unit testing simpler.
Sometimes it is ok to use abstractions
I am not saying you should never ever use layers of abstraction with RavenDB. If you have good reasons, then by all means go ahead. I just want you to consider next time whether the need to use abstractions outweighs the advantages of using RavenDB API directly.