Thursday 28 April 2016

Bad Event Souring

One of the worst pieces of advise I've read recently involved event sourcing... specifically the advise was in relation to performance.

So a question you'll regularly get is "Isn't event sourcing slow?". Loading all those events and replaying them to get an aggregate back to it's current state to make a change to it MUST be slow. 

Firstly... in general... no it's not slow. Most aggregates don't have 1,000s or even 100s of events, usually it's in the 10s. Replaying that many events isn't that expensive for a few reasons... computers are fast and you only replay state that is relevant, and most events don't actually have much relevant state. I'll explain that one in a bit.

Secondly... the general answer is that if you are concerned about event replay on aggregates you can always use snapshots. A snapshot, for those that this is new to, is a record of the 'current' state of an aggregate at a point in time. Normally you'd pick a pattern like every 10, 20 or 50 events you'll take a snapshot to sped up event reply because you don't have to load and replay all events, just the most recent snapshot and then all events since them.

Now we come to the worst advise I've read. That advise was to discard all events prior to the snapshot. I think this was suggested to save on storage (although storage is now soo cheap it's irrelevant and the cost of storage falls at the same rate at which more storage becomes available, so you never end up paying less, you just pay the same amount for even more storage... which makes budgeting very easy).

What was soo very, very wrong about this was the whole point of an event store. To quote Greg Young (from one of his many videos) and event store/log is like a doctors records about a patient. A doctor doesn't just have a single record about a patient saying they are healthy or not (this is an analogy for current patterns of just storing the current state of an object... think a product row or order in a database). Instead a doctor has many records about the patients whole history, which they can use to build a complete picture about the patient. Now the above advise is the same as a doctor throwing out all of the patients records just because they wrote a record about that patients current state... AKA took a snapshot.

While this might seem like a trivial issue, software solutions are not constant... otherwise we wouldn't have jobs. The world of business is always changing and something that happened a year or two ago might now have a different meaning. In many cases businesses write off the ability to accurately migrate data because of a lack of history, and they've been convinced to make do with a best guess approach to what existing data should look like with new business rules applied. BUT it doesn't have to be this way. With an event store new business rules can be applied with historical records/events to accurately build new, current state. When using snapshots, this just means you remove the snapshots and rebuild them applying the new business rules.

The important thing to remember is, just because it's not relevant or applicable to you today, doesn't mean it won't be tomorrow, and it's never a good look when you have to tell your boss you can't be as agile and nimble as your competitors.

Now back to the point I raised earlier about not all event data being relevant, I'll write a post about that next as it's a topic all to itself.

For a great video on event stores/log and why remembering events not current state is so important watch one of Greg Young's videos at vimeo.com/31645099

Tuesday 26 April 2016

CQRS Explained Easily - Reads/Writes

When starting to talk about CQRS, there's a lot of details that CTOs and architects can get bogged down by.

One of the easiest way to start the discussions is with very high level concepts, like separating reading (querying) from writing data... allowing the separate systems to be scaled independently of each other.


Taken from https://lostechies.com/gabrielschenker/2012/06/12/how-we-got-rid-of-the-database/

Monday 25 April 2016

How do you answer the question "How do I query events?"

One of the hardest concepts to learn and then over or come is "How do I query events?"

The question in itself shows a disconnect from the concept of what events in an event sourced or CQRS application are there for. In short... you don't query them, BUT... when you are faced with an architect or CTO with a solid grounding in the RDBMS world how do you sell event sourcing... smoke and mirrors is the best way to explain a high-level understanding. Just to be clear we're talking a very, VERY high level explanation. One so high it glosses over several concepts and ALL implementation concerns that it doesn't scare them off. It simple wets their toes in a friendly way that lets the rest of the conversations that are needed happen.

So, how exactly do you explain this?

You start by explaining that an event-store is like (but avoid the fact that it isn't) a master database in a master/slave configuration. Events are written into a master like database (or store as we'll call them now) and queries are then executed on a replicated, slave like database/store.

Avoid the details on how this operates, that's an implementation concern that first requires a conversation on the CAP theorem. Remember to mention that you are talking about the high level concepts, and that depending on what platform, patterns, frameworks and/or persistence technologies (Sql vs NoSQL) that will be used will all influence the answer in such a way that it's a little premature to speculate exactly how it will be implemented.

Here's is an excellent illustration that covers very high level concepts with just a few low level (the example/sample commands and queries illustrated) details.

How do I query events in CQRS?