Saturday 29 October 2016

Version 2.0 in more detail

I've had several enquires seeking more details about version 2.0 of cqrs.net, and found myself sharing the same details, so here's some further details.

Currently cqrs.net is two things. Firstly a core framework to provide a solid structure to build on in pretty much any way you want to. Secondly, it is some tooling to remove a lot of hand written code IF you want to have a strong domain driven design (DDD) structure over the top of a CQRS base. This is an important point. As with all software there are compromises, be it architecture, maintainability, technical debt, ease of support or deployment.

Given most of DDD is repetition in the form of creating classes that follow a certain set of rules and patterns to create separation and order, much of what a developer ends up coding (time their fingers are on the keyboard) isn't actually coding the problem solving code... it's creating the structure for the separation.

Currently our tooling tackles this issue by allowing you to define (using UML) your commands, events, aggregates, queries and some thin façade services to provide public access to these generally internal concepts. Once defined, you can use the code generation built into our tooling to have the scaffold of all your classes created for you. Literally all you need to code is the actual business logic itself... usually some form of if or switch statement that asserts some condition and then publishes an event. Really this is what developers do when you take away the need to code structure - obviously there are things like calling third party services and sending emails or communicating with firmware, but basic process and work-flow control should be fast.

While writing our documentation, it really hit home how crude and basic our tooling was, just look at our first attempt
at https://github.com/Chinchilla-Software-Com/CQRS/wiki/Tutorial-0:-Quick-Northwind-sample. It's all branch, node and leaf based in a tree... not really that visual.

What version 2.0 is focused on is taking our tooling to the next step by adding visual work-flows, similar to what is seen in the blog post

https://cqrs-net.blogspot.co.nz/2016/10/version-20.html. This means you can drop a command, aggregate and several events onto a visual view and connect them together to define in a visual way that we start with this command, that will be handled by this aggregate, and return one or more of the following events.

From there the code generation will create your scaffold, again leaving only the code of reading the incoming command, doing whatever if/switch logic and depending on the path chosen publish the event(s) needed. The advantage here being that less technical people can be involved in the process of defining what the work-flow should do. This then becomes a promise from the development team that this is what you all agree on to be done. Other possible work-flows are out of scope (to be done later probably) - thus scope creep is avoided and unintentional work-flow changes don't occur. If you do need to be agile and modify the work-flow, the consequences of doing so are very visually apparent and easily spotted. This will all be backwards compatible with our existing tooling, so if you started with the branch/node/leaf based tooling you won't be wasting time. You'll be able to use which ever part of the tooling is most suitable to you and your needs at the time.

With version 2.0 we also aim have our akka.net modules supported - we're currently still testing and developing the module as the akka.net project moves forward into a production ready state.

We already have some improvements around simpler implementations of event and data stores using SQL and more Azure service-bus options (EventHubs and topics will be supported out of the box).

Version 3 is where we'll be redefining some internal workings for the tooling (a simple migration path is a requirement so this might take some time) to prepare us for our future development which includes .net core. So this would be the earliest you'll see .net core being active on our road map. We're also very dependant on our third party dependencies, like Azure and MongoDB.


Saturday 1 October 2016

Sensible micro-services

Micro Servicing Properly

One of the biggest mistake you can make is getting the granularity right for your micro-service. Where most developers go wrong is only thinking about the code, and not the operational resourcing required. By this I mean technical support, ensuring up-time, scaling easily, reconfiguring without recompiling etc. Think like you're the guy who has to wake up at 3:30 am on a Sunday morning because a database is being thrashed or something has gone offline. When you start thinking like that guy, you realise smaller can (but not always) can be better.



Is it too big?


Take the following simple workflow where the main objective is some data analytics with the result stored. The steps might be:
  1. Locate the objects that need to analysed.
  2. Load the data for each object (lets assume this is external data, even if by external we mean from a database, a flat file or some other such data).
  3. Analyse the data for each object.
  4. Store the result of each objects analysis.
To many developers this might seem like a simple case of:
  1. Execute a DB query to get the objects
  2. For-each every item (maybe in parallel like PLINQ or await/async in C#)
    1. Load the objects data
    2. Sum or average the values loaded from 2.1
    3. Store the result into the database
This seems perfectly sensible for a report that is as simple as loading data, averaging the values and saving it, especially when the next report has a slightly different requirement and you have 15 of these reports to write, think like that guy who's been woken up at 3:30 am with a report from the customer that the reports aren't working.

Imagine in this case it's a (relatively) simple issue of a report that's killing your database locking tables and trying to write too much data. This in itself is an issue, but only in that it slows down how long the data takes to be written, except you wrote all of these reports in a total of 10 lines each in one method. It's not hard. It's only 10 lines of code per report. But now all your requests are failing to save, and worse still... that data sourced for step 2 is from a remote system, like an accounting, CRM or ERP system. If the system isn't brought back up now, all the data analysis will be missing for today.

How to think better.


If this had been built on a smaller, more fine grain level, say each step as a different service then you start getting into some really cool possibilities.

Take our poor support person looking at a thrashed database at 3:30. If each step was a separate micro-service with something between each service like a service/message bus, then the resulting data from step 3 would have been saved into a queue. When the database had recovered, all those saved messages would be able to be processed and there would have been no loss of data. The micro-service running step 3 could also have gone offline as the data loaded from step 2 would also have been saved into a queue. You've now added fault tolerance to two steps in your reporting system. The poor support person could go back to bed knowing the reports would still be accurately saved at some point.

How to think global

With micro-servicing at this smaller, more finely grained level, you can start effectively using resourcing. Step 1 and step 2 might not require much CPU resourcing as it has a lot of wait time as it involves databases and external data, but needs fast networking access. Step 3 however might require a lot of CPU resourcing as it processes and analysis the data as quickly as possible. Step 4 again only requires network access, but this time, it's internal network access to the database.
With this thinking you can now start moving some of the micro-services around. Take the micro-services running step 2, they can be moved to other parts of the world, where they are closer to the external service they are getting data from making them faster and cheaper. The micro-services running step 3 could be moved to somewhere that has cheap CPU cycles per minute costs. And step 4 might actually be on premise.

When you start thinking beyond just those 10 lines of code, the world opens up and expands greatly.

Version 2.0

New things to come.

We've been a little quiet as we realised, while writing our tutorials, our UI wasn't as good as it could be. In fact it was quite cumbersome for those that are just getting started. Soooo, we decided to start writing a Visual Studio extension. This has meant a few small changes here and there, but that's meant some new things are coming.

UML and Profile Updates:

The UML profiles are getting an update with new stereotype properties, but should be backwards compatible.

Build By Diagram

We've put a heap of effort into enabling developers to code less and model faster, so we've made a heap of improvements with building workflow and relationships using the built in diagram editor tools in Visual Studio. This then means all your classes are written for you using  automation via even more tools that come built-in to Visual Studio. Why code when you can manage how your classes and methods work with each other via drag and drop.



Solution and project Templates

You can now find a CQRS solution template in the online Visual Studio gallery.

Roadmap

We're working to polish off the new UI and editors to make the framework far more usable and accessible, which means we'll re-write all the tutorials (to polish out any minor bugs) then release version 2 of the UI tooling. We expect this to be finished by the end of the year.

Anyone wanting an early preview and play around, contact us and we'll invite you to our private beta programme.

Friday 6 May 2016

Not all events are equal

The of the more interesting realisations to make as you travel the CQRS path is that not all event data is equal. One of the simplest examples to help demonstrate this is applying event data to an aggregate.


Paint this picture if you will.

You have an aggregate, lets call it a Product aggregate. When you create a Product the business has said we have to provide a name and a category. As developers we duly capture these two requirements as properties on a CreateProduct command.


The command gets passed into our aggregate, maybe directly or indirectly via a command handler, but that's not important, what is important is that the name and category arrives in the Product aggregate Create method.


Maybe we do basic data validation like checking if the name and category properties are provided (we'll avoid implementation details here as there's a long list of ways to validate what is appropriate and what isn't and where and when to do it). We then raise an event saying the Product has been created with a ProductCreated event.


Eventually this might arrive in an event handler to update several read store for you projections and read models/views, but before that your aggregate will handle the event itself to apply the state within the event to itself. This is where things can get interesting.


As it stands the business hasn't said there's any other business rules, so here's where we start to see that not all event data is irrelevant.


Take the name property. This data has no meaning to the domain (if you are doing domain driven design) or to the business for that matter. The name is just a human readable, short description of the product. It means absolutely nothing to the businesses computer systems. It's view data only, something to return when viewing a projection or view of the Product aggregate. 
And here's where the penny should hopefully drop for you (replace penny with what ever currency you use).


Your aggregate doesn't actually have to remember the value of the name property internally. It doesn't need to read it off the event or apply it to itself. it can ignore it completely. Even if we have a RenameProduct command and ProductRenamed event, we still wouldn't need to remember their values.


The category property, for now, is exactly the same. there are no business rules based off it, so it can also be completely ignored. Remember the event log/event store will keep all the events so you can see later on what was recorded, plus you can always look at your read store for it's current value.

Changes to the business rules are fine - don't fear the refactor

If, several months later the business brings an enhancement to you that states the business now needs an email to be sent IF the category changes. We now have a business rule that is based on the category of the Product.

If you chose to model this by way of introducing a ChangeCategory command and CategoryChanged event, each with a NewCategory property, the same rule as above would still apply. Because you can tell directly from the command that a the category is to be changed, there's no need to retain internally the value of the category. Life goes on and things are simple... your Product aggregate is still extremely light-weight, fast and nimble.

Several more months go on and now the business has decided that the email needs to include what the product was originally categorised as.

Now things get a little more interesting. Here you can take one of two paths.

If you chose to make changes to the original the ChangeCategory command and CategoryChanged event to include an additional property called OriginalCategory then again, the same rules from above apply. All the required information you need for your existing business rules are there without the aggregate having to maintain state on the category.

If, however, you choose not to add a new property called OriginalCategory to the ChangeCategory command, and only to the CategoryChanged event, then you have an issue... how do you populate the new field, as it's not going to be provided by the client application. This is when you now need to start recording the value of Category internally so that you can read it's current value, populate the OriginalCategory property on the CategoryChanged event so the email can be generated with the newly required information.

I'm not going to suggest one way over the other. There are advantages and disadvantages to both, but the lesson to take away from this is that, as stated at the beginning "not all event data is equal".

Thursday 28 April 2016

Bad Event Souring

One of the worst pieces of advise I've read recently involved event sourcing... specifically the advise was in relation to performance.

So a question you'll regularly get is "Isn't event sourcing slow?". Loading all those events and replaying them to get an aggregate back to it's current state to make a change to it MUST be slow. 

Firstly... in general... no it's not slow. Most aggregates don't have 1,000s or even 100s of events, usually it's in the 10s. Replaying that many events isn't that expensive for a few reasons... computers are fast and you only replay state that is relevant, and most events don't actually have much relevant state. I'll explain that one in a bit.

Secondly... the general answer is that if you are concerned about event replay on aggregates you can always use snapshots. A snapshot, for those that this is new to, is a record of the 'current' state of an aggregate at a point in time. Normally you'd pick a pattern like every 10, 20 or 50 events you'll take a snapshot to sped up event reply because you don't have to load and replay all events, just the most recent snapshot and then all events since them.

Now we come to the worst advise I've read. That advise was to discard all events prior to the snapshot. I think this was suggested to save on storage (although storage is now soo cheap it's irrelevant and the cost of storage falls at the same rate at which more storage becomes available, so you never end up paying less, you just pay the same amount for even more storage... which makes budgeting very easy).

What was soo very, very wrong about this was the whole point of an event store. To quote Greg Young (from one of his many videos) and event store/log is like a doctors records about a patient. A doctor doesn't just have a single record about a patient saying they are healthy or not (this is an analogy for current patterns of just storing the current state of an object... think a product row or order in a database). Instead a doctor has many records about the patients whole history, which they can use to build a complete picture about the patient. Now the above advise is the same as a doctor throwing out all of the patients records just because they wrote a record about that patients current state... AKA took a snapshot.

While this might seem like a trivial issue, software solutions are not constant... otherwise we wouldn't have jobs. The world of business is always changing and something that happened a year or two ago might now have a different meaning. In many cases businesses write off the ability to accurately migrate data because of a lack of history, and they've been convinced to make do with a best guess approach to what existing data should look like with new business rules applied. BUT it doesn't have to be this way. With an event store new business rules can be applied with historical records/events to accurately build new, current state. When using snapshots, this just means you remove the snapshots and rebuild them applying the new business rules.

The important thing to remember is, just because it's not relevant or applicable to you today, doesn't mean it won't be tomorrow, and it's never a good look when you have to tell your boss you can't be as agile and nimble as your competitors.

Now back to the point I raised earlier about not all event data being relevant, I'll write a post about that next as it's a topic all to itself.

For a great video on event stores/log and why remembering events not current state is so important watch one of Greg Young's videos at vimeo.com/31645099

Tuesday 26 April 2016

CQRS Explained Easily - Reads/Writes

When starting to talk about CQRS, there's a lot of details that CTOs and architects can get bogged down by.

One of the easiest way to start the discussions is with very high level concepts, like separating reading (querying) from writing data... allowing the separate systems to be scaled independently of each other.


Taken from https://lostechies.com/gabrielschenker/2012/06/12/how-we-got-rid-of-the-database/

Monday 25 April 2016

How do you answer the question "How do I query events?"

One of the hardest concepts to learn and then over or come is "How do I query events?"

The question in itself shows a disconnect from the concept of what events in an event sourced or CQRS application are there for. In short... you don't query them, BUT... when you are faced with an architect or CTO with a solid grounding in the RDBMS world how do you sell event sourcing... smoke and mirrors is the best way to explain a high-level understanding. Just to be clear we're talking a very, VERY high level explanation. One so high it glosses over several concepts and ALL implementation concerns that it doesn't scare them off. It simple wets their toes in a friendly way that lets the rest of the conversations that are needed happen.

So, how exactly do you explain this?

You start by explaining that an event-store is like (but avoid the fact that it isn't) a master database in a master/slave configuration. Events are written into a master like database (or store as we'll call them now) and queries are then executed on a replicated, slave like database/store.

Avoid the details on how this operates, that's an implementation concern that first requires a conversation on the CAP theorem. Remember to mention that you are talking about the high level concepts, and that depending on what platform, patterns, frameworks and/or persistence technologies (Sql vs NoSQL) that will be used will all influence the answer in such a way that it's a little premature to speculate exactly how it will be implemented.

Here's is an excellent illustration that covers very high level concepts with just a few low level (the example/sample commands and queries illustrated) details.

How do I query events in CQRS?