Evolving code and adding new technologies - Part 2

In Part 1 of this series we saw a basic, naive implementation of a monolithic desktop client that draws input from some input producer and displays data. Now imagine that this is your system. You built and deployed it a while ago and time went by. On a quiet morning while you sit on your desk sipping coffee and reading your favorite news outlet, your PO and hands you over the new requirements for the next version. TL;DR it will get messy before it will get sweet

Parts 3-5 are here: Part 3Part 4 and Part 5

Version 2

GitHub branch

According to the requirements now you have more than one producer. And they produce data faster. However your processing of data just got slower. You have to fetch some older data from the DB, do some correlations and then save data back to the DB. No matter what the processing is your data processing takes anywhere from 600ms to 1000ms. Yes you read right; 1 sec by the time data is correlated, stored in the database and displayed. Database optimization and more threads are out of the scope of this series and in the real life system this article is drawn from they were out of the scope of the deadline as well. 

So you start typing and come up with an application which manages to do the processing and display the data in a timely fashion. Enter PO: The producers will not be creating one message every second anymore. The will be producing a list of DataMessages whenever they have the chance. So the producer goes from this:

To this:

Notice what happened here. The producer creates a list of 100 messages, adds them to a collection and pushes them to the consumer. The processing time however has not changed. it still takes up to 1 second to process each message. The consumer now looks as such:

And your application looks like this:

So as you can see the client is stuck. This is where you realize that you need to take drastic measures.

Synchronized collections to the rescue

As pompous as the heading is they do come to the rescue to some extent. No matter which type of collection you use or how you use it,  the outcome is the same. The code has become more complex and harder to debug. Concurrency and threading tends to do that to systems. It complicates flows and creates unexpected race conditions.

What is important to understand at this point in the code is that not only did we render our client inoperable, we are also facing starvation on the producer side. If we had a requirement of "Group all messages produced on the same time with a margin of 500ms on either side of an anchor time", we would fail miserably to address it.Considering the above we can now calculate the possible starvation time of a producer.

Any producer may have to wait this long to push data: O((n-1)*100 elements)*1000ms*100ms for list filling. So the last producer will wait for 9*100*1000*100 = 90000000 milliseconds or 25 hours. Yes folks it was like that and worst. Times would  not correlate and things became messier by the day.

So how do we solve the starvation on the producer side? Well take a look at the following code:

We are using a synchronized list to add messages and process them one by one.
There can be a few variations to this code but all you will do is shave off a few milliseconds.

The complexity of concurrency

In the real system behind this post, in versions before things changed radically, there was a synchronized collection per producer. Did that help? It did a bit... it shaved starvation time on the producer side. It did little to none to improve the system on the processing time of the data. In fact improving processing time had to do with more concurrency being put to place which complicated things even more, not to mention the fact that debugging it was possible only via trace logs.

I urge you to download and run the code for yourself and notice how while you print data of a specific minute your computer clock goes forward and you are still processing old data.

While writing these lines I realized the code throws an OutOfMemory exception when adding items to a simple WinForms list box. The control is not built for such abuse and the original system used WPF. Since UI is not in the scope of this blog I will leave it at that. After all the final chapter will hold a surprise in that area. This explains the efforts to clean as many collections as possible in the code The effect of the delay however is apparent by the time the exception is thrown. In any case the code for this part is here.

So how do you think the problem of the timing will be solved? Which technology will solve our problems until the next bottleneck? Stay tuned! 


Popular posts from this blog

NUnit Console Runner and NLog extension logging for your tests

Tests code coverage in Visual Studio Code with C# and .Net Core

Applying an internal forums system in the company culture - thoughts