You have probably read or heard about a company with a data-driven solution, or an organisation that has come out and said that they are data-driven. This feels positive, of course, because data is the new oil, so data-driven must be great. But what does this mean? In this blog post, Stefan Palm reflects on what data-driven means and describes different communication methods.
As always with these kinds of on-trend concepts, they can mean as little as you want; I am sure you understand what I mean with concepts like ‘digital transformation’ and ‘AI’. So this is my spin on the concept of data-driven, but remember that there may be more interpretations/meanings. I hope you are going to stay with me as I take you on a short journey that will end with a concrete example.
What does ‘data-driven’ mean?
So, let’s start by analysing what data-driven means. The noun ‘drive’ relates to movement and propulsion, so driven refers to moving things forward to make things happen. Another way of defining an expression is to say what it is not. So what does ‘data-driven’ not mean?
NB: It can be really enjoyable to take this approach in other situations. For example, when someone says, “Our organisation takes this issue very seriously”; it makes me wonder whether they think that there are other organisations that do not take the issue seriously. Newspapers are full of these comical expressions.
Let us not get too abstract here. So imagine that we have a well-known process that we want to be performed when we enter a specific kind of data. I am sure that you can find an example close to home. How can we transfer knowledge and insights from data to make sure something happens? We need to find a communication method. This is where it starts getting tricky.
Historically, this was not a problem, as you wrote a program that did everything. Simple, but problems start appearing when these systems start to grow. Most organisations have at least one ‘monolith’; a system that is carefully looked after by a few people, but which is impossible to communicate with (apart from exporting files).
Nowadays when you sit down to design the solution for a relatively simple process, it is highly likely that you will communicate with other systems or services. Instead of monoliths, we now talk about microservices, where each service solves a specific task. Today we can create solutions by combining several microservices.
All modern systems/services have Application Programming Interfaces, APIs, for this purpose. But do not be fooled; APIs are yet another one of these concepts that can mean as little as you want when looking at this technology. The most common list of abbreviations that usually appear is for older systems: SOAP/XML, function-oriented components using RPC/gRPC. But most people think of REST HTTP/JSON when talking about APIs, which are also referred to as REST APIs.
REST stands for ‘Representational State Transfer’. The focus is on resources, which some people refer to as ‘objects’. The easiest way to understand this is to imagine the definition of a noun. If you have forgotten your English classes from school, you can put, for example, ‘one’ or ‘several’ before a noun. In REST, you have one resource with several before it, i.e. the plural form. One resource has several ‘attributes’ and there are a number of functions for communicating with the resource. NB: to make sure you do not have to implement never-ending resources, object-oriented design is often used to re-use attributes and functions.
Communication is performed using HTTP, and you use the verb in HTTP to exchange data with the resource. The easiest way is to think of GET to retrieve data from the resource, and POST to send data to the resource.
If we are going to nitpick, POST stands for creating a resource, PUT for updating an existing resource completely, PATCH for updating an existing resource partially, and DELETE for deleting a resource. But this is not what we are focusing on right now.
Synchronous communication and asynchronous communication
What I want to highlight here is that REST APIs use synchronous communication. Think about how a web browser works. I type in a web address (URL), perform HTTP GET, wait for a response, and when the response comes, the page appears. This means that the application sends a ‘request’ and then waits for a ‘response’ before it continues. We have created a dependency between the one sending the request with the one responding.
To remove this lock in the communication, you can use ‘asynchronous communication’ instead of synchronous communication. There are ways of implementing asynchronous communication with REST APIs, but this presents certain challenges. People also have lots of opinions on this, but I am not going to get into that here.
Another way of creating asynchronous communication is to use event-based technologies, which are basically asynchronous. There are lots of variants, but we normally talk about pub/sub. This is where there is an ‘event’ that is in focus and this event communicates with a ‘topic’. One or more can ‘publish’, and one or more can listen to a topic or ‘subscribe’. The one writing to a topic has no direct link to the one(s) who is (are) listening and vice versa, i.e. there are no dependencies between components in a solution.
This attribute has made this kind of communication popular for modern solutions based on microservices. It is easy to replace components in this solution or add new functions without affecting existing elements.
So there are many advantages of introducing an architecture that is event-driven, but it is not the same as being ‘data-driven’. In an event-driven solution, there is still someone who interprets the data and when certain conditions are met, an event is created. And it is through this event that you can communicate and others can react.
When talking about data-driven, concepts such as ‘observer’ and ‘observable’ are introduced. The main difference with data-driven communication is that it takes place through data; and yes, this may sound a little strange, as it often does when something is new. To put it in very simple terms: a service can choose to share data that is observable, and someone else can choose to become an observer to this data. Unlike event-driven communication, the person sharing the data in this case does not make any evaluation of the data; it is up to the person observing the data to react to it as they wish.
One concept that is starting to emerge is data streams. Time becomes an aspect. Data that is shared over time becomes a flow of data, a flow that can be managed in a data stream. This is something that is normally talked about as part of the Internet of Things (IoT) solutions, where there are sensors that send data regularly. The data flow from a sensor can be defined as observable, and a service can observe the flow of data. However, it is not only within IoT that you work with data streams; any systems where you can share data with some degree of regularity can do this through a data stream. Normally it is not only the most recent value that you can get from a data stream, but you can also use the time aspect to help with the analysis of data streams. Average values, or maximum and minimum values, over a specific period of time are easy to understand. But often the technology that enables data streams has support for building complex analysis functions to support the observers. ‘AI’ has made an impression here as well, and Machine Learning for data streams is available on all major platforms.
Summary of communication methods
So this is a summary to clarify the differences between the 3 different communication methods. I am going to do this by using one simple example.
We are going to build an app that will send a text message if the temperature in the office goes above 25 degrees.
To help us, we have 3 different sensors that communicate in different ways.
Sensor 1 communicates via REST API.
Our app can then regularly send a request to the sensor to obtain the current value.
If the value goes above 25 degrees, the app will send a text message.
However, we will not know if the temperature has been 25 degrees at any time in the period in between us sending the requests. This means that we can be tempted to ask the question very often. (Yes, one challenge with REST is that it can create an extremely high amount of traffic.)
Sensor 2 communicates via events.
Our app can publish the threshold value the sensor should have (25 degrees) to a topic.
Our app will then listen to another topic that the sensor will publish to when the temperature goes above the threshold value. When this event happens, the app will send a text message.
This has much less communication, but requires a ‘smarter’ sensor (which also means more expensive).
And we ‘lose’ data/information from the sensor. If we want to calculate the average temperature over the past 24 hours, we will not have any underlying data. Building a smarter sensor to resolve this will also be more expensive and will require the ability to maintain a distributed solution.
Sensor 3 communicates via a data stream.
Our app can observe data streams and we can set a threshold value (25) and the length of time that it has to remain above the threshold value before our app sends a text message.
The sensor can be relatively straightforward (inexpensive), and by compressing data we can keep the amount of data traffic down. As we have access to all the data from the sensor, we can try things out and continually develop new functions.
Maybe we want a trend curve to show the average temperature per hour over the last week?
Or maybe we want to train an ML model to predict whether the temperature will go above 25 degrees in the next hour, based on data from the last 24 hours?
Remember that the technology you should use depends on your needs.
So it is not a question of one size fits all. It is likely that there is a need for all of the kinds of communication I have described above.
But different technologies create different opportunities, so it is all about being aware of this when making a choice. And creating an architecture where you can easily work continually on optimisation. This is because new technologies are emerging all the time that create new opportunities.
No one person can have insights into all the new technologies, but at Softronic we have a broad range of expertise.
So contact us if you would like to get a better understanding of this or would like help realising the value of new technology, because this is what we do.
Blog post written by Stefan Palm for Softronic AB.