Today, companies have begun to actively pay attention to the “clouds”, think about optimization and compaction of computing on their and cloud resources, and consider ways to reduce the cost of resources for “large” computing. With the development of cloud services, computing is becoming more and more like a utility service, and consumers are no longer interested in the physical server infrastructure that performs the calculations.
We talked about the experience of designing and creating our own computing environment and the prospects for such an approach to consuming IT with a leading architect in Russia, an expert in the implementation of such technically complex, highly loaded and distributed systems, Anatoly Makarov.
Serverless architecture – what is behind this mysterious concept?
Speaking about serverless computing, you need to understand that in fact, physical and virtual servers cannot be dispensed with under any conditions, which means that we are not talking about abandoning hardware, but about a new approach to providing services in the form of functions.
For example, I want to write the back end of an application. In any case, there will be a lot of headaches: you will have to prepare the infrastructure, determine the dependencies of the application, think about the host operating system, and so on. Well, we will not forget about fault tolerance and scaling under load.
The next stage in development is to take ephemeral containers in which the required dependencies are already pre-installed, and the containers themselves are isolated from each other and from the host OS. Microservices, each of which can be updated and scaled independently of the others, can be run on any infrastructure.
And if you do not want to configure containers? I don’t want to think about scaling the application. I do not want to pay for idle running containers when the load on the service is minimal. I want to write code. Focus on business logic and market products at the speed of light.
So serverless calculations appear. Serverless in this case means no headache for infrastructure management.
The idea is that application logic breaks down into independent functions. They have an event structure. Each of the functions performs one microtask. All that is required of the developer is to load the functions into the provided console and correlate them with the event sources.
For the first time, such FaaS services were implemented in 2014 in the Open-Source Microservice Hosting Platform project. Amazon AWS Lambda services soon appeared, in 2016 – Google Cloud Functions, Microsoft Azure Functions, IBM / Apache’s OpenWhisk and in 2017 – Oracle Cloud Fn.
And what is a reasonable form of providing serverless computing services?
The most commonplace is IaaS. You are provided with a standard server infrastructure or storage.
Then there were DaaS, DBaaS, PaaS … – in a word, XaaS, that is, "anything as services." All this is nothing more than a search for forms to provide the services that John McCarthy wrote about half a century ago.
The form of providing software as services – Software as a Service (SaaS) – has changed significantly over the years of its existence. It has three faces: containers (CaaS), applications (PaaS), functions (FaaS) are added to them. We could stop there, but marketers like serverless computing. That is, FaaS is nothing but the next step towards Utility computing.
How and when did you need this approach?
A few years ago I participated in a large federal project to transform the system of state registration, cadastre and cartography.
My task was to create a system for analyzing a large amount of geographically dispersed data with their subsequent migration to the target system. We needed a high-performance cluster, capable of processing about 200 TB of streaming data and maximally utilizing all available resources. Classical approaches to the architecture of such systems are unable to use the full power of the cluster, hence the new vision, the need for its own Lambda.
Tell us more about the project.
As I said, the task globally was to build a system for loading and processing data. At first glance, this is a classic ETL process, but several factors must be taken into account at once: the streaming nature of the data, the distribution and volume of this data, the complexity of the computation chain, the cost of the infrastructure, and, of course, the deadline.
Initially, we built the solution architecture on a queue system, that is, a data processing pipeline is a processor process with an input / output queue system in the role of a transport between them. There were also dedicated infrastructure parts for shared services. Ultimately, this approach did not allow us to process data efficiently for a number of reasons: it was downtime with dedicated nodes falling out, a pre-configured state of resources, and slow message transfer transport. The general utilization of all available resources left much to be desired.
What were the solutions, was there anything ready?
At that time, of course, there were already AWS Lambda and Google Functions on the market, but these are closed cloud providers, especially not in Russian jurisdictions – they didn’t suit us right away.
As a big fan of open and free solutions, I began to analyze existing solutions that could be reused. Current open solutions did not suit us either – basically they were built on almost the same architecture with dedicated nodes, which we had before. The decision to make your platform seemed reasonable and strategically correct.
What is the result? What effect did own development give?
The result is amazing. Ultimately, the transfer of some services to such a paradigm allowed us to reduce resource consumption by 30% without losing overall performance. It also allowed us to place other processing functions in isolation on the same resources of a common cluster, “compacting” our calculations. And increase SLA in terms of service availability and fault tolerance.
Any plans for the development of this platform?
Yes of course. Now I devote all my free time to popularizing and developing a free and open version of the solution. Already now you can reuse my work to build such systems. After I spoke as a speaker at a major HighLoad ++ conference, a community formed. There is movement, and it is inspiring.
Are you a fan of free and open source?
I have always been interested in open source software, but companies are not always able to use such solutions in production systems. I think that over time the paradigm will change. For example, in my work projects I successfully apply PostgreSQL, Cassandra, OpenShift, Docker, Kubernetes solutions from Google and I am an expert on many Red Hat products and their open branches – Ceph, Gluster, Ansible. Therefore, I believe that this is the only true development model.
Are you planning to attract investment?
Yes, I'm working on this issue. Investments help to achieve your goals faster, improve your product.
When is the inevitable serverless future?
The past decade has accustomed us to the convenience of cloud services. The next step is platforms that provide services of a higher level: queues, APIs, gateways, authentication tools. Now tools are improving very quickly. Both serverless computing based on functions and systems based on flowcharts have made significant progress over the past year. Container technologies are still under active development, but their relevance is doing its job.
When the future comes, which we are discussing here, it will even be possible to place applications of previous generations on new platforms, which will now be much less dependent on infrastructure. In addition, it is worth noting that such a qualitative transition will require a new approach to application design.
In general, we can say that the industry of serverless technologies is developing dynamically and in the future we will expect a lot of new and interesting things.