There can be many challenges for operators on their way to the cloud, some of which are more complex than others. It should be noted here that there is no uniform solution on the way for operators to migrate to the cloud. Operators have special requirements in terms of security, observability, reliability and performance.
As a result, Microsoft offers operators a far greater level of support than just leveraging and repackaging existing cloud services. We understand that knowing exactly what workloads operators need – including what to do to meet the needs that operators are demanding because of their dedication to providing fault tolerant services to customers is critical.
In this blog, we take a look at an example of Microsoft’s commitment to operators and developing a practical product strategy that emerged from years of research. This has led directly to solutions that we now offer the operators as part of it Azure of operators.
Bring a cellular packet core to the cloud
A few years ago Microsoft began work on a research project aimed at determining the feasibility of implementing a core cellular network (EPC) in a hyperscale public cloud. This work resulted in a research prototype of a distributed network architecture for the EPC in the public cloud. It was implemented as a cloud service that ensures high network availability while at the same time compensating for the unpredictability of public clouds. While maintaining the original EPC design, this cloud EPC prototype provided the same basic functionality as a core cellular network and was compatible with standard cellular devices (such as telephones and base stations).
To better understand the needs of operators planning to migrate to the cloud, we deployed the cloud EPC on Azure and tested it with a combination of real cell phones and synthetic workloads. It showed a higher availability compared to the telecommunication solutions existing at the time as well as a level of performance that was comparable to many commercial cellular networks.
Figure 1: High-level network deployment research prototype
Ultimately, this work provided the opportunity for a distributed network architecture that leverages the public cloud – a testament to Microsoft’s ability to potentially relieve operators of the burden of managing their own infrastructure.
Go one step further
For Microsoft, the above project was just the beginning of building greater awareness and expertise – both of which are now being used for the benefit of operators on their cloud migration trips. To better understand how the earlier cloud EPC prototype could work in the real world while the actual mobile traffic was being transmitted, the findings from this early project were incorporated into a two-year test on a real cellular network.
The real experiment was created in collaboration with the City of Cambridge in the UK. The experiment consisted of five cell towers installed in various locations around the city and was intended to benefit underserved communities that lacked traditional broadband access. Microsoft deployed the cloud EPC in parallel to a failover in Amsterdam, the Netherlands, in public Azure regions in Dublin, Ireland. The test with this small network ran successfully over the entire period and worked without a single failure. Microsoft has gained a wealth of technological and operational data that we are currently using for the benefit of the operators.
Figure 2: Data from the Cambridge study
Learned four important lessons
1. It’s very doable
One of the key findings of the Cambridge study is the idea that a telecommunications virtual network function (VNF) can actually work in the hyperscale public cloud. Even though the EPC was in the cloud outside of the country, Microsoft was still able to deliver a live LTE network with solid network performance. The data traffic reached between 20 and 40 GB daily, and the maximum connection throughput was more than 20 Mbit / s (with 2 × 2 MIMO on 5 MHz). During this time, most users got a download speed of at least 4 Mbps.
2. The setup is much faster
Operators want to know what tools and services are available to understand the complexity of moving to the cloud. Based on our experiment, finding the EPC in the cloud makes mobile phone adoption a lot easier. It could have taken months to source and operate traditional EPC equipment on your own, not to mention the need for investment. Instead, it took less than five minutes to launch in a new Azure region.
3. It’s very reliable
This study addressed concerns about hyperscale clouds not meeting the high availability standard required by operators and demonstrated that VNFs running on Azure can be extremely reliable. When measuring the reliability of various Azure components over a period of three months, Azure met the availability of four nines, which was sufficient for the trial version. It should be noted that other services in Azure (e.g. Azure ExpressRoute and deployments in Azure availability sets and Azure availability zones as well as reliable data storage) can be integrated into the deployment to further improve network availability.
4. It’s easy to wait
Another important result of the experiment was the ease of network management. Microsoft was able to write a network management interface in Azure to handle day-to-day operations. Using Azure’s data analysis tools, we were able to monitor the health of the network and generate warnings via the Azure portal. We achieved all of this without writing a single line of code, so a single team member could manage the entire network.
What does that mean for operators today?
Microsoft continues to refine its strategy and portfolio of services based on research such as the Cambridge Study, hands-on experience with hundreds of customers operating networks using advanced technology, and strong relationships with operators around the world. Some of the design features derived from these findings include:
Use microservices-based architectures to reduce footprint and improve performance
For example, our IP Multimedia Core Network Subsystem (IMS) was the first commercial cloud-native network function specifically designed to run in containers within hybrid cloud architectures. In view of the fact that little computing effort is required to meet the financial requirements of the operator infrastructures, care was taken to ensure that the microservices methods used are detailed enough to realize the advantages of reliability, flexibility and scalability, but not so details that they affect data endurance and performance.
Additional complexity had to be considered for network functions that support real-time user traffic. This required new innovations in accelerating the data plane and the packet processing pipeline that enable near silicon-like throughput without using excessive CPU cycles or custom hardware. These technologies were first deployed in our Session Border Controller and later in the 5G core, which is specifically designed for multi-access edge computing environments where resources are incredibly constrained.
Similarly, solutions like Unity Cloud Orchestration have taken advantage of cloud-native technologies to simplify orchestration by using a single tool like Kubernetes to manage containerized network functions. Unity Cloud reduces the time and complexity of capacity and high availability network planning as these functions are managed dynamically. Unity Cloud simplifies the provision of functionality and the software and patch upgrade process using microservices.
Use of an automated management level to reduce operating costs
With the service automation functions activated by ServiceIQ and Unity Cloud Operations, which are available as APIs (Application Programming Interfaces) for a business intelligence layer, operators can create networks that are more secure, flexible, efficient, scalable and stable. These networks will be:
- self-healing: Using big data and AI, the network can create predictive failure models that are then combined with automated processes that can be used to change network configurations to avoid failure conditions.
- Self defense: Behavioral analysis models can be created to identify the behavior of network elements that is abnormal and may indicate a component at risk. An automated process could then sandbox the suspicious network element for further analysis and correction, or even revert to the last known good configuration.
- Self-optimizing: AI recognizes patterns that lead to more efficient computing resources, radio resources and energy settings and adapts the network configuration accordingly.
- Self-configuring: When new network elements are added, they are automatically detected, provisioned and configured on the network.
Regardless of whether an operator selects a Microsoft first-party VNF or a cloud-native network function (CNF) or works with the third-party VNF or CNF partners certified on the Azure platform, Azure ensures that the underlying cloud Platform provides the orchestration required. Management and exposure functions to ensure resilience, manage performance, and automate execution required to support core network workloads.
Microsoft’s commitment to helping operators migrate to the cloud is not to be underestimated. The research cited here is just a few examples of the many projects that we continue to investigate and that will continue to influence our roadmap in the future. Ultimately, this knowledge is trend-setting for supporting operators to make the migration to the cloud as smooth as possible. Microsoft understands that when it comes to reliability, speed, and consistency in deploying hyperscale public clouds, there is no better suited to assisting operators.
To learn more about our vision and strategy, take a look at ours Azure for operators: a cloud for network operators ebook and Azure for Operators infographic.