October 31st, 2019 by Brian Beeler
One of the biggest challenges that small and medium business (SMB) and remote offices/branch offices (ROBO) face is how to ensure business continuity if their primary site goes down. Typically, vendors and integrators suggest that they setup a secondary site that mirrors business critical applications to and/or send backups to a secure location so they can recreate their environment in the case of catastrophic failure. But these schemes have some serious drawbacks. For example, a secondary site can sit unused for weeks or months, or if you are very fortunate it may never be used at all. This secondary site must be maintained and functional if the primary site goes down. Having backups and a mechanism restore them can alleviate some of the cost of having a secondary site but it presents problems of its own as your plan to restore your compute infrastructure from backups must be tested and verified on a regular basis, that coupled with the downtime that your business will suffer from while doing this can be crippling to a business.
We have been working with DataON for the last couple of months to get a better understanding of their Microsoft Azure Stack HCI solution (those articles can be located here, here, here and here). We asked the folks at DataON to help us understand the solutions that they have in place for helping customers that have deployed Microsoft Azure Stack HCI with business continuity.
As background information Microsoft Azure Stack HCI is an on-premise implementation of Microsoft Azure cloud service. Microsoft Azure Stack HCI allows companies to run virtualized applications on-premises with direct access to Azure management tools and services including Azure Site Recovery (ASR).
Many people get confused about Microsoft Azure Stack HCI and Azure Stack. While both are on-premise solutions, Azure Stack runs Azure OS with Azure Services, while Azure Stack HCI runs Windows Server OS with Azure Services. Azure Stack is an IaaS and PaaS solution while Azure Stack HCI allows you to run your virtualized workloads in the manner that you are used to with the added benefit of connecting to Azure cloud for additional services.
DataON is a Microsoft cloud provider that provides their customers with proven engineered solutions they support fully, from hardware to cloud. This prevents organizations from having to try and locate a specialist to solve complex problems as DataON takes ownership of problems regardless of where the issue is. As a bonus, DataON offers Microsoft cloud services for less than you can acquire them directly from Microsoft.
With that background out of the way, let’s discuss some of the DR options for Microsoft Azure Stack HCI. By the way HCI is an acronym that typically means Hyperconverged Infrastructure, but in this case it may better stand for Hybrid Converged Infrastructure, as it can seamlessly integrate your on-premise IT infrastructure with Microsoft cloud services.
At its core, DataON’s DR solution for Microsoft Azure Stack HCI uses Microsoft Cloud Services Azure Site Recovery, Azure’s built-in disaster recovery as a service (DRaaS), as the backup site. This allows organizations to overcome the cost of setting up and maintaining a secondary site and the time constraints of restoring your primary site from backup media. Microsoft has put a lot of time into developing Azure Site Recovery and Gartner, in their 2018 Magic Quadrant for Disaster Recovery as a Service, recognized it as a leader in DRaaS. To help visualize all of the moving parts, Microsoft has a relevant graphic showing the topography of Azure Site Recovery. The topography shows Microsoft Azure Stack HCI as the on-premise compute resource and the Azure Cloud as the DraaS secondary site.
The topography is a bit deceiving as the Azure Cloud shows compute resources as well as the replica (backup) disks, in reality you only use and, more importantly, are only charged for, the replica disks on the secondary site until you need to recover or test the recovery of your site on Azure Cloud. The topography below shows the components that you will be using and paying for most of the time, the grayed-out resources you will not be charged for until you actually consume them. To help your understanding of the charges associated with Azure Site Recovery, you can visit a page Microsoft put up here.
Setting up a DR site can be painful but Microsoft and DataON have greatly simplified the process of doing so by using Microsoft Cloud Services and Window Admin Center (WAC). WAC is a locally-deployed, browser-based management platform tool that can manage on-premise and Azure cloud-based instances of Windows 10 and Windows Server. The idea behind WAC was to help simplify management of servers by placing all frequently referenced tools used by system administrators into a single pane.
Before delving into Azure Site Recovery let’s look at WAC. WAC is installed on a client system and uses PowerShell scripts, Microsoft Windows Management Framework (WMF) over WinRM (Windows Remote Management) to monitor and manage Windows systems, including HCI clusters and Azure virtual machines.
WAC’s overview pane gives you a quick view of CPU, memory, network and disk activity and a restart button for the systems that you are monitoring. WAC includes tools such for certificate and device management, an event viewer, a file browser, local users and groups manager, firewall manager, process viewer, Registry editor, Role and Feature installation, Services manager, Storage manager. WAC even includes a Windows Update manager tool.
WAC also allows you to open a remote PowerShell session and an RDP session to the server if desired.
An example of WAC’s management capabilities can be seen how it measures the overall health of a system as well as to view metrics in real-time.
Microsoft realized that it cannot be all things to all people and so it allows extensions to be used with it. DataON was one of the first companies to take advantage of this ability and quickly ported its MUST extension to WAC. DataON MUST (Management Utility Software Tool) provides infrastructure visibility, monitoring, and management for Windows Server-based hyper-converged infrastructure, networking and storage.
Diving deeper into the DataON MUST extension we saw how it can be used to show the usage and health of the storage on a system.
In this case we see how DataON capitalizes on its Intel Select vendor status and uses Intel Optane SSD DC P4800X Series of NVMe 3.0 x4 drives for both caching and capacity. We did a full review of the Intel Optane SSD DC P4800X. (located here) and found that for low-latency workloads, there is currently nothing that is as performant as it.
The beauty of WAC is that it can be used with Azure Hybrid Services such as ASR, this ability is the backbone of using Azure as a recovery site for your on-premise VMs. WAC has tools that make this process extremely easy to consume Azure services.
The initial setup for ASR requires you to register your WAC with Azure. To do this you use the Azure hybrid services tool in WAC. Once this has been completed you will be able to create and connect Azure services using WAC.
ASR copies VMs from your on-premise site to the Azure cloud and then syncs them using a recovery point object of your choosing. If you have an on-premise outage you can then start the replicated VMs on Azure cloud to keep your IT running.
To setup ASR using WAC you select the virtual machines tab under tools and then select Update now. The update wizard will walk you through creating a new recovery group and the recovery service vault in the geographical location of your choosing.
You will then have the option to select a VM and protect it. The VM can be running while you protect it. After the VM has been copied over to Azure cloud it will show as Protected.
Once the VM has a protected status you can use the Azure portal to failover the protected VM. You can also use Test Failover to bring up the VM in sandbox that does not have network connectivity so it will not interfere with production servers.
We have not had the opportunity to work with WAC in depth before and were pleasantly surprised with the functionality that it provides for on-premise HCI servers, the VMs running on them, and Azure cloud services and objects. By integrating WAC with on-premise and their cloud services Microsoft greatly simplified the process of disaster recovery. We do realize that we protected only a single VM using ASR and that protecting an entire datacenter is far more complicated, it does appear that Microsoft has put all the pieces in place to do just that. DataON has documentation on the process used to protect their on-premise HCI solution with ASR, offers professional services that need assistance with more complicated DR planning and implementation and offers Azure services for less than you can get them from Microsoft.
For many HCI customers, the cloud still sounds ominous and complicated. By enabling ASR within Azure Stack HCI, Microsoft and their partners provide organizations with the easy baby steps to engage with the cloud in a meaningful way. In our experience the process is very straightforward; ASR should provide a springboard to greater cloud engagement for Azure Stack HCI customers.
This report was sponsored by DataON. All views and opinions expressed in this report are based on our unbiased view of the product(s) under consideration.