Microsoft Azure Stack HCI Review (DataON HCI-224 with Intel Optane NVMe) | StorageReview.com


by StorageReview Enterprise Lab

Up to now we’ve dived deep into the Microsoft Azure Stack HCI, the on-premise implementation of Microsoft’s Azure cloud service. Azure Stack HCI could be seen as a best-of-both-worlds kind of platform. It has all of the administration instruments from Azure like Azure Monitor, Azure Safety Middle, Azure Replace Administration, Azure Community Adapter, and Azure Web site Restoration, whereas housing the info on-prem and assembly sure laws. Azure Stack HCI is damaged down into three components: software program outlined structure, Azure companies, and {hardware}. 

Selecting the best {hardware} is necessary as we detailed in our article, “The Importance of Hardware in Microsoft Azure Stack HCI.” Step one to deploying Azure Stack HCI could be to discover a licensed {hardware} vendor, on this case, DataON. DataON has had a powerful partnership with each Microsoft and Intel for a number of years and brings this partnership to full realization within the {hardware} structure for Azure Stack HCI in an Intel Choose configuration. An fascinating aspect of the partnership with Intel is the flexibility to leverage the corporate’s PMEM (and naturally, its newest processors) with Azure Stack HCI.

In lots of instances, the DataON HCI Intel Choose options are configured and shipped in their very own rack, able to deploy instantly. This supply methodology is especially helpful on the edge, the place current IT infrastructure is restricted or nonexistent. Within the StorageReview lab, we deployed the 4 storage and compute nodes, area controller and switches as diagramed beneath.

Construct and Design

The Microsoft Azure Stack HCI cluster we reviewed is constructed on the DataON HCI-224 All Flash NVMe platform. These servers are 2U in measurement with 24-NVMe bays up entrance, providing loads of enlargement within the rear for PCIe-based elements. The labeling is excessive in distinction to the matte-black drive caddies, making it simple to identify particular drives if the time involves swap out. Every thing is labeled, which is not that unusual, however the extent of the labeling is. Our deployment has every node labeled (1 via 4), in addition to a lot of different gadgets that make it simple to deploy and handle within the datacenter.

Our configuration got here outfitted with 48 NVMe SSDs, or 12 per node. These included 4 375GB Intel Optane P4800X SSDs and eight Intel P4510 2TB SSDs.

Within the again, we now have two dual-port 100G Mellanox Join-X 5 NICs, offering a totally redundant connection via two Mellanox 100G switches (SN2100) for cluster community visitors. Not proven in our studio images are all of the connections with full labeling on every finish of the suitable community cable to permit for error-free cabling on the deployment stage.

StorageReview Microsoft Azure Stack HCI DataON Cluster Diagram

Previous to this, we have by no means had an answer arrive with this degree of documentation into the label. Microsoft and DataON make deploying Azure Stack a painless course of so prospects can get operational instantly. Every cable is colour coded to the precise use and labeled for the place every finish goes. Mixed with the personalized sheet DataON offers prospects, it almost ensures an error-free deployment. In our deployment, the system was pre-configured with IP addresses previous to cargo, with the IP addresses for administration and IPMI labeled.

Administration and Usability

For patrons operating a Hyper-V store working on Home windows Server, the Microsoft Azure Stack HCI shall be a simple transition. Lots of the identical administration instruments are in place, with many providing a extra built-in and easy workflow. In our overview course of, we leveraged each the Home windows Failover Cluster Supervisor to handle the DataOn HCI Cluster, in addition to the Home windows Admin Middle to watch workloads and see how they have been performing.

extra of the node degree first via a Microsoft Distant Desktop (RDP) session logged into one of many nodes, we regarded on the Home windows Failover Cluster Supervisor. This provides each node-level administration capabilities in addition to cluster-level visibility. Any such entry could be extra geared in direction of preliminary deployment, the place day-to-day monitoring would happen from the Home windows Admin Middle.

First up, we click on on our specific cluster and get normal details about it, the flexibility to configure it, and a glance into assets. This provides a abstract view of the chosen cluster, permitting you to see the place issues are and begin drilling into particular areas.

Subsequent up is failover roles. Right here we are able to see all the Hyper-V VMs operating on the cluster. Proven are the numerous vmfleet VMs we used for stress testing the cluster.

Networks permits us to see which cluster networks can be found and the standing of every. Choosing a cluster community lets you see the underlying community card related to it, in addition to its IP tackle.

Beneath the storage choice is Disks, Swimming pools, and Enclosures. For Disks, one can click on on digital disks and get info equivalent to standing, the place it’s assigned, the proprietor node, the disk quantity, partition fashion, and capability. Customers can drill down a bit deeper as properly with much more info offered like Pool ID, title, and description–as properly as Digital Disk ID, title, and outline, well being and operational standing, and resiliency. 

Swimming pools are comparable, with the data of sure storage swimming pools equivalent to standing, well being, proprietor node, operational states, and total capability, in addition to free and used area. 

Beneath Nodes, one can simply see all of the nodes within the cluster and their standing.

On the fitting, one can change to failover disks and see the person disk for a given node on the backside. 

From the identical sidebar, one may take a look at the community for a given node. 

Whereas the Home windows Failover Cluster Supervisor is a extra “all the way down to the small print” administration equipment, it requires customers to attach via Home windows Distant Desktop to a server itself (or one other server linked to that cluster) to work with it. Whereas this administration fashion is okay for a lot of makes use of, Microsoft did make issues simpler with a brand new platform referred to as Home windows Admin Middle. Not like the Failover Cluster Supervisor, Home windows Admin Middle is totally Net Browser primarily based, making it simpler to hook up with from any pc or pill within the office. It additionally presents a modernized and aesthetically pleasing appear and feel, making the day-to-day monitoring a extra pleasurable activity. It presents a glance into a lot of the identical info, with a stronger give attention to exercise monitoring that Failover Cluster Supervisor does not supply to the identical extent.

As soon as Home windows Admin Middle is related to a cluster, you may drill down into particular areas to view and handle operations. Right here we see total cluster compute efficiency info, which retains monitor of the general assets the VMs are using.

Whereas the Home windows Admin Middle is nice for viewing exercise, you’re nonetheless capable of work together with VMs in your cluster. Beneath we’re powering on a lot of vmfleet VMs.

Customers may drill into info on particular VMs.

Beneath roles, we get a barely completely different tackle roles however many of the identical key info. 

Beneath settings, customers can obtain, set up, and replace extensions for Azure. 

Via Home windows Admin Middle, we are able to additionally go into the Hyper-Converged Cluster Supervisor to look extra carefully at compute and storage. We divulge heart’s contents to the Dashboard that has normal info just like the variety of servers, drives, VMs, volumes, in addition to the utilization of CPU, reminiscence and storage. Alongside the underside of the dashboard is the cluster efficiency that’s damaged down into a selected timeframe and IOPS and latency. 

Beneath compute, admins can drill into the servers themselves for administration together with elimination of the server from the cluster. Right here, there’s normal details about the server used equivalent to uptime, location, area, producer, mannequin, serial quantity, OS title, model and construct quantity. Additionally, customers can take a look at efficiency particular to the server. 

Clicking on the Volumes tab brings customers to a abstract of all of the volumes on the cluster. The well being of the volumes are colour coded: inexperienced for wholesome, crimson for vital, and yellow for warning. Efficiency can be tracked for all of the volumes, damaged down by timeframe and into IOPS, latency, and throughput.

Drilling down right into a single quantity offers particular properties of the quantity together with standing, file system, path, fault area consciousness, whole measurement, used measurement, resiliency, and footprint. There are optionally available options (deduplication and compression in addition to integrity checksums) that may be turned off or on right here. The capability is proven graphically, exhibiting used versus accessible. And once more, we see efficiency. 

Beneath the Drives tab we get a abstract of all of the drives within the system. Right here we see the full variety of drives and whether or not or not there are any alerts with the identical colour coding as volumes. We will additionally see the capability: used, accessible, and reserve. 

Clinking on Stock, we get a listing of all of the drives and several other particulars. The main points embrace the standing of the drive, its mannequin, the capability measurement, the kind, what it’s used for, and the quantity of storage used.

We will drill down right into a single drive and see properties equivalent to standing, location, measurement, kind, used for, producer, mannequin, serial quantity, firmware model, and the storage pool it’s in. We will see the quantity of capability used versus accessible for the person drive and its efficiency in IOPS, latency, and throughput. 

Beneath the efficiency we are able to additionally see drive latency and error statistics. 

Efficiency

Efficiency contained in the Microsoft Azure Stack ecosystem has all the time been nice, a strong-suit that has made its manner via from the Storage Areas days. With that in thoughts, we checked out some widespread benchmarking workloads on this overview to permit customers to see how properly this platform compares to different HCI options out there. With that in thoughts, we used workloads to emphasize random small-block sizes in addition to large-block transfers to indicate what potential this Microsoft resolution can supply. In our Azure Stack HCI overview, we leveraged vmfleet for efficiency benchmarks, whereas on VMware or bare-metal Linux, we used vdbench.

For the efficiency right here, we examined the system with each 2-way mirror and 3-way mirror. The mirror refers back to the methodology of information safety (both two copies or three copies). Clearly with extra copies, customers will lose some capability. From a efficiency perspective, 3-way ought to result in higher reads via the rise in parallelism and 2-way is best for write efficiency with a 3rd much less community visitors. 

For our 4K random take a look at, the 2-way mirror noticed a throughput of two,204,296 IOPS learn at a mean latency of 247µs and a write throughput of 564,601 IOPS at a mean latency of three.69ms. The three-way noticed a learn throughput of two,302,610 IOPS learn at a mean latency of 170µs and for write, it was a throughput of 338,538 IOPS at a mean latency of 9.12ms. To place a few of this into perspective, VMware’s vSAN providing utilizing two Optane SSDs and 4 NVMe Capability SSDs per node measured 521Okay IOPS 4K learn at its peak and 202Okay IOPS write.

Subsequent up we take a look at our 32Okay sequential benchmark. For reads, we noticed the 2-way hit 42.59GB/s and the 3-way hit 39.48GB/s. For writes, the HCI gave us 13.8GB/s for the 2-way and seven.19GB/s for the 3-way.

Persevering with on with our sequential work, we transfer on to our 64Okay assessments. Right here the 2-way hit 39.5GB/s learn and 15.24GB/s write and the 3-way hit 46.47GB/s learn and seven.72GB/s write. In comparison with vSAN, learn bandwidth variations do not even come shut, the place bandwidth in its assessments topped at simply over 5.3GB/s with a 64Okay blocksize. Write bandwidth had an identical distinction, the place vSAN topped out at 2.55GB/s.

Our subsequent benchmark is SQL with combined learn/write efficiency. Right here the 2-way had a throughput of 1,959,921 IOPS at a mean latency of 324µs. The three-way hit 1,929,030 IOPS with a mean latency of 185µs. The SQL workload is one other space the place Azure Stack HCI is ready to present its power, measuring just below 2 million IOPS, whereas VMware’s vSAN in the identical workload profile measured 321okay IOPS.

With SQL 90-10, the 2-way hit 1,745,560 IOPS with a mean latency of 411µs and the 3-way had 1,547,388 IOPS and 285µs for latency. 

For SQL 80-20, the 2-way had a throughput of 1,530,319 IOPS at 581µs for latency. The three-way hit 1,175,469 IOPS and 681µs for latency.

SPECsfs

Subsequent up is our SPECsfs 2014 SP2 benchmark–a new take a look at for us right here. The SPECsfs is a benchmark suite that measures file server throughput and response time. The benchmark offers us a standardized methodology for evaluating efficiency throughout completely different vendor platforms. The benchmark operates by setting a scale and incrementing up till the purpose latency is simply too nice for the specs of the benchmark. Right here we take a look at the size that may be carried out till 11ms is breached, in addition to the bandwidth the server hits when it fails the latency quantity. 

We’ll take a look at latency first right here as it should shed extra gentle on why the bandwidth stopped the place it did within the second half. The size and their latencies for each the 2-way and 3-way are within the desk beneath:

SPECsfs Latency (ms)
Scale DataON HCI-224 2-Manner Mirror DataON HCI-224 3-Manner Mirror
100 0.243 0.262
200 0.329 0.371
300 0.466 0.499
400 0.636 0.699
500 0.753 0.896
600 0.953 1.083
700 1.113 1.314
800 1.326 1.557
900 1.501 1.826
1000 1.88 2.167
1100 2.061 2.807
1200 2.323 4.64
1300 2.749 8.557
1400 5.47 10.449
1500 8.616 11.285 (fail)
1600 10.485 11.414 (fail)
1700 11.069  
1800 11.697 (fail)  
1900 12.51 (fail)  

As one can see, each configurations began close to 250µs, the 2-way slightly below and staying that manner all through. At a scale of 1500, the 3-way failed going to 11.285ms giving it a variety of 262µs to 10.45ms. The two-way failed at a scale of 1800 hitting 11.7ms, giving it a variety of 243µs to 11.07ms. 

The following desk exhibits the bandwidth for every configuration at every construct, with the failure listed above in latency. 

SPECsfs Bandwidth (KB/s)
Scale DataON HCI-224 2-Manner Mirror DataON HCI-224 3-Manner Mirror
100 300897 300880
200 600372 600857
300 901672 902964
400 1202779 1203106
500 1504492 1503394
600 1805952 1806455
700 2105973 2108432
800 2408183 2406171
900 2710895 2707106
1000 3007499 3009280
1100 3308648 3308168
1200 3608244 3610219
1300 3910414 3888303
1400 4212976 4026720
1500 4513454 4000079 (fail)
1600 4587183 4229678 (fail)
1700 4621067  
1800 4630352 (fail)  
1900 4569824 (fail)  

For bandwidth, each configurations ran neck-in-neck with 300MB/s intervals up till the 3-way failed the latency with its closing passing bandwidth of 4.02GB/s, and the 2-way having a closing passing bandwidth of 4.62GB/s. 

Conclusion

It’s been a while since we’ve been this deep with something within the Microsoft storage-centric stack; and boy, are we glad to be again. With the rebranded Microsoft Azure Stack HCI resolution, Microsoft has carried out one thing that’s so primary and basic, it’s simple to below admire. Microsoft has made their HCI resolution useless easy to function with out overlaying something to dampen efficiency. As seen in our numbers, the DataON cluster we’ve been testing has posted great numbers, the quickest we’ve seen in a mid-market 4-node HCI cluster. To be honest, we’re not even testing the most recent and best {hardware} from DataON both. Whereas this config is clearly no slouch, full with Intel Optane DC SSDs, DataON presents quicker options that reap the benefits of Intel Xeon 2nd Technology CPUs, persistent reminiscence and quicker networking. The truth that there’s much more efficiency accessible in an Azure Stack HCI resolution is thrilling, nevertheless it’s additionally necessary to recollect the answer can scale down as properly to deployments as small as two node HCI that may be configured switchless for a low-cost edge or SMB resolution.

Drilling in via the efficiency numbers, the Microsoft Azure Stack HCI cluster was capable of supply an unimaginable quantity of I/O and bandwidth. Within the four-corners realm, we measured in extra of two.3M IOPS 4K random learn with a 3-way mirror configuration, and 338okay IOPS 4K random write. If you require better write efficiency, a 2-way mirror configuration was capable of enhance the 4K random write speeds to 564okay IOPs. bandwidth, although, is the place the Microsoft Azure Stack actually shines. In our 64Okay block sequential switch workload, 2-way mirror measured 39.5GB/s learn and 15.24GB/s write, whereas 3-way mirror measured 46.47GB/s learn and seven.72GB/s write. This far exceeds what we have measured from previous HCI clusters.

Total, Microsoft’s Azure Stack HCI resolution proved to be each easy to deploy, simple to handle and exceptionally performant, all stuff you need. DataON, as an answer’s accomplice, excelled in offering a turn-key construct, providing built-to-spec {hardware} with clear directions that is in the end bought in a configuration that may be up and operating very quickly. Clients may even skip the wiring in lots of instances, so it actually comes all the way down to the precise want. In both manner, although, Azure Stack HCI mixed with Intel Optane, Intel NVMe SSDs and Mellanox 100G networking proved itself a power to be reckoned with.

DataON HCI Solutions

Discuss this review

Sign up for the StorageReview newsletter





Source link