I recently worked up a script for a customer to create a snap-and-restore of an existing VM using PowerShell. The script requires only three parameters: VM name, snapshot name, and action (backup or restore). The logic behind the script was around the snapshot name. When providing that information and performing the protection (backup), a series of snapshots would be created with the existing name of the disks added by .snap.<snapshotname> in the same resource group of the original VM.
The same holds when a restore was requested. However, at this time the script would create a managed disk called <OriginalDiskName>.<Snapshotname> and the last step would be the replacement of all VMs disks with the newly created disks. We would keep the original disks around, so it could be used as plan B in case of an unexpected problem that may require us to restore the VM to the original state.
The same customer was using the script to protect and restore more than 30-plus VMs within the same resource group, and the performance wasn’t ideal, for obvious reasons. The script is linear, and it would execute one machine at a time, and having 30 VMs with several disks would take a couple of hours to complete.
The second complaint was related to the process. Although a PowerShell script saves a lot of time and consistently performs all validation, they wanted something simpler and resilient. Based on the requirements, I had to implement two changes: First, move the current script to Azure Automation to create the snapshot — Azure Automation is excellent and makes it more accessible for anyone to use it. However, I was saving JSON files on the file system, so we had to incorporate Storage Accounts to keep that information in a central location. The second change was around the performance, and the only way was using multithreading, which is not that simple because it requires some refactoring of the current script.
Understanding the runbook operation
The goal is to allow the operations team to be able to use an Azure Automation runbook to create a snapshot of all VMs in any given resource group and be able to restore to that point in time when required as well.
We are going to work on a scenario similar to the image below. The current VM (apvm) has four disks — one for the operating system and the other three are data disks. We also created two snapshots called Time0 and AP6. A snapshot is just a snapshot resource in the same resource group, and they will have the original name added by .snap.Time0 or .snap.AP6.
To make sure that the script is working as expected, I’ve created a different type of Storage Account type for each data disk and configured different Host Caching settings to make sure that when restoring a snapshot those settings are preserved.
Our operations team will deploy a patch at the application level on all VMs in the resource group, and before starting the process, we need to create a snapshot. We are going to call it Patch5, and we recommend to deallocate all VMs before running the runbook.
When running the runbook, the operator needs to define the resource group name, snapshot name, and by default, the runbook is going to back up. Restore is disabled.
After the runbook is submitted, we can check the status by looking at any of the tabs available. In the example, we are monitoring the output of the script, we provide an output for each significant task being completed, and at the end of the script, we also provide the total time to execute all assignments.
The result of the operation can be seen at the resource group level when searching for the snapshot name (patch5 in our case), and we will see all disks associated with all VMs with their snapshots.
Restoring a snapshot
Based on the previous section, we know that all disks associated with the running VM have the AP6 name. After applying all patches, we realized that was a huge mistake, and we need to restore to the point before the actual change, and that is our snapshot Patch5.
The script was created to fit the requirements of a specific customer, but you have all the tools to implement it in your environment, and if you have any special needs, please feel free to change it.
Wait for the completion of the runbook. Keep in mind that we are using multithreading, which means all the VMs will run at the same time. The results are easy to be seen in the Azure Portal, check the disks of any given VM on the resource group, and the result will be disks being attached to the VM having the snapshot name (in our case patch5).
Where is the script? I shared the code in GitHub, and you can click here to get the complete code. The code was adapted from a regular PowerShell script to PowerShell Workflow, which allows the multithreading capability. Again, the script was created based on the requirements of a single customer to use Azure Automation to create a snapshot of all Azure virtual machines. However, you can use the same code to implement it in your Azure subscription and perform changes to adapt to your needs.
Featured image: Shutterstock