I wanted to share this experience I had on a project, and some lessons learned about it.
This project was typical, with both Production and Non-Production environments; all hosted within Azure.
In the Non-Production environment, the various Application Teams had engaged with their respective Vendors, to perform the supported installation and configuration of the applications within the Non-Production Azure Virtual Machines. Nothing different or complex there.
Where it gets interesting, is when the Application Teams did not want to spend the money to have the Vendors install and configure the applications again in the Production Azure Virtual Machines.
So, they “simply” just wanted to clone the Non-Production Virtual Machines into the Production environment.
That’s the scenario we’re dealing with here.
Using Azure Backup to “Clone”
So with this “simple” request of cloning, I devised the following high-level steps:
- Create a VM-level backup with Azure Backup
- Perform a VM Restore operation through Azure Backup
- Ensure the restored VM name aligns to the Production naming convention
- Post VM restore operation, unjoin the Non-Prod domain, rename the Computer object, join the Production domain
Without getting into all of the little details, those are the steps I followed.
But, it was a little more complicated than that. Let’s break it all down.
System Local Admin Access
For anyone that’s worked within a domain environment, you will know that if you want to move a computer from one domain to another, there will be a point in the process where the system is not joined to any domain.
For posterity, this is what that process looks like:
- Login to VM with Non-Prod credentials; unjoin domain (usually into a Workgroup)
- System will reboot
- Login to VM with non-domain credentials; join the new domain (Note: Obviously Domain Admin creds are required for this)
But this specific process included renaming the cloned VM to a Production standard name. For example, if the VM was called “Azure-Test-Web”, then in Production it would be renamed to “Azure-Prod-Web”.
In this specific project, the Client had the standard local Administrator account renamed, not once, but twice! Once while the system was being built (i.e. OSD), and again through a GPO after joining the domain!
So to avoid the complication of trying to figure out which “local” account was actually the right one, we ended up creating a new Local Admin account as a temporary option (we, of course, included the removal of this temp local account at the end of the process).
Azure Disk Encryption / Key Vault
Here’s another complication in the process. The source Virtual Machine is encrypted with Azure Disk Encryption (aka BitLocker).
If you’re not familiar with Azure Disk Encryption (ADE), and it’s dependant Azure service Key Vault, here’s a few important points to be aware of:
- The encryption and decryption of the VHDs requires that the Virtual Machine and the Key Vault are both in the same Azure region and subscription
- Only Standard Tier VM classes are supported
- Only ARM-based VMs are supported (no Classic VMs)
- No integration with on-prem Key Management Services
So when a clone is created, it’s actually still using the Non-Production Key Vault encryption keys. If there is a requirement to maintain a complete separation from Non-Prod and Prod, then that means after the clone VM is created, you will have to decrypt and then re-encrypt.
VNets and Subnets
In this particular environment, both Production and Non-Prod were contained within the same Virtual Network, but segregation was accomplished via Subnets and Network Security Groups (NSGs).
That’s all good but think about this for a minute. If you made an exact copy of a system, and built a new Virtual Machine from it, and stood it up on the network; what would happen? Think about the type of issues you would experience with 2 servers with the same FQDN, same SIDs, but different IP Addresses.
So, in this particular case, we created a completely isolated Subnet, to allow us to unjoin the VM clone from the Non-Prod domain, change the computer hostname, etc. before moving it into the appropriate Production subnet, and joining it into the Production domain.
Computer Object OU
And finally, don’t forget about the Computer Object itself, and the associated Organization Unit (OU) in Active Directory.
When you join a new computer object to a domain, by default the object is placed in the “Computers” OU. This may not be the correct location, depending on how your AD OU structure is.
So after building the cloned VM, we have to remember to move the computer object to the correct OU, to ensure all the applicable GPOs, etc. are applied per a Production system.
Repeat Across Azure Regions
Remember the points mentioned in the Azure Disk Encryption (ADE) and Key Vault section? There’s a little important point that states: “The encryption and decryption of the VHDs require that the Virtual Machine and the Key Vault are both in the same Azure region and subscription.”
One of the requests from this Client was to make a “clone” of an Azure Virtual Machine, which was hosted in one region (i.e. Canada Central), and bring up the clone in another region (i.e. Canada East).
There are a couple of important points around this.
Cross-Region VM Restore
The Azure Backup service does not (currently) have an option to perform a restore operation across different regions. This means since the Non-Prod Virtual Machine resides in Canada Central, and the Recovery Services Vault (RSV) is also in Canada Central, we do not have a direct way to perform a restore (and thus build a new VM off of that restore) in Canada East.
That being said, a potential workaround is to perform a restore of the VHDs in Canada Central (to an Azure Storage account) and then copy them over to an Azure Storage account in Canada East. From there, we could build a Virtual Machine from these VHDs. However, this potential workaround is nullified due to the next challenge.
All of the Azure Virtual Machines within this Client’s environment were encrypted, which means that to perform a restore operation from the Recovery Services Vault (RSV), we have to use PowerShell for the restoration process (which is not an issue). The issue occurs when we create the clone VM since the disks will be encrypted by the original encryption key (via Azure Key Vault). This does not pose an issue for cloning in the same region but does when we cross regions.
There is a requirement that the Azure Key Vault (which holds the encryption keys) be located in the same region as the Virtual Machine. So, even if we are able to copy the VHDs across regions, they will be unable to boot since it cannot access the Key Vault (located in Canada Central) to authorize decryption and start-up.
What did we do? The end-to-end solution was this:
- De-encrypt the Non-Prod Virtual Machine
- Shut down the Virtual Machine
- Initiate a full VM-level backup/snapshot
- Re-encrypt the Non-Prod VM
- Perform a VHD disk restore to an Azure Storage account (located in Canada Central)
- Utilize Azure Storage Explorer to copy the VHDs from the Storage Account in Canada Central to a Storage Account in Canada East
- Create a new VM in Canada East using the VHDs that were copied over
- Re-apply Azure Disk Encryption to the new VM now running in Canada East
Even though Azure Backup can definitely create VM-level backups, and subsequently restore those backups in the form of another Virtual Machine, there are a few important things I wanted to reiterate as to why this could be a bad idea.
- Duplicate SIDs in the same environment
- Duplicate VMs with the same hostname
- Unknown issues within the Registry, DLLs, etc.
- Unsupported by software Vendors
Other than that, it works great!
Reference: Ermie, A. (2018) Cloning Azure VMs into Production via Azure Backup. Available at: https://adinermie.com/cloning-azure-vms-into-production-via-azure-backup/ [Accessed 30th May 2018]