Microsoft Cloud Adoption Framework for Azure – Manage (Part VI)

The guidance in the last phase of the Microsoft Cloud Adoption Framework serves two purposes. On the one hand it helps you to develop the business and technical approaches needed to provide cloud management that powers ongoing operations and provides examples of actionable operations management approaches that represent common experiences often encountered by customers. On the other hand, it helps you create personalized management solutions based on business commitments. Delivering on a cloud strategy requires solid planning, readiness, and adoption. But it’s the ongoing operation of the digital assets that delivers tangible business outcomes. Without a plan for reliable, well-managed operations of the cloud solutions, those efforts will yield little value. As a business moves to a cloud-based model, the importance of proper management and operations can’t be overstated. Unfortunately, few organizations are prepared for the IT management shift that’s required for success in building a cloud-first operating model.

A section of the management phase is the Azure management guide. The Azure Management Guide helps Azure customers create a management baseline to establish resource consistency across Azure. This guide outlines the basic tools needed for any Azure production environments, especially environments that host sensitive data. This guide teaches you how to establish tooling for a management baseline. It also outlines ways to extend the baseline or build resiliency beyond the baseline. Inventory and visibility, Operational compliance, Protect and recover or Enhanced baseline options amongst others are covered in this guide.

Microsoft Cloud Adoption Framework

Inventory and Visibility

Inventory and visibility is the first of three disciplines in a cloud management baseline. This discipline comes first because collecting proper operational data is vital when you make decisions about operations. Cloud management teams must understand what is managed and how well those assets are operated. In the Inventory and Visibility article different tools like Azure Service Health or Log Analytics that provide both an inventory and visibility into the inventory’s run state are described. Azure Service Health provides a personalized view of the health of your Azure services and regions. Information about active issues is posted to Service Health to help you understand the effect on your resources. Regular updates keep you informed as issues are resolved. Also planned maintenance events are getting published to Service Health so you’ll know about changes that can affect resource availability. A Log Analytics workspace is a unique environment for storing Azure Monitor log data. Each workspace has its own data repository and configuration. Data sources and solutions are configured to store their data in particular workspaces. Azure monitoring solutions require all servers to be connected to a workspace, so that their log data can be stored and accessed.

Operational compliance

Operational compliance is the second discipline in any cloud management baseline. Improving operational compliance reduces the likelihood of an outage related to configuration drift or vulnerabilities related to systems being improperly patched.

Protect and recover

Protect and recover is the third and final discipline in any cloud-management baseline. This section of the Azure management guide aims to reduce the duration and impact of outages that can’t be prevented. Tools that can be useful are Azure Backup and Azure Site Recovery. With Azure Backup, you can back up, protect, and recover your data in the Microsoft cloud. Azure Backup replaces your existing on-premises or offsite backup solution with a cloud-based solution. This new solution is reliable, secure, and cost competitive. Azure Backup can also help protect and recover on-premises assets through one consistent solution. Azure Site Recovery is a critical component in your disaster recovery strategy. Site Recovery mentioned earlier replicates VMs and workloads that are hosted in a primary Azure region. It replicates them to a copy that is hosted in a secondary region. When an outage occurs in your primary region, you fail over to the copy running in the secondary region. You then continue to access your applications and services from there. This proactive approach to recovery can significantly reduce recovery times. When the recovery environment is no longer needed, production traffic can fall back to the original environment.

The first three cloud management disciplines describe a management baseline. The purpose of a management baseline is to create a consistent offering that provides a minimum level of business commitment for all supported workloads. With this baseline of common, repeatable management offerings, the team can deliver highly optimized operational management with minimal deviation. However, you might need a greater commitment to the business beyond the standard offering. The following image and list show three ways to go beyond the management baseline.

Microsoft Cloud Adoption Framework
  • Workload operations: The largest per-workload operations investment and the highest degree of resiliency. Workload operations are suggested for the approximately 20% of workloads that drive business value. This specialization is usually reserved for high criticality or mission-critical workloads.
  • Platform operations: Operations investment is spread across many workloads. Resiliency improvements affect all workloads that use the defined platform. Platform operations are suggested for the approximately 20% of platforms that have the highest criticality. This specialization is usually reserved for medium to high criticality workloads.
  • Enhanced management baseline: The relatively lowest operations investment. This specialization slightly improves business commitments by using additional cloud-native operations tools and processes.

Both workload operations and platform operations require changes to design and architecture principles. Those changes can take time and might result in increased operating expenses. To reduce the number of workloads that require such investments, an enhanced management baseline can provide enough of an improvement to the business commitment.

Enhanced management baseline

Two example tools in customers’ enhanced management baselines are Azure Automation and Azure Security Center. Azure Automation provides a centralized system for the management of automated controls. In Azure Automation, you can run simple remediation, scale, and optimization processes in response to environmental metrics. These processes reduce the overhead associated with manual incident processing. Most importantly, automated remediation can be delivered in near-real-time, significantly reducing interruptions to business processes. Azure Security Center provides advanced threat detection by using machine learning and behavioral analytics to help identify active threats targeting your Azure resources. It also provides threat protection that blocks malware and other unwanted code, and it reduces the surface area exposed to brute force and other network attacks. When Azure Security Center identifies a threat, it triggers a security alert with steps you need for responding to an attack. It also provides a report with information about the detected threat.

Platform specialization

Platform specialization consists of a disciplined execution of four processes in an iterative approach.

  • Improve system design: Improve the design of common systems or platforms to effectively minimize interruptions.
  • Automate remediation: Some improvements aren’t cost effective. In such cases, it might make more sense to automate remediation and reduce the effect of interruptions.
  • Scale the solution: As systems design and automated remediation are improved, those changes can be scaled across the environment through the service catalog.
  • Continuous improvement: Different monitoring tools can be used to discover incremental improvements. These improvements can be addressed in the next pass of system design, automation, and scale.

Workload specialization

Workload specialization too consists of a disciplined execution of the four processes in an iterative approach.

  • Improve system design: Improve the design of a specific workload to effectively minimize interruptions.
  • Automate remediation: Some improvements aren’t cost effective. In such cases, it might make more sense to automate remediation and reduce the effect of interruptions.
  • Scale the solution: As you improve systems design and automated remediation, you can scale those changes across the environment through the service catalog.
  • Continuous improvement: You can use different monitoring tools to discover incremental improvements. These improvements can be addressed in the next pass of system design, automation, and scale.

Workload specialization often triggers a cultural change in traditional IT build processes that focus on delivering a management baseline, enhanced baselines, and platform operations. Those types of offerings can be scaled across the environment. Workload specialization is similar in execution to platform specialization. But unlike common platforms, the specialization required by individual workloads often doesn’t scale. When workload specialization is required, operational management commonly evolves beyond a central IT perspective. The approach suggested in Cloud Adoption Framework is a distribution of cloud management functionality. In this model, operational tasks like monitoring, deployment, DevOps, and other innovation-focused functions shift to an application-development or business-unit organization. The Cloud Platform and core Cloud Monitoring team still delivers on the management baseline across the environment. Those centralized teams also guide and instruct workload-specialized teams on operations of their workloads. But the day-to-day operational responsibility falls on a cloud management team that is managed outside of IT. This type of distributed control is one of the primary indicators of maturity in a cloud center of excellence.

About the Author:

I am Matthias Gessenay, and I am a Microsoft MVP for Azure, Microsoft Certified Trainer and Azure Architect for Corporate Software. I am in IT for about 20 years, and dealing with Azure since about six years. I am passionate about community and run four Meetup groups.

Reference:

Gessenay, M. (2020). Microsoft Cloud Adoption Framework for Azure – Manage (Part VI). Available at: https://cloudspeed.ch/post/azure-cloud-adoption-framework-part6/ [Accessed: 19th May 2020].

Check out more great Azure content here

Share this on...

Rate this Post:

Share:

Topics:

Azure

Tags: