Azure Batch is a very practical and highly customizable service. It is one of Azure Compute services which are designed for performing compute-intensive or data-intensive tasks. Azure Batch combined with Azure Storage is a pair of draft horses which can handle unusual workloads. Setup is very easy and program interface is intuitive and easy to code against. There is even an API to retrieve files from a working directory of your application.
What workload is Azure Batch good for
Azure Batch is good for data transformation. It is ideal when you need big amount of computing power for a short period of time. I probably would not recommend Azure Batch for high availability scenarios. It is where Service Fabric serves much better. I even don’t know how to apply OS updates to Azure Batch compute nodes. I just didn’t need them. I allocated over a hundred of virtual machines, performed what I need to compute and deallocated them when the job was done.
Where input and output are stored
It is ineffective to access a single database from hundreds of machines at one time. Azure Batch is prepared to read/write data from/to Azure Blob Storage. You don’t have to write this logic by yourself. Azure Batch downloads input files from the storage to compute node before job starts. When the job is finished it uploads output files back to the storage automatically (when the blob with the same name already exists it is overwritten).
How to code the logic that is being executed
The business logic must be packed into old fashioned traditional (.exe) application. Which is cool because you are not limited by any sandbox. The only limit you have is a boundary of virtual machine. I wanted to say that the application can run with administrator privileges.
Application deployment & updates
The application is stored as a single ZIP archive in Azure Storage. From there is deployed to every compute node after its allocations. You can easily update the app, but you must restart the compute node to deploy the updated application automatically. You are not limited to 1 application only (however you cannot exceed 20 applications) which can be used for separating application dependencies into individual packages which can be updated independently.
Virtual machine nodes are grouped into pools. All tasks of the same kind are grouped into jobs. One pool must be assigned to a job. One Azure Batch account can contain multiple pools. One task can declare multiple dependent tasks which must be completed before the follow-up task is started. A task consists of two important parts – a unique name and a command which is executed in a command line and calls one of your applications in the compute node. Azure Batch itself is an orchestration service which holds a list of your jobs and taking care of compute nodes in your pools.
You can manage Azure Batch manually in the Azure Portal, Azure CLI or in the Batch Explorer. There is also an option to manage it programmatically in Python or in .NET (Core) by Azure.Batch NuGet package. You can create a new pool with Active Directory Batch account credentials only. The pool can be scaled manually or automatically by a custom script which has an overview of pending tasks.
How to start experimenting with Azure Batch
- Create your free Azure account.
- Add a new Resource group.
- Create a Storage / Storage account – blob, file, table, queue resource.
- Create a Compute / Batch Service a resource.
- Add a new pool.
- Download & install Visual Studio.
- Create a new console app.
- Install Azure.Batch and Azure.Storage NuGet packages.
- Copy Batch account keys to your code.
- Take a look to Azure Batch .NET Quickstart.
About the Author:
Václav Dajbych is a Cloud architect, software developer and MVP from Czech Republic.
Dajbych, D. (2018). Working with Azure Batch. Available at: https://dajbych.net/working-with-azure-batch [Accessed: 21st March 2019]