Updating Azure Functions in Production

In this post, I’m going to address the “updating Azure Function Apps”. When you’re gone live in production and made a new change in the functions’ code, how would you update the function without bothering the users. There could be various scenarios for updating a function app, which depends on the Function App’s nature, responsibility and deployment strategy.

Problem definition

Generally speaking, some or all of the following features are required in production app upgrades:

  • Zero or minimum downtime
  • Zero or minimum action/data loss
  • CI/CD Pipeline integration

Depending on what the function app is meant to do and how it does it, we can decide on what’s really needed in our upgrade process. As a case in point, if we have a serverless web app which is being used by many users scattered globally, zero/minimum downtime would be the main concern. When users are in different time zones the app should stay up and running 24 hours a day and we can’t take it down for an update.

If you need the minimum downtime, the best way of updating the app would be through the deployment slots. The way that the slots work is pretty simple. Think of it as another Azure function app that can run side by side with your current app and could be swapped with it quickly with nearly zero downtime. In fact, the main function app is actually a slot named “production” and once you swap it with another named slot, it just changes the routings, addresses and settings instantaneously, so all the slots would stay up and running just as before. Additionally, we can easily roll back (by swapping again) if something went wrong.

At the time of writing this post, function slots are still in preview version. You can see how to create and use slots in the following blog post:
https://blogs.msdn.microsoft.com/appserviceteam/2017/06/13/deployment-slots-preview-for-azure-functions/

The deployment slots thing, works like a charm for most of the apps, but it’s not a silver bullet and has its own downsides.

In one of our sprints, I was supposed to do research on slots. As the Azure Function Apps are still pretty new though, I couldn’t find much useful documents and benchmarks on slots and the swap feature, so decided to benchmark it myself to see how reliable it is for our app.

Let me give you a little bit of background about how we use Azure Functions in our solution. The main app is a multi-tenant SaaS cloud app. Each tenant could define a few connectors that would monitor a file repository and submit the file/folder changes to the main cloud app. These connectors are actually Azure Function apps that are being deployed at the run-time, once a tenant creates a new connector. So, as you see in our scenario, function app deployments and consequently upgrades are happening at the run-time.

Connectors-FunctionApp

Another important point in our scenario, is that some of the functions in the function app, have got timer triggers, meaning that they’re running scheduled tasks. For instance, one of the functions is polling for the file/folder modification events on the file repository every 5 minutes and then puts the detected event data in an Azure Storage Queue so that another function picks it up through a queue trigger and process it. If you’re not familiar with timer and queue bindings/triggers, the following link would be a good start point:
https://docs.microsoft.com/en-us/azure/azure-functions/functions-triggers-bindings

As you might have concluded from what I said about the app’s mechanism, zero data loss is the main concern is our app, not the zero/minimum downtime. The app is monitoring the repository every 5 minute and we don’t care if this 5 minute gets 7 minute. Additionally, our app is not being directly used by our end users so having zero downtime may not be relevant in this case.

Upgrading through Function Slots

Now that we know about the requirements, it’s time to see what happens once we update our app. Suppose we’re using Azure function slots for deployments:

  • The main Function App V1 is already working on production slot
  • Then we deploy the V2 into a new slot
  • once the V2 is deployed and is up and running we swap the slots.

This process would be quick with almost zero downtime, but it’s likely to loose some data or operation during the swap.

There was not much articles/benchmarks on the swap thing so I did a test myself; I made a function with timer trigger that runs every 10 seconds and in each run goes through a loop and submits 10 items to an HTTP endpoint one at a time with 1 second delay. So the function would be submitting items all the time, every second. Each submitted item had an identifier so that I know it’s coming from what function and loop run. The identifier was something like A(3,2), which means A is sent from the 3rd function run and 2nd loop iteration. So the identifier format was like A(m,n).

Then modified my function app’s code, and this time changed the A(m,n) to B(m,n) to distinguish the submitted items, the deployed it into a new slot and swapped it with the main function. I was impressed with the swap as it took less than 2 seconds, but there was 2 missing item submissions during the swap. It seems normal to me, as during the swap, we’re actually changing the routes and may loose our network sockets and other IO related stuff. But it’s painful when we’re in production. Meaning that if a function is in the middle of something and we swap, we’re going to miss that item, which is not desirable.
FunctionSlotsSwapTest

Graceful app stop by handling the CancellationToken

Potentially, there is another solution for making the upgrades safe and graceful, without loosing any data/operation. That’s through handling the CancellationToken in the functions.

Take this sample function:

public static class ProcessItem
 {
     [FunctionName("ProcessItem")]
     public static void Run(
         [QueueTrigger("myqueue-items", Connection = "StorageConnectionSetting")]string myQueueItem,
         TraceWriter log,
         CancellationToken cancellationToken)
    {
         cancellationToken.Register(() =>
             {
                 log.Info("Function app is stopping");
                 log.Info("Stashing the item in a blob storage");
                 // The code for stashing item goes here
                 log.Info("Item stashed so we're good to upgrade");
                 log.Flush();
             });

         if(!cancellationToken.IsCancellationRequested)
         {
             // the code to process the item goes here
         }
    }
 }

As you see, we have a binding of CancellationToken type in the Azure Function. Once we add this binding to the method, we’d be able to know if the app is about to stop. As soon as the function app is stopping, the cancellation token would request for cancellation, and we can do our stuff to not loose anything as a result of the app stop (in our case the stop is caused by upgrade).

Sounds exciting, but the thing is: By default, it only gives you 5 seconds to do whatever needed to be done before the app stops. It’s good to know that the 5 second is not going to happen in practice and it would be less!

The good news is we can add this wait time by adding the “stopping_wait_time” setting in the “settings.job” file. (you need to create the settings.job file in the “site” folder in the Kudu console as shown in following screenshot)

Settings.job file that sets the stopping wait time to 2 minutes:

{
     "stopping_wait_time": 120
}
FunctionAppKudu-Settings-job
Add/Edit settings.job from Kudu console

So far so good, huh? But the bad news is however it increases the waiting time, but it’s not even close to our 2 mins. And it’s not reliable. Sometimes gives enough time to stop and sometimes not. If you try the function I mentioned above, sometimes hits the cancellation block sometimes not!!! It was not reliable at all. Didn’t have enough time to figure the cause and resolve it, so gave up on cancellation token thing and decided to take another approach to solve our problem.

Our App Mode approach

In our scenario, we decided to not go with function slot as it doesn’t add much value to our upgrade process and also couldn’t handle the CancellationToken as it was not reliable. Instead, implemented our own logic, which is like this :

  1. The function app works in 3 modes: Normal, PreparingForStop, SafeToStop. When the app starts, it’s in Normal mode and does its jobs as usual.
  2. Before starting upgrade, we call a function from the function app to signal that we’re about to upgrade.
  3. Once the app receives that signal changes its mode to PreparingForStop, tries to finish up all the current jobs quickly and wouldn’t start new jobs. If we need to stash any data to be processed later so that we can quit the current function quickly, we would do that at this stage.
  4. There is another function in the app whereby we poll the app mode. We would call this method till it returns the “SafeToStop” mode.
  5. As soon as the app turns to “SafeToStop” mode we upgrade the app.

This solution might not be the best solution but at least keeps everything under control and worked in practice.
That was how we handled our safe function app upgrades. Let me know if you have any comments or suggestions on that!

How to fix the assembly binding redirect problem in Azure Functions

Azure-Functions-LogoIn one of our recent gigs, we had to use MS Azure Functions. I’ll discuss about the architecture and how we’re using Azure Functions in our SaaS cloud solution in another post. In this post though, I’m going to talk about the problem we had in Azure Functions and propose a solution for that.

Assembly Binding Redirect issue:

As you may know, in .NET, there is a feature called “Assembly Binding Redirect” that works for any type of application. This binding redirect thing, is used when your app depends on multiple assemblies (dlls) that are themselves using different versions of the same assembly.  Case in point, let’s say you’re using Microsoft.Rest NuGet package that depends on Newtonsoft.Json v9.0.0.0 and also using Box.V2 NuGet package that depends on Newtonsoft.Json v10.0.0.0 . Therefore, at the run time, we need something to redirect all Newtonsoft.json dependencies to redirect to a single version (say v10.0.0.0). The reason being we can’t have multiple versions of the same assembly at the run time. For more info about the binding redirect have a look into the following link :
https://docs.microsoft.com/en-us/dotnet/framework/configure-apps/redirect-assembly-versions

You can simply do assembly binding redirect in all types of .NET apps. You just need to add a section to your app.config or web.config file (in case of web apps) just like the following:

<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <runtime>
    <assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1">
      <dependentAssembly>
        <assemblyIdentity name="Newtonsoft.Json" publicKeyToken="30ad4fe6b2a6aeed" culture="neutral" />
        <bindingRedirect oldVersion="0.0.0.0-10.0.0.0" newVersion="10.0.0.0" />
      </dependentAssembly>
    </assemblyBinding>
  </runtime>
</configuration>

As you see, the setting redirects all references with whatever versions (from 0.0.0.0 up to 10.0.0.0) of the Newtonsoft.Json assembly to the version 10.0.0.0 , simple as that!

In the Azure Functions though, we can’t do that; at least at the time of writing this post. So if you add an app.config to your Azure Function Project, it wouldn’t take effect, as if there is no app.config. The reason being, the way that Functions run on Azure is a bit different from normal .NET apps, as they themselves run over webjobs platform. Check out the following ticket in GitHub to see how come binding redirect doesn’t work:
https://github.com/Azure/azure-webjobs-sdk-script/issues/992

It you’re working on a real project, this problem would be quite annoying, as the code doesn’t actually work once your project’s dependencies are pointing to multiple versions of the same assembly. You simply get stuck, just as we did!

How to resolve it:

However, at the time of writing this post, the problem is not resolved by Microsoft, there is a workaround for that. Let’s say we want to add something to our application settings, which is the “local.settings.json” in Azure Functions and could be accessed through the Azure Portal, just as shown below:

Azure-Function-Settings.png
Function Application Settings

So we just added a new field named “BindingRedirects” to the settings and set its value like this:

[ { “ShortName”: “Newtonsoft.Json”, “RedirectToVersion”: “10.0.0.0”, “PublicKeyToken”: “30ad4fe6b2a6aeed” } ]

Which means redirect all the references to the “Newtonsoft.Json” to version 10.0.0.0

And here’s what our local.settings.json looks like:

{
  "IsEncrypted": false,
  "Values": {
    "AzureWebJobsStorage": "UseDevelopmentStorage=true",
    "BindingRedirects": "[ { \"ShortName\": \"Newtonsoft.Json\", \"RedirectToVersion\": \"10.0.0.0\", \"PublicKeyToken\": \"30ad4fe6b2a6aeed\" } ]"
  }
}
using System;
using System.Collections.Generic;
using System.Globalization;
using System.IO;
using System.Reflection;
using System.Runtime.Serialization.Json;
using System.Text;
namespace Connectors.AzureFunctions.Application
{
    public static class AssemblyBindingRedirectHelper
    {

        ///<summary>
        /// Reads the "BindingRedirecs" field from the app settings and applies the redirection on the
        /// specified assemblies
        /// </summary>

        public static void ConfigureBindingRedirects()
        {
            var redirects = GetBindingRedirects();
            redirects.ForEach(RedirectAssembly);
        }

        private static List<BindingRedirect> GetBindingRedirects()
        {
            var result = new List<BindingRedirect>();
            var bindingRedirectListJson = Environment.GetEnvironmentVariable("BindingRedirects");
            using (var memoryStream = new MemoryStream(Encoding.Unicode.GetBytes(bindingRedirectListJson)))
            {
                var serializer = new DataContractJsonSerializer(typeof(List<BindingRedirect>));
                result = (List<BindingRedirect>)serializer.ReadObject(memoryStream);
            }
            return result;
        }

        private static void RedirectAssembly(BindingRedirect bindingRedirect)
        {
            ResolveEventHandler handler = null;
            handler = (sender, args) =>
            {
                var requestedAssembly = new AssemblyName(args.Name);
                if (requestedAssembly.Name != bindingRedirect.ShortName)
                {
                    return null;
                }
                var targetPublicKeyToken = new AssemblyName("x, PublicKeyToken=" + bindingRedirect.PublicKeyToken).GetPublicKeyToken();
                requestedAssembly.SetPublicKeyToken(targetPublicKeyToken);
                requestedAssembly.Version = new Version(bindingRedirect.RedirectToVersion);
                requestedAssembly.CultureInfo = CultureInfo.InvariantCulture;
                AppDomain.CurrentDomain.AssemblyResolve -= handler;
                return Assembly.Load(requestedAssembly);
            };
            AppDomain.CurrentDomain.AssemblyResolve += handler;
        }

        public class BindingRedirect
        {
            public string ShortName { get; set; }
            public string PublicKeyToken { get; set; }
            public string RedirectToVersion { get; set; }
        }
    }
}

The code simply reads a setting field named “bindingRedirects” from the Function app settings, deserialize it to  List of BindingRedirects and goes through the list and hooks a new method to the “AppDomain.CurrentDomain.AssemblyResolve” event. That event gets triggered once a new assembly gets resolved. Whenever you write a code that uses an external assembly, as soon it run the code that calls a type from the assembly, it triggers that event and gets resolved and loaded. If the assembly is in our binding redirect list, then we  load the version specified in the settings, otherwise we do nothing.

We just need to call this method only once, and before running any other code in the function app. So what I did was creating this class as a helper, it just guaranties that the code above runs only once.

public static class ApplicationHelper
    {
        private static bool IsStarted = false;
        private static object _syncLock = new object();
        ///<summary>
        /// Sets up the app before running any other code
        /// </summary>

        public static void Startup()
        {
            if (!IsStarted)
            {
                lock (_syncLock)
                {
                    if (!IsStarted)
                    {
                        AssemblyBindingRedirectHelper.ConfigureBindingRedirects();
                        IsStarted = true;
                    }
                }
            }
        }
    }

Then in your Azure Function, simple call this helper. But we need to make sure that it runs before any other code, so I put it in the static constructor of the class. I’m assuming that you’re using the project template of Visual Studio 2017 Azure Function tools for development.

using System.Linq;
using System.Net;
using System.Net.Http;
using Microsoft.Azure.WebJobs;
using Microsoft.Azure.WebJobs.Extensions.Http;
using Microsoft.Azure.WebJobs.Host;
using System.Threading.Tasks;
using Newtonsoft.Json;
using System.Collections.Generic;

namespace Connectors.AzureFunctions
{
    public static class EventProcessor
    {
        static EventProcessor()
        {
            ApplicationHelper.Startup();
        }

         [FunctionName("EventProcessor")]
        public static async Task<HttpResponseMessage> Run(
            [HttpTrigger(AuthorizationLevel.Function, "get", "post", Route = null)]HttpRequestMessage req,
            TraceWriter log)
        {
            // Method's body, in our case, let's say it's a code which is using Newtonsoft.Json
        }
    }

}

Now we should be good, and if you need to add more binding redirects, then no need to touch the code, just add the new redirect to the settings, sit back and enjoy.

Hope this workaround resolves your binding redirect issues in the Function apps and saves some time and effort!