Going Serverless

Previous post was about utilizing cloud services to build processes. However I didn’t went all out on this: It was Raspberry Pi located at my home which was utilized to run Python code. And it is the weakest link from (at least) two point of view:

  • Raspberry Pi’s SD memory card will eventually corrupt.
  • Somebody has to maintain Raspberry Pi by patching the operating system and runtimes, monitor disk for not getting full etc. All the regular maintenance tasks administrators have to deal with. That somebody is me: Lazy data engineer.

I already had my first SD card corruption happened last fall: It took only two years from purchase to occur. Second bullet point is much more general problem and it covers all on-premise systems and IaaS option in the cloud as well: Maintenance overhead. One option to avoid SD corruption could be utilizing a VM running on Azure. But then again there’s a problem with bullet point two: Somebody has to maintain that VM. I wish there was a service into which I could just submit the code and not worry about this kind of things…

And there is: Azure has services for building business processes in cloud without having to worry about the server(s) running it. This concept is called serverless. It’s a little bit misleading term since actually there are servers under the hood but the main point is that developers don’t have to worry about those. Traditionally difficult problems, like scaling, is taken care of by the platform so developers can focus on the main thing: Building business processes. Serverless also means apps are billed only when those are actually run so no cost is generated when there isn’t usage. This is called a consumption based plan. So it’s like bring-your-own-code-and-don’t-worry-about-the-platform®.

Serverless offering in Azure builds on top of Logic Apps and Azure Functions. First one is a service for building workflows using a visual designer to automate business processes without coding. It has lots of built-in connectors available for both cloud and on-premise systems and services. It’s based on triggers which means a workflow is kick-started when something happens, e.g. a timer is hit, an certain event (like a file landing into a blob container) happens or a HTTP request is received.

Azure Functions is a service for running code in cloud environment. Language choices at the time of writing are C#, F#, Python, JavaScript, Java, PowerShell and TypeScript. Check the official documentation.

Usually Logic Apps and Functions goes hand in hand so that Logic App works like an orchestrator and calls different functions to control the flow of business processes.

The architecture I was aiming at was like this: Getting rid of Raspberry Pi and setting up a Azure Function to run existing Python code:

Serverless architecture

Python is one of supported languages so that was a natural choice since original code was written on it. It turned out so that all changes I had to do was how to schedule it and how to interact with storage account.

Scheduling can be done using triggers. One type of those is a timer and it’s the one used here. There are also other types of triggers available like an event based (e.g. file lands into blob).

Connecting the function to storage account is achieved using bindings which are endpoints to different Azure services like blob storage, Event Hubs, Cosmos DB etc. In this case we are uploading a csv file to blob so it’s an output binding. To get data into a function, one can use input bindings. Check the documentation for details.

Triggers and bindings are configured in a file named function.json which in this case was like this:

{
  "scriptFile": "__init__.py",
  "bindings": [
    {
      "name": "mytimer",
      "type": "timerTrigger",
      "direction": "in",
      "schedule": "0 0 8 * * *"
    },
    {
      "type": "blob",
      "direction": "out",
      "name": "outputBlob",
      "path": "sahko/inputdata.csv",
      "connection": "AzureWebJobsStorage"
    }
  ]
}

Mytimer is a schedule for running the function and described as a NCRONTAB expression. It’s like a classic CRON except having an extra sixth field at the beginning which describes seconds (original CRON has time granularity at minute level). Schedule is 0 0 8 * * * which means the function is run in the morning at 08:00 every day. OutputBlob is a binding to storage account container/filename (sahko/inputdata.csv).

Here’s the actual Python code:

import os
import requests
from lxml import html
from datetime import datetime, timezone
from azure.storage.blob import BlobServiceClient
import azure.functions as func

def main(mytimer: func.TimerRequest, outputBlob: func.Out[str]) -> None:
 
    LOGIN = os.environ["LOGIN"]
    PASSWORD = os.environ["PASSWORD"]
    LOGIN_URL = os.environ["LOGIN_URL"]
    URL = os.environ["URL"]

    session_requests = requests.session()

    # Get login csrf token
    result = session_requests.get(LOGIN_URL)
    tree = html.fromstring(result.text)
    authenticity_token = list(set(tree.xpath("//input[@name='__RequestVerificationToken']/@value")))[0]

    # Create payload
    payload = {
        "username": LOGIN,
        "password": PASSWORD,
        "__RequestVerificationToken": authenticity_token
    }

    # Perform login
    result = session_requests.post(LOGIN_URL, data = payload, headers = dict(referer = LOGIN_URL))


    # Scrape url
    result = session_requests.get(URL, headers = dict(referer = URL))

    formatted_output = result.text.replace('\\r\\n', '\n')

    for line in formatted_output.splitlines():
        if line.lstrip()[0:11] == 'var model =':

            jsonni = line.lstrip()[12:-1]
            start = jsonni.find('[[')
            end = jsonni.find(']]')
            jsonni = jsonni[start+1:end+1]
            jsonni = jsonni.replace('],[','\n')
            jsonni = jsonni.replace(']','')
            jsonni = jsonni.replace('[','')

            output = 'timestamp;consumption\n' # header for csv-file
            for line in jsonni.splitlines():
                start = line.find(',')
                epoc = line[0:start]
                measure = line[start+1:]
                timestamp = int(epoc)/1000
                timedate = datetime.fromtimestamp(timestamp, timezone.utc)
                timestamp_str = str(timedate)[:-6]
                output += timestamp_str + ';' + measure + '\n'

            outputBlob.set(output)


On line 8 one can see how timer trigger and output binding are utilized. Line 59 shows how to write csv file into blob. I also took the hard coded connection strings, urls, user id and password away from the code file into application settings. Those values are fetched on lines 10-13.

All other sections are identical compared to original code so in that sense it can be said it was pretty straightforward to migrate existing application from on-premise to serverless.

Development was done using Visual Studio Code. One thing to note is that setting up Azure Function & Python environments in VS Code was a painful experience and could be a blog post on its own.

Solution has been running as serverless now for two weeks without problems. Here’s a screen shot of Azure Portal of actual execution logs. It takes about 5 seconds for code to run:

Execution statistics

Originally I planned it so that Logic App would do orchestration of all components. I mean Logic App would also call Function based on a timer which is set up in Logic App side. However this didn’t work as I initially planned since at the moment Logic App only supports calling Functions which use .NET or JavaScript as a runtime stack. Most likely this will change in future and also Python (and other languages not supported at the moment) functions can be utilized from Logic Apps. And when this happens, there will be a blog post about it!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Create your website at WordPress.com
Get started
%d bloggers like this: