Author Archives: Chris Johnson

About Chris Johnson

Chris is an avid developer, speaker and is the General Manager of Provoke Solutions Inc. a Microsoft Gold Partner in Seattle WA that is one of the world's most renowned and sought-after online experience consultancies. Provoke Solutions specialize in software solutions for SharePoint and the Microsoft technology stack (http://www.provokesolutions.com). In Nov 2011 Chris left Microsoft Corporation after nine and a half years where he most recently was a Senior Technical Product Manager for the SharePoint product group in Redmond Washington managing SharePoint’s professional developer audience technical marketing programs. Chris moved to Redmond in 2007 to work in the software engineering team on the SharePoint 2010 release after working for Microsoft New Zealand. In New Zealand he consulted to customers across the Asia Pacific region on designing and implementing Content Management Server and SharePoint deployments. Chris’ background is in Microsoft software development and enjoys all things technical. He is a speaker at numerous conferences around the world such as Tech.Ed, SharePoint Best Practices Conference, SharePoint Connections and the world wide SharePoint Conference. Chris holds a Bachelor of Computer Science & enjoys throwing himself out of perfectly good airplanes from time to time. Chris blogs and can be contacted via www.looselytyped.net

Parsing Azure Blob Storage logs using Azure Functions

One of the ways AC and I track how many people are listening to the Microsoft Cloud Show episodes we put out is by using the logs created for Azure Storage.  These track the various requests for the mp3 files for each episode.  You have to turn on this logging for your account and once this is done then log files are written into the /$logs/blob folder of your storage account.  You can read more about Azure Storage Analytics here: https://msdn.microsoft.com/en-us/library/hh343270.aspx

image

However, the way the storage logs are filed does not exactly make analyzing them easy in something like Excel.   They are logged in /year/month/day/hour/5min chunk folders/files in CSV format. (detail of what is logged is here).

image

Until recently our process for downloading the analyzing these was rather laborious and included:

  1. Download all the logs files.
  2. Use some PowerShell to squash all the thousands of CSV files into one big one.
  3. Use SQL Import tool to take the CSV and import each line into a temp table in SQL Azure
  4. Run a T-SQL script over the data to remove all the rows we were not interested in (we only wanted the logs for downloaded MP3s, not other assets etc.…)
  5. Mutate the data in various columns for better reporting. e.g. pulling important parts from the User Agent string and pulling an episode number out of the filenames.
  6. Finally inserting all the new rows into the final reporting Table
  7. Using Excel to report over the data

It was error prone, painfully slow at times and required all these things to be done each time we wanted new statistics.  At times we would go months without doing this and were flying blind on how many people were listening.  Not really very good.

What we really needed was an automated process!

Azure Function Apps to the rescue!

In March 2016 Microsoft announced Azure Functions.  They are lightweight bits of code that can be run in Azure when certain events occur, like on a timed basis, when an HTTP request is made and when a blob is added to storage.  What is nice about them is that you are not paying for a Virtual Machine to sit there mostly idle.  You only pay for what you use.  You can write them in a variety of languages and trigger them in various ways.  If you know how to write a console app in C# they are super easy to understand, same with a hello world app in Node.JS.

They looked perfect for what I wanted to do.  I wanted to read in log files as they were created, parse the content out of them & them log that to an Azure SQL Database.  From there we just attach Excel to the DB and go to town with graphs and pivot tables etc.…

Microsoft.Azure.WebJobs.Host: Invalid container name: $logs.

image

Sad smile

Problem! Currently it looks like you can’t trigger a Function off blob creation of a log file in the $logs container which is where the files are dropped 🙁  I don’t know if this is something that will change, but ill try and find out.

This means that for now we would have to do one of two things:

  1. Manually copy log files to another container periodically. Ok solution.
  2. Make another function that is triggered every 15 mins or so to copy the log files out to another container which would trigger or other function. Better solution.

I opted for copying the blobs on a schedule.  I wrote some pretty crude code that:

  1. Enumerates all the blobs in the source $logs container
  2. Checks if they exist in the destination container
  3. Copies them over if they don’t

I say this is crude because it doesn’t keep track of the last blob that it copied and therefore each time it runs it enumerates all the logs each time. This isn’t ideal and I will probably need to make it a bit more sophisticated in the future. But for now running it once a day shouldn’t be a problem.
Update:  Since I published the post I updated the copy code to check to see if it needs to sync files since the last time the function ran + 1 extra day (to be safe).  This means it wont scan all your logs every time.

So here is what we have end to end …

image

  1. Copy logs from $logs in one subscription to /showlogs in another subscription
  2. Process each log file as it arrives and put the data in the Database

Show me the code!

Ok so how did I do this.  Below is the two Azure Functions.  I have tried to comment them so you can follow along.

Copying log files from /$logs to another Azure Blob container.

Next, Processing those log files when they arrive.

I have put all the source for these two functions in a GitHub repo here: https://github.com/LoungeFlyZ/AzureBlobLogProcessing

There are two files for each function:

  • project.json – contains some dependency information for libraries that I used in each function
  • run.csx – the main azure function code

Hopefully someone else will find this useful!

Azure Functions provide a really handy and simple way to run code periodically.  If you are familiar with Node.JS, C#, Python or PHP you should go take a look at them.

-CJ

Microsoft Cloud Show: Blog retired…

We just posted a new podcast episode!

When we started the site we thought we needed a blog feed as well as a podcast feed.  Turns out all we have been posting there are notices about the new podcast episodes which isnt really adding any value. 

So, with that in mind we are going to retire the blog.  The feed will remain, but you should definitely subscribe to our podcast feed instead to get new updates on shows etc. 

PODCAST FEED

Thanks,

AC & CJ

Microsoft Cloud Show: Episode 124 | Geek Out – Behind the Scenes at Kennedy Space Center and Rocket Launches with NASA Social

We just posted a new podcast episode!

In this 124th episode, AC and CJ share their experiences with NASA Social for their experiences  for the last two rocket launches at NASA’s at Kennedy Space Center.

Be sure to download the latest episode here: Episode 124 | Geek Out – Behind the Scenes at Kennedy Space Center and Rocket Launches with NASA Social.

Remember if you have a question, send it in or leave a comment.

This episode is sponsored by Valid.nl 

Microsoft Cloud Show: Episode 124 | Geek Out – Behind the Scenes at Kennedy Space Center and Rocket Launches with NASA Social

We just posted a new podcast episode!

In this 124th episode, AC and CJ share their experiences with NASA Social for their experiences  for the last two rocket launches at NASA’s at Kennedy Space Center.

Be sure to download the latest episode here: Episode 124 | Geek Out – Behind the Scenes at Kennedy Space Center and Rocket Launches with NASA Social.

Remember if you have a question, send it in or leave a comment.

This episode is sponsored by Valid.nl 

Microsoft Cloud Show: Episode 123 | Microsoft Build 2016 Conference Recap

We just posted a new podcast episode!

This 123rd episode, AC and CJ recap the Microsoft Build 2016 Conference covering additional news since our keynote recap episodes and our overall takeways and impressions of the conference.

Be sure to download the latest episode here: Episode 123 | Microsoft Build 2016 Conference Recap.

Remember if you have a question, send it in or leave a comment.

This episode is sponsored by Valid.nl 

Microsoft Cloud Show: Episode 122 | Microsoft Build 2016 Day 2 Keynote Recap

We just posted a new podcast episode!

This 122nd episode, AC and CJ recap the second keynote from the Microsoft Build conference in San Francisco, CA this week. Today’s keynote focused on Microsoft Azure, Xamarin & Office.

Be sure to download the latest episode here: Episode 122 | Microsoft Build 2016 Day 2 Keynote Recap.

Remember if you have a question, send it in or leave a comment.

This episode is sponsored by Valid.nl 

Microsoft Cloud Show: Episode 121 | Microsoft Build 2016 Day 1 Keynote Recap

We just posted a new podcast episode!

This 121st episode, AC and CJ recap the first keynote from the Microsoft Build conference in San Francisco, CA this week. Today’s keynote focused on personal computing such as Skype, Windows, HoloLens and a new Microsoft Bot Framework.

Be sure to download the latest episode here: Episode 121 | Microsoft Build 2016 Day 1 Keynote Recap.

Remember if you have a question, send it in or leave a comment.

This episode is sponsored by Valid.nl 

Microsoft Cloud Show: Episode 120 | Microsoft invests in Mesosphere, Docker for Mac, Windows and Azure, Google Cloud, Office 365 Connectors and Microsoft’s Build Conference

We just posted a new podcast episode!

This 120th episode, AC and CJ talk about the recent Microsoft investment with Mesosphere, improvements to the Docker developer experience no Mac, Windows and Azure, Office 365 Connectors and the Google Cloud Platform’s Stackdriver as well as look forward to Microsoft’s Build conference.

Be sure to download the latest episode here: Episode 120 | Microsoft invests in Mesosphere, Docker for Mac, Windows and Azure, Google Cloud, Office 365 Connectors and Microsoft’s Build Conference.

Remember if you have a question, send it in or leave a comment.

This episode is sponsored by Valid.nl 

Microsoft Cloud Show: Episode 119 | Google Cloud Big Wins, AWS Big Losses, SharePoint 2016 RTM and New Azure Options for DoD and in Germany

We just posted a new podcast episode!

In this 119th episode, AC and CJ discuss the latest news where AWS lost some big customers in Dropbox and iCloud moving to the Google Cloud Platform as well as the SharePoint 2016 RTM announcement plus new Azure data centers for the US Department of Defense and secure options in Germany.

Be sure to download the latest episode here: Episode 119 | Google Cloud Big Wins, AWS Big Losses, SharePoint 2016 RTM and New Azure Options for DoD and in Germany.

Remember if you have a question, send it in or leave a comment.

This episode is sponsored by Valid.nl 

Microsoft Cloud Show: Episode 118 | SQL Server on Linux, Stretched Databases, Mulling an Slack bid and more Cloud News

We just posted a new podcast episode!

In this 118th episode, AC and CJ talk about the latest news with Microsoft’s SQL Server, Microsoft considering a $8 billion bid for Slack and other cloud News.

Be sure to download the latest episode here: Episode 118 | SQL Server on Linux, Stretched Databases, Mulling an Slack bid and more Cloud News.

Remember if you have a question, send it in or leave a comment.

This episode is sponsored by Valid.nl