Fix: AccessToKeyVaultDenied when using KeyVault with Function App application setting

After following the instructions on the MS website to establish a KeyVault reference and place that in my App Settings, I set up a Managed Service Identity and grant that identity access to my KeyVault key. Next, wishing to follow Microsoft’s advice and secured a firewall around the KeyVault, ensuring that I checked the Allow trusted Microsoft services to bypass this firewall? setting, however, I was still receiving an AccessToKeyVaultDenied error:

Screen shot showing that System assigned managed identity is receiving the AccessToKeyVaultDenied error with the explanation 'Key Vault reference was not able to be resolved because site was denied access to Key Vault reference's vault.'

I even checked and yes, App Service is supposed to be able to bypass the firewall – so what was going on? Well, on the KeyVault resolver reference page it has this text:

Key Vault references are not presently able to resolve secrets stored in a key vault with network restrictions.

That seemed ok when i first read it as after all, there’s an explicit setting to bypass the firewall. But when i disabled network firewall (allow access from all networks), everything suddenly worked, and the key status is Resolved with a nice green tick:

Screen shot showing that the KeyVault key status is "Resolved"

Fix: This must be accepted explicitly before updates for this repository can be applied

Some repos, such a the one for the Unifi Controller, use different ‘field’ values to tie a release and require manual updates. For someone like me who has a standalone, automated controller setup designed mainly to keep the firmware up to date without much intervention, this is a hassle. It looks something like this:

[email protected]:~$ sudo apt-get update
[sudo] password for robert: 
Hit:1 http://mirrors.linode.com/ubuntu bionic InRelease
Get:2 http://mirrors.linode.com/ubuntu bionic-updates InRelease [88.7 kB]          
Get:3 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]        
Get:4 http://mirrors.linode.com/ubuntu bionic-backports InRelease [74.6 kB]                                       
Ign:5 http://repo.mongodb.org/apt/ubuntu xenial/mongodb-org/3.4 InRelease                                         
Hit:6 http://repo.mongodb.org/apt/ubuntu xenial/mongodb-org/3.4 Release                                           
Get:7 https://dl.ubnt.com/unifi/debian stable InRelease [3,024 B]        
Reading package lists... Done                             
E: Repository 'https://dl.ubnt.com/unifi/debian stable InRelease' changed its 'Codename' value from 'unifi-5.12' to 'unifi-5.13'
N: This must be accepted explicitly before updates for this repository can be applied. See apt-secure(8) manpage for details.

It’s an easy fix. Just tell apt-get to ignore the codename field:

[email protected]:~$ echo 'Acquire::AllowReleaseInfoChange::Codename "true";' | sudo tee    /etc/apt/apt.conf.d/99releaseinfochange
Acquire::AllowReleaseInfoChange::Codename "true";

and then it works!

[email protected]:~$ sudo apt-get update
Ign:1 http://repo.mongodb.org/apt/ubuntu xenial/mongodb-org/3.4 InRelease
Hit:2 http://mirrors.linode.com/ubuntu bionic InRelease                                                           
Hit:3 http://mirrors.linode.com/ubuntu bionic-updates InRelease                                                   
Hit:4 http://repo.mongodb.org/apt/ubuntu xenial/mongodb-org/3.4 Release                                           
Hit:5 http://mirrors.linode.com/ubuntu bionic-backports InRelease                                                 
Hit:6 http://security.ubuntu.com/ubuntu bionic-security InRelease                                                 
Hit:7 https://dl.ubnt.com/unifi/debian stable InRelease                                                           
Reading package lists... Done

Fix pyodbc.Error: (‘01000’, “[01000] [unixODBC][Driver Manager]Can’t open lib ‘ODBC Driver 13 for SQL Server’ : file not found (0) (SQLDriverConnect)”)

I was connecting from my macbook to a SQL Azure Database when i hit the following error:

>>> environ.get('cloud_sql_conn_string')
'Driver={ODBC Driver 13 for SQL Server};Server=tcp:cynexia-sql.database.windows.net,1433;Database=cloud_scales;Uid=<username>;Pwd=<password;Encrypt=yes;TrustServerCertificate=no;Connection Timeout=30;Authentication=ActiveDirectoryPassword'
>>> import pyodbc
>>> cnxn = pyodbc.connect(environ.get('cloud_sql_conn_string'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
pyodbc.Error: ('01000', "[01000] [unixODBC][Driver Manager]Can't open lib 'ODBC Driver 13 for SQL Server' : file not found (0) (SQLDriverConnect)")

The solution was to install the ODBC driver, following the instructions on the Microsoft website:

brew tap microsoft/mssql-release https://github.com/Microsoft/homebrew-mssql-release
brew update
HOMEBREW_NO_ENV_FILTERING=1 ACCEPT_EULA=Y brew install msodbcsql17 mssql-tools
ACCEPT_EULA=Y brew install [email protected] [email protected]

Fix: unable to kmem_alloc enough memory for scatter/gather list in ZFS Solaris 10.5

The ZFS Pool on my server was showing degraded state. After checking the SMART status of the constituent drives and finding no problem, I discovered that there’s a bug in Solaris 10.5 where the system reports a growing number of errors and eventually fails the pool. dmesg shows an error unable to kmem_alloc enough memory for scatter/gather list, however, there is actually nothing wrong with the pool. Running zpool status shows degraded state:

[email protected]:~# zpool status
  pool: rpool
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM      CAP            Product /Disks     IOstat mess          SN/LUN
        rpool       ONLINE       0     0     0
          c1t0d0    ONLINE       0     0     0      32.2 GB        VMware Virtual S   S:5 H:25 T:0         000000000000000

errors: No known data errors

  pool: tank
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://illumos.org/msg/ZFS-8000-9P
  scan: scrub repaired 0 in 12h15m with 0 errors on Fri Dec 21 00:08:43 2020
config:

        NAME                       STATE     READ WRITE CKSUM      CAP            Product /Disks     IOstat mess          SN/LUN
        tank                       DEGRADED     0     0     0
          raidz1-0                 DEGRADED     0     0     0
            c0t50014EE20BF0750Dd0  ONLINE       0     0     0      4 TB           WDC WD40EFRX-68W   S:0 H:0 T:0          WDWCC4E6NAXVAS
            c0t50014EE263348A3Ed0  ONLINE       0     0     0      4 TB           WDC WD40EFRX-68W   S:0 H:0 T:0          WDWCC4E0FRRRRP
            c0t50014EE2B69D2D68d0  DEGRADED     0     0    20  too many errors      4 TB           WDC WD40EFRX-68W   S:0 H:0 T:0          WDWCC4E3AN2Y99

errors: No known data errors

Running zpool clear recovers the pool:

[email protected]:~# zpool clear
[email protected]:~# zpool status    
  pool: rpool
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          c1t0d0    ONLINE       0     0     0

errors: No known data errors

  pool: tank
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://illumos.org/msg/ZFS-8000-9P
  scan: none requested
config:

        NAME                       STATE     READ WRITE CKSUM
        tank                       ONLINE       0     0     0
          raidz1-0                 ONLINE       0     0     0
            c0t50014EE20BF0750Dd0  ONLINE       0     0     2
            c0t50014EE263348A3Ed0  ONLINE       0     0     0
            c0t50014EE2B69D2D68d0  ONLINE       0     0     0

Fix: /var/lib/docker/aufs/diff is consuming entire drive

Some of my docker containers were complaining that they didn’t have enough drive space. This looked odd – so i logged in to the host and checked around:

[email protected]:/$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 2.0G 0 2.0G 0% /dev
tmpfs 394M 12M 382M 3% /run
/dev/sda1 26G 25G 0 100% /
tmpfs 2.0G 2.0M 2.0G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 2.0G 0 2.0G 0% /sys/fs/cgroup
//fs.cynexia.net/largeappdata 2.7T 285G 2.4T 11% /mnt/largeappdata
//fs.cynexia.net/video 5.4T 3.0T 2.4T 56% /mnt/video
//fs.cynexia.net/appdata 2.5T 52G 2.4T 3% /mnt/appdata

All space used up. Huh. Wonder why? I did a quick check to see what’s using most space:

[email protected]:/$ sudo du -xh / | grep '[0-9\.]\+G'
8.0K /var/lib/docker/aufs/diff/1dba1b90260105df03d0147c535c104cca0dd24fcc9273f0bc27b725c7cc676f/usr/local/crashplan/jre/lib/locale/zh.GBK/LC_MESSAGES
12K /var/lib/docker/aufs/diff/1dba1b90260105df03d0147c535c104cca0dd24fcc9273f0bc27b725c7cc676f/usr/local/crashplan/jre/lib/locale/zh.GBK
19G /var/lib/docker/aufs/diff
19G /var/lib/docker/aufs
2.2G /var/lib/docker/containers/8c5725f63f681e012fcc479e78133f31ab1c760b7d8d4e0a7e150d213face41f
2.3G /var/lib/docker/containers
21G /var/lib/docker
21G /var/lib
22G /var
25G /

clearly /var/lib/docker/aufs/diff is what’s causing it. Let’s clean that up:

[email protected]:/$ docker rmi $(docker images -aq --filter=dangling=true)
Untagged: mnbf9rca/[email protected]:9846b7570b5ba6d686be21623446cec8abd9db04cf55a39ce45cabfaa0d63f9f
Deleted: sha256:011bf974552570c536f8f98c73e0ed7d09ef9e2bfcbc7b3f3e02e19682b7480e
Deleted: sha256:a0637dd0588be6aee9f4655260176e6da802fcd92347cdf789ae84f3503322c3
Deleted: sha256:6e21a0999ad14a1cc0ccc8e31611b137793e3614338e01f920e13bfeb4128fdc
Deleted: sha256:b98c7813439119c3d2f859060fe11bf10151f69587f850a48448cae0fa4d9305
Untagged: mnbf9rca/[email protected]:ad493202d196dfae418769428ba6dea4d576ce1adec7ebe90837d0b965fe9b42
Deleted: sha256:b8df5a1ffa1eedd7be03d4a2a37549bf81699cc6fa1586c1d3510d90d4e9e562
...
Deleted: sha256:07c09e3cb65b3cec786933f882a08d5b0a34cd94f6922ada0d6f0cf779482ee0

Let’s check now…

[email protected]:/$ sudo du -xh / | grep '[0-9\.]\+G'
8.0K /var/lib/docker/aufs/diff/1dba1b90260105df03d0147c535c104cca0dd24fcc9273f0bc27b725c7cc676f/usr/local/crashplan/jre/lib/locale/zh.GBK/LC_MESSAGES
12K /var/lib/docker/aufs/diff/1dba1b90260105df03d0147c535c104cca0dd24fcc9273f0bc27b725c7cc676f/usr/local/crashplan/jre/lib/locale/zh.GBK
4.8G /var/lib/docker/aufs/diff
4.8G /var/lib/docker/aufs
2.2G /var/lib/docker/containers/8c5725f63f681e012fcc479e78133f31ab1c760b7d8d4e0a7e150d213face41f
2.3G /var/lib/docker/containers
7.4G /var/lib/docker
7.6G /var/lib
8.3G /var
11G /

much better! It turns out there are a few great cleanup agents e.g. docker-gc-cron which will do the job for me.

Fix: Flashing an M1015 – Error code = 64 Failed to flash the image. Please retry recovery

I purchased an IBM M1015 to use as an HBA in my server. As part of that, I wanted to flash it with the IT firmware, however, I was getting errors, even when I used the original IBM firmware from their website:

C:\> megarec -cleanflash 0 M1000FW.ROM

MegaRAID HWR Controller Recovery tool. Version 01.01-004 February 05, 2010
Copyright (c) 2006-2008 LSI Corp.
Supports 1079 controller and its Successors


Erasing Flash Chip (8MB)....
 Completed: 100%
Flashing Image. Please wait...


Currently flashing component = BIOS
Programming Flash....
 Completed: 100%
Verifying the Flashed Data...


Currently flashing component = HIIM
Programming Flash....
 Completed: 100%
Verifying the Flashed Data...


Currently flashing component = APP
Error in downloading the image.
Error code = 64
Failed to flash the image. Please retry recovery

I’d never seen an M1015 before, at least not close up. Closer inspection of the card, however, revealed a code: FRU 46C8927. I know FRU” means “Field Replaceable Unit” i.e. something you can order as a replacement part. So I googled that code, and discovered that this was an IBM M5015, not an M1015. The M5015 cannot be used in IT mode, so I had to send it back.

Fix: Emby Docker fails to start when config on mounted share – SQLite “database is locked”

I have a clean VM running Ubuntu 16.04 on VWare ESXi 6.5. I have a CIFS share mounted at /mnt/appdata with the noperm flag. The share is writeable.

I installed Docker using the instructions here: https://hub.docker.c…mby/embyserver/

docker run -it --rm -v /usr/local/bin:/target \
     -e "APP_USER=robert" \
     -e "APP_CONFIG=/mnt/appdata/emby" \
     emby/embyserver instl

then

docker run -it --rm -v /etc/systemd/system:/target \
    emby/embyserver instl services

the next command, sudo systemctl enable emby-server.service, didnt work. Instead I had to do:

sudo systemctl enable [email protected]

then I ran emby-server and configured it with a path /mnt/video (Also a CIFS share mounted on my local machine). However, Emby doesnt work – and i see an error in the attached log (“svc.txt”):

?2016-12-04T09:49:04.572948238Z Error, Main, UnhandledException
52016-12-04T09:49:04.573027012Z  *** Error Report ***
42016-12-04T09:49:04.573039000Z  Version: 3.0.8500.0
2016-12-04T09:49:04.573049078Z  Command line: /usr/lib/emby-server/bin/MediaBrowser.Server.Mono.exe -programdata /config -ffmpeg /bin/ffmpeg -ffprobe /bin/ffprobe -restartpath /usr/lib/emby-server/restart.sh
@2016-12-04T09:49:04.573081031Z  Operating system: Unix 4.4.0.31
32016-12-04T09:49:04.573090300Z  Processor count: 4
02016-12-04T09:49:04.573097909Z  64-Bit OS: True
52016-12-04T09:49:04.573105143Z  64-Bit Process: True
;2016-12-04T09:49:04.573112588Z  Program data path: /config
b2016-12-04T09:49:04.573119889Z  Mono: 4.6.2 (Stable 4.6.2.7/08fd525 Mon Nov 21 15:56:40 UTC 2016)
h2016-12-04T09:49:04.573127634Z  Application Path: /usr/lib/emby-server/bin/MediaBrowser.Server.Mono.exe
=2016-12-04T09:49:04.573135097Z  One or more errors occurred.
:2016-12-04T09:49:04.573142348Z  System.AggregateException
2016-12-04T09:49:04.573151009Z    at System.Threading.Tasks.Task.WaitAll (System.Threading.Tasks.Task[] tasks, System.Int32 millisecondsTimeout, System.Threading.CancellationToken cancellationToken) [0x00242] in <8f2c484307284b51944a1a13a14c0266>:0 
2016-12-04T09:49:04.573161976Z    at System.Threading.Tasks.Task.WaitAll (System.Threading.Tasks.Task[] tasks, System.Int32 millisecondsTimeout) [0x00000] in <8f2c484307284b51944a1a13a14c0266>:0 
2016-12-04T09:49:04.573172331Z    at System.Threading.Tasks.Task.WaitAll (System.Threading.Tasks.Task[] tasks) [0x00000] in <8f2c484307284b51944a1a13a14c0266>:0 
>2016-12-04T09:49:04.573181859Z    at MediaBrowser.Server.Mono.MainClass.RunApplication (MediaBrowser.Server.Implementations.ServerApplicationPaths appPaths, MediaBrowser.Model.Logging.ILogManager logManager, MediaBrowser.Server.Startup.Common.StartupOptions options) [0x000cf] in <8385af0cf454438f8df15fa62f41afa4>:0 
2016-12-04T09:49:04.573191220Z    at MediaBrowser.Server.Mono.MainClass.Main (System.String[] args) [0x0008a] in <8385af0cf454438f8df15fa62f41afa4>:0 
S2016-12-04T09:49:04.573199399Z  InnerException: System.Data.SQLite.SQLiteException
32016-12-04T09:49:04.573206751Z  database is locked
32016-12-04T09:49:04.573213834Z  database is locked

I tried running the container directly:

docker run -d --name="EmbyServer" \
      --net="host" \
      -e TZ="UTC" \
      -e HOST_OS="ubuntu" \
      -e "TCP_PORT_8096"="8096" \
      -v "/mnt/appdata/emby/":"/config":rw \
      emby/embyserver

but i get the same error (“just run.txt”). I checked, and the /mnt/appdata/emby folder is being created:

[email protected]:~$ ls /mnt/appdata/emby
abc config data localization logs
[email protected]:~$ du -sh /mnt/appdata/emby
3.4M /mnt/appdata/emby

so clearly the share is writeable from within the container. If I run the container without using the mapped volume for the config:

docker run -d --name="EmbyServer" \
      --net="host" \
      -e TZ="UTC" \
      -e HOST_OS="ubuntu" \
      -e "TCP_PORT_8096"="8096" \
      emby/embyserver

it’s reachable at http://host:8096 and works fine (“no map.txt”) – but obviously the configuration isn’t persistent.

It turns out that the root of the problem is the way that CIFS handles byte-range locking, which is incompatible with SQLite. One way to fix this is to add the nobrl parameter to the mount, e.g.:

//fs.cynexia.net/appdata /mnt/appdata cifs iocharset=utf8,credentials=/root/.smbcredentials,nobrl,dir_mode=0775,nofail,gid=10,noperm 0 0

HP Gen8 Microserver error “Embedded media manager failed initialization” – how to get HPQLOCFG

During the process of installing VMWare on to my Gen8 Microserver, I had trouble writing data to the internal SD card – in fact, I couldn’t even see it. Looking in the ILO Event Logs I saw this:

Embedded Flash/SD-CARD: Embedded media manager failed initialization.

googling this didn’t get me much – just forum posts with people complaining about it, but then i found this HPE Customer Advisory, which lists out the steps needed to reset the error. Basically:

  1. create an XML file with the following content:
    <!-- RIBCL Sample Script for HP Lights-Out Products --> 
    <!--Copyright (c) 2016 Hewlett-Packard Enterprise Development Company,L.P. --> 
    
    <!-- Description: This is a sample XML script to force format ll --> 
    <!-- the iLO partitions. --> 
    <!-- iLO resets automatically for this operation to take effect --> 
    
    <!-- Warning: This command erases all data on the partition(s) --> 
    <!-- External providers will need to be re-configured if --> 
    <!-- partition is formatted --> 
    
    <!-- Input: VALUE tag: all - format all available partitions --> 
    
    <!-- NOTE:You will need to replace the USER_LOGIN and PASSWORD values --> 
    <!-- with values that are appropriate for your environment --> 
    
    <!-- See "HP Integrated Lights-Out Management Processor Scripting --> 
    <!-- and Command Line Resource Guide" for more information on --> 
    <!-- scripting and the syntax of the RIBCL XML --> 
    
    <!-- Firmware support information for this script: --> 
    <!-- iLO 4 - Version 2.42 or later. --> 
    <!-- iLO 3 - None. --> 
    <!-- iLO 2 - None. -->
    
    <RIBCL VERSION="2.0"> 
    <LOGIN USER_LOGIN="Administrator" PASSWORD=""> 
    <RIB_INFO MODE="write"> 
    <FORCE_FORMAT VALUE="all" /> 
    </RIB_INFO> 
    </LOGIN> 
    </RIBCL>
  2. run that file against the server using HPQLOCFG.exe:
    hpqlocfg -s <server IP> -l c:\hpqcfg.log -f c:\Force_Format.xml -v -t user=Administrator,password=<password>
  3. some other steps to reinstall intelligent provisioning, if you use it.

All well and good – but where do you get HPQLOCFG from? If you follow the link in the article, it refuses to install because i don’t have the full PSP installed. So how can I apply the change?

Well, in my case, I installed VMWare to an internal USB stick and then ran the command from there – you could even do this with all of your other existing drives removed so that they don’t get erased. You could then restart the process. Problem solved!

error processing package apt-show-versions on Ubuntu 14.04 or Ubuntu 16.04

When installing Webmin, I’ve sometimes come across an error installing a dependency package, apt-show-versions:

Setting up apt-show-versions (0.22.7) ...
** initializing cache. This may take a while **
FATAL -> Failed to fork.
dpkg: error processing package apt-show-versions (--configure):
subprocess installed post-installation script returned error exit status 100
dpkg: dependency problems prevent configuration of webmin:FATAL -> Failed to fork.

This is caused by the fact that apt-show-versions can’t read compressed index files. Thankfully, the solution is quite simple:

First, we need to tell APT not to compress the index. To do this we create an entry in a file called /etc/apt/apt.conf.d/02compress-indexes:

sudo nano /etc/apt/apt.conf.d/02compress-indexes

If the file is empty (mine was), simply put this line in it:

Acquire::GzipIndexes "false";

if the file has some text, check if this parameter is in there as “true” and if so change to false. If it’s missing, just add it.

Then, we need to delete the existing indexes and re-download them:

sudo rm /var/lib/dpkg/info/apt-show*

followed by

sudo apt-get update

Finally, we just need to complete the installation:

sudo apt-get -f install webmin

And job done.

How to ensure you can revert changes to function apps

As I’ve been playing around with Azure Functions I’ve slowly outgrown the web-based editor. It’s not that it’s not useful, it’s just that I miss intellisense (I’ll come back to this in a later post), and I accidentally deployed a change which broke one of my functions. I’d made dozens of tiny changes, but I simply could not figure out which one it was. Not having a version history, I was kinda screwed.

I had seen the “Configure Continuous Integration” option before, but never really looked at it. I keep my source code in private GitHub repos, so it was relatively trivial to set up a new repo tied to this function app. After reading the setup instructions, however, I was a little confused by what exactly to do to put my existing functions in to repo, but it was actually much simpler than I thought. It turns out one of the best features is the ability to roll back to a previous commit with a single click:

azure-roll-back-ci-deployment

First, I created a new private GitHub repo and cloned it to my local machine. I chose not to use branching – but I guess you could map different function apps to different branches to support a separation between “dev”, “test”, “production” etc. In the root of my repo, I created a folder for each of the functions I wished to deploy, named exactly the same as the existing functions (I assume they’re not case sensitive but I kept to the same case).

Then, I needed to put the actual code in there. Under the visual editor for each of the functions is a “view files” link: view-files. Clicking this, I was able to see the function.json and run.csx files within each function. I simply cut and pasted the code from there to a file of the same name in the relevant folder.

Next, I needed to find the host.json file. That’s a bit more tricky. In the end, I figured the easiest way was to use the Dev Console. Navigate to Function App Settings, and select “Open dev console”. After a few seconds, the dev console appears:

azure-dev-console

This appears to be a Linux shell. You should start in the d:\home\site\wwwroot folder – that’s where host.json lives. Just type cat host.json to see the contents. It turns out mine was empty (just an open and close curly brace):

D:\home\site\wwwroot

> ls
D:\home\site\wwwroot
QueueTriggerCSharp1
fetchDepartureBoardOrchestrator
host.json
postToTwitter
> cat host.json
D:\home\site\wwwroot
{}
>

I created this in the root of my repo, then committed the changes and pushed them back to GitHub. Within a few seconds, I was able to see the change by clicking “Configure continuous integrations” in Function App Settings. My changes deployed immediately. And when I next screw up, because I’m forced to push changes via GIT, I know I’ll be able to roll back to a known-good configuration.