Fixing VMware ESXi 6.5 Upgrade Issues

A few days ago VMWare released vSphere 6.5, and in it came a raft of improvements to vCenter and other fluffy features that everyone loves. I’ve been running a small dual host setup at home for a while now, and while not in anyway a “real production environment” its been the host of a lot of household services, most notibly (for my other half) a Plex server. Unfortunatly everything didn’t go to plan, my lab host (Obelisk) took the update without issue being managed by the now embedded update manager in vCenter, my other host (Anshar) didn’t take to it at all.

The error I encountered was “Cannot run upgrade script on host”, a lovely generic error which had me scrabbling around inside the ESXi logs to find the solution: It turns out that one time or another I put the USB stick with the ESXi install into my Mac, which in turn sprayed a collection of “.Spotlight-V100”, “.fseventd’, and various other Mac specific files in the local datastore and various critical folders in the filesystem. Thankfully the host still booted so I was able to resolve it.

  • Enable SSH on your ESXi host
  • Run find -name ".Spotlight-V100" -type d -exec rm -rf {} \;
  • Run find -name ".Trashes" -type d -exec rm -rf {} \;
  • Run find -name ".fseventd" -type d -exec rm -rf {} \;
  • Re-run the Upgrade

Now this should of all worked, and the logs indiciated it wasn’t failing on any silly parts, but again I was hit with the “Unable to run update script” error. Further digging was required.

It turns out that VUM writes out a very detailed log of exactly what it is doing into /var/log/vua.log and this should be your first port of call for debugging any issues. In my log it indiciated it was expecting that 6.5 was already installed and when it was comparing the list of VIBs to update it was extremely confused why everything was out of date. It seems that ESXi depends on a system called “locker” to store all its package information and one of the first VIBs that is updated includes the updated locker files. Somehow I had to revert these files back to the 6.0 files. VMWare itself seems to recommend copying over the files from a working host, which wasn’t possible in my case as the other host was already on 6.5. So I held my breath and did the following:

  • Remove tools-light using esxcli software via remove -n tools-light
  • Install the 6.0 version using update profile esxcli software profile update -p ESXi-6.0.0-20161004001-standard -d

…and thankfully it seemed to complete without issue. Running VUM from then on updated to 6.5 without issue. Each situation is different, but looking at the logs can really give you some insight into what is going on, or you can run the installer ISO directly and watch it spit out the specific issue its hitting. But, here are some reference links for working out what is wrong:

Upgrading a Google GB-7007 / U1 firmware

For many years Google has produced a range of “Search Appliances”, the idea is that you have a miniture Google search engine within your business that is able to index all your internal files and make them available in a nice interface that everyone is use to. Over the years they’ve produced many iterations of the product with the most recent ones being essentially rebadged Dell hardware.

The “current” generation (and I use that loosely) is a rebadged Dell R710 with the bare minimum fitting options:

  • 2 x Intel Xeon E5620
  • 48GB ECC RAM
  • PERC H700 with 8 x 2.5” SAS drives
  • iDRAC Express

Extra niceties have been left out to cut costs, so no internal SD card reader, no CD/DVD drive. The hardware was generally given free with a license so you can find these devices popping up on the market every so often when people’s licenses expire and they don’t want the hardware filling up their racks.

Google actually publishes some quick guidelines on repurposing the hardware after the end of the license which should be good enough for the vast majority of people, but the BIOS is out of date and its still tagged up with the Google Search Appliance boot screen, finding updates for the BIOS is near impossible. If you want to run ESXi with all the fluffy bits it can be a bit of a pain.

But, this is a R710, can’t we just use Dell’s version?

Actually, yes you can. On the Dell R710 support page, grab the latest BIOS package in the “Non-Packaged” format and put it on a bootable USB stick with FreeDOS on it. Then just follow these steps:

  1. Pop the lid off your GB-7007 and disable the BIOS password (check on the back of the lid for details)
  2. Boot the system and enter the BIOS, change the boot order to USB first.
  3. Reboot with the USB stick in one of the ports, wait until you hit the FreeDOS prompt
  4. From the prompt run R710-060400C.exe /forcetype

OK, that last item may look a little scary. The update process has a check in it to see if the system you’re trying to flash the BIOS to is the target system, this appliance will identify as a Google Search Appliance so will always fail this check, even though the hardware is identical to the R710 system. The /forcetype option disables this check and forces the BIOS to install.

After a minute or two your system will reboot and you’ll get the normal Dell boot logo and options, congratulations, your Google Search Appliance is now a Dell R710 sporting a lovely yellow case.

Installing VMware vSphere CLI 6.0 on Debian

In a attempt to try and improve monitoring on my ESX system, i’ve started to poke around with a few Munin plugins which look interesting. The biggest road block was the requirement to have the VMware vSphere CLI installed. Unfortunately its doesn’t seem to be a simple task of install and forget, as like most commercial software companies they’re yet to sign up to the RPM/dpkg route for distributing their software.

Thankfully, after a while Googling and a few experimentations i’ve found the following magic bullet:

# apt-get install libxml-libxml-perl  perl-doc  libssl-dev e2fsprogs libarchive-zip-perl libcrypt-ssleay-perl libclass-methodmaker-perl libdata-dump-perl libsoap-lite-perl libdatetime-format-iso8601-perl

# echo "ubuntu" > /etc/tmp-release
# export httpproxy=
# export ftp_proxy=

This works for Debian 8 (Jessie), and its been reported that it works for Debian 7 as well.

Ps. VMware, no, /usr/bin isn’t a sane default for installing your software into!

Upgrading the firmware on a HP ProCurve 2824

As it turns out, the “new” switch i’ve acquired was very out of date in regards to firmware. A few bugs have been fixed and some silly Java problems have been resolved on the Web UI, so its worth taking the time to update it.

First of all, check what firmware and boot ROM your switch is using using the show flash command on the CLI:

sw3# show flash
Image           Size(Bytes)   Date   Version
-----           ----------  -------- -------
Primary Image   : 3003952   12/21/05 I.08.87 
Secondary Image : 3003952   12/21/05 I.08.87 
Boot Rom Version: I.08.07
Current Boot    : Primary

All firmware versions I.08.07 onwards need the I.08.07 Boot ROM, and you’ll need to flash up to this version first. Thankfully HP provide that specific version on the website to download. Follow the exact same steps as below for I.08.07, then repeat for whatever version you’re upgrading to.

To get the firmware to the switch we use a TFTP server, a little out of scope for this article but you can find a lot of free and open source servers for this, i’m using my local pfSense gateway’s TFTP server for this, i’ve uploaded the I_10_107.swi firmware file to the TFTP and from the switch’s CLI I run the following:

sw3# copy tftp flash I_10_107.swi secondary
The Secondary OS Image will be deleted, continue [y/n]?  y

After a few seconds you’ll be back at the prompt. To check everything has worked as expected check the show flash command:

sw3# show flash
Image           Size(Bytes)   Date   Version
-----           ----------  -------- -------
Primary Image   : 3003952   12/21/05 I.08.87 
Secondary Image : 3428242   08/24/15 I.10.107
Boot Rom Version: I.08.07
Current Boot    : Primary

All you need to do is reboot the switch with the new firmware, check everything works, then flash over the image to the primary flash storage:

sw3# boot system flash secondary
Device will be rebooted, do you want to continue [y/n]?  

Once the system is up and working, use show flash again to check its booted to the secondary area.

sw3# show flash
Image           Size(Bytes)   Date   Version
-----           ----------  -------- -------
Primary Image   : 3003952   12/21/05 I.08.87 
Secondary Image : 3428242   08/24/15 I.10.107
Boot Rom Version: I.08.07
Current Boot    : Secondary

And if everything is working as expected, flash the firmware over to the primary image exactly the same way as before

sw3# copy tftp flash I_10_107.swi primary
The Primary OS Image will be deleted, continue [y/n]?  y

For the final (optional) step, switch back to the primary image:

sw3# boot system flash primary
Device will be rebooted, do you want to continue [y/n]?  

And you’re all done.

Resetting a HP ProCurve 2824

Another day, another switch. This time i’ve bought a second HP ProCurve 2824, they’re solid and reliable and with a quick replacement to the fans they’re damn near silent. Throw in the full Layer 2 feature set and a basic Layer 3 (named by HP as L3-lite) its a workhorse of a switch suited for small environment or edge switches on larger networks.

Main problem is that most of these ex-corporate switches come pre-configured with some setup you neither know or care about, thankfully resetting this switch is amazingly easy.

  1. With the power on, poke the Reset and Clear buttons at the same time with whatever pokey devices you can find.
  2. Release the Reset button
  3. Wait until the Test LED starts blinking
  4. Release the Clear button

Within a few seconds you’ll have a factory default switch, grab your straight through serial cable and have a play with the CLI.

Its worth taking the time to get the firmware up to date, remember to check the change logs and the documentation as some interim steps may be needed to bring it up to the current version. The software for this switch has progressed quite a bit, its still the same horrible Java based Web UI but little features introduced here and there really help out.

RHCSA Lab - Day 1

As part of my 2016 professional objectives i’ve got attaining a RCHSA high on the list, i’ve been meaning to get this done for the last few years but never got around to it due to the cost. Thankfully this year my manager has signed off on it, along with my Puppet Certified Professional, so I can’t grumble. The business is going through big drive of getting certificates for people in our managed service department and i’m being took along for the ride.

First task on the list is getting a RHEL7 system installed and ready to be twiddled with. RedHat offers a 30 day trial of Red Hat Enterprise Linux, so if you don’t want to use CentOS7 (which is more than acceptable as RHCSA doesn’t delve deep in the RedHat proprietary technologies) you can grab a recent DVD installer from their website without much hassle. To avoid the question of entitlement usage of the business’ account i’ve setup my own personal account, which will be useful when I need to apply my certificate somewhere.

As I won’t be using a physical machine for my base system i’ll be making use of my ESXi system to create the base VM, hopefully when it comes to creating the KVM guests it’ll actually work.

I’m working to the “RHCSA & RHCE Training and Exam Preparation Guide” from Asghar Ghori. Its recommended setup is a single host system with 40GB storage and 4GB of RAM, which should be more than enough for the lab exercises outlined in the book. A quick run through the standard Red Hat installer with some minor custom options and we’ve got a working system up and ready.

With that, i’ll call it a “day”. I’m not going to cover the entire RHCSA certification as i’m already experienced with working with RHEL on a day to day basis, i’ll be mostly revising the odd parts I usually don’t touch. ACLs, SELinux, KVM, and AutoFS… Joy.

Broken OpenVPN IPv4 routing with iOS9 and IPv6

After finally taking the time to get tunnelled IPv6 into the homelab via Hurricane Electric I thought it would be nice to extend out the routing to my VPN clients, after all they connect in an appear like local devices to the rest of the network, why not?

What I thought was a simple configuration change has been puzzling me for the last few days, what I didn’t realise is that after switching on IPv6 in the OpenVPN server all IPv4 traffic hasn’t been correctly routed via the VPN. It turns out a small issue in either the OpenVPN client, iOS or something in-between has broke the configuration, but thankfully it only requires a small fix.

The solution finally came from the OpenVPN bug tracker, ticket 614:

IPv4 routing on iOS 9 is broken if IPv6 is enabled inside the tunnel. The tests were done with tun-ipv6 and redirect-gateway activated and all the IPv4 traffic bypasses VPN gateway, while IPv6 works fine. Works as expected without tun-ipv6. Doesn’t work with tun-ipv6 but no IPv6 address.

Exactly what I was experiencing. Thankfully fkooman came across an entry in the FAQ which mentioned an undocumented option called redirect-gateway ipv6. Injecting this option in the OpenVPN server resolves the routing issues.

On pfSense you just need to add push "redirect-gateway ipv6" into the “Advanced Options” section of the OpenVPN server configuration

Miniflux - Easy, self-hosted RSS

Since the demise of Google Reader a lot of new tools and sites have tried to take over the mantle as the de-facto RSS reader for the masses. The biggest (to my understanding) is Feedly which used the shutdown to push their product, unfortunately over time the investment in the “free” Feedly seems to have slowly slipped away in favour of their Pro offering, which isn’t surprising for any company wanting to turn a profit. This issue seems to be replicated across all the hosted providers who are trying to make a profit out of a service Google had supplied for free, and old stalwarts like me still struggle with the idea of paying $3-$7 a month for aggregating RSS.

With the aim to take matters into my own hands I decided to hunt around for an open source solution that I could self host, I’m already paying for a dedicated server so why not use that to host it?

Thankfully, it seems that a lot of other people had the same issue and a large list of open source solutions had popped up. The interesting one seems to involve the “Fever” API, which is a simple method of exporting these feed readers out to mobile and desktop readers without any quirky reader dependent applications, my favourite RSS application Reeder supported this API so really helped with the decision of what solution I needed.

Miniflux seems to be the perfect balance between function and simplicity, It can be installed damn near anywhere as it only uses PHP and a few standard modules, in addition it supports importing and exporting OPML files and the Fever API to allow my desktop and mobile client to keep in sync with no extra work.

Installation couldn’t be simpler. Checkout the repo, move to a folder of your choice and throw in a Nginx configuration:

server {
  listen 80;
  root /home/user/www/;
  index index.php;

  index index.php index.html index.htm;

  # the following line is responsible for clean URLs
  try_files $uri $uri/ /index.php?$args;

  # serve static files directly
  location ~* ^.+.(jpg|jpeg|gif|css|png|js|ico|html|xml|txt)$ {
    access_log        off;
    expires           max;

  location ^~ /data/ {
    deny all;

  location ~ \.php$ {
    # Security: must set cgi.fixpathinfo to 0 in php.ini!
    fastcgi_split_path_info ^(.+\.php)(/.+)$;
    fastcgi_index index.php;
    fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
    fastcgi_param PATH_INFO $fastcgi_path_info;
    include /etc/nginx/fastcgi_params;

Done. Your Fever API endpoint is available at /fever/ and the username and password can be configured in the UI for the application. Everything is stored in Sqlite so easy to backup and move around.

If you’re looking for something thats simple and works, i’d recommend giving it a try!

Homelab Puppet

It might sound like using a nuclear weapon to swat a fly, but when you’re working with Puppet in your day job it can be really useful to have a test bench to fiddle with new ideas at home. After all thats what homelabs exist right?

Puppet Enterprise comes with a 10 free nodes license as stock, for a small homelab its perfect for managing that configuration that applies to all systems, DNS, routing, SSH keys, you get the idea. Also, as my day job runs Puppet Open Source its great to test out the commercial version and get to know it before the inevitable upgrade where a lot more is at stake.

For my installation I went with CentOS 7 and a single node installation, I use Code Manager to automatically deploy my configuration from a git repository I have stored in Gogs, which if you’ve not seen already I highly suggest checking out. Agents are mostly Debian 8 with a sprinkle of CentOS7 and RHEL7 for my learning needs.

Heres some handy hints from my Puppet usage, both in work and home:

Use Puppet Enterprise

10 free nodes! Take advantage of it if you can. While open source Puppet is great, the installer and Console makes Enterprise worth the $100/year/node just for the saved time of fiddling with config.

Use Puppet Forge

It might seem obvious, but a lot of places suffer from NIH when it comes to Puppet and decide to re-write from scratch instead of expanding on an already open source module. While the vast majority of modules I use are public on the Forge I have slipped into a habit of quickly hacking together a profile for an application rather than write a full module to share. In general using the Forge will save you time, so take advantage of it.

Use distro packages

While you can grab x .tar.gz from y website, extract, run, copy files and such, save yourself the pain and use distribution packages whenever possible. Not only does it make for much easier installation and management it saves you a lot of time when it comes to upgrading.

Don’t aim for 100% coverage

Trying to configure every part of a system with Puppet will burn you out quickly, cover the required elements and tick them off first. In my opinion Puppet shouldn’t be handling assigning IPs to devices or managing file systems, but setting DNS, firewall rules and package repositories are right up its street.

Things break, so check your config first

The --noop option is your friend. Make use of it to check that your new shiny config won’t blow a hole in the side of your system due to a dodgy Heira YAML file. In Puppet Enterprise you can even run this from the console.

If you have the infrastructure to spare, get a [Jenkins]() system setup and lint/test that config before it hits a live system. If you want to get really fancy, have Jenkins auto push to your production branch after testing for that continuous deployment feeling.

Puppet Enterprise has Application Orchestration!

While its a recent development I highly suggest you read the documentation and have a play with this. Hand holding multi system deployments is no longer needed!

I’m sure people have a hundred and one other things to say, but i’ll leave that for the experts…

The strange case of an OCZ Petrol SSD

A few years ago I took the risk and installed an SSD into my father’s PC, At the time his 300GB Seagate drive had failed in his stock Dell PC, just a touch outside of the warranty period, and in a attempt to keep the costs low I ended up picking a cheap SSD for him. The cheapest at the time was a OCZ Petrol 64GB. Only after a year or so did the horror stories about OCZ SSDs start appearing and a lot of people experienced failures after just a few weeks to months. My father’s SSD carried on chugging for a good few years, and died just a few weeks ago, not bad for a cursed brand…

The strange part was how it failed. Usually these SSDs just stopped working in every way and would appear to the BIOS. In this instance it was still there, it still booted, and it got about half way through the boot sequence for Windows XP before dying with IO errors BSOD. At the time I wrote off the disk as a complete failure, trying to plug it into another PC didn’t work, USB to SATA connector didn’t work, even when I did manage to get recognising on a system it said around 95% of the blocks were bad on the device. New SSD was purchased and this one was forgot about on my desk until I picked up a new USB 3.0 to SATA cable from Amazon today.

On a whim I decided to plug it into the drive, then into my Mac. OSX by default doesn’t write to NTFS but can read it, and it turns out it identifies something very weird in this device. When operated in read-only mode with no writes attempted to the device it works perfectly, this also confirms what I was seeing in the PC in that the boot loader and initial stages of Windows XP worked fine, but when it came to actually check the disk and do a write it caused the device to lock solid.

So, if you have a OCZ Petrol that you need to recover data from, try getting a device that supports write blocking and give it a go.