How to write MCollective Agents to run actions in background

The Marionette Collective, or MCollective in short, is a cutting-edge tech for running system administration tasks and jobs in parallel against a cluster of servers. When the number of servers you have to manage grows, the task of managing them, including keeping the OS and packages installed on them updated, becomes without a doubt a nightmare. MCollective helps you drag yourself out of that nightmare and into a jolly dream, where you are the king, and at your disposal is a powerful tool, by merely wielding which you can control all of your servers in one go. I’ll probably do a bad job of painting MCollective in good light, so I’d recommend you read all about it on MCollective’s main website.

Like every good tool worth its while, MCollective gives you the power to extend it by writing what it calls agents to run custom code on your servers to perform any kind of job. You can read about agents here. MCollective is all in Ruby, so if you know Ruby, which is a pretty little programming language by the way, you can take full advantage of MCollective. Incidentally, a part of my day job revolves around writing MCollective agents to automate all sorts of jobs you can think of performing on a server.

For a while I have been perplexed at the lack of support for being able to run agents in the background. Not every job takes milliseconds to finish itself. Most average-level jobs, in terms of what they have to do, take anywhere from seconds to, even longer, minutes. And since I write an API which uses MCollective agents to execute jobs, I often run into the problem of having the API block while the agent is taking its sweet time to run. As far as I’ve looked, I haven’t found support within MCollective for asynchronously running actions.

So, I got down to thinking, and came up with a solution, which you could call a bit of a hack. But insofar as my experience testing it has been, I’ve yet to face any issues with it.

I’ve an example hosted on GitHub. It’s very straightforward, even if crude. The code is self explanatory. At the heart of it is creating your actions in agents to fork and run as childs, without having the parent wait for the childs to reap, and having one extra action for each agent to fetch the status of the other actions. So, essentially, the approach has to be taken for every agent you have, but only for those actions which you wish to run asynchronously. With agents in place, all you’ll need to do is call the agents thus:

$ mco rpc bg run_bg -I node -j
{"status": 0, "result": "2434"}

and repeatedly fetch the status of the async action thus:

$ mco rpc bg status pid=2434 operation=run_bug -I node -j
{"status": 0, "result": ""}

It’s a solution that works. I’ve tested it over a month and a half now over many servers without any issues. I would like to see people play with it. The code is full of comments which explain how to do what. But if you have any questions, I’d love to entertain them

Mobile Safari Web Inspector on Safari on Mac.

If you tweak HTML and CSS and JavaScript on webpages, then you may appreciate how simple and convenient the plugin that goes by the name of Firebug for Firefox has made your life. I know it has for me. It is simply an indispensable piece of technology that I use every time I have to debug a CSS or HTML or JS issue on a webpage. Other browsers, notably Chrome and Safari, have built-in web inspectors which allow similar features.

I have of late been working on a lot of Mobile versions of websites, particularly those optimized for Mobile Safari and iOS devices using the fascinating jQuery Mobile framework. One of my biggest pet peeves while debugging issues on Mobile websites was not being able to debug the sites live on a mobile browser. More times than I’d care to acknowledge a particular mobile site, when viewed on a normal browser would look fine, but would have slight aberrations when viewed on a mobile browser. Those kind of issues are a real pain in the neck, because you are, more often than not, left to get sucked in an annoying cycle of wildly making changes in the HTML/CSS and refreshing the page on the mobile browser continuously in the hopes that the problem would fix itself.

However, today I bumped into a way to inspect and debug a mobile site on Mobile Safari by using Safari on a Mac computer (I am sure that it may work equally well for Safari on Windows, but I have not had reason nor occasion to try to find out). It not only works so well, but is also very straightforward to set up.

My testing environment includes an iPhone 4S running iOS 6 and a MacBook Pro running OS X Lion 10.7.5. Here are the steps you must follow in order to be able to inspect mobile web pages live via Safari on your Mac:

  • On your iOS device, turn on “Web Inspector” inside the Advanced settings for Safari. You’ll find these under the Settings app.
  • On the Advanced tab under Preferences for Safari on your Mac, you must check the “Show Develop menu in menu bar” setting.
  • Connect your iOS device to your Mac via the data cable.
  • You may now load on Mobile Safari the mobile webpage you want to inspect.
  • On Safari on Mac, open the Develop menu in the menu bar, select the name of your iOS device that will appear in the list, and click on the webpage shown on the pop-up that slides to the right.

The web inspector in Safari will kick in. You may explore the different tabs and functionality available there. The only downside is that no preview of the page is available on the inspector. Instead, the live page open on your Mobile Safari is the preview. You may go through the HTML source displayed inside the inspector, selecting the elements you want to inspect and notice that the corresponding elements are highlighted on the device. What’s more, you can also live edit the source as well as CSS, and they take effect almost immediately. How awesome is that!

Guide: Deploying web.py on IIS7 using PyISAPIe

I spent the last week travailing away, trying to painfully find a way to deploy a web.py based API on IIS7 using PyISAPIe. As frustrations had begun to mount up, I had nearly decided to give up. Being a die-hard Linux and Mac guy, I despise having to work on Windows. Here I was, not only forced to work on Windows, but to find a solution for a problem that left no leaves unturned in its effort to drive me crazy. As if someone decided all this misery wasn’t quite enough, I had to work with a remote desktop session in order to research, tweak, bang my head, and get things to work. Eventually, I cut through massive frustration and despair, managing to find a satisfactory solution. I almost danced in excitment and relief, letting out all sorts of expletives directed at Windows in general and IIS in particular.

To get back to the important question of deploying a web.py script on IIS7 using PyISAPIe, I will make it such that this guide will list down various steps I took, including snippets of relevant code I changed, to tame the beast. I can only hope that what is below will help a poor, miserable soul looking for help as I did (and found none).

I worked with PyISAPIe because I had successfully deployed multiple Django websites on IIS7 on it. The script in question was going to be a part of another Django website (though acting independently). It only made sense to use PyISAPIe for it as well.

First and foremost, I had to install the web.py module on the system. Having had trouble before with IIS with web.py installed through easy_install, I decided to be safe and installed it from source.. Getting web.py to work with PyISAPIe required a small hack (I notice I may make it sound as though it all came down to me in a dream, but in reality, it took me days to figure it out, and clearly after much anguish and pain). In the file Lib\site-packages\web\wsgi.py lies the following function:

def _is_dev_mode():
    # quick hack to check if the program is running in dev mode.
    if os.environ.has_key('SERVER_SOFTWARE') \
        or os.environ.has_key('PHP_FCGI_CHILDREN') \
        or 'fcgi' in sys.argv or 'fastcgi' in sys.argv \
        or 'mod_wsgi' in sys.argv:
            return False
    return True

In its pristine state, when web.py is imported from a source file through PyISAPIe, an exception is thrown. The exception, while I don’t have the exact message, is about it complaining about sys.argv not having an attribute argv, which reads fishy. Since the function _is_dev_mode() only checks whether web.py is being run in development mode, I thought I didn’t care about it since I wanted everything to run in production mode. I edited the function such that its body would be bypassed, while it returned a False boolean value. It looked like this (the important changes I made are highlighted):

def _is_dev_mode():
    return False
    # quick hack to check if the program is running in dev mode.
    if os.environ.has_key('SERVER_SOFTWARE') \
        or os.environ.has_key('PHP_FCGI_CHILDREN') \
        or 'fcgi' in sys.argv or 'fastcgi' in sys.argv \
        or 'mod_wsgi' in sys.argv:
            return False
    return True

This innocuous little addition did away with the exception.

Next up, I used default Hello World-esque example of web.py found on their site to test the deployment (of course, I went on to use my original API script, which was far too complex to trim down and fit into as an example). I called it code.py (I placed it inside the folder C:\websites\myproject). It looked like this:

  import web
  urls = (
      '/.*', 'hello',
      )
  class hello:
      def GET(self):
          return "Hello, world."
  application = web.application(urls, globals()).wsgifunc()

It was pretty simple. You have to pay particular attention on the call to web.application. I called the wsgifunc() to return a WSGI-compatible function to boot the application. I prefer WSGI.

I set up a website under IIS using the IIS Management Console. Since I was working on a 64-bit server edition of Windows and had chosen to use 32-bit version of Python and all modules, I made sure to enable 32-bit support for the application pool being used for the website. This was important.

I decided to keep the PyISAPIe folder inside the folder where code.py rested. This PyISAPIe folder contained, of import, the PyISAPIe.dll file, and the Http folder. Inside the Http folder, I placed the most important file of all: the Isapi.py. That file could be thought of as the starting point for each request that is made, what glues the Request to the proper Handler and code. I worked with the Examples\WSGI\Isapi.py available as part of PyISAPIe. I tweaked the file to look like this:

from Http.WSGI import RunWSGI
from Http import Env
#from md5 import md5
from hashlib import md5
import imp
import os
import sys
sys.path.append(r"C:\websites\myproject")
from code import application
ScriptHandlers = {
	"/api/": application,
}
def RunScript(Path):
  global ScriptHandlers
  try:
    # attempt to call an already-loaded request function.
    return ScriptHandlers[Path]()
  except KeyError:
    # uses the script path's md5 hash to ensure a unique
    # name - not the best way to do it, but it keeps
    # undesired characters out of the name that will
    # mess up the loading.
    Name = '__'+md5(Path).hexdigest().upper()
    ScriptHandlers[Path] = \
      imp.load_source(Name, Env.SCRIPT_TRANSLATED).Request
    return ScriptHandlers[Path]()
# URL prefixes to map to the roots of each application.
Apps = {
  "/api/" : lambda P: RunWSGI(application),
}
# The main request handler.
def Request():
  # Might be better to do some caching here?
  Name = Env.SCRIPT_NAME
  # Apps might be better off as a tuple-of-tuples,
  # but for the sake of representation I leave it
  # as a dict.
  for App, Handler in Apps.items():
    if Name.startswith(App):
      return Handler(Name)
  # Cause 500 error: there should be a 404 handler, eh?
  raise Exception, "Handler not found."

The important bits to note in the above code are the following:

  • I import application from my code module. I set the PATH to include the directory in which the file code.py is so that the import statement does not complain. (I’ve to admit that the idea of import application and feeding it into RunWSGI came to while I was in the loo.)
  • I defined a script handler which matches the URL prefix I want to associate with my web.py script. (In hindsight, this isn’t necessary, as the RunScript() is not being used in this example).
  • In the Apps dictionary, I again route the URL prefix to the lambda function which actually calls the `RunWSGI` function and feeds it application.
  • I also imported the md5 function from the hashlib module instead of the md5 module as originally defined in the file. This was because Python complained about md5 module being deprecated and suggested instead of use hashlib.

And that’s pretty much it. It worked. I couldn’t believe what I saw on the browser in front of me. I danced around my room (while hurling all kinds of expletives).

There’s a caveat though. If you have specific URLs in your web.py script, as I did in my API script, you will have to modify each of those URLs are add the /api/ prefix to them (or whatever URL prefix you set in the Isapi.py. Without that, web.py will not match any URLs in the file.

What a nightmare! I hope this guide serves to help others.

Thank you for reading. Good bye!

PS: If you want to avoid using PyISAPIe, there is a simpler way of deploying web.py on IIS. It is documented crudely over here.

Get all public interface IPs on a system using Python

     I recently came across a requirement in a project where I had to, in Python, programmatically extract all available public IPs on available interfaces on the machine the code would run. I looked around and settled with the following snippet of code that uses the built-in, standard socket Python module:

import socket
ip_list = [ip for ip in socket.gethostbyname_ex(socket.gethostname())[2] if not ip.startswith("127.")]

     While this piece of code does find a public IP listening on any of the available interfaces, its restriction lies in not being able to return all public IPs on interfaces: It gives back just one IP.

     This wasn’t clearly sufficient. I looked around again, and this time, found a third-party Python module called pynetinfo. This module could make possible working with different network device settings.

     I rearranged the code around pynetinfo and produced this:

def get_inet_ips():
  try:
    import netinfo
  except ImportError:
    return None
  else:
    inetIPs = []
    for interface in netinfo.list_active_devs():
      if not interface.startswith('lo'):
        ip = netinfo.get_ip(interface)
        inetIPs.append(ip)
    return inetIPs

     The code above loops through all available and active interfaces on the system, fetching and storing their IP in a simple datastructure. That got me all of the IPs available to a machine, excluding the loopback one, which the code was set to discard.

     But that wasn’t it. There was a slight problem. Not all the active interfaces on the system had public IPs. Some had private, local LAN IPs in the 192.168.0.0/16 and 10.0.0.0/8 subnets. The code above was returning all the IPs it could find, including public and private ones.

     I then found the netaddr third-party Python module which provided a Pythonic means of manipulating network addresses. I modified my code to use the netaddr module and got the following to boot with:

def get_inet_ips():
  try:
    import netinfo
    from netaddr import IPAddress, AddrFormatError
  except ImportError:
    return None
  else:
    inetIPs = []
    for interface in netinfo.list_active_devs():
      if not interface.startswith('lo'):
        ip = netinfo.get_ip(interface)
        try:
          ip_address = IPAddress(ip)
        except AddrFormatError:
          continue
        else:
          # If the IP is not private, use it.
          if not ip_address.is_private():
            inetIPs.append(ip)
    return inetIPs

     The netaddr.IPAddress.is_private() method in the code above determines whether the given IP is part of any of the defined private networks.

     Admittedly, there is much room for improvement in the code above. I can only hope that if it doesn’t help, then at the very least it serves as an interesting read.

Browsing on the BlackBerry simulator

     I am not a big fan of BlackBerry smart-phones. I realize that there are a lot of people who can’t seemingly exist without access to their emails virtually all the time, and for those lot, BlackBerry, with its prominent push email feature, is perhaps a better fit than any other smart-phone platforms out there. When it comes to me and my smart-phone usage, I would not go so far as to say that I can’t live without my phone. I can. By every measure, I consider myself a hardcore geek, perhaps more hardcore than most others, but I am by no means a gadget freak. While it will be unfair to say that I absolutely abhor typing on small (semi-) keyboards, I don’t also quite enjoy the experience. When it comes down to typing, I would much rather prefer a full-fledged keyboard. That is why to me a compact laptop is many times more important than a fully equipped smart-phone. (For the curious reader, I own a Nokia E72.)

     For a recent mobile website project that I worked on, I had to face a complaint from the client where the layout of certain pages on the site didn’t look quite as expected on BlackBerry devices. Naturally, I didn’t have a BlackBerry handset nor an easy equivalent to test the issue for myself, so I did what anyone stuck in the same corner as I would do: I went over the BlackBerry developer portal online to look for BlackBerry simulators.

     Unlike the Epoch simulator for Symbian/Nokia phones and the iPhone simulator, the BlackBerry simulators were spread out such that for each possible BlackBerry smart-phone model in existence, there was a simulator available for it. And each one of the download was anywhere from 50 to 150 MB in size.

     I chose the simulator for one of the latest BlackBerry handsets, and downloaded it. Like the Epoch simulator, BlackBerry simulators are Windows-specific, in that, they are available in the form of Windows executable binaries. I didn’t have Windows anywhere in my study, so I had to set up a Windows guest inside VMware Fusion in order to set up the simulator. To cut a long, painful story short, I was able to install the simulator after tirelessly downloading a big Java SDK update, without which the installation wouldn’t continue. And then, I powered up the simulator. I was instantly reminded of the never-ending pain I had to suffer through the hands of the Epoch simulator in my previous life where I used to be a Symbian developer. The BlackBerry simulator took ages to start up. I almost cursed out loud because that fact alone opened up old, deep gashes that I had thought I had patched up for good. I was mistaken. Never in my dreams had I thought of having to deal with such monstrosity ever again. And, to my utter, absolute dismay, here I was.

     Eventually, after what seemed to have been ages since I booted up the simulator, I was able to navigate my way to the BlackBerry browser. I let out a deep sigh and thought that I could now finally concentrate on the problem I set out to tackle. But, no! I couldn’t browse on the BlackBerry browser at all. No amount of fiddling with the 3G and WiFi settings all over the BlackBerry OS got browsing working. From what I could tell, both the 3G and WiFi networks were alive, but there was no traffic flowing through. I almost gave up.

     After groping on the Internet with a wince on my face, I was finally able to find out why. Apparently, by default, the BlackBerry simulator are unable to simulate network connections. In order to do this, you have to download and install an additional, supplementary BlackBerry simulator that is called the BlackBerry MDS Simulator. Once this simulator is up and running, your actual BlackBerry simulator will be able to simulate network connections, browse, and do all sorts of network related functions. Who knew!

     As an aside, there’s also the BlackBerry Email Simulator that simulates messaging functionality.

Ranting about Mercurial (hg) (part 1)

     The day I got a grip of git, I fell in love with it. Yes, I am talking about the famous distributed version control application: git. I was a happy Subversion (svn) user before discovering git. The day to day workflow with git and the way git worked made me hate Subversion. I was totally head over heels in love with git.

     I have come to a point where the first thing I do when I am about to start a new project is to start a git repository for that project. Why? Because it is so easy and conveneient to do. I really like the git workflow and the little tools that come with it. I know that people who complain about git often lament about its lack of good, proper GUI front-ends. While the state of GUI front-ends for git has improved incredibly and drastically over the years, I consider myself a happy command-line user of git. All my interactions with git happen on the command-line. And, I will admit, it does not slow me down or hinder my progress or limit me in any way in what I do. I have managed fairly large projects on git from the command-line, and let me tell you, if you are not afraid of working on the command-line, it is extremely manageable. However, I understand that a lot of people run away from the command-line as if it were the plague. For those people, there are some really good git GUI front-ends available.

     I have to admit that for someone coming from no version control background or centralized version control background (such as Subversion or CVS), the learning curve to git could turn out to be a litle steep. I know that it was for me, especially when it came to getting to grips with concepts involving pushing, pulling, rebasing, merging, etc. I don’t think I will be off-base here to think that a good number of people who use version control software for fairly basic tasks are afraid of merge conflicts in code when collaborating code. I know I was always afraid of it, until I actually finally faced a merge conflict for the first time. That was the last time I felt afraid of it, because I was surprised to find out how easy it was to deal with and (possibly) resolve merge conflicts. There’s no rocket science to it.

     I have been plenty impressed with git that I wince when I am forced to use any other version control. In fact, I refuse to use Subversion, as an example, and try my best to convince people to move to git. Why? Because git is just that good. I should probably point the reader to this lovely article that puts down severals reasons that explain why git is better that a number of other popular version control applications out there.

     Recently, at work, I have had to deal with the beast they call Mercurial (hg). Yes, it is another famous distributed version control system out there. I was furious at the decision of my peers to select Mercurial for work-related use, and tried my best to convince them to use git instead, but they didn’t give in to my attempts to persaude them. I had no choice but to unwilling embrace the beast and work with it.

     Instantly, I had big gripes with the way Mercurial worked as well as its general workflow. Where I fell in love with how git tracked content and not files, I hated the fact that Mercurial only tracked files. What I mean by that is if you add a file in a Mercurial repository to be tracked and commit it, and then make any changes to the file, Mercurial would automatically add it to the staging commit. Yes, it wouldn’t let you specify manually which files that were changed should be part of the staging commit, something that git does. A staging commit is a staging area where files that have been changed and have been selected to be part of the next commit are loaded to be ready for commit. With git, you have to manually specify which file to add to the staging commit. And I loved doing it that way. With Mercurial, this was done automatically. This in turn brings me to the second big gripe I had with Mercurial: incremental commits. Git workflow places an indirect emphasis on incremental commits. What are incremental commits? In the version control world, it is considered best practice to perform a commit that is as small as possible and that does not break any existing functionality. So, if for example, you have made changes to ten files to add five different functionalities, that may be independent of each other, but would only like to commit one functionality at a time. That is where incremental commits come in action. With incremental commits, you specify only the files you want to be committed and commit them, instead of committing all the changed files in a bunch and add a general purpose commit message. Incremental commits is a big win for me, and I adore that feature. Because with git you had to manually specify which of the changed files you wanted in the staging commit area, incremental commits came easily and naturally. With Mercurial, where it automatically added all changes to tracked files into the staging commit area, it did not.

     So I asked the obvious question: How do we do incremental commits with Mercurial? Aftering sifting through the logs, I found out that it was indeed possible, but in a very ugly way. The Mercurial commit command takes an extra parameter, -I, that can be used to specify which files explicity to make part of the commit. But that wasn’t it. If you had, say, five files of the ten changed files you wanted to commit, you would have to tag the -I switch behind each of those files, otherwise, Mercurial will fail to include the files in the commit. To this day, this little quirk bites me every now and then.

     I will admit that the one thing I really liked about Mercurial was the built-in web server it came with that could provide a web-based interface to your Mercurial repositories for viewing. This was something that was missing from or quite difficult to do with git. And, this was also one of the biggest reasons why my peers at work decided to go with Mercurial for. They all wanted a convenient way to access the repositories over the web.

To be continued.

Creating click-able hyper-links in CSV files programmatically

If you are programmatically creating CSV files intended for Microsoft Excel users, and cannot figure out how to make URL data in the CSV files behave like click-able hyper-links for your users, this tip is for you.

There is a special format in which URLs should be formatted, that is used by a number of spreadsheet applications to interpret URLs as click-able hyper-links. The format is this:

=HYPERLINK("URL")

The URL must be enclosed in quotes for the format to be interpreted properly (without which, at least, Microsoft Excel complains).