Scalatra tutorial part 3

Part 1 and Part 2 are here. You can follow along on github at https://github.com/jieyangh/scalatra-sample-API. The git hash for part 3 is 92828543922dafb66b00bff03e9f115647747427 (simply type git checkout 92828543922dafb66b00bff03e9f115647747427 from the command line in your local repository). There’s a lot of code in this one, so I’ll just post mostly code snippets from the code on github.

Let’s make a simple health check API. To keep things simple, the API will consist of a single endpoint that reports the current application version. This can be used as a sanity check to quickly determine if the service is up and running and whether or not the right version was deployed. As more functionality is added to the API, the health check API can be expanded in scope to check connections to dependencies such as the database and other web services.

This feature can be broken down into three subtasks. First, we will modify the build script to store the version info inside a dynamically generated file. Second, we will create a singleton that is responsible for parsing this file and handling storage and retrieval of the version information. Once the first two steps are done, the third task is simple: We need to implement the actual API controller for the new endpoint.

1. Modifying the Build:
The build.sbt file we were using before allowed simple definitions of build tasks. By using a build.scala file, we have more freedom over what the build does, and use the full expressiveness of the scala programming language to specify build tasks. We will want the build to write out a text file with all the version information. The first difference between the previous build.sbt and build.scala is that we now define a project object, and set the library dependencies in there:

 lazy val project = Project (
    "sampleAPI",
    file("."),
    settings = Defaults.defaultSettings ++ Seq(webSettings :_*)++ Seq(
      libraryDependencies ++= Seq(
        "org.scalatra" %% "scalatra" % "2.2.2",
        "org.eclipse.jetty" % "jetty-webapp" % "8.1.7.v20120910" % "container,compile",
        "org.eclipse.jetty.orbit" % "javax.servlet" % "3.0.0.v201112011016",
        "ch.qos.logback" % "logback-classic" % "1.0.1",
        "org.scalatra" %% "scalatra-scalatest" % "2.2.2" % "test"
      )
    )
  )

The second difference is that we can now define various build tasks in the code. Here, we call

VersionHelper.BuildVersionsPlugin.buildVersionFile("src/main/resources/version.conf")

VersionHelper mixes in the sbt.Plugin trait, which basically is a way to create code modules that can be run during the build. VersionHelper will be responsible for creating the text file with all the version info. For simplicity sake, the version will simply be the major, minor, patch versions concatenated with the git sha. The major, minor, and patch versions will be hardcoded. The git sha is calculated in what is the least readable part of the code:

 private val sha1Matcher = """[0-9a-f]{5,40}""".r

  private val sha = sha1Matcher findFirstIn "git log --pretty=format:'%H' -1".!! 

  def gitSha: String = sha match {
    case Some(s) => s
    case None => "0000000000000000000000000000000000000000" //error
  }

The “.r” following the string creates a Regex using the magic of Scala implicits. This allows methods to be applied to objects that normally wouldn’t have them. I’m not a fan of them because it makes the code hard to read and follow, sacrificing maintainability for convenience. But this is the recommended way to create a Regex expression in Scala.

The regex is then run. findFirstIn is called as a postfix operator of the sha1Matcher Regex object. This is syntactic sugar. It matches with the string output by the expression: “git log –pretty=format:’%H’ -1”.!!

A note here: In Scala, operators are methods. To allow natural extensions to their language, Scala allows method names to take on various symbols reserved for operators such as “!”. What !! does is actually call scala.sys.process, which then runs the preceding string in the command line, returning the output. In our case, it gets the latest git sha. Again, this syntactic sugar sacrifices readability and allows people to abuse the language by making all sorts of ugly and unintuitive method names. However, this is part of the official Scala library. Finally, once we retrieve the gitSha, we do a pattern match on the result of the Regex match. This is an Option, which can either be Some or None. If there was no match, we just return an error string. The rest of the code in VersionHelper.scala writes out the version information to the version.conf file under /src/main/resources and is relatively straightforward in comparison with the snippet we just looked at. It can be found on github as linked in the start of this article.

The conf file will look something like the following:

 build {
   name = "Sample API"
   version = "1.1.0.92828543922dafb66b00bff03e9f115647747427"
   lastBuilt = "2015-01-08 05:37:22.401 UTC"
 }

Now that we have the configuration file, we will want to create a VersionInfo.scala class in our web application that reads from it. TypeSafe provides the config package that makes reading from .conf files easy. The VersionInfo.scala class can be found on github. To load the config, we use ConfigFactory.load as follows:

 val config = {
     try {
        ConfigFactory.load("version.conf")
     }
     catch {
       case ex:Exception => null
     }
   }

This will load version.conf from the classpath. Scala will be able to locate the version.conf underneath resources folder. The rest of the code in VersionInfo.scala then parses the file using a series of intuitive method calls that do exactly what you would expect them to:

   val name = loadConfig("build.name")
   val version = loadConfig("build.version")
   val lastBuilt = loadConfig("build.lastBuilt")

Each “.” in the string denotes a level of hierarchy in the conf file.

Finally, we get to the easy part. We create a controller that outputs the version info, and mount it to a heartbeat endpoint in the ScalatraBootstrap file. This is covered in part 1 of the Scalatra tutorial. All the code can be found on github as linked earlier. HealthCheckController is just a few lines and is trivial given that all the heavy lifting has been done.

Interview practice: Racing horses

This one is a puzzle question. I’m personally not the biggest fan of these types of problems and would never ask something like this in an interview. That being said, they are kind of fun, and the following problem is no exception. Its an entertaining logic question that can give insight into a candidate’s thought process and how they approach an optimization problem.

There are 25 race horses and you want to find the fastest one. You can only race 5 horses at a time, and you can’t write down their times. How many races does it take to find the fastest three horses?

Here’s how I approached it. The first thing to do is figure out all the assumptions inherent in any puzzle question. The first and most crucial assumption is that the finish times for each horse remains the same across races. Without this assumption, the task is impossible, as the fastest horse could change between races: horse #1 could beat horse #2 in the first race, but lose to it in the second race. The second assumption is that there are no ties. There is a clear ranking in finish times for the horses.

Now with those out of the way, we can approach the problem. The most obvious thing to do first is to divide the 25 horses into five groups, and then race each group. This will identify the fastest horse in each heat. Now we just race the five fastest horses from each group. The winner will be the fastest out of all 25 horses, and we will have identified the fastest horse in just six races. Let’s call this winner horse #1, and assume he came from group #1. That’s the easy part. Now the question remains, how many races do we now need to find the second fastest horse? Should we just pick the second fastest horse from race #6? Should we race all the second place finishers from the five groups as well?

At this point it would be helpful to do some whiteboarding. Let’s write down some concrete race times.

Race #1:
Horse #1 1 minute
Horse #2 2 minute
Horse #3 3 minute
Horse #4 4 minute
Horse #5 5 minute

Race #2:
Horse #6 1.6 minutes
Horse #7 1.7 minutes
Horse #8 1.8 minutes
Horse #9 1.9 minutes
Horse #10 1.95 minutes

By writing out the first two races and assigning some times we can see that it is entirely possible that all the horses in the second heat are actually faster than all of the other horses in the first heat save for horse #1. Let’s write down some more race times for the other heats.

Race #3:
Horse #11 1.1 minutes
Horse #12 1.2 minutes
Horse #13 1.3 minutes
Horse #14 1.4 minutes
Horse #15 1.5 minutes

Again, we see that for the third heat, it is entirely possible that all the horses are faster than all of the other horses in the first heat save for horse #2. At this point it may seem that it is necessary to race all the second place finishers in each heat, before racing the winner of the seventh race against the second place finisher of the sixth race. But that’s not quite true.

Race #4:
Horse #16 1.01 minutes
Horse #17 1.02 minutes
Horse #18 1.03 minutes
Horse #19 1.04 minutes
Horse #20 1.05 minutes

Race #5:
Horse #21 1.001 minutes
Horse #22 1.002 minutes
Horse #23 1.003 minutes
Horse #24 1.004 minutes
Horse #25 1.005 minutes

Race #6:
Horse #1 comes in first place at 1 minute
Horse #21 comes in second at 1.001 minutes
Horse #16 comes in third at 1.01 minutes
Horse #11 comes in fourth at 1.1 minutes
Horse #6 comes in fifth at 1.6 minutes

From this we can quickly see that the second place finisher of race #6 (horse #21 from heat 5 from our whiteboard example) is faster than the first place finishers of heats 2,3, and 4. This means that he is also going to be faster than all the horses in heats 2,3, and 4. We can quickly eliminate all those horses as being the second fastest. Thus the only horses to consider for being second fastest are either the second place finisher in heat 6, or the second fastest horse from the winning heat the fastest horse was from (horse #2 from heat 1 in this case). So we race them. So far, we’ve had to do seven races.

So what about identifying the third fastest horse? Based on the same logic as above, we know that the third place finisher of race #6 (horse #16 from heat 4 from our whiteboard example) is faster than all the horses in heats 2 and 3. We do not know if this third place finisher is faster than the second or third place finisher from the winning heat (Horse #2 and #3 from heat 1), or the second place finisher from the heat that the second fastest horse came from (horse #22 from heat 5 in this case). Note for the latter case, we don’t need to consider the third place finisher from this heat, because at best it could only be the fourth fastest horse overall (we know by its placement that it must be slower than the first and second place finishers in its heat and slower than the fastest horse overall). Since we don’t care about the fourth fastest horse, we can eliminate it from consideration. So, there are only four horses to consider for third fastest, and we will need to have them compete in an eighth race to find the third fastest horse overall.

Or do we? In race #7, we raced horse #2 from heat 1 against horse #21 from heat 5. In race #8, we raced horse #2 from heat 1 again. There are only five horses we need to consider for second and third fastest, so we can combine races #7 and #8 into one. The second and third place finishers of the seventh race are the second and third fastest horses, respectively. In total, we just need seven races.

The general algorithm is:
Divide the 25 horses into groups of five.
Race each group.
Select the winner from each group and race them.
The winner from this race is the fastest horse. Let heat A denote the winning heat this horse is from, heat B denote the heat that the second place winner is from, and heat C denote the heat that the third place winner is from.
Now race horse #2 and #3 from heat A, horse #1 and #2 from heat B, and horse #1 from heat C (where #1, #2, and #3 denotes what place the horse finished for that given heat) in a final seventh race.
The first and second place winners of this race are the second and third fastest horses.

Scalatra Tutorial (Part 2): Logging, debugging, and tests

Part 1 of this tutorial is here

You can follow along by getting the code from github here: https://github.com/jieyangh/scalatra-sample-API
The git hash for part 2 is 32190b1ae5eb685f6a06eaae6cd5fa15d5cf23bd

Now that we have a barebones implementation of a web API, let’s start adding on functionality and utility.

Debugging 

Remote debugging with IntelliJ makes life much easier. Here’s how to set it up. Open the project in IntelliJ and go to Run->Edit->Configurations. Choose Remote , hit the “+” button, and enter in a name for the configuration (I chose “ScalatraDebug” for simplicity). The default settings are fine and will set up a debug port on port 5005:

remote debug

We will need to modify the SBT parameters accordingly, since it defaults to a different port. Go to File->Settings->Other settings and select SBT from the side nav. Under IDE Settings, change the VM parameters so that its listening on port 5005:

-Xmx512M -XX:MaxPermSize=256M -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5005

SBTDebug

Launch the application in the SBT console by typing “container:start” as we did before in part 1. Once the app is running, go to Run->Debug and select the remote debugging configuration that you previously created.

Logging

Any serious enterprise level application will need logging. Scalatra uses the logback framework by default, so that’s what we’ll use too. First we’ll need to add this dependency in the librarydependencies sequence in build.sbt:

"ch.qos.logback" % "logback-classic" % "1.0.1"

Next we will create a logback.xml under src/main/resources. Chances are if you’re reading this you’ve used logback before. If not, the file contents are fairly self explanatory. We can modify the pattern to suit our purposes, but including the date, thread, level, logger and actual message is a good a starting point as any:

<configuration>
    <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
        <encoder>
            <pattern>Captain's Log Stardate %date{ISO8601}. [%thread] %level %logger %msg
            </pattern>
        </encoder>
    </appender>

    <root level="debug">
        <appender-ref ref="STDOUT" />
    </root>
</configuration>

Finally, we’ll need to utilize the logger in our code. Let’s add some logging to the GreetingController scala class. First let’s create the logger:

val logger =  LoggerFactory.getLogger(getClass)

 

And now we just need to use our logger object to log things:

logger.info("We have entered a spectacular binary star system in the Kavis Alpha sector");

Again, pretty self explanatory. Here’s the full listing for the GreetingController:

package sampleApi.controllers

import org.scalatra.ScalatraServlet
import org.slf4j.{Logger, LoggerFactory}

class GreetingController extends ScalatraServlet {
  val logger =  LoggerFactory.getLogger(getClass)

  get("/") {
    logger.info("We have entered a spectacular binary star system in the Kavis Alpha sector");
    "Hello world"
  }

  get("/:name") {
    val name = params.getOrElse("name", "world")
    "Hello " + name
  }
}

Tests

Now we’ll want to add some automated testing. ScalaTest appears to be the go to test framework for Scala, and it provides support for Scalatra as well. This can be accomplished by adding a dependency for Scalatra ScalaTest in the libraryDependencies sequence in build.sbt:

"org.scalatra" %% "scalatra-scalatest" % "2.2.2" % "test"

Next we’ll want to create a test under src/test/scala. We’ll want our test class to extend the ScalatraSuite class, which allows us to use simple sentences to specify the expected behavior of our application. This is Behavior Driven Development (BDD). We also want our test class to have the “FunSuite” trait so that it can treat each test case as a function value. Note that in the official Scalatra testing example, the listed code didn’t compile for me, because it could not resolve “FunSuiteLike”. Replacing “FunSuiteLike” with “FunSuite” fixed the issue for me.

import org.scalatra.test.scalatest.ScalatraSuite
import org.scalatest.FunSuite
import sampleApi.controllers.GreetingController

class GreetingControllerTests extends ScalatraSuite  with FunSuite {
  addServlet(classOf[GreetingController], "/*")

  test("simple get") {
    get("/bob") {
      status should equal (200)
      body should include ("Hello bob")
    }
  }
} 

The test code is also pretty straightforward. An HTTP get request for “/bob” should return a status code of 200, and the response message should include “Hello bob”. You can run the test in IntelliJ by right clicking anywhere on the test code, or by creating a test configuration in the IDE and running it via the Run menu.

Scalatra tutorial (Part 1): Setting up a barebones “Hello world” REST API

I went through the exercise of setting up a Scalatra web API on my MacOS X for work. There are lots of guides out there, most of which use pre-existing projects on Github as a starting point. The official Scalatra documentation does this, although I ran into issues trying to follow their instructions (it seems to work fine when using Ubuntu but not with Mac). I used these guides as a reference, but I wanted to start from scratch and build out a simple GET endpoint that returned “Hello World”  using the least amount of code possible. The following is a guide on how to do exactly that.  The goal was to familiarize myself with how Scalatra APIs are structured and build upon that foundation.

I used Mac OS X (Mavericks) and IntelliJ (13.1) with the SBT and Scala plugins. The SBT plugin is optional, as you can use homebrew to install SBT and run the API from the command line. Most of the skeleton frameworks on github have an SBT bash script in the project root directory which unpacks and runs the SBT jar file (typically found under the project folder underneath the project root directory). Keep in mind that is why most documentation tells you to run the SBT commands using “./sbt”. The homebrew install adds SBT to your path, so you can simply type “sbt” instead.

If you want to follow along in github, you can check the sample project out at https://github.com/jieyangh/scalatra-sample-API. The hash for part 1 is 6c66a3df87ab321fa736f6ab0a819cb16d8e039e

Let’s get started!

Install Scala:

brew install scala

Install SBT (optional):

brew install sbt

SBT is a simple build tool, analagous to maven. Typically in most projects, there is a bash shell script with the same name in the project root that runs the sbt jar file (typically located in the project directory, and run via “./sbt”. If you install with brew or opt for a manual install, you can just type “sbt” instead). It is typically used  to generate the project idea file, update dependencies, and launch the application. However, the IntelliJ plugin makes this un-necessary though, so you can skip this step.

Install the Scala and SBT plugins for IntelliJ:

You’ll need to restart IntelliJ for the changes to take effect. Go to Preference->IDE Settings->Plugins->Browse Repositories. Search for “Scala” and “SBT” and install them both. The SBT plugin is optional, but it just makes life much easier.

scalaplugins

Create a new intelliJ project:

In the New Project dialog window, select Scala and choose “SBT”

newproj

Define the web.xml file:

Create a webapp/WEB-INF directory under src/main and add a web.xml file there. This file is a carry over from Java, which uses the web.xml file for servlet configuration. In Scalatra, web.xml is still utilized, but its role is much diminished. All that’s needed here is a definition for our listener class:

<?xml version="1.0" encoding="UTF-8"?>
<web-app xmlns="http://java.sun.com/xml/ns/javaee"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://java.sun.com/xml/ns/javaee
http://java.sun.com/xml/ns/javaee/web-app_3_0.xsd"
version="3.0">
        <listener>
                <listener-class>org.scalatra.servlet.ScalatraListener</listener-class>
        </listener>
</web-app>

Configure the routes in ScalatraBootstrap.scala:

Create a ScalatraBootstrap.scala file under src/main/scala, which will contain the bulk of your application configuration code. The design philosophy seems to favor using code over XML, a pattern that we will see repeated later when we see how SBT handles plugins and dependencies. The ScalatraBootstrap file is where our route definitions will go. Since we’re keeping things simple, we will specify a “sample” endpoint underneath the website route and then map it the yet to be implemented servlet, GreetingController, that services it.

import javax.servlet.ServletContext
import org.scalatra.LifeCycle

import sampleApi.controllers.GreetingController

class ScalatraBootstrap extends LifeCycle {
  override def init(context: ServletContext) {
    // Mount servlets.
    context.mount(new GreetingController, "/sample/*")
  }
} 

Implement the GreetingController servlet:

Here is where we will implement our HelloWorld endpoints. Create a sampleApi.controllers package under src/main and add a GreetingController.scala. Note that its customary to put the API endpoints under a controllers package, but not necessary for the implementation.

package sampleApi.controllers
import org.scalatra.ScalatraServlet

class GreetingController extends ScalatraServlet {
  get("/") {
    "Hello world"
  }

  get("/:name") {
    val name = params.getOrElse("name", "world")
    "Hello " + name
  }
} 

Two HTTP GET routes are defined one that accepts a “name” query string parameter, and one that doesn’t. Both are mapped to the root of application (in our case this would be http://localhost:8080/)

Add dependencies in build.sbt:

Now edit the build.sbt file under the project root. Here is where we manage all our package dependencies. Modify build.sbt so that it contains the following code (you can safely override the default auto generated file provided by IntelliJ)

libraryDependencies ++= Seq(
  "org.scalatra" %% "scalatra" % "2.2.2",
  "org.eclipse.jetty" % "jetty-webapp" % "8.1.7.v20120910" % "container,compile",
  "org.eclipse.jetty.orbit" % "javax.servlet" % "3.0.0.v201112011016")

Seq(webSettings :_*)

libraryDependencies is a variable declared in keys.scala that stores all the managed dependencies. “++= Seq” appends our list of dependencies to the variable. Each dependency uses the same groupID % artifactID % revision format that Maven does. The main difference here is that instead of using XML to specify the dependencies in a pom file, Scala code is used instead. Again, the design philosophy seems to favor the expressiveness and flexibility of code over that of XML.

IntelliJ will prompt you to re-import your project. Enable auto-import to save you some clicks in the future:

reimport

Add Jetty plugin to SBT:
We are still missing one more dependency in order to get our application to run successfully. We will want to launch our application from SBT by running it inside a Jetty web container, which is basically just a wrapper for tomcat. SBT plugins are managed in the project/plugins.sbt file. Note that plugins.sbt manages all the SBT plugins, and build.sbt manages all the dependencies for the actual application itself. Add the following line:

addSbtPlugin("com.earldouglas" % "xsbt-web-plugin" % "0.7.0”)

Launch the application:

Your project structure should look like the following:
projectstructure

Go to View->Tool Windows->SBT Console. Hit the green arrow button to launch SBT. It will load the project definition and update the dependencies automatically. Once SBT has finished loading, you should see a “success” message followed by a command prompt. Type “container:start” to launch the jetty web server:

SBTConsole

You can now can open up a browser and point it to “localhost:8080/sample” and see “Hello world”. Congratulations!

References

http://ethanway.com/scalatra/

Great guide but it does some things that aren’t standard. Namely, it uses a pom file to manage dependencies and launches the application by manually spinning up a jetty server in code instead of using SBT.

https://github.com/laurilehmijoki/sbt-scalatra-skeleton

A simple HelloWorld API with JSON output, complete with a working test and instructions on how to launch the application.

http://www.scalatra.org/2.3/guides/deployment/configuration.html

Official documentation outlining how to setup your API

null + null = 0, +new Date, and why loosely typed languages like Javascript are a pain

Javascript is a loosely typed language. This was done to expedite the writing of code and avoiding tedious tasks such as declaring variables and explicitly defining their types. You know, the kind of stuff only suckers would do. The irony is that these so called shortcuts actually result in more tedium as everything requires a tremendous amount of type checking. Consider the simple act of concatenating two strings: str1 and str2.

str1.concat(str2) fails immediately if str1 is undefined or null, so the concat method is out of the question. But we can always use the javascript “+” operator right? Sure, if you want to trust the implicit type casting in Javascript. In a strongly typed language such as C#, if str1 and str2 are null, str1 + str2 results in null as well. The result is intuitive and expected. In Javascript, the result of str1 + str2 is the integer value 0. This can lead to some hilarious and unintended consequences in the code that can be a pain in the ass to track down. But wait, there’s more! If both str1 and str2 are undefined, str1 + str2 is NaN (not a number). If str1 is “” and str2 is null, then str1 + str2 is equal to “null”. That’s right, because str1 is a string, Javascript implicitly casts str2 as a string as well. The expression becomes “” + “null” which evaluates to “null”. Ugh. If str1 is “” and str2 is undefined, well guess what? The string, “undefined”, is what happens when you concatenate the two.

If you want a result that is not nonsensical, you wind up having to write to some hideous code to convert all null and undefined values to an empty string first:

if (str1 === undefined || str1 === null)
{
    str1 = "";
}

if (str2 === undefined || str2 === null)
{
    str2 = "";
}

var result = str1 + str2;

And yes, the triple equals is absolutely necessary. Without it, the implicit casting rears its ugly head again. So to sum up, use common sense when writing Javascript code. Stick with the basic best practices.

DECLARE YOUR VARIABLES.

str1 = "hello"; //BAD but valid, sets str as a property of a global object. That's right, str1 is now a global variable. Horrible
var str2 = "world"; //You have to type three extra characters but it saves you a ton of grief

DO NOT USE YOUR VARIABLES FOR MORE THAN ONE PURPOSE

var str1 = "hello";
console.log(str1);
str1 = 5;  //For the love of god, just declare a new variable here instead
console.log(str1);

DO NOT DO CUTE STUFF WITH IMPLICIT CASTING TO LOOK COOL

var start = +new Date();

This is a clever snippet of code, but it can be rewritten to this functionally equivalent:

//short hand for
var startDateTime = new Date();
var startTicks = startDateTime.getTime();

Both versions of the code do the same thing. The second requires a bit more typing but is far more readable. +new Date is gimmicky. Yes it “looks cool” and shows off your Javascript skillz, but the brevity in this case is detrimental in the long run to maintainability. +new Date works because it first creates a new Date object. By default, the Date constructor returns the current date. The “+” unary operator is applied, which implicitly casts Date to an integer. The implicit casting returns the number of ticks since 1/1/1970. Great, but why have this esoteric code? It relies on an implicit cast, so it is entirely on the version of Javascript and the browser it is running on. Any future changes could potentially break the code . If typing two lines instead of one is so painful, create a utility function instead that calls +new Date and slap a comment on there. Or don’t, because +new Date has horrible performance.

So remember, scripting languages such as javascript make writing code easy, but the ease of use leads to ease of abuse. Be careful or you’ll wind up making things more difficult in the long run.

Scoping of “this” – client side Javascript versus server side NodeJS

Every function in Javascript has a “this” object that behaves differently depending on the execution context. Everytime a function is executed in Javascript, there is an object context associated with it; “this” provides a way to reference that object. When calling an object method, such as Foo.bar(), “this” refers to the Foo object. When calling a method in a global context, such as bar(), “this” refers to the Javascript global object. In most browsers, that would be the window object. bar() is essentially shorthand for window.bar().

Now, the behavior in NodeJS is quite different. In order to prevent pollution of the global namespace, all functions running in the top level scope are not scoped to the global object, but to a unique local object instead. That means every function in Node has its own “this” variable. According to the official NodeJS documentation:
In browsers, the top-level scope is the global scope. That means that in browsers if you’re in the global scope var something will define a global variable. In Node this is different. The top-level scope is not the global scope; var something inside a Node module will be local to that module.

Consider the following code:


function somefunc() {
    this.value = 100;
    console.log("this.value inside somefunc() is " + this.value);
}

this.value = 1;
somefunc();
console.log("this.value in top level scope is " + this.value);

The nodeJS output is:

this.value inside somefunc() is 100
this.value in top level scope is 1

The output from the javascript console in a browser such as Chrome or Firefox is:

this.value inside somefunc() is 100 
this.value in top level scope is 100 

Because somefunc is attached to the global object, “this.value” actually refers to window.value. Thus, changing the value of “this.value” inside somefunc() changes window.value, so “this.value” is still 100 even after somefunc() has finished execution. In NodeJS however, “this.value” inside somefunc is not the same “this.value” outside of somefunc. Again, every function has its own “this” variable, scoped to the object that it is referenced in. In NodeJS, the same holds true for anonymous functions as well:

 
function somefunc(cb) {
     this.value = 100;
     console.log("this.value inside somefunc() is: " + this.value);
     cb();
}

this.value = 1;
somefunc(function() {
    this.value = 50;
    console.log("this.value inside the anonymous function is: " + this.value);
});

console.log("this.value in top level scope is " + this.value); 

NodeJS outputs:

this.value inside somefunc() is: 100
this.value inside the anonymous function is: 50
this.value in top level scope is 1 

The browser outputs:

this.value inside somefunc() is: 100 
this.value inside the anonymous function is: 50  
this.value in top level scope is 50 

As before, the browser behavior is a result of somefunc being executed in a global context. In the preceding code, this.value always refers to window.value, even when it appears inside the anonymous function callback (more on this later). In NodeJS, the anonymous function has its own “this” object that is different from the “this” object inside somefunc. This results in the three different values for “this.value”.

Now let’s take a look at what happens when we explicitly set the execution context of somefunc to that of a local object:

function somefunc(cb) {
     this.value = 100;
	 console.log("this.value within somefunc is: " + this.value);
     cb();
}

var obj = {
    method:somefunc
};

this.value = 1;

obj.method(function() {
    this.value = 50;
	console.log("this.value within anonymous function is: " + this.value);
});

console.log("obj.value is: " + obj.value);
console.log("this.value in top level scope is: " + this.value);               

The browser prints out:

this.value within somefunc is: 100  
this.value within anonymous function is: 50  
obj.value is: 100  
this.value in top level scope is: 50 
this.value within somefunc is: 100
this.value within anonymous function is: 50
obj.value is: 100
this.value in top level scope is: 1 

When run from a browser, obj.value is set to 100 and window.value is set to 50. Since somefunc is no longer being called from a global context, “this” inside somefunc is owned by the object that calls it (imaginatively named obj in this example). “this.value” is actually referencing obj.value. Note that the anonymous callback function, “this” is not scoped to obj, but rather the global object. Because the anonymous function is not explicitly called via an object context, it still defaults to a global context!

The behavior in Node is consistent with what we observed before. When called in an object context, the “this” object inside of somefunc refers to obj. The anonymous function has its own “this” object, and any modifications to it are not reflected in obj nor in the top level scope. As a result, obj.value is 100 whereas this.value remains 1 in the top level scope – it was initialized once and never modified again.

The key takeaway here is that “this” can be incredibly confusing and unintuitive. Javascript provides methods such as apply() and call() methods for a reason: so that you can explicitly set “this” and avoid all the headache second-guessing yourself trying to figure out what “this” actually is.

When keeping it Agile goes horribly wrong

Disclaimer: I don’t like being Agile. I also don’t like complaining. Too much negativity is a bad thing, and in some cases, it can be even worse than being Agile. I subscribe to the simple philosophy that time spent whining can be better spent improving the situation. Given that I’m not always successful at following my own principles and given that this article is (mostly) facetious, I’ve decided to compromise: The following essay is a tongue in cheek commentary about why Agile sucks. It makes little to no attempt to come up with something better (perhaps because much like democracy, Agile is the worst methodology, except for all the other methodologies tried before it). So without further ado here is my anti Agile manifesto:

Once upon a time, there was this development methodology called Waterfall. It sucked and everyone wrote huge essays ripping it to pieces. It was such an easy target because of how ludicrous it was. Anyone who has ever worked on a project that had complexity greater than that of a “Hello World” application knows that getting all the requirements and design right on the first try, let alone the implementation, is a delusional dream. In some ways, its not even really fair to criticize the Waterfall model anymore. Its like beating a dead horse. Why bother doing it? There’s no challenge. Well, people still like to berate the Waterfall model because it serves as a useful strawman, a foil against which they can hock their new and improved development methodologies. The thing is, a wise man named Fred Brooks once said that there is no silver bullet. Software complexity spans across multiple layers, covering architectural distances that are orders of magnitude apart. So long as humans are writing the code, the building of software is always going to be an imperfect process. Any methodologies should merely be guidelines, not strict rules defining orthodoxy that constrict and bind the builders. Anyone promising the one true way to software development must be regarded with skepticism.

Yet over the years, new false prophets in software engineering delivered sermons promising untold productivity and awesomeness. The most popular of these was from the cult of Agile, which quickly became a full blown religion, complete with its own church and accompanying dogma. A methodology that was supposed to be about “individuals and interactions over process and tools” is now bogged down by the process it sought to avoid. Instead of circumventing tools by shuffling sticky notes around on a whiteboard, software tools designed to simulate sticky notes shuffled around on a whiteboard were adopted. Heretics who complained about how it was a bit too much were castigated and excoriated by the church. Cowed into submission, they would dutifully complete their two week sprints while complaining about how stupid it all was.

Among the front line of enforcers in the Agile army are the scrum-masters. They ensure strict adherence to the doctrine and processes of Agile. Straying from the path is punished by meetings on improving the process improvement process. Scrummasters come in two flavors: the full time scrummaster and the hapless developer who is asked to do double duty writing code and managing the sprint. The former has no technical skills and if they have no domain knowledge either then its best to run. Run and don’t look back. Apparently you can get certified as a scrummaster, and once you have been ordained, you are supposed to be able to provide value to a project just because you are familiar with Agile. I’ve seen this in action and the results aren’t pretty.

I remember at one meeting while the developers were having a technical discussion, the scrummaster stared on helplessly and asked everyone to please use the “As an X, I want Y, so that Z” format so as to provide the proper context. He was promptly ignored for being useless. Later, said scrummaster tried to randomize the team by asking the developers to retroactively edit the completed stories in a sprint to be tagged under a different release. Presumably this is because the scrummaster had no idea how to use Microsoft Team Foundation Suite. Such a scrummaster is dead weight. The problem is exacerbated because the scrummaster has no real power or control. The product owner/manager controls what features goes in the milestone. The developers not only don’t respect the scrummaster, they don’t report to him either. They report to their dev managers. The takeaway here is don’t hire full time scrummasters, but if you must, make sure they have a few years of domain knowledge. The developer/scrummaster hybrid role is not much better and is an unholy abomination. Developers are typically overloaded and randomized enough as it is. Having to manage a sprint on top of it while being invited to yet even more useless meetings is a major pain in the ass.

The meetings all revolve around completing the sprint. Sprints are typically two weeks and are comprised of stories, the basic unit of work. Stories are the building blocks of epics. Unlike the epic of Gilglamesh, which detail the exploits of the legendary warrior king of Uruk and his partner in crime Enkidu, these epics are typically mundane high level overviews of new features that need to be built. Depending on how orthodox the particular Agile church you belong to is, the stories need to be in the famous “As an X, I want Y, so that Z” format. For example, “As a user, I want this super duper awesome feature that shoots pew pew laser beams, so that I feel awesome and give more money to the company”. Or: “As a developer, I want to drink wine out of the bottle, so I can dull the pain of shuffling sticky notes around a whiteboard every day during my daily standup meetings”.

If the stories are not written correctly, they are rewritten or removed from the sprint in the grooming meetings, where the wheat is culled from the chaff. These are supposed to be highly efficient meetings, but I’ve seen these degenerate into pointless semantic meetings where people argue about the definition of “done” and what the proper “acceptance criteria” for a story is. These typically spin off yet more banal meetings about vocabulary words where there is much weeping and gnashing of teeth. Once the stories have been groomed, then the sprint planning meeting can begin. The stories in the sprint are then “sized”, and scope is adjusted accordingly. “Size” is just an arbitrary metric measuring how much work a story is. It can either be measured in terms of complexity (How difficult a task is) or time (How long it will take). The two are not necessarily the same. On an increasing scale of 0 to 5 points, banging your head against the wall 1000 times would be a 1 point story based on complexity, but a 5 point story based on time. These meetings can drag on forever if huge arguments ensue about how large a story is. The good news is that most people just want to get the meeting over with. As the sprint planning drags on, most people’s spirits are broken and they just simply stop caring enough to even bother arguing.

Once the sprint is finally complete, there is the sprint demo. A lot of the times this is to non technical stakeholders or customers. While demos are great if you’re working on a UI and have something flashy to show off, webservices, messaging queues, build scripts, database work, and the other sort of behind the scenes stuff that makes everything work don’t lend themselves to exciting presentations. Its pretty awkward but there are workarounds such as demoing only the UI and hiding any XML, database queries, and command line interfaces. Following the demo there is usually a sprint review. This is a meeting where everyone looks at the sprint burndown chart, a graph that tracks the number of remaining story points in the sprint over time. A perfect burndown is a straight line with a negative slope that reaches 0 at the end of day 14. In reality, the burndown charts resembles a drunk person stumbling around the park. Sometimes the line plateaus for days before small and erratic stepwise increases and decreases eventually reach a number that not quite zero. During the review meeting, everyone discusses ways to make the burndown chart be prettier in the future:
“Perhaps instead of sizing based on time or complexity, we can size based on the position of the moon relative to the earth and the sun?”
“How about consulting the oracle bones instead?”
“How about basing it on how many developers are weeping when they read the story?”
Regardless of whatever zany ideas are proposed, the resulting burndown never even remotely resembles a straight downward line.

Closely related but subtly different from the sprint review is the sprint retro, where the team reflects back on the sprint. On a dysfunctional team, this is mostly for catharsis as team members yell about what went wrong. Typically, this results in a set of action items with clearly identified owners. This is actually a great exercise in team improvement, but your mileage may vary. Those who feel empowered and accountable for the project at large will actually follow up on the action items. Those who are shackled by bureaucracy and red tape will shrug their shoulders, send out a few passive aggressive emails, and call it a day. On teams that are well run, the retro is mostly just a formality.

Throughout the entirety of the sprint are the daily standup meetings. This is where people talk about what they’re working on. Each person usually says something along the lines of: “Yesterday I worked on this stuff, today I’m going to work on this other stuff. No blockers. Sunny outside with a small chance of showers.” These tend to be tedious. To be fair though, they can provide some modicum of value by making sure the entire team is on the same page and aware of any potential risks of blocking issues. Then again, simply having a team that talks to one another accomplishes the exact same thing. The potential danger lurking in the standup meetings is having a weak scrummaster. Its up to the scrummaster to make sure that people don’t take up too much time reporting their status, ramble on, go off topic, or get into a huge debate with others. Any kind of serious issues should be taken offline and discussed after the meeting. If the scrummaster is not strict about this, the standup meetings will start to drag on and become a major source of pain for everyone involved. The problem grows exponentially worse the larger the size of the team, as though daily standup drama is directly proportional to how many people are unlucky enough to attend.

On paper, Agile isn’t completely terrible. There’s a lot of good ideas. The problem is when it is actually put into practice. In that regard, it is quite similar to both Waterfall and Communism. While both are ideas that sound great on paper, they fail miserably in the real world. Agile, an anti-process process of writing software, becomes bogged down in process. Managers obsessed with metrics start demanding more granular detail on story size estimates, more information on what business value a particular story provides, and more visibility into the sprint progress (typically accomplished by asking for daily status reports). Soon things get so ridiculous that something as simple as fixing a typo in a variable name requires a story, approval from business, and a triage meeting before the scrummaster moves it into the sprint. Those on a dysfunctional team will hide behind process like a government worker hides behind red tape to not get anything done. They thrive in an environment like this, feeding off the death of team morale.

At this point in my life, I’d rather do everything ad hoc. There has to be a certain amount of flexibility in the software development process. Sometimes the requirements will not be clear, and they will rarely be perfect. Nor will the requirements be set in stone. Unfortunately, requirements do change, and sometimes at the last possible minute. It sucks, but the solution to this isn’t to add yet even more processes and meetings to prevent this from happening. Minimizing volatility is a good goal, but the team should eschew the rigidity of a predefined process and feel empowered to make their own decisions. Agile helps foster communication among team members. But if the team is filled with toxic people who can’t play well together , no methodology will ever solve that. If the team is filled with competent people who have good rapport and open lines of communication, then Agile is not really necessary. I once worked on a team that had a hard ship date in two months. For those two months we kept the daily standup but eliminated the overhead of grooming, planning, and retros, and simply did what needed to be done. This requires a certain level of proactivity and an ability to handle the almost chaotic nature of such freedom, but it was a refreshing refrain from the suffocating nature of Agile.

The TLDR cliff notes is, teams should follow the spirit of Agile rather than enforcing it to the letter. Rigidity leads to bureaucracy, which defeats the whole purpose of Agile in the first place. Agile was meant to be a lightweight methodology designed to ship code quickly and frequently. The keyword here is lightweight. The frequent grooming, design, planning, demos, and retros are there to foster effective communication, not to stifle people with the tedium of it all.

Deduping an XML document with XSLT (listing only unique elements)

XSLT provides a key() function that retrieves a subset of nodes in an XML document based on a selection criteria specified in an element. In essence, is an index of the XML document, and key() serves as an index lookup function. This provides a lot of versatility to XSLT that allows you to accomplish a lot with a little. Let’s look at how we could use this to dedupe an XML document that contains a lot of duplicate items and only list each unique element once. Consider the following XML:

<Inventory>
<Product>
	<ProductID>1</ProductID>
	<name>Sword of the ancients</name>
	<description>A legendary sword fit for a hero</description> 
</Product>
<Product>
	<ProductID>2</ProductID>
	<name>BFG 9000</name>
	<description>A slayer of demons</description> 
</Product>
<Product>
	<ProductID>2</ProductID>
	<name>BFG 9000</name>
	<description>A slayer of demons</description> 
</Product>
<Product>
	<ProductID>3</ProductID>
	<name>Flaming sword</name>
	<description>Effective against the undead</description> 
</Product>
<Product>
	<ProductID>4</ProductID>
	<name>Aegis bulwark</name>
	<description>A stalwart shield that blocks the deadliest blows</description> 
</Product>
<Product>
	<ProductID>4</ProductID>
	<name>Aegis bulwark</name>
	<description>A stalwart shield that blocks the deadliest blows</description> 
</Product>
<Product>
	<ProductID>4</ProductID>
	<name>Aegis bulwark</name>
	<description>A stalwart shield that blocks the deadliest blows</description> 
</Product>
</Inventory>

Each Product is uniquely identified via its productID. Some of the products appear more than once. Let’s write a transform to clean up the XML and list each item only once. In order to do so, we will need to make use of the key() and the generate-id() functions. Let’s take a look at key() in detail first. key() takes two parameters, the name of element to do the lookup on and a search string to match against:

key(key-name, search-string)

search-string is then matched against the selection criteria defined in the key variable key-name. A key element has three attributes: @name, @match, and @use. @name is the key-name parameter that will be passed in to the key() function. @match is an XPATH expression that specifies which elements in the XML document to index. @use is an XPATH expression that serves as the lookup key. It must be relative to the elements specified in @match, and it will be evaluated against the key() search-string parameter. The following would be an xsl:key that indexes all the Products in the previous XML document based on the ProductID element.

<xsl:key name="ID-key" match="Product" use="ProductID" />  

key(“ID-Key”, “2”) would return the two BFG 9000 elements. In order to extract only the unique elements, we will need to use generate-id(). generate-id() takes a node as a parameter, and generates a unique ID for it, relative to the entire document. No two nodes in the same document will ever map to the same ID. However, generate-id() does not guarantee that the value generated for a given node will be the same on a different run. In order to identify all the unique elements, we will use generate-id() as a way to test for equality between nodes. The following expression selects all the unique ID values in the document:

<xsl:for-each select="//Product[generate-id(.)=generate-id(key('ID-key', ProductID)[1])]">

Let’s break this gnarly expression down into smaller, more manageable, parts. The xsl:foreach is selecting a set of products that satisfy the equality conditional inside the bracket predicates. What is the predicate trying to accomplish? Starting with the innermost expression in the right hand side of the conditional, we see that key(‘ID-key’, ProductID) returns a list of nodes that has the same ProductID value as the current node being evaluated. [1] returns the first node in this key node-set, which is then passed as an argument to generate-id(). Note that we are always guaranteed at least one result from the key lookup, since key() will always return the current node in the node-set.

On the left hand side of the conditional is the expression generate-id(.). This generates an id for the current product node being evaluated. If the two generated IDs are equal, we include the node in the result-set. Subsequent nodes with the same ProductID will fail this equality test, since we are only comparing to the *first* node returned by key(). The rest will be ignored. This guarantees that we get a list of Product nodes where no two ProductIDs will be the same. The equality test essentially filters the redundant elements out.

With that hairy expression out of the way, its just a simple matter of printing out all the unique elements:

<xsl:for-each select="//Product[generate-id(.)=generate-id(key('ID-key', ProductID)[1])]">
   <xsl:copy-of select="."/>
    <xsl:value-of select="."/>
</xsl:for-each>

The complete code listing is here:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl">
  <xsl:output method="xml" indent="yes"/>
  <xsl:key name="ID-key" match="Product" use="ProductID" />


  <xsl:template match="/">
    <xsl:element name="Inventory">
      <xsl:for-each select="//Product[generate-id(.)=generate-id(key('ID-key', ProductID)[1])]">
        <xsl:copy-of select="."/>
      </xsl:for-each>
    </xsl:element>

  </xsl:template>
</xsl:stylesheet>

Running the transform on the previously listed input XML will output the following XML file:

<?xml version="1.0" encoding="utf-8"?>
<Inventory>
  <Product>
    <ProductID>1</ProductID>
    <name>Sword of the ancients</name>
    <description>A legendary sword fit for a hero</description>
  </Product>
  <Product>
    <ProductID>2</ProductID>
    <name>BFG 9000</name>
    <description>A slayer of demons</description>
  </Product>
  <Product>
    <ProductID>3</ProductID>
    <name>Flaming sword</name>
    <description>Effective against the undead</description>
  </Product>
  <Product>
    <ProductID>4</ProductID>
    <name>Aegis bulwark</name>
    <description>A stalwart shield that blocks the deadliest blows</description>
  </Product>
</Inventory>

Exploring Javascript closures

Javascript closures are a concept that is easier to explain by example:

function outer(increment) {
    var outer_foo = 1;
    function inner() {
        outer_foo += increment;
        return outer_foo; 
    }
    return inner;
}

var outer_instance = outer(2);
var foo1 = outer_instance();
var foo2 = outer_instance();  

alert(foo1);   //3
alert(foo2);   //5

The inner function references the local variables of the outer function. The inner function is the closure. It is enclosed by its parent function and has access to its parent’s execution context, even after the parent has finished execution and returned. In essence, variables in the scope chain of the parent function stay in scope, as though they were allocated on the heap and not on the stack. Contrast this to programming languages such as C, where parameters and variables local to a function are declared on the stack frame, and immediately deallocated after the function has exited.

Because the parent function’s variables stay in scope and are essentially passed by reference to the closure, this can lead to bugs. Case in point:

function createFunctions() {
    var result = new Array();

    for (var i = 0; i < 10; i++) {
        result[i] = function() { return i; };
    }

    return result;
}

var functionArray = createFunctions();
var func1 = functionArray[0];
var foo1 = func1();  
alert(foo1); //10
var func7 = functionArray[6];
var foo7 = func7(); 
alert(foo7) //10

The createFunctions method returns an array of closures, each of which is a function that returns the value of the loop index, i. At first glance, it would seem as though the closures would return the numbers 0 through 9, respectively. In actuality, every closure in the array returns 10. This is because the i variable in each closure references the i variable of the parent execution context. Because this is a reference, the value of i will not be the value at the time of the creation of the closure, but rather at the time of the execution of the closure. The for loop will have already terminated by this point and set i to 10.

In order to match the original intent behind the function, createFunctions() can be rewritten as follows:

function createFunctions2() {
    var result = new Array();

    for (var i = 0; i < 10; i++) {
        result[i] = function(num) { 
	    return function() {
	        return num; 
	    };
	}(i);
    }

    return result;
}

This syntax can be a bit jarring to read at first. The inner-most anonymous function is a closure that references the variable num, which is scoped to its parent function:

 
    //snipped out code ... 
        function() {
            return num; 
        }
    //...

The inner function is the return value for its parent function, which takes in a single argument: num. This parent function is immediately invoked with the loop-counter i as its parameter, as denoted by the “(i)” found at the end of the function declaration.

//snipped out code ...
        function(num) { 
	   //...
	}(i);
//...

Because parameters are passed in by value and not by reference, num retains the value of i at the time this outer function was created. In other words, num will equal the value of i at the current iteration in the for loop, and will keep this value independent of any future modifications to i. The outer anonymous function serves as an intermediary: It returns a closure that returns the value of num “frozen” at that point in time, allowing the code to function as originally intended.

   function createFunctions2() {
	var result = new Array();

	for (var i = 0; i < 10; i++) {
		result[i] = function(num) { 
			return function() {
				return num; 
			};
		}(i);
	}

    return result;
}

var functionArray = createFunctions2();
       
//prints out the numbers 0 through 9
for (var i = 0; i < 10; i++)
{
	var bub = functionArray[i]();
	alert(bub);
}

The key takeaway here is that closures access variables in their parent’s scope chain by reference, and not by value. To bypass this intended behavior, the parent variables that the closure accesses must themselves by a copy of the originals, which is accomplished by passing them in as parameters to the parent function. The code that accomplishes this can look unwieldy, but its not so bad once you figure out the basic pattern behind it.

Implementing a for loop in XSLT

XSLT is a language designed to transform XML documents. Instead of being a procedural programming language, XSLT models itself after functional programming, borrowing many of its fundamental design concepts. Functional programming deals with expressions rather than statements. Statements are executed. They may or may not return a value, and variables directly or indirect impacted by statements can be modified, resulting in potentially far reaching side effects once a particular statement has been executed. Expressions on the other hand, are evaluated. They can be nested, chained together as arguments to a parent expression, and combined together in various other ways. Expressions all are evaluated eventually though, simplifying down into a single return value. In a functional programming language, the parameters that an expression acts upon are immutable and thus there are no side effects when an expression is evaluated.

XSLT takes the same approach to its underlying design philosophy. In XSLT, templates are essentially expressions that can be combined together in various ways, eventually being evaluated and returning an output value that is typically, although not necessarily, a well formed XML document. Because XSLT uses immutable variables, there are no side effects when templates are evaluated. This means that an XSLT engine can optimize the transform, applying template rules in any order, sometimes even simultaneously.

Imagine what would happen if variables were mutable instead. A given template in an XSLT stylesheet can match multiple nodes in an XML document. The order in which those nodes are read would suddenly matter if say, the template output some text from the node and appended the value of an auto incrementing integer variable. Differing implementations of XSLT engines would result in different outputs, a clear violation of encapsulation. Worse, any implementation that has a non-deterministic order of evaluation will have race conditions and result in nondeterministic output every time the transform is run. This also holds true at the template level as well, as the side effects from one template that modify a variable that is used by another template could very well impact the output of the other. Nondeterministic output in a language designed specifically to output documents is a nonstarter from a design perspective, so XSLT would suddenly require some mechanism with which to specify the order of evaluation. The end result would be a language that would be quite cumbersome and awkward, as opposed to the clean and concise syntax that XSLT uses today.

Its important to have this functional programming mindset when working with XSLT. Trying to do procedural programming in XSLT is like trying to fit a square peg into a round hole. Its possible, but the end result is not pretty. Take for instance, a for loop. XSLT provides iteration capabilities via XPATH that lets it iterate over a set of nodes and return the ones that match a certain set of criteria. However, this is not technically not the same as a for loop in a procedural programming language, which specifies a counter, an increment/decrement operation, and a comparison operator that terminates the loop. Because variables are immutable in XSLT, something as simple as having a loop counter suddenly becomes a nontrivial nightmare.

It is indeed possible to simulate a for loop in XSLT, but whether or not you would want to is another question altogether. This is more an interesting exercise in how to map functional programming paradigm to a procedural one. The following solution is based on the mammoth O’Reilly XSLT book by Doug Tidwell. The important thing to keep in mind however, is that in a real world scenario, the need for a for loop might indicate that XSLT may not be the right tool for the job. I’ve encountered usage of such for loops in some hairy XSLT code, and it isn’t pretty. Its best to keep procedural programming tasks left to actual procedural programming languages. This is a shocking concept that can be lost on a lot of people who forget that they can either do any necessary pre or post processing in code instead of in XSLT, or write a code module in a more powerful language that can then be called in XSLT as an extension function.

But I digress. If you’re a glutton for punishment, here’s how you would go about implementing a for loop. The basic idea here is that because variables are immutable, we cannot increment the loop-counter directly. Instead, we can it as a parameter by recursively calling the template with the incremented value.

 
<xsl:template name="for-loop">
	<xsl:param name="loop-counter" />
	<xsl:param name="increment-value" /> 

        ......do some processing here.....

	<xsl:call-template name="for-loop">
		<xsl:with-param name="loop-counter"   select="$loop-counter + $increment-value" />
		<xsl:with-param name="increment-value" select="$increment-value"/> 
	</xsl:call-template>
</xsl:template>

Remember that any task that can be done iteratively can also be done with recursion (given a large enough call stack). The recursion will terminate once the conditional of the for-loop template has been satisfied. We will need to have some way of testing the conditional against the current value of the for-loop counter value. To make the template more flexible, we define a “$conditional-test” parameter that specifies what comparison operator to perform and a “$conditional-value” parameter to evaluate it against. A “$conditional-passed” variable will be used to store the result of the comparison. Now, because XSLT does not have a way to pass in an expression that can be dynamically evaluated at runtime, we will have to use to manually map each potential value of “$conditional-test” to the appropriate if statement. This requires writing a lot of tedious code:


  <xsl:variable name="conditional-passed">
      <xsl:choose>
        <xsl:when test="$conditional-test = '!='">
          <xsl:if test="$loop-counter != $conditional-value">
            <xsl:text>true</xsl:text>
          </xsl:if>
        </xsl:when>

        <xsl:when test="$conditional-test = '&gt;='">
          <xsl:if test="$loop-counter &gt;= $conditional-value">
            <xsl:text>true</xsl:text>
          </xsl:if>
        </xsl:when>

        ... add support for other conditional tests here ...

        <xsl:otherwise>
          <xsl:message terminate="yes">
            Invalid conditional-test passed in
          </xsl:message>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:variable>

If the value of “$conditional-test” matches none of the pre-defined values, then we use xsl:message to write the error to output and terminate the template. Otherwise, the final step is to check the value of “$conditiona-passed”:

 
     <xsl:if test="$conditional-passed='true'">
         
         ... Add any custom code here that the for loop needs to execute.   In this case, we merely print the loop-counter
         <xsl:text>loop-counter: </xsl:text>
         <xsl:value-of select="$loop-counter" />
         <xsl:text> &#xA;</xsl:text>

      <xsl:call-template name="for-loop">
        <xsl:with-param name="loop-counter"   select="$loop-counter + $increment-value" />
        <xsl:with-param name="increment-value" select="$increment-value"/>
        <xsl:with-param name="conditional-test" select="$conditional-test"/>
        <xsl:with-param name="conditional-value" select="$conditional-value"/>
      </xsl:call-template>
    </xsl:if>

Note that the custom code inside this if-statement corresponds to the user-defined code found inside a standard procedural for-loop. If $testPassed is not equal to ‘true’, then we have reached the end of our loop and can exit the template. Otherwise, we execute the code and recursively call the template again, after having incremented $counter accordingly. This simulates the next iteration of our for-loop. Note that this section of the template will need to be changed every-time the for-loop needs execute a different set of code. Because there is no way in XSLT to pass in a set of instructions as a parameter to be later dynamically evaluated at runtime, this makes an XSLT for-loop template hard to re-use. Anytime new functionality is needed inside our for-loop construct, a new template must be created. The old template can be copy/pasted and the custom code inside the if-statement modified accordingly, but this is not an ideal solution. This further drives home the point that its best to use the right tool for the right job.

Putting it all together, we arrive at the following code in all its glory:

<?xml version="1.0"?>

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output method="text"/>

  <xsl:param name="loop-counter" select="10" />
  <xsl:param name="increment-value" select="-1"/>
  <xsl:param name="conditional-test" select="'&gt;='"/>
  <xsl:param name="conditional-value" select="0" />

  <xsl:template match="/">
    <xsl:call-template name="for-loop">
      <xsl:with-param name="loop-counter" select="$loop-counter" />
      <xsl:with-param name="increment-value" select="$increment-value"/>
      <xsl:with-param name="conditional-test" select="$conditional-test"/>
      <xsl:with-param name="conditional-value" select="$conditional-value"/>
    </xsl:call-template>
  </xsl:template>

  <xsl:template name="for-loop">
    <xsl:param name="loop-counter" />
    <xsl:param name="increment-value" />
    <xsl:param name="operator" />
    <xsl:param name="conditional-value" />

    <xsl:variable name="conditional-passed">
      <xsl:choose>
        <xsl:when test="$operator = '!='">
          <xsl:if test="$loop-counter != $conditional-value">
            <xsl:text>true</xsl:text>
          </xsl:if>
        </xsl:when>

        <xsl:when test="$operator = '&gt;='">
          <xsl:if test="$loop-counter &gt;= $conditional-value">
            <xsl:text>true</xsl:text>
          </xsl:if>
        </xsl:when>

        <xsl:when test="$operator = '&lt;='">
          <xsl:if test="$loop-counter &lt;= $conditional-value">
            <xsl:text>true</xsl:text>
          </xsl:if>
        </xsl:when>

        <xsl:when test="$operator = '&gt;'">
          <xsl:if test="$loop-counter &gt; $conditional-value">
            <xsl:text>true</xsl:text>
          </xsl:if>
        </xsl:when>

        <xsl:when test="$operator = '&lt;'">
          <xsl:if test="$loop-counter &lt; $conditional-value">
            <xsl:text>true</xsl:text>
          </xsl:if>
        </xsl:when>

        <xsl:when test="$operator = '='">
          <xsl:if test="$loop-counter = $conditional-value">
            <xsl:text>true</xsl:text>
          </xsl:if>
        </xsl:when>
        <xsl:otherwise>
          <xsl:message terminate="yes">
            Invalid conditional-test passed in
          </xsl:message>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:variable>


    <xsl:if test="$conditional-passed='true'">
      <xsl:text>loop-counter: </xsl:text>
      <xsl:value-of select="$loop-counter" />
      <xsl:text> &#xA;</xsl:text>

      <xsl:call-template name="for-loop">
        <xsl:with-param name="loop-counter"   select="$loop-counter + $increment-value" />
        <xsl:with-param name="increment-value" select="$increment-value"/>
        <xsl:with-param name="conditional-test" select="$conditional-test"/>
        <xsl:with-param name="conditional-value" select="$conditional-value"/>
      </xsl:call-template>
    </xsl:if>
  </xsl:template>
</xsl:stylesheet>
 

A simple .net program can be whipped up to apply this transform:

using System.Xml;
using System.Xml.Xsl;
using System.Text;

namespace XSLTTest
{
    class Program
    {
        static void Main(string[] args)
        {
            XslCompiledTransform stylesheet = new XslCompiledTransform(); 
            stylesheet.Load("Forloop.xslt"); 
            stylesheet.Transform("input.xml", "output.xml");
        }
    }
}

Since this template blindly prints out the loop-counter, the input xml is completely irrelevant. For a real working world example, one would likely define new parameters in the for-loop template that would read in various values that the for-loop would then operate on. In our case the output simply looks like the following:

loop-counter: 10 
loop-counter: 9 
loop-counter: 8 
loop-counter: 7 
loop-counter: 6 
loop-counter: 5 
loop-counter: 4 
loop-counter: 3 
loop-counter: 2 
loop-counter: 1 
loop-counter: 0