UW CSE P576 notes – Harris corner detection

The following are my notes on the Harris corner detection algorithm for finding the features in an image. These slide screenshots were taken from the University of Washington course homepage here:

http://courses.cs.washington.edu/courses/csep576/15sp/projects/project1/web/project1.htm

The idea is to consider a small window around each pixel p in an image. We want to identify all such pixel windows that are unique. Uniqueness can be measured by shifting each window by a small amount in a given direction and measuring the amount of change that occurs in the pixel values.

window

More formally, we take the sum squared difference (SSD) of the pixel values before and after the shift, and identifying pixel windows where the SSD is large for shifts in all 8 directions. Let us define the change function E(u,v) as the sum of all the sum squared differences (SSD), where u,v are the x,y coordinates of every pixel in our 3 x 3 window and I is the intensity value of the pixel. The features in the image are all pixels that have large values of E(u,v), as defined by some threshold.

equation

After some fancy math that is best left explained by Wikipedia or the original slides which essentially involve taking the first order approximation of the Taylor series expansion for I(x + u, y + v), we are left with:

equationsimplified

 

where H is the Harris matrix and the I_x and I_y terms are the gradients in the x and y directions, respectively (the gradient values for each pixel can be done using the Sobel operator). Note that this is a sum of all the matrices in the window W. This is important later.

Remember that we want the SSD to be large in shifts for all eight directions, or conversely, for the SSD to be small for none of the directions. By solving for the eigenvectors of H, we can obtain the directions for both the largest and smallest increases in SSD. The corresponding eigenvalues give us the actual value amount of these increases. Because H is a 2×2 matrix, solving for the eigenvalues can be done by taking the determinant and setting it to 0, and using the quadratic equation to find the two possible solutions.

Because solving the quadratic equation for every pixel is computationally expensive (it requires the square root operator), we can use a variant where instead of solving for the eigenvalues directly, we compute a corner strength function as defined by:

c(H) = determinant(H) / trace(H) where the trace is the sum of the two elements in the main diagonal (upper left to lower right). This is the Harris operator.

One question that tripped me up as well as other students is why the determinant of the Harris matrix isn’t always equal to zero. The determinant of H at first glance is equal to

I_x^2 * I_y^2 –  I_x*I_y * I_x * I_y. This becomes I_x^2 * I_y^2 – I_x^2 * I_y^2 = 0. However, as previously noted, these individual terms represent the sums across all the pixel values in the window. So I_x^2 is summed up over all the pixels in the window W, as is I_x*I_y and such.

Here then is the high level pseudocode:

1. Take the grayscale of the original image

2. Apply a Gaussian filter to smooth out any noise

3.  Apply sobel operator to find the x and y gradient values for every pixel in the grayscale image

4. For each pixel p in the grayscale image, consider a 3×3 window around it and compute the corner strength function. Call this its Harris value.

5. Find all pixels that exceed a certain threshold and are the local maxima within a certain window (to prevent redundant dupes of features)

6. For each pixel that meets the criteria in 5, compute a feature descriptor.

Step 5 is itself a topic of much discussion that is out of scope for these notes. The simplest approach is to use a 5 x 5 window. In terms of feature matching, such a feature descriptor is invariant to translation, but nothing else. Better feature descriptors would be invariant to rotation, illumination, and scaling.

 

 

UW CSE P 505 – Monads

Monads were created in response to the problems that arise when a pure functional programming language encounters the dirty impurities of the real world. One particular problem that vexed researchers and developers early on was how to represent file I/O in a language such as Haskell. Take a relatively straightforward method such as a function prints a string to the console. It’s pretty obvious that the entire purpose of the function is so that it can execute a side effect. It’s also not clear that this function has a return value. Indeed, in many procedural languages, methods such as Console.WriteLine in C# have a return type of void. Functions that have side effects and have no return value clearly aren’t functions in the mathematical sense.

To address this issue, languages such as Haskell differentiated between pure functions and actions, the latter which is a value that represents the result of a computation that may cause side effects. A monad encapsulates this notion of computations and their state, and provides a way to transition from one computation to the next based on the resulting action from the previous computation. This is known as the bind method, or simply “>>=”. Essentially, it takes the value “wrapped” in the state of the previous action, performs some computation on the unwrapped value, and then wraps the result with the monad. The other important method, return, explicitly wraps some value type “a” with the monad.
This is very abstract, so let’s look at the Haskell IO monad as an example. The bind method would have a function signature of type:

IO a -> (a -> IO b) -> IO b

In other words, given some action IO a, bind takes a function (call it F) that takes in the unwrapped type a and returns a wrapped type b, applies F to a and returns the resulting b “wrapped” in an IO action. Let’s look at how we would use bind to read a line from the user and output it back to the screen.

The getLine function in Haskell reads a line from standard input and has type IO String. putStrLn has a return type of IO (). We want to call getLine, take the resulting string from that, and call putStrLn with it. In the IO bind method, a is type String, b is type Unit (), and the mapping function F is putStrLn. The (a -> IO b) matches up with (String -> IO ())

The use of bind would look something like this:

getLine >>= \input ->
putStrLn (“You typed “ ++ input)

What this has accomplished is the separation of concerns between the side effects of an action with the results of an action. By encapsulating any state changes brought about by a computation inside of a monad, we retain the ability to interact with the results of these computations in a functional way. Thanks to the syntactic sugar that Haskell monads provide, we also get rid of a lot of boilerplate code that would normally be required to pass the state from one computation to the next, as well as any tedious checking that involves. One can easily imagine a scenario in an imperative language that consists of a series of nested if statements in the code that do null pointer checks on the results from a series of computations. The pseudocode of such a scenario might look like:

httpRequest = parseHttpRequest()
If (httpRequest != null)
{
	queryString = parseQueryString()
	If (queryString != null)
	{
		print(queryString)
	}
}

Haskell provides syntactic sugar in the form of the “do” notation:

Do
	Req <- parseHttpRequest()
	QueryString <- parseQueryString()
        putStrLn QueryString)

Where we could imagine there being some sort of monad that would wrap the imaginary request and querystring types returned from parseHttpRequest and parseQueryString respectively. This looks similar to the imperative style of coding where we define a series of computational, but what’s nice about the monad is that if at any point there is an error in the chain of computations, the rest of the computations in the do block are short circuited, bypassing the need for cumbersome if statements along every step in the computation.

This of course, only scratches the surface of what a monad is and what they can be used for, as evidenced by the plethora of guides out there written about them.

UW CSE P 505 Notes – Continuation Passing Style

Continuation passing style (CPS) is a style of programming wherein function calls are chained together. Each function in the chain performs some computation on a value and then passes the result of that computation on to the next function in the chain. Using this style, functions do not directly return a value. Instead, they return a function call. Exactly which function is called next is determined by making it an explicit argument to the current function.

Functions written in CPS will have the form … ->(a -> r) -> r in Haskell where “…->” denotes the set of argument values that this function does a computation on. This portion of the function signature would match up exactly with an equivalent function that did not use CPS. For example, a sum function would have signature Int->Int->Int in direct style, and have signature Int->Int->(Int->r)->r in CPS style. The difference lies in the (a -> r) –> r, that is, a function that takes as an argument another function that takes a value and returns a value, and that itself returns a value. (a->r) is the continuation and represents the next function in the chain that will be called. Note that the type of the argument a does not have to match the return type of r. The only constraint is that the return type of the continuation must match up with the return type of the current function.
This is abstract, so some simple examples can help illustrate what is going on. Consider the quadratic equation y =3×2 + 2x + 5. A function that solves for y given a value of x might look something like the following without continuations:

add :: Int -> Int -> Int
add x y = x + y
 
square :: Int -> Int
square x = x * x

mult :: Int -> Int -> Int
mult x y = x * y

solveQuadraticExample :: Int -> Int
solveQuadraticExample x = 
    add (add (mult 3 (square x)) (mult 2 x)) 5

With continuations, the code becomes:

addCont :: Int -> Int -> ((Int -> r) -> r)
addCont x y = \k -> k (x + y)
 
squareCont :: Int -> ((Int -> r) -> r)
squareCont x = \k -> k (x * x)

multCont :: Int -> Int -> ((Int -> r) -> r)
multCont x y = \k -> k (x * y) 
 
solveQuadraticExampleCont :: Int -> ((Int -> r) -> r)
solveQuadraticExampleCont x = (\k ->
multCont 2 x (\twoX ->
    squareCont x (\x2 ->
        multCont 3 x2 (\threeXSquared ->
            addCont threeXSquared twoX (\sum -> 
                addCont sum 5 k)))))

This can be invoked by passing in a top level continuation of type (Int -> r) to the solveQuadraticExampleCont. We can use the identity function:
solveQuadraticExampleCont 3 id which prints out 38 from the ghci REPL. The continuations serve to dictate what happens next with the result of the current function, with the order of operations and the intermediate values calculated made explicit. The continuations are somewhat analogous to callback functions. Event driven frameworks that use callbacks to define what happens next in an asynchronous call are a form of CPS. Indeed in Node JS, you will typically see nested anonymous callback functions chained together, as in this contrived example:

http.createServer(function (req, res) {
  parseHeaders(req, function(headers) {
	parseContentType(headers, function(contentType) {
		//....... do some stuff here
	})
  })
}); 

Where in the imaginary parseBody and parseQueryString methods, the second argument is a function that represents what is essentially a continuation to call next with the results of the current function. Often times, these callbacks in NodeJS will also have an error argument that is itself an error handling function. The power of continuations lies in this flexibility of being able to control the flow and add new functionality to existing functions (without the need to modify the code which might be in third party libraries) by defining their continuation. Indeed, continuations can be used to implement any form of non-local control flow, from exception handling to implementing Y combinators.

Notes from UW CSE P 505 – Writing an interpreter

One of the canonical exercises given in Computer Science graduate level Programming Language courses is writing an interpreter for a Scheme like language, due to Scheme’s relatively simple syntax and minimalist design. My UW CSE P 505 class was no exception. Using Haskell as our implementation language, we started off implementing a simple version of the Scheme language and gradually added features to illustrate concepts such as store passing style, continuation passing style, and type systems. How we approached the problem of writing an interpreter was particularly insightful, and the following is a high level overview.
The key takeaway from the course is to think of the interpreter as a series of mappers between successive layers. Much like a web application ultimately maps from the data access layer all the way up to the presentation layer that presents it to the end user, an interpreter maps between an input string to a core expression tree. By adding multiple intermediate steps along the way, each layer is only responsible for a few specific tasks, making it easier to reason about any given step.
The first task of the interpreter is to tokenize the input. Let’s call this method parseToken. It will take in a string and return a list of tokens. It will strip out all the whitespace and comments until we are left with a sequence of tokens that represent numbers, parentheses, and identifiers. In the Haskell code for the interpreter, the Token type looks something like this:

data Brace = Round -- ()
           | Square -- []
           | Curly -- {}
           deriving (Eq, Show)

data Token = Open Brace -- An open brace (of the given shape).
           | Close Brace -- A closing brace (of the given shape).
           | NumTok Integer -- A numeric literal.
           | IdTok String -- An identifier.
           deriving (Eq, Show)

Note that at this layer, an ID Token has no semantic meaning. It exists as a catch all bucket to differentiate between a brace and a number token. The ID Token could be a keyword, an operator, a variable, or something else entirely, but the tokenizing layer does not care. All it cares about is finding them. Passing in “{(5)}” to the parseToken method returns [Open Curly, Open Round, NumTok 5, Close Round, Close Curly]
The subsequent task is to convert the list of tokens into an SExpression tree. Let’s call this method parseSExp. It will take a list of tokens as input and return a single SExp and any leftover tokens not consumed in the parsing (more on this later). This can be visualized as a tree with the single SExp being the root. It may also be helpful to think of the top level SExpression as containing a nested list of SExpressions as well.

data SExp = NumS Integer -- numeric expression
          | IdS String -- identifier
          | ListS [SExp] -- list of S-expressions
          deriving Eq

Because this layer deals with SExpressions, we create new data types that correspond to the types. Although at first glance IdS and NumS seem redundant with IdTok and NumTok, its important for each layer to have its own types so that in the mapping code it is always clear what layer we are mapping to and from. Converting between open and close brace tokens to the newly introduced ListS type is the main responsibility of this layer. An open brace token denotes the start of a new list while a close brace token denotes the end of a list. This means the SExp parser will need to make sure the parentheses are balanced correctly. An SExpressions appearing outside of a pair of open close braces can only appear once. This means that any tokens not consumed in the parsing will be leftover (and can optionally be returned in the output from parseSExp).
If we run parseExpr on the list of tokens corresponding to “(foo (4 bar))”

[Open Round, IdTok "foo", Open Round, NumTok 4, IdTok "bar", Close Round, Close Round]

We get:

ListS [IdS “foo”, ListS [NumS 4, IdS “bar”]]

Again, this layer is not concerned with syntax, just mapping brace tokens into lists of SExpressions and checking that the braces match. It is in the subsequent step that we map the nested list of SExpressions into an expression tree. Let’s call this method parseExpr. Assuming no syntax errors (which would cause the method to fail accordingly), the output from parseExpr will represent a valid expression in our language that can be interpreted. This does not necessarily mean however, that it will not result in any runtime errors. For example, the expression (+ (* 2 3) (+ 1 2)) is represented as a tree with root “+” and two subtrees with * and + as their roots, respectively. Representing the expression as a tree means that we can write our subsequent parse and interpret methods recursively, simplifying the logic.
For simplicity sake, let’s say that our language supports if statements, functions, apply statements, and with* statements (a binding expression that binds a variable to a function or some value):

data Expr = NumE Integer
          | IfE Expr Expr Expr
          | VarE Var
          | FunE [Var] Expr
          | AppE [Expr]
          | WithStarE [(Var, Expr)] Expr
          deriving (Eq, Show)

FunE, AppE, and WithStarE all take a list of arguments. This makes writing an interpreter for the expression more complex than necessary. To further simplify things, we want to desugar our language into a core syntax. We can get rid of the WithStarE which is just a specialized case of AppE. Likewise, by utilizing currying, we can restrict functions and apply statements to only have one argument in the parameter list. Any statements with more than one argument will desugar into partial applications. The core expression would look something like:

data CExpr = NumC Integer
           | IfC CExpr CExpr CExpr
           | VarC Var
           | FunC Var CExpr
           | AppC CExpr CExpr
           deriving (Eq, Show)

Here are some examples of desguaring. The following with expression

(with* ([x (+ 1 2)]
        [y (* x x)]
        [x (+ y 3)])
  (+ x y))

Would desugar into something that looks like this.

((fun (x)
   ((fun (y)
      ((fun (x)
         (+ x y))
       (+ y 3)))
    (* x x)))
 (+ 1 2))

In terms of actual CExpr, it’d parse into:

(AppC (FunC "x" (AppC (FunC "y" (AppC (FunC "x" (AppC (AppC (VarC "+") (VarC "x")) (VarC "y"))) (AppC (AppC (VarC "+") (VarC "y")) (NumC 3)))) (AppC (AppC (VarC "*") (VarC "x")) (VarC "x")))) (AppC (AppC (VarC "+") (NumC 1)) (NumC 2)))

Here’s a simpler example:

((fun (x) (* x x)) (+ 1 2))

Desugars into

AppC (FunC "x" (AppC (AppC (VarC "*") (VarC "x")) (VarC "x")))
     (AppC (AppC (VarC "+") (NumC 1)) (NumC 2))

This concept of desugaring is a useful pattern to have in the toolbox: Whenever the logic for parsing or interpreting an expression becomes too unwieldy, mapping it to a simpler intermediate language with more rigid constraints helps to simplify things.

The final step in the interpreter of course, is to actually interpret the expression. This will map the desugared CExpr into a Value type. Again, because our CExpr is a tree, we can call our interp method starting from the top level of the tree. If there are no run time errors, this will reduce to a final value:

data Val = NumV Integer
         | BoolV Bool
         | FunV Var CExpr Env
         | PrimV String (Val -> Result Val)   

The PrimV type here is of interest, as it represents primitive operators in our language. Its first argument is a string representing the operator (eg “+” for addition), and the second argument is a function that takes a val parameter and returns a val. In order for our interp method to succeed, we will pass in an initial environment that contains all the predefined primitives. Note that our desugared CExpressions have no notion of a Boolean expression, because true and false will be defined as PrimV in this initial environment instead. This environment will grow during the interpretation as we encounter additional variables and their bindings (via variable application expressions). When interp encounters a function application, it up its argument parameter (the desugared CExpr guarantees only one variable per function) in the environment. If it does not exist, the result is an error.
To give you an idea of what the interp code looks like, here is the snippet for interp being called on an IfC statement would look something like:

case expr of
    IfC cond b1 b2 -> case interp cond env of
        Ok (BoolV True) -> interp b1 env
        Ok (BoolV False) -> interp b2 env
        _ -> Err "Conditional does not evaluate to a boolean"   
… other cases here

How to represent addition, subtraction, comparison operators and the like as a PrimV in Haskell is out of the scope of this article, as is how to write the parsing and interpretation methods. As is the popular saying, these are left as an exercise for the reader (or perhaps explained in a subsequent tutorial).

Haskell is cool (infinite self referencing lists)

I’m currently taking CSE 505:Programming Languages at the University of Washington for my masters degree in computer science. We are using Scala at work, a language that incorporates a lot of advanced programming language constructs, so I felt that work and school would synergize well together. As luck would have it, we are learning Haskell for class, and its pretty obvious that a lot of the Scala design and syntax is inspired by Haskell. I’ve been pretty amazed at how expressive a purely functional programming language is compared to a hybrid language like Scala that incorporates both OOP and functional paradigms into its design.

Take for example, lists. In Haskell, you can define data types that are self referencing recursive types. This is not limited to just data types though, you can do the same for lists as well! In the ghci REPL, you can type:

Prelude> let ones = 1:ones

This does exactly what it looks like. “ones” is a list variable that recursively references itself. It is the number one appended to itself over and over again. In fact, if you then type “ones” in ghci, it will print out an infinite list of ones. The beauty here is that Haskell does not evaluate the “ones” list until you actually do something with it. This lazy eval lets you literally work with lists of infinite length! You can then type something like:

Prelude> take 10 ones

which will evaluate to a list with ten 1’s:

[1,1,1,1,1,1,1,1,1,1]

You can also use functions that combine lists. zip takes two lists and produces a single list of pairs, where each nth pair in the list consists of the nth element from lists one and two. If one of the lists has more elements than the other, the resulting list has length equal to that of the shorter list. The extra elements are not paired up with anything and discarded. zipWith is similar to zip. It zips a pair of lists and takes in an additional function parameter that is then applied to each pair in the list. This function takes in two arguments and returns just one.

Prelude> let twos = zipWith (+) ones ones
Prelude> take 10 twos

This prints out
[2,2,2,2,2,2,2,2,2,2]

This leads up to the canonical “hello world” example in Haskell: the one line definition of the fibonacci sequence.

Prelude> let fibs = 0 : 1 : zipWith (+) fibs (tail fibs)

So what exactly is going on here? The tail function returns a list with every element in the original except the first one. To see how the statement works, first recall that zipWith ignores extra elements. So you can think of the above statement as being built out as follows:

Prelude> zipWith (+) [0, 1] [1]
[1]

0:1 are then prepended to [1] which produces [0,1,1]

Prelude> zipWith (+) [0,1,1] [1,1]
[1,2]

0:1 are prepended to [1,2] which produces [0,1,1,2]

Prelude> zipWith (+) [0,1,1,2] [1,1,2]
[1,2,3]

0:1 are prepended to [1,2,3] which produces [0,1,1,2,3]

Prelude> zipWith (+) [0,1,1,2,3] [1,1,2,3]
[1,2,3,5]

0:1 are prepended to [1,2,3,5] which produces [0,1,1,2,3,5] and so on, repeat an infinite number of times.

Note that each recursion adds just one new element to the list, which is the sum of the last and second to last elements in the list. This is exactly the definition of the fibbonacci sequence.

I can interact with this infinite list just like I could any other list. If I wanted to get the 100th element in the sequence, I can type the following and find out that the answer is 354224848179261915075:

Prelude> fibs !! 1000

Awesome.

Committing to git as a different user

I have two accounts on git, one for work, and one for personal use. On my work laptop, I push all my commits to origin on my work account. But as part of learning Scala for work, I created a sandbox where I could play around with Scalatra. I created a repository on git for my sandbox, under my personal user name. I wanted to push all commits in this repository under my personal account. I had gotten all this working and promptly forgot how I set it all up. Unfortunately, it stopped working, so I set it all up again.

I first removed the [Credential] section of my git config:
git config –global –edit

It had been using credential.helper osxkeychain and my keychain had been hosed before. This was probably why everything stopped working. I decided to start from scratch and forgo using keychain.

The first step was to create the ssh key and then adding it to git, as documented here

The second step was to tell git to use my personal account for my scalatra sandbox repository. For some reason, git was still trying to use my work email to push commits to my sandbox repo. After much frustration and googling, I found the answer on stack overflow

The missing step was to edit ~/.ssh/config and add two sections, one for work, and one for personal:

#personal
Host github.com-jieyangh
     HostName github.com
     User git
     IdentityFile ~/.ssh/id_rsa
     IdentitiesOnly yes
   
#work
   Host github.com-jiehu
    HostName github.com
    User git
    IdentityFile ~/.ssh/id_rsa_work

Note that you always connect to git as the user “git”.

Next I deleted my remote origin and re-added it, this time specifying the key to use:

git remote remove
git remote add origin git@github.com-jieyangh:jieyangh/scalatra-sample-API.git

where git@github.com-jieyangh was the host value specified under ~/.ssh/config and jieyangh/scalatra-sample-API.git was the username followed by the git repo (git could not find the repo unless it was preceded by my username).

After that, I was able to push to my work repos as my work account, and my sandbox under my personal account once more.

Scalatra tutorial part 4: Adding scalastyle

Scalastyle provides functionality to enforce style rules and best practices in Scala, similar to jslint, FXCop, or PMD. Setting it up is straightforward.

You can follow along here:
https://github.com/jieyangh/scalatra-sample-API with git hash 5d40618622e79835e870fac1533a90bbf9694dc3

First, we will modify plugins.sbt to add the scalastyle plugin and a resolver to help sbt find the plugin:

resolvers += "sonatype-releases" at "https://oss.sonatype.org/content/repositories/releases/"

addSbtPlugin("org.scalastyle" %% "scalastyle-sbt-plugin" % "0.6.0")

Next we add the scalastyle project settings to the project settings. The scalastyle default project settings will automatically cause the build task currently being executed to fail if scalastyle fails (scalastyleFailOnError):

  lazy val project = Project (
    "sampleAPI",
    file("."),
    settings = Defaults.defaultSettings
      ++ Seq(webSettings :_*)
      ++ Seq(org.scalastyle.sbt.ScalastylePlugin.projectSettings :_*)
      ++ Seq(
      libraryDependencies ++= Seq(
        "org.scalatra" %% "scalatra" % "2.2.2",
        "org.eclipse.jetty" % "jetty-webapp" % "8.1.7.v20120910" % "container,compile",
        "org.eclipse.jetty.orbit" % "javax.servlet" % "3.0.0.v201112011016",
        "ch.qos.logback" % "logback-classic" % "1.0.1",
        "org.scalatra" %% "scalatra-scalatest" % "2.2.2" % "test"
      )
    ) 
  )

For convenience, we can define our own custom build task “full” in build.sbt that does a clean, updates the packages, runs the tests and then runs scalastyle:

addCommandAlias("full", ";clean ;compile ;test ;scalastyle")

Before we run scalastyle we need to generate the config by first running:
./sbt scalastyleGenerateConfig

Now when we run ./sbt full we will see the scalastyle errors. You can see that it enforces some Scala best practices:

[info] scalastyle using config /Users/jiehu/scalatra-sample-API/scalastyle-config.xml
[warn] /Users/jiehu/scalatra-sample-API/src/main/scala/sampleApi/config/VersionInfo.scala:1: Header does not match expected text
[warn] /Users/jiehu/scalatra-sample-API/src/main/scala/sampleApi/config/VersionInfo.scala:37:0: Whitespace at end of line
[warn] /Users/jiehu/scalatra-sample-API/src/main/scala/sampleApi/config/VersionInfo.scala:27:7: Avoid using return
[warn] /Users/jiehu/scalatra-sample-API/src/main/scala/sampleApi/config/VersionInfo.scala:32:30: Avoid using return
[warn] /Users/jiehu/scalatra-sample-API/src/main/scala/sampleApi/config/VersionInfo.scala:16:28: Avoid using null
[warn] /Users/jiehu/scalatra-sample-API/src/main/scala/sampleApi/config/VersionInfo.scala: File must end with newline character
[warn] /Users/jiehu/scalatra-sample-API/src/main/scala/sampleApi/controllers/GreetingController.scala:1: Header does not match expected text
[warn] /Users/jiehu/scalatra-sample-API/src/main/scala/sampleApi/controllers/GreetingController.scala: File must end with newline character
[warn] /Users/jiehu/scalatra-sample-API/src/main/scala/sampleApi/controllers/HealthCheckController.scala:1: Header does not match expected text
[warn] /Users/jiehu/scalatra-sample-API/src/main/scala/ScalatraBootstrap.scala:1: Header does not match expected text
[warn] /Users/jiehu/scalatra-sample-API/src/main/scala/ScalatraBootstrap.scala:12:1: Whitespace at end of line
[warn] /Users/jiehu/scalatra-sample-API/src/main/scala/ScalatraBootstrap.scala: File must end with newline character

We can look up the rules here. I disabled the “Header does not match expected text” which is configured in scalastyle-config.xml file by setting enabled to false. This just checks if the predefined header appears at the start of every file. Since this is a simple tutorial project, there is no need for any such headers to exist. This makes the rule un-necessary.

 <check level="warning" class="org.scalastyle.file.HeaderMatchesChecker" enabled="false">

You can also turn off scalastyle in sections of code by putting them between //scalastyle:off and //scalastyle:on
It is best practice to fully specify the rule id being turned off //scalastyle:off magic.number This way it will only suppress that specific warning and not any others that may exist.

The rest of the checkin is just fixing the miscellaneous scalastyle errors. Once all of them have been fixed, the build succeeds!

NOTE: I had also tried using findbugs as well but found it didn’t play well with Scala. Findbugs works on Java bytecode. However when Scala compiles down to Java bytecode, it often results in Findbugs errors that aren’t really errors. For example, it will report that class names aren’t capitalized when in actuality they are, its just that traits apparently have lowercase class names in the java byte code.

Scalatra tutorial part 3

Part 1 and Part 2 are here. You can follow along on github at https://github.com/jieyangh/scalatra-sample-API. The git hash for part 3 is 92828543922dafb66b00bff03e9f115647747427 (simply type git checkout 92828543922dafb66b00bff03e9f115647747427 from the command line in your local repository). There’s a lot of code in this one, so I’ll just post mostly code snippets from the code on github.

Let’s make a simple health check API. To keep things simple, the API will consist of a single endpoint that reports the current application version. This can be used as a sanity check to quickly determine if the service is up and running and whether or not the right version was deployed. As more functionality is added to the API, the health check API can be expanded in scope to check connections to dependencies such as the database and other web services.

This feature can be broken down into three subtasks. First, we will modify the build script to store the version info inside a dynamically generated file. Second, we will create a singleton that is responsible for parsing this file and handling storage and retrieval of the version information. Once the first two steps are done, the third task is simple: We need to implement the actual API controller for the new endpoint.

1. Modifying the Build:
The build.sbt file we were using before allowed simple definitions of build tasks. By using a build.scala file, we have more freedom over what the build does, and use the full expressiveness of the scala programming language to specify build tasks. We will want the build to write out a text file with all the version information. The first difference between the previous build.sbt and build.scala is that we now define a project object, and set the library dependencies in there:

 lazy val project = Project (
    "sampleAPI",
    file("."),
    settings = Defaults.defaultSettings ++ Seq(webSettings :_*)++ Seq(
      libraryDependencies ++= Seq(
        "org.scalatra" %% "scalatra" % "2.2.2",
        "org.eclipse.jetty" % "jetty-webapp" % "8.1.7.v20120910" % "container,compile",
        "org.eclipse.jetty.orbit" % "javax.servlet" % "3.0.0.v201112011016",
        "ch.qos.logback" % "logback-classic" % "1.0.1",
        "org.scalatra" %% "scalatra-scalatest" % "2.2.2" % "test"
      )
    )
  )

The second difference is that we can now define various build tasks in the code. Here, we call

VersionHelper.BuildVersionsPlugin.buildVersionFile("src/main/resources/version.conf")

VersionHelper mixes in the sbt.Plugin trait, which basically is a way to create code modules that can be run during the build. VersionHelper will be responsible for creating the text file with all the version info. For simplicity sake, the version will simply be the major, minor, patch versions concatenated with the git sha. The major, minor, and patch versions will be hardcoded. The git sha is calculated in what is the least readable part of the code:

 private val sha1Matcher = """[0-9a-f]{5,40}""".r

  private val sha = sha1Matcher findFirstIn "git log --pretty=format:'%H' -1".!! 

  def gitSha: String = sha match {
    case Some(s) => s
    case None => "0000000000000000000000000000000000000000" //error
  }

The “.r” following the string creates a Regex using the magic of Scala implicits. This allows methods to be applied to objects that normally wouldn’t have them. I’m not a fan of them because it makes the code hard to read and follow, sacrificing maintainability for convenience. But this is the recommended way to create a Regex expression in Scala.

The regex is then run. findFirstIn is called as a postfix operator of the sha1Matcher Regex object. This is syntactic sugar. It matches with the string output by the expression: “git log –pretty=format:’%H’ -1″.!!

A note here: In Scala, operators are methods. To allow natural extensions to their language, Scala allows method names to take on various symbols reserved for operators such as “!”. What !! does is actually call scala.sys.process, which then runs the preceding string in the command line, returning the output. In our case, it gets the latest git sha. Again, this syntactic sugar sacrifices readability and allows people to abuse the language by making all sorts of ugly and unintuitive method names. However, this is part of the official Scala library. Finally, once we retrieve the gitSha, we do a pattern match on the result of the Regex match. This is an Option, which can either be Some or None. If there was no match, we just return an error string. The rest of the code in VersionHelper.scala writes out the version information to the version.conf file under /src/main/resources and is relatively straightforward in comparison with the snippet we just looked at. It can be found on github as linked in the start of this article.

The conf file will look something like the following:

 build {
   name = "Sample API"
   version = "1.1.0.92828543922dafb66b00bff03e9f115647747427"
   lastBuilt = "2015-01-08 05:37:22.401 UTC"
 }

Now that we have the configuration file, we will want to create a VersionInfo.scala class in our web application that reads from it. TypeSafe provides the config package that makes reading from .conf files easy. The VersionInfo.scala class can be found on github. To load the config, we use ConfigFactory.load as follows:

 val config = {
     try {
        ConfigFactory.load("version.conf")
     }
     catch {
       case ex:Exception => null
     }
   }

This will load version.conf from the classpath. Scala will be able to locate the version.conf underneath resources folder. The rest of the code in VersionInfo.scala then parses the file using a series of intuitive method calls that do exactly what you would expect them to:

   val name = loadConfig("build.name")
   val version = loadConfig("build.version")
   val lastBuilt = loadConfig("build.lastBuilt")

Each “.” in the string denotes a level of hierarchy in the conf file.

Finally, we get to the easy part. We create a controller that outputs the version info, and mount it to a heartbeat endpoint in the ScalatraBootstrap file. This is covered in part 1 of the Scalatra tutorial. All the code can be found on github as linked earlier. HealthCheckController is just a few lines and is trivial given that all the heavy lifting has been done.

Interview practice: Racing horses

This one is a puzzle question. I’m personally not the biggest fan of these types of problems and would never ask something like this in an interview. That being said, they are kind of fun, and the following problem is no exception. Its an entertaining logic question that can give insight into a candidate’s thought process and how they approach an optimization problem.

There are 25 race horses and you want to find the fastest one. You can only race 5 horses at a time, and you can’t write down their times. How many races does it take to find the fastest three horses?

Here’s how I approached it. The first thing to do is figure out all the assumptions inherent in any puzzle question. The first and most crucial assumption is that the finish times for each horse remains the same across races. Without this assumption, the task is impossible, as the fastest horse could change between races: horse #1 could beat horse #2 in the first race, but lose to it in the second race. The second assumption is that there are no ties. There is a clear ranking in finish times for the horses.

Now with those out of the way, we can approach the problem. The most obvious thing to do first is to divide the 25 horses into five groups, and then race each group. This will identify the fastest horse in each heat. Now we just race the five fastest horses from each group. The winner will be the fastest out of all 25 horses, and we will have identified the fastest horse in just six races. Let’s call this winner horse #1, and assume he came from group #1. That’s the easy part. Now the question remains, how many races do we now need to find the second fastest horse? Should we just pick the second fastest horse from race #6? Should we race all the second place finishers from the five groups as well?

At this point it would be helpful to do some whiteboarding. Let’s write down some concrete race times.

Race #1:
Horse #1 1 minute
Horse #2 2 minute
Horse #3 3 minute
Horse #4 4 minute
Horse #5 5 minute

Race #2:
Horse #6 1.6 minutes
Horse #7 1.7 minutes
Horse #8 1.8 minutes
Horse #9 1.9 minutes
Horse #10 1.95 minutes

By writing out the first two races and assigning some times we can see that it is entirely possible that all the horses in the second heat are actually faster than all of the other horses in the first heat save for horse #1. Let’s write down some more race times for the other heats.

Race #3:
Horse #11 1.1 minutes
Horse #12 1.2 minutes
Horse #13 1.3 minutes
Horse #14 1.4 minutes
Horse #15 1.5 minutes

Again, we see that for the third heat, it is entirely possible that all the horses are faster than all of the other horses in the first heat save for horse #2. At this point it may seem that it is necessary to race all the second place finishers in each heat, before racing the winner of the seventh race against the second place finisher of the sixth race. But that’s not quite true.

Race #4:
Horse #16 1.01 minutes
Horse #17 1.02 minutes
Horse #18 1.03 minutes
Horse #19 1.04 minutes
Horse #20 1.05 minutes

Race #5:
Horse #21 1.001 minutes
Horse #22 1.002 minutes
Horse #23 1.003 minutes
Horse #24 1.004 minutes
Horse #25 1.005 minutes

Race #6:
Horse #1 comes in first place at 1 minute
Horse #21 comes in second at 1.001 minutes
Horse #16 comes in third at 1.01 minutes
Horse #11 comes in fourth at 1.1 minutes
Horse #6 comes in fifth at 1.6 minutes

From this we can quickly see that the second place finisher of race #6 (horse #21 from heat 5 from our whiteboard example) is faster than the first place finishers of heats 2,3, and 4. This means that he is also going to be faster than all the horses in heats 2,3, and 4. We can quickly eliminate all those horses as being the second fastest. Thus the only horses to consider for being second fastest are either the second place finisher in heat 6, or the second fastest horse from the winning heat the fastest horse was from (horse #2 from heat 1 in this case). So we race them. So far, we’ve had to do seven races.

So what about identifying the third fastest horse? Based on the same logic as above, we know that the third place finisher of race #6 (horse #16 from heat 4 from our whiteboard example) is faster than all the horses in heats 2 and 3. We do not know if this third place finisher is faster than the second or third place finisher from the winning heat (Horse #2 and #3 from heat 1), or the second place finisher from the heat that the second fastest horse came from (horse #22 from heat 5 in this case). Note for the latter case, we don’t need to consider the third place finisher from this heat, because at best it could only be the fourth fastest horse overall (we know by its placement that it must be slower than the first and second place finishers in its heat and slower than the fastest horse overall). Since we don’t care about the fourth fastest horse, we can eliminate it from consideration. So, there are only four horses to consider for third fastest, and we will need to have them compete in an eighth race to find the third fastest horse overall.

Or do we? In race #7, we raced horse #2 from heat 1 against horse #21 from heat 5. In race #8, we raced horse #2 from heat 1 again. There are only five horses we need to consider for second and third fastest, so we can combine races #7 and #8 into one. The second and third place finishers of the seventh race are the second and third fastest horses, respectively. In total, we just need seven races.

The general algorithm is:
Divide the 25 horses into groups of five.
Race each group.
Select the winner from each group and race them.
The winner from this race is the fastest horse. Let heat A denote the winning heat this horse is from, heat B denote the heat that the second place winner is from, and heat C denote the heat that the third place winner is from.
Now race horse #2 and #3 from heat A, horse #1 and #2 from heat B, and horse #1 from heat C (where #1, #2, and #3 denotes what place the horse finished for that given heat) in a final seventh race.
The first and second place winners of this race are the second and third fastest horses.

Scalatra Tutorial (Part 2): Logging, debugging, and tests

Part 1 of this tutorial is here

You can follow along by getting the code from github here: https://github.com/jieyangh/scalatra-sample-API
The git hash for part 2 is 32190b1ae5eb685f6a06eaae6cd5fa15d5cf23bd

Now that we have a barebones implementation of a web API, let’s start adding on functionality and utility.

Debugging 

Remote debugging with IntelliJ makes life much easier. Here’s how to set it up. Open the project in IntelliJ and go to Run->Edit->Configurations. Choose Remote , hit the “+” button, and enter in a name for the configuration (I chose “ScalatraDebug” for simplicity). The default settings are fine and will set up a debug port on port 5005:

remote debug

We will need to modify the SBT parameters accordingly, since it defaults to a different port. Go to File->Settings->Other settings and select SBT from the side nav. Under IDE Settings, change the VM parameters so that its listening on port 5005:

-Xmx512M -XX:MaxPermSize=256M -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5005

SBTDebug

Launch the application in the SBT console by typing “container:start” as we did before in part 1. Once the app is running, go to Run->Debug and select the remote debugging configuration that you previously created.

Logging

Any serious enterprise level application will need logging. Scalatra uses the logback framework by default, so that’s what we’ll use too. First we’ll need to add this dependency in the librarydependencies sequence in build.sbt:

"ch.qos.logback" % "logback-classic" % "1.0.1"

Next we will create a logback.xml under src/main/resources. Chances are if you’re reading this you’ve used logback before. If not, the file contents are fairly self explanatory. We can modify the pattern to suit our purposes, but including the date, thread, level, logger and actual message is a good a starting point as any:

<configuration>
    <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
        <encoder>
            <pattern>Captain's Log Stardate %date{ISO8601}. [%thread] %level %logger %msg
            </pattern>
        </encoder>
    </appender>

    <root level="debug">
        <appender-ref ref="STDOUT" />
    </root>
</configuration>

Finally, we’ll need to utilize the logger in our code. Let’s add some logging to the GreetingController scala class. First let’s create the logger:

val logger =  LoggerFactory.getLogger(getClass)

 

And now we just need to use our logger object to log things:

logger.info("We have entered a spectacular binary star system in the Kavis Alpha sector");

Again, pretty self explanatory. Here’s the full listing for the GreetingController:

package sampleApi.controllers

import org.scalatra.ScalatraServlet
import org.slf4j.{Logger, LoggerFactory}

class GreetingController extends ScalatraServlet {
  val logger =  LoggerFactory.getLogger(getClass)

  get("/") {
    logger.info("We have entered a spectacular binary star system in the Kavis Alpha sector");
    "Hello world"
  }

  get("/:name") {
    val name = params.getOrElse("name", "world")
    "Hello " + name
  }
}

Tests

Now we’ll want to add some automated testing. ScalaTest appears to be the go to test framework for Scala, and it provides support for Scalatra as well. This can be accomplished by adding a dependency for Scalatra ScalaTest in the libraryDependencies sequence in build.sbt:

"org.scalatra" %% "scalatra-scalatest" % "2.2.2" % "test"

Next we’ll want to create a test under src/test/scala. We’ll want our test class to extend the ScalatraSuite class, which allows us to use simple sentences to specify the expected behavior of our application. This is Behavior Driven Development (BDD). We also want our test class to have the “FunSuite” trait so that it can treat each test case as a function value. Note that in the official Scalatra testing example, the listed code didn’t compile for me, because it could not resolve “FunSuiteLike”. Replacing “FunSuiteLike” with “FunSuite” fixed the issue for me.

import org.scalatra.test.scalatest.ScalatraSuite
import org.scalatest.FunSuite
import sampleApi.controllers.GreetingController

class GreetingControllerTests extends ScalatraSuite  with FunSuite {
  addServlet(classOf[GreetingController], "/*")

  test("simple get") {
    get("/bob") {
      status should equal (200)
      body should include ("Hello bob")
    }
  }
} 

The test code is also pretty straightforward. An HTTP get request for “/bob” should return a status code of 200, and the response message should include “Hello bob”. You can run the test in IntelliJ by right clicking anywhere on the test code, or by creating a test configuration in the IDE and running it via the Run menu.