UW CSE P576 notes – Harris corner detection

The following are my notes on the Harris corner detection algorithm for finding the features in an image. These slide screenshots were taken from the University of Washington course homepage here:


The idea is to consider a small window around each pixel p in an image. We want to identify all such pixel windows that are unique. Uniqueness can be measured by shifting each window by a small amount in a given direction and measuring the amount of change that occurs in the pixel values.


More formally, we take the sum squared difference (SSD) of the pixel values before and after the shift, and identifying pixel windows where the SSD is large for shifts in all 8 directions. Let us define the change function E(u,v) as the sum of all the sum squared differences (SSD), where u,v are the x,y coordinates of every pixel in our 3 x 3 window and I is the intensity value of the pixel. The features in the image are all pixels that have large values of E(u,v), as defined by some threshold.


After some fancy math that is best left explained by Wikipedia or the original slides which essentially involve taking the first order approximation of the Taylor series expansion for I(x + u, y + v), we are left with:



where H is the Harris matrix and the I_x and I_y terms are the gradients in the x and y directions, respectively (the gradient values for each pixel can be done using the Sobel operator). Note that this is a sum of all the matrices in the window W. This is important later.

Remember that we want the SSD to be large in shifts for all eight directions, or conversely, for the SSD to be small for none of the directions. By solving for the eigenvectors of H, we can obtain the directions for both the largest and smallest increases in SSD. The corresponding eigenvalues give us the actual value amount of these increases. Because H is a 2×2 matrix, solving for the eigenvalues can be done by taking the determinant and setting it to 0, and using the quadratic equation to find the two possible solutions.

Because solving the quadratic equation for every pixel is computationally expensive (it requires the square root operator), we can use a variant where instead of solving for the eigenvalues directly, we compute a corner strength function as defined by:

c(H) = determinant(H) / trace(H) where the trace is the sum of the two elements in the main diagonal (upper left to lower right). This is the Harris operator.

One question that tripped me up as well as other students is why the determinant of the Harris matrix isn’t always equal to zero. The determinant of H at first glance is equal to

I_x^2 * I_y^2 –  I_x*I_y * I_x * I_y. This becomes I_x^2 * I_y^2 – I_x^2 * I_y^2 = 0. However, as previously noted, these individual terms represent the sums across all the pixel values in the window. So I_x^2 is summed up over all the pixels in the window W, as is I_x*I_y and such.

Here then is the high level pseudocode:

1. Take the grayscale of the original image

2. Apply a Gaussian filter to smooth out any noise

3.  Apply sobel operator to find the x and y gradient values for every pixel in the grayscale image

4. For each pixel p in the grayscale image, consider a 3×3 window around it and compute the corner strength function. Call this its Harris value.

5. Find all pixels that exceed a certain threshold and are the local maxima within a certain window (to prevent redundant dupes of features)

6. For each pixel that meets the criteria in 5, compute a feature descriptor.

Step 5 is itself a topic of much discussion that is out of scope for these notes. The simplest approach is to use a 5 x 5 window. In terms of feature matching, such a feature descriptor is invariant to translation, but nothing else. Better feature descriptors would be invariant to rotation, illumination, and scaling.



Interview questions: Sorting a terabyte of integers with a gigabyte of memory (part 2)

In the previous article, I mentioned that the mergesort algorithm could be modified in order to solve this problem. The usual mergesort implementation assumes that all the unsorted elements can fit into main memory. The algorithm divides the unsorted elements into two halves, and then recursively calls mergesort on each half before merging them together. The merge operation requires a temporary array to hold the merged result, making it a non starter. Worse, the merging will also be horribly inefficient on disk, as swapping two elements on disk versus swapping two elements of an in-memory array is orders of magnitude slower. Disk I/O is optimized for sequential, not random access. Furthermore, due to the large size of the unsorted input, there will likely not be enough memory on the stack for all the recursive calls required.

We will need to use an external sorting algorithm to address these issues, and the basic idea behind mergesort can be used to come up with a workable solution. Let us define M as the set of unsorted integers residing on hard drive D1, and N as the total number of records that can be sorted in memory at any given time (N takes into account any overhead needed to perform the sort itself). Assume also that we have three additional hard drives in addition to D1 (or some other similar storage medium that is optimized for sequential read/write access, such as tape): D2, D3, and D4. We can sort the data in multiple passes. The initial pass generates N/M sorted subsets of size N. Subsequent passes will merge these subsets together until we are left with one final merged set of sorted integers.

Let’s use a simple example with N = 18 and M = 3. Here is the unsorted data before we apply our sorting algorithm:

D1    15 4 1 20 19 3 100 80 8 12 10 11 55 40 31 39 67 88           

The first pass reads M elements at a time from D1 and sorts them in memory (any performant algorithm with a good asymptotic runtime such as quicksort will do). The resulting subsets will be written to disk, with the destination alternating between D3 and D4.

D3    1  4  15 | 8  80 100 | 31 40 55
D4    3  19 20 | 10 11 12  | 39 67 88

D3 and D4 now each contain three (half of N/M) sorted subsets of size M. The next step will be to combine two subsets at a time, one from D3 and one from D4, into a set of size 2*M. The output will be written to disk, this time with the destination alternating between D1 and D2.

D1    1  3  4  15 19 20  | 31 39 40 55 67 88           
D2    8  10 11 12 80 100 

This method of reading in the data from two disks and writing the merged output to the other two disks is repeated until we end up with one disk which contains all the data in sorted order.

D3    1  3  4  8  10 11 12 19 20 80 100
D4    31 39 40 55 67 88
D1    1 3 4 8 10 11 12 19 20 31 39 40 55 67 80 88 100            

The final detail here is how we perform the actual merge operation itself given the limited amount of memory available. It turns out we can re-use the merge logic from mergesort. Recall that in the mergesort algorithm, the merge operation takes in two subarrays, S1 and S2 (since the sort happens in memory, these are typically passed in as start and end indexes from the array being sorted), which are then merged in sorted order into a temporary array T1. The contents of T1 are then written back to the array being sorted. The logic for the merge is fairly simple. Let A1, A2, and A3 be indexes into S1, S2, S3, respectively. One integer is read from each subarray at a time, with the smaller of the integers being written out to the temporary array. We then advance the appropriate index. For example, if S1[A1] < S2[A2], then S1[A1] is written to S3[A3], and A1 and A3 are incremented. If S1[A1] > S2[A2], then S2[A2] is written to S3[A3], and A2 and A3 are incremented.

The important takeaway here is that only two integers need to be compared at any given time, which means that the merge operation requires very little memory. Thus, the logic for the merge operation of our external sorting algorithm will be almost identical, only instead of reading/writing from an array, we will be dealing with Disk I/O instead. Instead of incrementing an index into an array, we advance the spindle on the hard drive instead. Because we are reading and writing from the disks in sequential order, we avoid being penalized by random disk I/O access.

As with mergesort, each pass of the data halves the number of sets. After the initial run, there are N/M subsets. Thus, log2(N/M) subsequent passes through the data are required. Note that if additional external storage devices were available, less runs would be required. For example, if there were six total disks, then each run would cut the number of sets by one third. Given 2k disks, the number of runs required would be logk(N/M). There is additional complexity in the merge logic however. Finding the smaller of two integers is trivial, since we can just compare them directly. Finding the smallest element in a set of k integers however, will require additional work. A data structure such as a priority queue could be used to return the minimum. Note that we also need to keep track of which disk the minimum came from, so that we can read in the next integer from that disk. Additional calls to remove the previous minimum from the priority queue and insert the next integer read from disk will also need to be made.