Interview practice: Racing horses

This one is a puzzle question. I’m personally not the biggest fan of these types of problems and would never ask something like this in an interview. That being said, they are kind of fun, and the following problem is no exception. Its an entertaining logic question that can give insight into a candidate’s thought process and how they approach an optimization problem.

There are 25 race horses and you want to find the fastest one. You can only race 5 horses at a time, and you can’t write down their times. How many races does it take to find the fastest three horses?

Here’s how I approached it. The first thing to do is figure out all the assumptions inherent in any puzzle question. The first and most crucial assumption is that the finish times for each horse remains the same across races. Without this assumption, the task is impossible, as the fastest horse could change between races: horse #1 could beat horse #2 in the first race, but lose to it in the second race. The second assumption is that there are no ties. There is a clear ranking in finish times for the horses.

Now with those out of the way, we can approach the problem. The most obvious thing to do first is to divide the 25 horses into five groups, and then race each group. This will identify the fastest horse in each heat. Now we just race the five fastest horses from each group. The winner will be the fastest out of all 25 horses, and we will have identified the fastest horse in just six races. Let’s call this winner horse #1, and assume he came from group #1. That’s the easy part. Now the question remains, how many races do we now need to find the second fastest horse? Should we just pick the second fastest horse from race #6? Should we race all the second place finishers from the five groups as well?

At this point it would be helpful to do some whiteboarding. Let’s write down some concrete race times.

Race #1:
Horse #1 1 minute
Horse #2 2 minute
Horse #3 3 minute
Horse #4 4 minute
Horse #5 5 minute

Race #2:
Horse #6 1.6 minutes
Horse #7 1.7 minutes
Horse #8 1.8 minutes
Horse #9 1.9 minutes
Horse #10 1.95 minutes

By writing out the first two races and assigning some times we can see that it is entirely possible that all the horses in the second heat are actually faster than all of the other horses in the first heat save for horse #1. Let’s write down some more race times for the other heats.

Race #3:
Horse #11 1.1 minutes
Horse #12 1.2 minutes
Horse #13 1.3 minutes
Horse #14 1.4 minutes
Horse #15 1.5 minutes

Again, we see that for the third heat, it is entirely possible that all the horses are faster than all of the other horses in the first heat save for horse #2. At this point it may seem that it is necessary to race all the second place finishers in each heat, before racing the winner of the seventh race against the second place finisher of the sixth race. But that’s not quite true.

Race #4:
Horse #16 1.01 minutes
Horse #17 1.02 minutes
Horse #18 1.03 minutes
Horse #19 1.04 minutes
Horse #20 1.05 minutes

Race #5:
Horse #21 1.001 minutes
Horse #22 1.002 minutes
Horse #23 1.003 minutes
Horse #24 1.004 minutes
Horse #25 1.005 minutes

Race #6:
Horse #1 comes in first place at 1 minute
Horse #21 comes in second at 1.001 minutes
Horse #16 comes in third at 1.01 minutes
Horse #11 comes in fourth at 1.1 minutes
Horse #6 comes in fifth at 1.6 minutes

From this we can quickly see that the second place finisher of race #6 (horse #21 from heat 5 from our whiteboard example) is faster than the first place finishers of heats 2,3, and 4. This means that he is also going to be faster than all the horses in heats 2,3, and 4. We can quickly eliminate all those horses as being the second fastest. Thus the only horses to consider for being second fastest are either the second place finisher in heat 6, or the second fastest horse from the winning heat the fastest horse was from (horse #2 from heat 1 in this case). So we race them. So far, we’ve had to do seven races.

So what about identifying the third fastest horse? Based on the same logic as above, we know that the third place finisher of race #6 (horse #16 from heat 4 from our whiteboard example) is faster than all the horses in heats 2 and 3. We do not know if this third place finisher is faster than the second or third place finisher from the winning heat (Horse #2 and #3 from heat 1), or the second place finisher from the heat that the second fastest horse came from (horse #22 from heat 5 in this case). Note for the latter case, we don’t need to consider the third place finisher from this heat, because at best it could only be the fourth fastest horse overall (we know by its placement that it must be slower than the first and second place finishers in its heat and slower than the fastest horse overall). Since we don’t care about the fourth fastest horse, we can eliminate it from consideration. So, there are only four horses to consider for third fastest, and we will need to have them compete in an eighth race to find the third fastest horse overall.

Or do we? In race #7, we raced horse #2 from heat 1 against horse #21 from heat 5. In race #8, we raced horse #2 from heat 1 again. There are only five horses we need to consider for second and third fastest, so we can combine races #7 and #8 into one. The second and third place finishers of the seventh race are the second and third fastest horses, respectively. In total, we just need seven races.

The general algorithm is:
Divide the 25 horses into groups of five.
Race each group.
Select the winner from each group and race them.
The winner from this race is the fastest horse. Let heat A denote the winning heat this horse is from, heat B denote the heat that the second place winner is from, and heat C denote the heat that the third place winner is from.
Now race horse #2 and #3 from heat A, horse #1 and #2 from heat B, and horse #1 from heat C (where #1, #2, and #3 denotes what place the horse finished for that given heat) in a final seventh race.
The first and second place winners of this race are the second and third fastest horses.

Interview questions: Sorting a terabyte of integers with a gigabyte of memory (part 2)

In the previous article, I mentioned that the mergesort algorithm could be modified in order to solve this problem. The usual mergesort implementation assumes that all the unsorted elements can fit into main memory. The algorithm divides the unsorted elements into two halves, and then recursively calls mergesort on each half before merging them together. The merge operation requires a temporary array to hold the merged result, making it a non starter. Worse, the merging will also be horribly inefficient on disk, as swapping two elements on disk versus swapping two elements of an in-memory array is orders of magnitude slower. Disk I/O is optimized for sequential, not random access. Furthermore, due to the large size of the unsorted input, there will likely not be enough memory on the stack for all the recursive calls required.

We will need to use an external sorting algorithm to address these issues, and the basic idea behind mergesort can be used to come up with a workable solution. Let us define M as the set of unsorted integers residing on hard drive D1, and N as the total number of records that can be sorted in memory at any given time (N takes into account any overhead needed to perform the sort itself). Assume also that we have three additional hard drives in addition to D1 (or some other similar storage medium that is optimized for sequential read/write access, such as tape): D2, D3, and D4. We can sort the data in multiple passes. The initial pass generates N/M sorted subsets of size N. Subsequent passes will merge these subsets together until we are left with one final merged set of sorted integers.

Let’s use a simple example with N = 18 and M = 3. Here is the unsorted data before we apply our sorting algorithm:

D1    15 4 1 20 19 3 100 80 8 12 10 11 55 40 31 39 67 88           
D2
D3
D4

The first pass reads M elements at a time from D1 and sorts them in memory (any performant algorithm with a good asymptotic runtime such as quicksort will do). The resulting subsets will be written to disk, with the destination alternating between D3 and D4.

D1              
D2
D3    1  4  15 | 8  80 100 | 31 40 55
D4    3  19 20 | 10 11 12  | 39 67 88

D3 and D4 now each contain three (half of N/M) sorted subsets of size M. The next step will be to combine two subsets at a time, one from D3 and one from D4, into a set of size 2*M. The output will be written to disk, this time with the destination alternating between D1 and D2.

D1    1  3  4  15 19 20  | 31 39 40 55 67 88           
D2    8  10 11 12 80 100 
D3    
D4    

This method of reading in the data from two disks and writing the merged output to the other two disks is repeated until we end up with one disk which contains all the data in sorted order.

D1               
D2     
D3    1  3  4  8  10 11 12 19 20 80 100
D4    31 39 40 55 67 88
D1    1 3 4 8 10 11 12 19 20 31 39 40 55 67 80 88 100            
D2     
D3   
D4   

The final detail here is how we perform the actual merge operation itself given the limited amount of memory available. It turns out we can re-use the merge logic from mergesort. Recall that in the mergesort algorithm, the merge operation takes in two subarrays, S1 and S2 (since the sort happens in memory, these are typically passed in as start and end indexes from the array being sorted), which are then merged in sorted order into a temporary array T1. The contents of T1 are then written back to the array being sorted. The logic for the merge is fairly simple. Let A1, A2, and A3 be indexes into S1, S2, S3, respectively. One integer is read from each subarray at a time, with the smaller of the integers being written out to the temporary array. We then advance the appropriate index. For example, if S1[A1] < S2[A2], then S1[A1] is written to S3[A3], and A1 and A3 are incremented. If S1[A1] > S2[A2], then S2[A2] is written to S3[A3], and A2 and A3 are incremented.

The important takeaway here is that only two integers need to be compared at any given time, which means that the merge operation requires very little memory. Thus, the logic for the merge operation of our external sorting algorithm will be almost identical, only instead of reading/writing from an array, we will be dealing with Disk I/O instead. Instead of incrementing an index into an array, we advance the spindle on the hard drive instead. Because we are reading and writing from the disks in sequential order, we avoid being penalized by random disk I/O access.

As with mergesort, each pass of the data halves the number of sets. After the initial run, there are N/M subsets. Thus, log2(N/M) subsequent passes through the data are required. Note that if additional external storage devices were available, less runs would be required. For example, if there were six total disks, then each run would cut the number of sets by one third. Given 2k disks, the number of runs required would be logk(N/M). There is additional complexity in the merge logic however. Finding the smaller of two integers is trivial, since we can just compare them directly. Finding the smallest element in a set of k integers however, will require additional work. A data structure such as a priority queue could be used to return the minimum. Note that we also need to keep track of which disk the minimum came from, so that we can read in the next integer from that disk. Additional calls to remove the previous minimum from the priority queue and insert the next integer read from disk will also need to be made.

Interview questions explained: Sorting a terabyte of integers with a gigabyte of memory

Given a terabyte of unsorted 32-bit unsigned integers on disk and only one gigabyte of memory, sort the integers. At first glance, mergesort would be the first solution that comes to mind. However, the typical mergesort implementation assumes that all the elements can fit into main memory. Indeed, it is still possible to modify the mergesort algorithm to solve this problem, but there is another alternative: counting sort. The basic idea is to keep track of how many times each integer appears. A hash table would be the ideal data structure for this purpose. From there, its just a simple matter of iterating through each integer in the hash table in order, looking up how many times it appears, and writing them out to disk. The end result is a sorted list of integers. This sort has an O(n) runtime, as it requires one linear pass to count all the integers, and a second one to write them out in sorted order. This sort can only be used in special cases where all the possible values of the unsorted elements is either known ahead of time or is finite. A counting sort would not work for example if we were dealing with floats or strings.

However, because we are dealing with integers, we can apply the counting sort in this scenario. Here are some quick numbers. There are 1,099,511,627,776 bytes in a terabyte. Each integer takes up four bytes of storage, which means there are a total of 274,877,906,944 unsorted integers. There are 1,073,741,824 bytes in a gigabyte. Assume we use some kind of hash table to keep track of the # of occurrences of each integer, and that the key/value pair consists of integer/integer. Without considering the additional overhead of storing the hash table metadata in memory, we would need eight bytes for each key/value pair. This means that at most, we could hope to store the counts for 134,217,728 integers in memory at a time. It is quickly clear that there is not enough memory to do the counting sort in one pass. One workaround would be to write out the integer counts to disk, reading and counting the integers in one gigabyte chunks at a time (allowing for additional memory overhead used by the program itself). Disk I/O is expensive. To make the system more robust, we can write out additional metadata to disk to keep track of what has been read so far. This would allow the program to resume progress from where it left off in the case of a fatal error.

This solution can be further optimized based on the distribution of the unsorted integers. This is why asking for clarification during the interview process is so important. If the range of possible integer values is limited by a minimum and maximum value, then this frees up the amount of memory required for storing the keys. Likewise, if there is a maximum number of times each integer can appear in the unsorted list, this frees up the amount of memory required for storing the count values.

The first chapter of the book Programming Pearls provides one such example of a problem that could be solved entirely in memory with an optimized counting sort. The problem involves a list of unique nine digit phone numbers that need to appear in sorted order. Because each number appears only once in the unsorted set, only one bit of memory is needed to store the count for each one. Thus, a bit vector can be used to store the key/value pairs for the counting sort. Each bit in the vector represents all the possible ten digit numbers between 0 and 999,999,999. If the particular bit corresponding to a number is flipped on, the number appears in the unsorted list, otherwise it does not. A bit vector with 1,000,000,000 bits requires only 125,000,000 bytes or roughly 119 megabytes of memory. Not bad. But even if we had a scenario where the set of key/value pairs cannot fit into main memory, this limitation can be bypassed altogether.

The advantage of the counting sort is that it can scale with the size of the unsorted elements by using a distributed system to do the counting. Here is a high level overview of one possible design. The range of possible values can be divided into buckets across various slave machines; each machine stores the counts for its respective range of values. For example, machine A stores all the integers between 1 and 1,000,000, machine B stores all the integers between 1,000,001 to 2,000,000, and so on. Each time an integer is read from the master, a mapping function is applied to determine the machine whose range the integer falls in. A notification service then alerts the appropriate machine, which then increments the count for that integer accordingly. Once all the integers have been read and counted, each machine in the system can return its set of ordered key/value pairs by exposing it via a web service interface, or some functional equivalent. Keys in the range that have a count of zero can be omitted from the response. The master can then aggregate this data and print out the final result in sorted order.

The potential downside when it comes to a distributed approach is the increased complexity. Each machine is now a potential point of failure. To address this, additional error checking logic is needed to make the system more robust. For example, every time the machine increments its value, we want to notify the master that the work is done. If an error is encountered, we want to retry the increment operation. Ideally, this should all be done asynchronously to improve performance, using some sort of event driven notification system. The easiest solution here would be to simply use a third party component. Messaging queues such as RabbitMQ, MSMQ, IBM MQ, etc were built from the ground up with these kind of issues in mind. We can add redundancy and provide even more error checking by having clusters of machines be responsible for each range of numbers. Once all the unsorted integers have been counted, we can md5 hash the data structure containing the key/value pairs on each machine. By checking that this value is identical across all the machines in a given cluster, we can help ensure that the counts were performed correctly. The end result of this additional complexity is a highly scalable system that can sort an unlimited number of elements, given enough hardware.

When given a sorting problem in an interview, it is important to ask lots of questions and try to break the problem down into something simpler. What exactly is being sorted? By identifying the various constraints on the possible set of values, an easier solution such as a counting sort may present itself. Typically, the interviewer will follow up the question by asking you to scale your solution. You’ll want to think about how to divide the work up in parallel across a large distributed system. The map/reduce class of algorithms was designed to do exactly that; a distributed counting sort is one such example. It may not be the right solution every time, but its a good one to bring up during the interview process. The right mindset to have when asked a scalability question is to have a conversation with the interviewer, discussing the various possible solutions and their design tradeoffs. A good talk goes a long way, and at the very least, you may come out of it learning a lot.

Interview questions explained: Fun with multiplication!

Write a method that takes as its input an array of integers. For each element in the array, print the product of all the other elements in the array. For example, given the following array:
[1,2,3,4,5]
The output would be:
120, 60, 40, 30, 24

Let’s start with the most literal solution. For each element in the array, loop through all the other elements and multiply them together:

int[] j = { 1, 2, 3, 4, 5}; 

for (int outerIndex = 0; outerIndex < j.Length; outerIndex ++)
{
    int product = 1;
    for (int innerIndex = 0; innerIndex < j.Length; innerIndex++)
    {
        if (innerIndex != outerIndex)
        {
            product *= j[innerIndex];
        }
    }

    Console.WriteLine(product);
} 

This solution requires an outer for loop to iterate through each element in the array and then an inner for loop to iterate through the other n-1 elements in the array. The asymptotic complexity of this is O(n^2).

Obviously, this isn’t a very good solution, so we can do better. The key observation to make here is that for any given element with value x in the array, the product of all the other elements in the array is simply the total product of all the elements divided by x. We only need to calculate the total product once.

This solution only requires two passes through the array. The first calculates the total product, and the second divides the total product by each element in the array:

int[] j = { 1, 2, 3, 4, 5 };

int totalProduct = 1;
for (int i = 0; i < j.Length; i++)
{
    totalProduct *= j[i];
}

for (int i = 0; i < j.Length; i++)
{
    //assume we check for zeroes beforehand to prevent divide by zero errors
    Console.WriteLine(totalProduct / j[i]);
}

This solution has O(n) complexity, since it requires only two linear scans of the array.

Now let’s make the problem a little bit more challenging. What if we cannot use division? Assume that the operation is too prohibitively expensive and that we need to find a workaround (a not uncommon scenario in the real world).

We can use an algorithm design technique known as dynamic programming: Break the problem up into smaller sub-problems, solve and store the solution for each one, and then combine the solutions as necessary to arrive at the answer to the original problem. One difference between dynamic programming and the divide and conquer class of algorithms is that dynamic programming stores the solutions to the subproblems, which are then retrieved at a later point in time. This is an optimization that prevents the same sub-problems from being solved multiple times. One example of this approach is calculating the nth element in the fibbonacci sequence. The fibbonacci sequence is a sequence that starts with 0, 1, 1, 2 and where every subsequent number is the sum of the previous two numbers in the sequence. Typically, the solution involves using either recursion or iteration. However, we can use dynamic programming by precomputing the sequence up to some number j. Thus, for all numbers where n < j, instead of having to use recursion or iteration, we can do a simple lookup. For numbers where n > j, we can still reduce the number of computations by summing from j and j-1 instead of from 0 and 1.

With respect to the problem of determining the products without the usage of division, we can use a similar approach. Note that the inefficiency of the first solution is due to the same multiplication operations being carried out multiple times. We can eliminate these redundant calculations by using dynamic programming. For any given element k in the array j of size n, we want to multiply all the numbers to the left of k with all the numbers to the right of k. By precomputing the running product of all the elements in the array in both directions (from left to right and vice versa), we now know the product of all the numbers to the left and right of any element k in the array. The problem can then be solved:

int[] j = {1,2,3,4,5};

int[] runningProductLeft = new int[j.Length];
int[] runningProductRight = new int[j.Length];  

int product = 1;
//there is no element to the left of the start of the array, so set it to 1
runningProductLeft[0] = 1;
            
//since we already set the first element of runningProductLeft
//start populating it from index 1
for (int i = 1; i < j.Length; i++)
{
    product = product * j[i - 1];
    runningProductLeft[i] = product;
}

//we want to populate runningProductRight in reverse by starting from the end of the array

//there is no element to the right of the end of the array, so set it to 1
runningProductRight[j.Length - 1] = 1;
product = 1;

//since we already populated the last element of runningProductRight, start populating from the second to last element in the array
for (int i = j.Length - 2; i >= 0; i--)
{ 
    product = product * j[i + 1];
    runningProductRight[i] = product;
}

//now that the running products have been precomputed, printing out the solution becomes trivial
for (int i = 0; i < j.Length; i++)
{
    product = runningProductLeft[i] * runningProductRight[i];
    Console.WriteLine(product);
}

This solution requires three linear scans through the array, so the runtime complexity is still O(n).