Interview questions: Sorting a terabyte of integers with a gigabyte of memory (part 2)

In the previous article, I mentioned that the mergesort algorithm could be modified in order to solve this problem. The usual mergesort implementation assumes that all the unsorted elements can fit into main memory. The algorithm divides the unsorted elements into two halves, and then recursively calls mergesort on each half before merging them together. The merge operation requires a temporary array to hold the merged result, making it a non starter. Worse, the merging will also be horribly inefficient on disk, as swapping two elements on disk versus swapping two elements of an in-memory array is orders of magnitude slower. Disk I/O is optimized for sequential, not random access. Furthermore, due to the large size of the unsorted input, there will likely not be enough memory on the stack for all the recursive calls required.

We will need to use an external sorting algorithm to address these issues, and the basic idea behind mergesort can be used to come up with a workable solution. Let us define M as the set of unsorted integers residing on hard drive D1, and N as the total number of records that can be sorted in memory at any given time (N takes into account any overhead needed to perform the sort itself). Assume also that we have three additional hard drives in addition to D1 (or some other similar storage medium that is optimized for sequential read/write access, such as tape): D2, D3, and D4. We can sort the data in multiple passes. The initial pass generates N/M sorted subsets of size N. Subsequent passes will merge these subsets together until we are left with one final merged set of sorted integers.

Let’s use a simple example with N = 18 and M = 3. Here is the unsorted data before we apply our sorting algorithm:

D1    15 4 1 20 19 3 100 80 8 12 10 11 55 40 31 39 67 88           
D2
D3
D4

The first pass reads M elements at a time from D1 and sorts them in memory (any performant algorithm with a good asymptotic runtime such as quicksort will do). The resulting subsets will be written to disk, with the destination alternating between D3 and D4.

D1              
D2
D3    1  4  15 | 8  80 100 | 31 40 55
D4    3  19 20 | 10 11 12  | 39 67 88

D3 and D4 now each contain three (half of N/M) sorted subsets of size M. The next step will be to combine two subsets at a time, one from D3 and one from D4, into a set of size 2*M. The output will be written to disk, this time with the destination alternating between D1 and D2.

D1    1  3  4  15 19 20  | 31 39 40 55 67 88           
D2    8  10 11 12 80 100 
D3    
D4    

This method of reading in the data from two disks and writing the merged output to the other two disks is repeated until we end up with one disk which contains all the data in sorted order.

D1               
D2     
D3    1  3  4  8  10 11 12 19 20 80 100
D4    31 39 40 55 67 88
D1    1 3 4 8 10 11 12 19 20 31 39 40 55 67 80 88 100            
D2     
D3   
D4   

The final detail here is how we perform the actual merge operation itself given the limited amount of memory available. It turns out we can re-use the merge logic from mergesort. Recall that in the mergesort algorithm, the merge operation takes in two subarrays, S1 and S2 (since the sort happens in memory, these are typically passed in as start and end indexes from the array being sorted), which are then merged in sorted order into a temporary array T1. The contents of T1 are then written back to the array being sorted. The logic for the merge is fairly simple. Let A1, A2, and A3 be indexes into S1, S2, S3, respectively. One integer is read from each subarray at a time, with the smaller of the integers being written out to the temporary array. We then advance the appropriate index. For example, if S1[A1] < S2[A2], then S1[A1] is written to S3[A3], and A1 and A3 are incremented. If S1[A1] > S2[A2], then S2[A2] is written to S3[A3], and A2 and A3 are incremented.

The important takeaway here is that only two integers need to be compared at any given time, which means that the merge operation requires very little memory. Thus, the logic for the merge operation of our external sorting algorithm will be almost identical, only instead of reading/writing from an array, we will be dealing with Disk I/O instead. Instead of incrementing an index into an array, we advance the spindle on the hard drive instead. Because we are reading and writing from the disks in sequential order, we avoid being penalized by random disk I/O access.

As with mergesort, each pass of the data halves the number of sets. After the initial run, there are N/M subsets. Thus, log2(N/M) subsequent passes through the data are required. Note that if additional external storage devices were available, less runs would be required. For example, if there were six total disks, then each run would cut the number of sets by one third. Given 2k disks, the number of runs required would be logk(N/M). There is additional complexity in the merge logic however. Finding the smaller of two integers is trivial, since we can just compare them directly. Finding the smallest element in a set of k integers however, will require additional work. A data structure such as a priority queue could be used to return the minimum. Note that we also need to keep track of which disk the minimum came from, so that we can read in the next integer from that disk. Additional calls to remove the previous minimum from the priority queue and insert the next integer read from disk will also need to be made.

Interview questions explained: Sorting a terabyte of integers with a gigabyte of memory

Given a terabyte of unsorted 32-bit unsigned integers on disk and only one gigabyte of memory, sort the integers. At first glance, mergesort would be the first solution that comes to mind. However, the typical mergesort implementation assumes that all the elements can fit into main memory. Indeed, it is still possible to modify the mergesort algorithm to solve this problem, but there is another alternative: counting sort. The basic idea is to keep track of how many times each integer appears. A hash table would be the ideal data structure for this purpose. From there, its just a simple matter of iterating through each integer in the hash table in order, looking up how many times it appears, and writing them out to disk. The end result is a sorted list of integers. This sort has an O(n) runtime, as it requires one linear pass to count all the integers, and a second one to write them out in sorted order. This sort can only be used in special cases where all the possible values of the unsorted elements is either known ahead of time or is finite. A counting sort would not work for example if we were dealing with floats or strings.

However, because we are dealing with integers, we can apply the counting sort in this scenario. Here are some quick numbers. There are 1,099,511,627,776 bytes in a terabyte. Each integer takes up four bytes of storage, which means there are a total of 274,877,906,944 unsorted integers. There are 1,073,741,824 bytes in a gigabyte. Assume we use some kind of hash table to keep track of the # of occurrences of each integer, and that the key/value pair consists of integer/integer. Without considering the additional overhead of storing the hash table metadata in memory, we would need eight bytes for each key/value pair. This means that at most, we could hope to store the counts for 134,217,728 integers in memory at a time. It is quickly clear that there is not enough memory to do the counting sort in one pass. One workaround would be to write out the integer counts to disk, reading and counting the integers in one gigabyte chunks at a time (allowing for additional memory overhead used by the program itself). Disk I/O is expensive. To make the system more robust, we can write out additional metadata to disk to keep track of what has been read so far. This would allow the program to resume progress from where it left off in the case of a fatal error.

This solution can be further optimized based on the distribution of the unsorted integers. This is why asking for clarification during the interview process is so important. If the range of possible integer values is limited by a minimum and maximum value, then this frees up the amount of memory required for storing the keys. Likewise, if there is a maximum number of times each integer can appear in the unsorted list, this frees up the amount of memory required for storing the count values.

The first chapter of the book Programming Pearls provides one such example of a problem that could be solved entirely in memory with an optimized counting sort. The problem involves a list of unique nine digit phone numbers that need to appear in sorted order. Because each number appears only once in the unsorted set, only one bit of memory is needed to store the count for each one. Thus, a bit vector can be used to store the key/value pairs for the counting sort. Each bit in the vector represents all the possible ten digit numbers between 0 and 999,999,999. If the particular bit corresponding to a number is flipped on, the number appears in the unsorted list, otherwise it does not. A bit vector with 1,000,000,000 bits requires only 125,000,000 bytes or roughly 119 megabytes of memory. Not bad. But even if we had a scenario where the set of key/value pairs cannot fit into main memory, this limitation can be bypassed altogether.

The advantage of the counting sort is that it can scale with the size of the unsorted elements by using a distributed system to do the counting. Here is a high level overview of one possible design. The range of possible values can be divided into buckets across various slave machines; each machine stores the counts for its respective range of values. For example, machine A stores all the integers between 1 and 1,000,000, machine B stores all the integers between 1,000,001 to 2,000,000, and so on. Each time an integer is read from the master, a mapping function is applied to determine the machine whose range the integer falls in. A notification service then alerts the appropriate machine, which then increments the count for that integer accordingly. Once all the integers have been read and counted, each machine in the system can return its set of ordered key/value pairs by exposing it via a web service interface, or some functional equivalent. Keys in the range that have a count of zero can be omitted from the response. The master can then aggregate this data and print out the final result in sorted order.

The potential downside when it comes to a distributed approach is the increased complexity. Each machine is now a potential point of failure. To address this, additional error checking logic is needed to make the system more robust. For example, every time the machine increments its value, we want to notify the master that the work is done. If an error is encountered, we want to retry the increment operation. Ideally, this should all be done asynchronously to improve performance, using some sort of event driven notification system. The easiest solution here would be to simply use a third party component. Messaging queues such as RabbitMQ, MSMQ, IBM MQ, etc were built from the ground up with these kind of issues in mind. We can add redundancy and provide even more error checking by having clusters of machines be responsible for each range of numbers. Once all the unsorted integers have been counted, we can md5 hash the data structure containing the key/value pairs on each machine. By checking that this value is identical across all the machines in a given cluster, we can help ensure that the counts were performed correctly. The end result of this additional complexity is a highly scalable system that can sort an unlimited number of elements, given enough hardware.

When given a sorting problem in an interview, it is important to ask lots of questions and try to break the problem down into something simpler. What exactly is being sorted? By identifying the various constraints on the possible set of values, an easier solution such as a counting sort may present itself. Typically, the interviewer will follow up the question by asking you to scale your solution. You’ll want to think about how to divide the work up in parallel across a large distributed system. The map/reduce class of algorithms was designed to do exactly that; a distributed counting sort is one such example. It may not be the right solution every time, but its a good one to bring up during the interview process. The right mindset to have when asked a scalability question is to have a conversation with the interviewer, discussing the various possible solutions and their design tradeoffs. A good talk goes a long way, and at the very least, you may come out of it learning a lot.

Interview questions explained: Fun with multiplication!

Write a method that takes as its input an array of integers. For each element in the array, print the product of all the other elements in the array. For example, given the following array:
[1,2,3,4,5]
The output would be:
120, 60, 40, 30, 24

Let’s start with the most literal solution. For each element in the array, loop through all the other elements and multiply them together:

int[] j = { 1, 2, 3, 4, 5}; 

for (int outerIndex = 0; outerIndex < j.Length; outerIndex ++)
{
    int product = 1;
    for (int innerIndex = 0; innerIndex < j.Length; innerIndex++)
    {
        if (innerIndex != outerIndex)
        {
            product *= j[innerIndex];
        }
    }

    Console.WriteLine(product);
} 

This solution requires an outer for loop to iterate through each element in the array and then an inner for loop to iterate through the other n-1 elements in the array. The asymptotic complexity of this is O(n^2).

Obviously, this isn’t a very good solution, so we can do better. The key observation to make here is that for any given element with value x in the array, the product of all the other elements in the array is simply the total product of all the elements divided by x. We only need to calculate the total product once.

This solution only requires two passes through the array. The first calculates the total product, and the second divides the total product by each element in the array:

int[] j = { 1, 2, 3, 4, 5 };

int totalProduct = 1;
for (int i = 0; i < j.Length; i++)
{
    totalProduct *= j[i];
}

for (int i = 0; i < j.Length; i++)
{
    //assume we check for zeroes beforehand to prevent divide by zero errors
    Console.WriteLine(totalProduct / j[i]);
}

This solution has O(n) complexity, since it requires only two linear scans of the array.

Now let’s make the problem a little bit more challenging. What if we cannot use division? Assume that the operation is too prohibitively expensive and that we need to find a workaround (a not uncommon scenario in the real world).

We can use an algorithm design technique known as dynamic programming: Break the problem up into smaller sub-problems, solve and store the solution for each one, and then combine the solutions as necessary to arrive at the answer to the original problem. One difference between dynamic programming and the divide and conquer class of algorithms is that dynamic programming stores the solutions to the subproblems, which are then retrieved at a later point in time. This is an optimization that prevents the same sub-problems from being solved multiple times. One example of this approach is calculating the nth element in the fibbonacci sequence. The fibbonacci sequence is a sequence that starts with 0, 1, 1, 2 and where every subsequent number is the sum of the previous two numbers in the sequence. Typically, the solution involves using either recursion or iteration. However, we can use dynamic programming by precomputing the sequence up to some number j. Thus, for all numbers where n < j, instead of having to use recursion or iteration, we can do a simple lookup. For numbers where n > j, we can still reduce the number of computations by summing from j and j-1 instead of from 0 and 1.

With respect to the problem of determining the products without the usage of division, we can use a similar approach. Note that the inefficiency of the first solution is due to the same multiplication operations being carried out multiple times. We can eliminate these redundant calculations by using dynamic programming. For any given element k in the array j of size n, we want to multiply all the numbers to the left of k with all the numbers to the right of k. By precomputing the running product of all the elements in the array in both directions (from left to right and vice versa), we now know the product of all the numbers to the left and right of any element k in the array. The problem can then be solved:

int[] j = {1,2,3,4,5};

int[] runningProductLeft = new int[j.Length];
int[] runningProductRight = new int[j.Length];  

int product = 1;
//there is no element to the left of the start of the array, so set it to 1
runningProductLeft[0] = 1;
            
//since we already set the first element of runningProductLeft
//start populating it from index 1
for (int i = 1; i < j.Length; i++)
{
    product = product * j[i - 1];
    runningProductLeft[i] = product;
}

//we want to populate runningProductRight in reverse by starting from the end of the array

//there is no element to the right of the end of the array, so set it to 1
runningProductRight[j.Length - 1] = 1;
product = 1;

//since we already populated the last element of runningProductRight, start populating from the second to last element in the array
for (int i = j.Length - 2; i >= 0; i--)
{ 
    product = product * j[i + 1];
    runningProductRight[i] = product;
}

//now that the running products have been precomputed, printing out the solution becomes trivial
for (int i = 0; i < j.Length; i++)
{
    product = runningProductLeft[i] * runningProductRight[i];
    Console.WriteLine(product);
}

This solution requires three linear scans through the array, so the runtime complexity is still O(n).

json-schema how to reference other schemas in jsonschema2pojo

I’ve been using the jsonschema2pojo utility at work to automatically generate java classes from JSON schemas, simplifying a lot of the tedious code that needs to be written for the marshalling and unmarshalling of JSON objects. JSON schema is a developing standard that is analogous to what XSD is to XML, providing schema definition and validation for JSON. Just like with XSD, a JSON schema can reference other schemas. This is done by using the special object property $ref.

The following is a simple JSON schema for an address object. Assume that it is stored in the file, Address.json:

{
    "id":"Address",
    "title": "Address",
    "description":"This is the json schema for an address",
    "type": "object",
    "properties": {
        "streetAddress": {
            "type": "string"
        },
        "city": {
            "type": "string"
        },
        "state/province": {
            "type": "string"
        },
        "zipcode": {
            "type": "string"
        }
    }
}

Here is a simple JSON schema for a person object. A person object contains an address. To reference the previously defined address schema, add an address object that contains a $ref property pointing to the filepath of the Address.json file:

 
{
    "id": "Person",
    "title": "Person",
    "description":"This is the json schema for a person",
    "type": "object",
    "properties": {
        "firstName": {
            "type": "string"
        },
        "lastName": {
            "type": "string"
        },
        "age": {
            "description": "Age in years",
            "type": "integer",
            "minimum": 0
        },
        "address": {
            "$ref": "Address.json"
        }
    }
}

Note that both relative and absolute pathing work in jsonschema2pojo.

To add a reference to a list of addresses, add an “addresses” object with a “type” property of array and an “items” object. Add a $ref property under “items” that points to the filepath of Address.json:

{
    "id": "Person",
    "title": "Person",
    "description":"This is the json schema for a person",
    "type": "object",
    "properties": {
        "firstName": {
            "type": "string"
        },
        "lastName": {
            "type": "string"
        },
        "age": {
            "description": "Age in years",
            "type": "integer",
            "minimum": 0
        },
        "addresses": {
            "id" : "addresses",
            "type": "array", 
            "items": { "$ref": "Address.json" },
            "uniqueItems": true
        }
    }
}

jsonschema2pojo will generate a Java class that contains a List of Address objects. Incidentally, adding the property “uniqueItems”:true it becomes a HashSet of Address objects.

A free schema generation tool is available at http://www.jsonschema.net/. When paired with utilities like jsonschema2pojo, it makes life for developers that much easier.

vim tutorial – using the s command to replace text on the fly

The vim text editor comes with a powerful :s (substitution) command that is more versatile and expressive than the search and replace functionality found in GUI based text editors.

The general form of the command is:
:[g][address]s/search-string/replace-string[/option]

The address specifies which lines vim will search. If none is provided, it will default to the current line only. You can enter in a single line number to search, or specify an inclusive range by entering in the lower and upper bounds separated by a comma. For example: an address of 1,10 is lines one through ten inclusive.

You can also provide a string value for the address by enclosing it with forward slashes. vim will operate on the next line that matches this string. If the address string is preceded by “g”, vim will search all lines that match this string. /hello/ matches the next line that contains hello, whereas g/hello matches every line that contains hello.

The search-string is a regular expression and the replace-string can reference the matched string by using an ampersand (&).

[option] allows even more fine grained control over the substitution. One of the more common options used is “g”, not to be confused with the “g” that precedes address. Option “g”, which appears at the end of the command, replaces every occurrence of the search-string on the line. Normally, the substitute command only matches on the first occurrence and then stops.

s/ten/10/g

run on the following line:
ten + ten = 20

results in:

10 + 10 = 20

as opposed to:

10 + ten = 20

without the global option.

Given all this versatility, the :s command comes in quite handy. Consider the following scenario. There is a comma delimited file that is missing trailing commas on some lines and not others. In order to normalize the text file so that all lines ended with a comma, you could run:

1,$s/[^,]$/&,

The address range 1,$ spans the entire file ($ in the address means the last line in the file). The search-string “[^,]$” is a regular expression that matches every line that ends with any character except comma ($ in a regex indicates end of the line). The replace-string has an &, which refers to the trailing character matched in the search-string. By setting the replace-string to “&,” we are telling VIM to take the last character on every line that is not a comma and add a comma to it.

[^,]$ won’t match on blank new lines because [^,] expects at least one character to be on the line. To get around this problem, you would normally use negative look behinds, however the VIM regex does not seem to support them. The easiest way around this is to use a second replace command for newlines:
1,$s[^\n$]/,

This tells it to only add a comma to any line that only contains a newline (^ in a regex indicates start of line).

This is just one example of course. By coming up with the right regex in the search-string, you can automate all sorts of normally tedious tasks with succinct commands. The best part is, unlike those cumbersome GUI based editors that often require the use of a pesky mouse, your hands never have to leave the keyboard! For even more control and flexibility, you could use sed, but :s can handle most day to day tasks quite easily.

Linux tutorial: Searching all files of a specific type for a string

Let’s say you want to search all the *.txt files in a given parent directory and all its children for the word “hello”.

Something like grep -rl "hello" *.txt would not work because the shell will expand “*.txt” for the current directory only. The -r flag for recursion would essentially be ignored. For example, if the parent directory contained:

a.txt

and the child directory contained:

a.txt b.txt c.txt d.txt

grep -rl "hello" *.txt would only search a.txt in the parent directory. This is because the shell will only evaluate the * wildcard for the parent directory from which the command is run.

What we actually want to do is use the find command to recursively list all the text files in the directory and its children, and then pass each of these files as arguments into grep, which will then search each argument for any instances of the string “hello”.

The find command to locate all the text files looks like this:
find ./ -name "*.txt"

In order to pass each file as an argument into grep, we use xargs. The xargs utilities reads in parameters from standard input (the default delimiter is whitespace or newline). For each item read in from standard input, xargs will then execute a given command with each item passed in as an argument.

Essentially what it is doing is this:

foreach item in stdin
{
execute "[command] [initial arguments] [arg]"
}

In our example, we want to run xargs grep "hello" (grep being [command] and “hello” being [initial arguments]), with stdin coming from the output of the find command. Putting this all together, we get the following:

find ./ -name "*.txt" | xargs grep "hello"

Combining commands together is the strength of the UNIX design philosophy. The various command line utilities are designed to play well with one another, using the output from one as the input into another. Think of each utility as a puzzle piece that can fit together with any other puzzle piece,combining in interesting ways to solve complex problems. Often times there will be many possible solutions to a given problem; such is the versatility of the platform!

Does Javascript pass by value or pass by reference?

Javascript exhibits quite interesting behavior when it passes object parameters. At first glance, it would appear that it passes everything by reference. Consider the following HTML page.

<html>
<body>
<script type="text/javascript">

    function MyObject() {
        this.value = 5;
    }

    var o = new MyObject(); 
    alert(o.value); // o.value = 5

    function ChangeObject(oReference) {
        oReference.value = 6;
    }

    ChangeObject(o);
    alert(o.value); // o.value is now equal to 6

</script>

</body>
</html>

We initialize an object o by calling the MyObject() constructor. The constructor sets the value member to 5. We then pass o into the ChangeObject() function, where it sets the value member to 6. If o were passed by value, that would mean we are passing a copy of o, and that its value member would not be changed once the function exits. However, if we open up this HTML page in a browser, the final alert(o.value) indeed prints 6. This would lead us to conclude that Javascript passes by reference.

However, things get really interesting when we look at this HTML page.

<html>
<body>

<script type="text/javascript">

    function MyObject() {
        this.value = 5;
    }

    var o = new MyObject();    
    alert(o.value); // o.value = 5

    function ChangeObjectWithNew(oReference) {
        oReference = new MyObject();
        oReference.value = 6;
    }

    ChangeObjectWithNew(o);
    alert(o.value); // o.value is still 5

</script>

</body>
</html> 

This page is identical to the last, with the slight modification that we call the ChangeObjectWithNew () function. ChangeObjectWithNew creates a new instance of MyObject and assigns it to oReference, and then sets this new instance’s value member to 6. If Javascript passed by reference, o would now point to this new instance, and its value member would be 6 after ChangeObjectWithNew ran. However, if you run this page in a browser, the final alert(o.value) still prints 5. How is this possible?

What is happening behind the scenes is that Javascript is passing a reference of o by value. This can be a little confusing, so think of it this way. When o is first declared and initialized, it points to the memory location of the actual object that we instantiated. When the ChangeObject function is called, it passes in a copy of this pointer that points to the exact same memory location. Thus any changes to oReference’s member variables will be reflected in o, since both point to the same spot in memory. However, when we call ChangeObjectWithNew, the line “oReference = new MyObject” now causes to point to an entirely different spot in memory. Thus any further changes to oReference’s member variables will no longer be reflected in o.

A picture helps clarify things.

Here is what the situation looks like when we call ObjectChangeWithNew:

After ChangeObjectWithNew has run, note that oReference now points to a new memory location.

Since Javascript hides the implementation details behind how it handles its pointers, this C program demonstrates passing by reference and passing a “reference” by value. ChangeObject is called with a pointer to MyObject. This is passing by reference. ChangeObjectWithNew is being called with a copy of the original MyObject pointer. As a result, modifying its value member does not modify the original pointer’s value member.

#include <stdio.h>
#include <stdlib.h>

typedef struct 
{
    int value;
} MyObject;

void ChangeObject(MyObject * objReference)
{
    (*objReference).value = 6;
}
 
MyObject * ChangeObjectWithNew(MyObject * objReference)
{
    objReference = (MyObject *)malloc(sizeof(MyObject));
    objReference->value = 6;
}

int main(int argc, char* argv[])
{       
    MyObject * o = (MyObject *)malloc(sizeof(MyObject));  
    MyObject * oCopy;  //this will be a copy of the original object pointer      
    oCopy = o;

    printf("Setting o->value to 5");
    o->value = 5;
    printf("MyObject.value: %d\n", o->value);
    
    //now pass by reference
    printf("Calling ChangeObject with original pointer\n");
    ChangeObject(o);  //this will change o->value to 6    
    printf("MyObject.value: %d\n", o->value);  //prints 6

    //reset o->value to 5
    printf("Resetting o->value to 5\n");
    o->value = 5;
    printf("MyObject.value: %d\n", o->value);  //prints 5

    //now pass a COPY of the origal pointer
    //this is how Javascript behaves, minus the memory leak
  
    printf("Passing in a copy of the pointer o to ChangeObjectWithNew\n"); 
    oCopy = ChangeObjectWithNew(oCopy);
    printf("MyObject.value: %d\n", o->value); //still prints 5

    //free the memory 
    free(o);
    free(oCopy);
    o = NULL;
    oCopy = NULL;

    scanf("Press any key to continue");
}

The fact that Javascript passes a “reference” by value is important to keep in mind. Assuming it passes by reference can lead to some difficult to track down bugs later on.

Dynamically modifying client endpoints in WCF

I was doing some contract work for a client building out a SOA backend for them. It consisted of some WCF services that all talked to one another. Due to how the client had set up their deployment process, we had to dynamically load these client endpoint address from an external XML file at runtime, instead of setting them inside the web.config.
In other words, we needed a way to dynamically change the address attribute:

<system.serviceModel>
    <client>
        <endpoint address="http://someaddress.com/webservice"
          binding="ws2007HttpBinding" bindingConfiguration="WebService_ws2007HttpBinding"
          contract="WebService.ServiceContracts.IWebService" name="ws2007HttpBinding_IWebService" /> 
    </client> 
</system.serviceModel> 

WCF allows you to programmatically configure any System.ServiceModel setting in the web.config, so the challenge was to inject the endpoint before the call was actually made. Normally, you can pass in the endpoint address to the channel factory constructor and be done with it. However, the service in question we needed to modify was a simple passthrough WCF router. There was no code to speak of, so in order to modify the client endpoints we decided to do so in a service behavior.

The first task was to figure out how to access the endpoints themselves. At first, I tried:

    var serviceModelSection = ConfigurationManager.GetSection("system.serviceModel"); 
    ClientSection clientSection = serviceModelSection.GetSection("client" ) as ClientSection;
    ChannelEndpointElementCollection endpointCollection = clientSection.Endpoints;

However, when I actually tried to edit the endpoints inside ChannelEndpointElementCollection, I got an error: “The configuration is read only”. After searching online, I tried using WebConfigurationManager.OpenWebConfiguration instead:

       Configuration webConfig = WebConfigurationManager.OpenWebConfiguration("~" );
       ClientSection clientSection = webConfig.GetSection("system.serviceModel/client" ) as ClientSection;
       ChannelEndpointElementCollection endpointCollection = clientSection.Endpoints;

       //dynamically load the URI here
       Uri serviceAddress = “http://sometempuri.org”;

       endpointCollection[0].Address = new EndpointAddress(serviceAddress);         
       webConfig.Save();

This option was a non starter because webConfig.Save() literally saves the actual web.config file itself. This causes the application pool to recycle, and since it edits the physical file, the changes made won’t apply to the current request.

Ultimately, we ended up implementing IEndpointBehavior interface. The IEndpointBehavior interface has an ApplyClientBehavior method, that takes as its parameter a client service endpoint. This method fires only once for the lifetime of the application for each client endpoint defined, which is exactly what we wanted. The following sample code demonstrates how this service behavior can be used to dynamically set the EndpointAddress for the client endpoint.

public class WebServiceEndpointBehavior : IEndpointBehavior
{
    public void Validate(ServiceEndpoint endpoint)
    {
           
    }

    public void AddBindingParameters(ServiceEndpoint endpoint, BindingParameterCollection bindingParameters)
    {
    
    }

    public void ApplyDispatchBehavior(ServiceEndpoint endpoint, EndpointDispatcher endpointDispatcher)
    {
           
    }

    public void ApplyClientBehavior(ServiceEndpoint endpoint, ClientRuntime clientRuntime)
    {           
        Uri serviceAddress = new Uri("http://sometempuri.org ");   //dynamically load URL here
            endpoint.Address = new EndpointAddress(serviceAddress);       
    }                                                
}        

From here, it was just a matter of coding the behavior extension element that loads the behavior, as well wiring in this extension element into the web.config. Here are the relevant snippets:

namespace WebService
{
    public class WebServiceEndpointBehaviorExtensionElement: BehaviorExtensionElement
    {
        protected override object CreateBehavior()
        {
            return new WebServiceEndpointBehavior ();
        }

        public override Type BehaviorType
        {
            get { return typeof (WebServiceEndpointBehavior ); }
        }
    }
}
  <endpointBehaviors>
        <behavior name="updateEndpointAddress">
          <webserviceEndpointBehavior/>
        </behavior>
<extensions>
    <behaviorExtensions>
        <add name="webserviceEndpointBehavior" type="WebService.WebServiceEndpointBehaviorExtensionElement, WebService" />
    </behaviorExtensions>
</extensions>

CredentialCache.DefaultNetworkCredentials is empty and so is Thread.CurrentPrincipal.Identity.Name

I was working on a simple console application in C# that issued HTTP DELETE requests to WebDAV to delete expired documents from the file system. Once completed, this was to run periodically as a job. However, I kept getting back 401 Unauthorized on the DELETE requests. While troubleshooting the issue, I went down the rabbit hole and learned some interesting things. I was passing in CredentialCache.DefaultNetworkCredentials as my HttpRequest credentials. So as a sanity check, I tried viewing it in the debugger to make sure the program was passing in my credentials, only to find that CredentialCache.DefaultNetworkCredentials.UserName was blank.

Well, it turns out that you can’t actually view the credentials unless you set them manually yourself. According to the MSDN documentation:

“The ICredentials instance returned by DefaultCredentials cannot be used to view the user name, password, or domain of the current security context.”

So I tried checking the value of Thread.CurrentPrincipal.Identity.Name instead. This was blank as well. Upon further reading, I determined that this was due to the principal policy not being explicitly set to WindowsPrincipal. Once I did so, Thread.CurrentPrincipal.Identity.Name correctly displayed my windows login ID:

AppDomain.CurrentDomain.SetPrincipalPolicy(PrincipalPolicy.WindowsPrincipal);
Console.WriteLine(Thread.CurrentPrincipal.Identity.Name);

It is helpful to review what an application domain is before proceeding. In Windows, every application runs as its own process, complete with its own set of resources. By isolating applications via processes, this minimizes the risk that one badly coded application can negatively impact others. In the Common Language Runtime, application domains provide an even more granular level of isolation. A single process (the application host) can run multiple application domains, each with the same level of isolation that separate processes would have, minus any of the overhead.
Because every app domain is separate from one another, each has its own Principal object. This object represents the current security context that the code is running as. The PrincipalPolicy is an enum that indicates how this principal object is to be created for the given app domain. Setting it to WindowsPrincipal will map the principal object to the Windows user that the application host is executing as. By default, the PrincipalPolicy will be UnauthenticatedPrincipal, which will set Name to empty string.

After doing some more digging, I also found out that I could use WindowsIdentity.GetCurrent().Name to determine what user the program was executing as:

Console.WriteLine(WindowsIdentity.GetCurrent().Name); 

Having finally proven to myself that my program was running as the correct user, I eventually figured out the issue. It was completely unrelated to the code of course; I had simply forgotten to enable Windows Authentication in IIS. I didn’t mind the time sink, as it proved to be quite educational.

Retrieving a return value from a stored procedure using the Entity Framework DBContext.Database class

I was trying to figure out how to call a stored procedure and then retrieve the return value using the Entity Framework in .NET.      Entity Framework is designed to be an ORM layer, so it doesn’t really have very strong support for stored procs.    My problem seemed to be somewhat common as I found a decent number of results on google.    It seems like lots of places that use Entity Framework still require the use of procs.    Perhaps they don’t trust the performance or scalability of Entity Framework, or perhaps the particular procedure in question encapsulated some complex and highly optimized SQL logic.

 

Whatever the reason, the code had to call a stored procedure and retrieve a return value.    The stored procedure in question could not be modified to use an output parameter instead of having a return value.  This also seemed to be a common requirement.   Typically the proc would be really old legacy TSQL, and changing it would have required too much bureaucracy to have been worth it.

 

So there’s a couple of ways to solve the problem.   Probably the simplest and easiest way is not to use Entity Framework at all and just use plain old ADO.NET instead.       However, in my case, everything else in the solution  was already using the Entity Framework, so I wanted the code to be consistent.    After doing some investigation and testing, I finally figured it out.    The trick is to set the direction of the SQLParameter as ParameterDirection.Output.    This is what tripped me up initially, as in ADO.NET, you would declare the SQLParameter with direction type ParameterDirection.ReturnValue.      Another interesting thing to note is that Database.ExecuteSqlCommand returns an integer.   But this int does not correspond to the return value from the stored procedure.   The MSDN documentation seemed to indicate otherwise.   It states that the return value is: “The result returned by the database after executing the command.”     When I tried storing the result, I got back -1 and I’m not really sure why.

 

Everything  I found online consisted of just code snippets, so here is a complete code sample that deletes a row from a database table and checks the return value to see if it was successful.

 

 


public bool DeleteRowFromDB(int rowId)
{
    bool success = false;
    var retVal = new SqlParameter("@Success", SqlDbType.Int) {    Direction = ParameterDirection.Output };

    object[] parameters =
    {
         new SqlParameter("@RowID", SqlDbType.BigInt) { Value = rowId}
        ,retVal
    };

    string command = string.Format("exec @Success = dbo.spDeleteTableRow @RowID");

    //the result is -1 regardless of what the stored procedure returns
    //note that ExecuteSqlCommand maps to sp_executesql in SQL Server 
    //which will append the type and value for @RowID  
    int result = _DBContext.Database.ExecuteSqlCommand(command, parameters);

    if ((int)retVal.Value == 1)
    {
        success = true;
    }

    return success;
}