Thursday, May 22, 2008

Alex Papadimoulis on the Mechanics of Quitting

Head over to WordPress for the full story.

Wednesday, May 14, 2008

Functional Programing 101: List Comprehensions and Closures

For a bunch of reasons I've detailed here, I decided to shift my blog to WordPress. But don't worry, in order to serve you better, I will make sure I post a link here every time I make a new entry over there. :-)

Anyway, the functional programming series continues and this time we will look at list comprehensions and a simple closure. Head over to WordPress for the action!

Monday, May 05, 2008

A Question of Two Exchanges

Let me interrupt the functional programming 101 series to pose an interesting question.

Question: Which of these two methods of swapping a list of integers is faster?

Method 1: Plain C Code
for(int i = 0; i <  array_size; i++)
{
    int temp = a[i];
    a[i] = b[i];
    b[i] = temp;
}

Method 2: Inline Assembly using the XCHG instruction.
__asm
{
    mov ecx, array_size;
    mov esi, a;
    mov edi, b;

loopstart:
    mov eax, [esi];
    xchg [edi], eax;
    mov [esi], eax;

    add esi, 4;
    add edi, 4;
    sub ecx, 1;
    jnz loopstart;
}
(Vague?) Hint: The Intel engineers who designed the XCHG instruction decided that it would primarily be used to implement synchronization primitives like semaphores and mutexes.

Comment to let me know what you're thinking about the two snippets of code and why any one of the two should be faster or slower than the other. In my next entry I will post the answer and also dig deeper in the mechanics of the exchanges.

Saturday, April 12, 2008

Functional Programming 101: Lambda Forms

I got into an argument with Ashwin the other day about lambdas and closures in C++ 0x. Unfortunately, if you're jetlagged and feverish you tend not make a whole lot of sense, and that is what happened with me. At the moment, I am not very jetlagged and a little less feverish, so hopefully, the following makes a little more sense. [The good thing is even if it doesn't make any sense I already have a solid excuse :-)].

In the Beginning
Lets use a simple example: I have a list of integers and I want to create another list that contains only those elements from this list that are greater than 5. The straightforward way to write this is like below:
def filter_above_5(l):
o = [] # initialize empty list
for num in l: # iterate over list
if num > 5: o.append(num)
return o # return filtered list
I wrote the snippet above in python, but hopefully, it is easy enough to read even if you haven't seen the language before. [1]

Shown below is example of how you'd use the filter_above_5 function.
print "original list:", l
print "filtered list:", filter_above_5(l)
Simple enough, isn't it? For reasons that we will come to later, people like to call this an imperative implementation.

Lets Make This Functional
If you think about it, filter_above_5 is a special case of a generic filter function which iterates over an input list and produces an output list that contains all the elements from the input list that satisfy some condition. [If you wanted to make things sound complicated, you'd use predicate instead of "some condition" and sequence and sub-sequence instead of input and output list. :-)] Jokes apart, it would be cool if we could write a general function like this:
def filter_list(predicate, l):
o = []
for element in l:
if predicate(element): o.append(element)
return o
filter_list is a function that takes a list and a function (our predicate) as parameters and iterates over each element of the input list, and adds only those elements to the output list that satisfy the predicate. Then you'd call filter_list like below:
def greater_than_5(x): return x > 5
print "original list:", l
print "filtered list:", filter_list(greater_than_5, l)
It turns out that python already has a builtin function filter that does exactly what our filter_list function does. So our implementation just boils down to calling the that function with our custom predicate:
def greater_than_5(x): return x > 5
print "original list:", l
print "filtered list:", filter(greater_than_5, l)
Notice the difference between our initial implementation and new one. filter_above_5 did not just tell the interpreter what to do, it also specified exactly how to do it using a series of commands. For that reason, we call it an imperative implementation.

Our newest implementation that uses the builtin filter function does not specify how the filtering should be done. For instance, with all this ado about multi-core, it is not far fetched to think of a version of filter that spawns a number of threads and splits up the work of filtering among these threads. The cool thing is, if, say, Python 3.1 implemented such a multi-threaded filtering function, all our code which used the filter function would get the performance improvement for free.

Plus, notice that this version contains much lesser code, and we're programming at a much higher level of abstraction, not to mention the fact that the new version is more elegant.

All put together, I think it is fair to say, that the implementation that is based on filter is "better" than the imperative implementation.

Notice an important thing about filter - we are passing a function as a parameter to another function. Programming languages which allow this kind of thing - passing and returning functions from other functions, construction of functions at runtime etc. - are said to have first class functions. Now, exactly what set of attributes are required for a language to be deemed to have first class function support is a controversial topic, and so we will side-step that issue here. [2] The important point is passing functions to other functions doing so is a key aspect of functional programming. And since we're doing this in our filtering snippet, we're going to call it a functional implementation.

Enter the Lambda
Nothing we've done so far can't be done from, say, C/C++. In C for instance, we could've created a filter function that operated on array of void*, and used a function pointer to pass the predicate around. [Similar in spirit to the qsort library function]. In C++, we could have created a templatized version of filter that was allegedly more elegant and worked in more or less the same way, but would claim to be "safer" because it would be type-checked. [3]

Recognize that functions like greater_than_5 are use and throw functions. They are created only for the purpose of being called from the filtering function, and then they are never used again. Unfortunately, despite the fact that these functions are going to be called from exactly one place, they will still be sitting in source code polluting our namespace. What would be cool is a way of creating the function greater_than_5 specifically for the purpose of passing it to filter and then "forgetting about it".

The solution to this problem is called a lambda function. In python it looks like this:
print "original list:", l
print "filtered list:", filter(lambda x: x>5, l)
Pay attention to the first argument to filter: "lambda x: x>5". What it means is: create a function that takes one parameter as input - "x", and returns the value of the expression x>5 and then return that function that just got created. Note that the created function is anonymous. The key benefit now is that if I had to call filter in a hundred different places with 50 different filtering predicates, I needn't create a 50 predicates similar to greater_than_5 just to pass them to filter. I can create each of these functions at the point at which filter is called.

This means my code will look much cleaner; there will be no namespace pollution and most importantly I will have to write lesser code.

So, in summary lambda forms allow the functional programmer to create anonymous functions that can be passed around to other functions. Or even be stored for later use [although this is somewhat uncommon].

Isn't this just syntactic sugar?
The thing is, besides the lack of namespace pollution, and some fewer keystrokes to type [not that these aren't important things, but lets play the devil's advocate here], we still haven't gained much by using the lambda. And the fact is, if these were the only advantages of the lambda, lambda's would just be cute syntactic sugar.

But there is more - the key advantage of lambda's is when they are combined with a language that supports closures. We will look at closures in the next post.

In the meantime, if any of the above didn't make sense, or you have some feedback, please comment below.


[1] If you still run into problems understanding the syntax, please do let me know using the comments.

[2] Point is that it is is not correct to call python a functional programming language. If you want to be PC, it is a multi-paradigm language with some functional programming features. Anyway, lets not get into these religious wars and concentrate on the fun stuff instead. :-)

[3] Ensuring const-correctness on your elegant function, making it behave like a STL function so that it works correctly for everything that implements forward_iterators and/or const_forward_iterators and fun times spent debugging the wonderfully cryptic error messages that compilers spit out for templates are but minor prices to pay for that elegance. [Sorry, I couldn't resist that crack :-)]


EDIT (16-Apr-07): Added missing argument to the call to filter. Thanks Psygote for pointing this out!

weblog.rename("Exploring Computation");

When I started off blogging, I wasn't sure where this was going and it seemed that I was going to be blogging about more or less random things. (Hence the lousy title :-) But as I blogged more and more, two things happened - (1) my own thoughts became a lot clearer about what I wanted to do and (2) the agenda for the blog itself became clearer.

So, moral of the story: this new tagline for the blog is that it will be about exploring the art, science, hardware and software of computation. For that to make any sense, I think it is important to define the meaning of the term computation. Here is what wikipedia has to say:
Computation is a general term for any type of information processing that can be represented mathematically. This includes phenomena ranging from human thinking to calculations with a more narrow meaning. Computation is a process following a well-defined model that is understood and can be expressed in an algorithm, protocol, network topology, etc.
The content of the blog is going to remain similar. The name change is an attempt to bring the documentation in sync with codebase, it is not a modification of the project scope. :-D

I hope you like the name change. If you don't, please let me know why. (If you do, you can still take this opportunity to compliment my naming skills :-)

Sunday, March 23, 2008

Paul Graham on Programmers and Their Bosses

At last, Mr. Graham has written something I agree with! :-)

The restrictiveness of big company jobs is particularly hard on programmers, because the essence of programming is to build new things. Sales people make much the same pitches every day; support people answer much the same questions; but once you've written a piece of code you don't need to write it again. So a programmer working as programmers are meant to is always making new things. And when you're part of an organization whose structure gives each person freedom in inverse proportion to the size of the tree, you're going to face resistance when you do something new.

This seems an inevitable consequence of bigness. It's true even in the smartest companies. I was talking recently to a founder who considered starting a startup right out of college, but went to work for Google instead because he thought he'd learn more there. He didn't learn as much as he expected. Programmers learn by doing, and most of the things he wanted to do, he couldn't—sometimes because the company wouldn't let him, but often because the company's code wouldn't let him. Between the drag of legacy code, the overhead of doing development in such a large organization, and the restrictions imposed by interfaces owned by other groups, he could only try a fraction of the things he would have liked to. He said he has learned much more in his own startup, despite the fact that he has to do all the company's errands as well as programming, because at least when he's programming he can do whatever he wants.

An obstacle downstream propagates upstream. If you're not allowed to implement new ideas, you stop having them. And vice versa: when you can do whatever you want, you have more ideas about what to do. So working for yourself makes your brain more powerful in the same way a low-restriction exhaust system makes an engine more powerful.

Tuesday, March 18, 2008

Intel Interesting

Apologies for the corny title, but I couldn't stop myself! :-)

The most immediate news is the 6-core Dunnington, expected to release in the second half of this year. It looks like it has a 3-way split 9MB L2 cache, 16 MB of L3 cache and micro-architecture based on the Penryn. The kicker is that it only has a TDP of 80W!

There is more and more information about the Nehalem now becoming available. We'd already heard about the integrated DDR3 memory controller and simultaneous multi-threading (the successor to hyperthreading). The latest update is that the Nehalem will feature a QuickPath interconnect capable of transfer rates of upto 25.6 GBps. And it is not just more memory bandwidth and multiple cores that will improve performance, it turns out that the Nehalem will increase the OOO execution window to 128 uOps (from 96 on the Penryn) - so individual cores are also going to get faster!

In the longer term Sandy Bridge - the second of Intel's 32 nm processors - is going to have a 256-bit wide SSE execution unit. The only complaint I have about this is that it is 2 years away from shipping. :-)

See here, here and here for more details.

Wednesday, March 05, 2008

Random Link Clearance

Links I've accumulated over the last week or so:
  1. 50 Cent vs Windows XP. In Da Club mixed with error sounds from XP. I like this better than the original!
  2. Scott Meyer's wonderful comic with Basic Instructions on How to Seem Smart.
  3. On a slightly more serious note, Jeff Atwood explains how to hack your progress bars to make your code seem faster!

Saturday, March 01, 2008

SSE Tutorial : Part III

After fighting with blogger for about half-an-hour to make it get the damn formatting right, I've given up. So I've published it on Google Docs instead.

Click here to see the document, and please leave your comments below.

Thursday, February 21, 2008

Finding Which Core Your Thread is Running On

How do you figure which core of a multi-core processor your thread in running on?

At first, I figured that there might be a Win32 function to do this. So, I looked around unsuccessfully in MSDN hoping for a "get" equivalent of SetThreadAffinityMask or SetThreadIdealProcessor.

It turns out the correct way to find this information is to ask the processor. Seems logical if you think about it, but unfortunately the answer not easily googleable because of the weird terminology!

After a little digging around, I found that the CPUID instruction has this interesting beast called an APIC ID. Intel documentation says:
"Each logical processor has a unique APIC ID,which is initially assigned by the hardware at system reset and can be later reprogrammed by the BIOS or the operating system. On a processor that supports Hyper-Threading Technology, the CPUID instruction also provides the initial APIC ID for a logical processor prior to any changes by the BIOS or operating system."
Simplified, this just means that in a multi-processor/multi-core system, the APIC ID is a unique identifier for each logical processor.

What's a logical processor? A hyperthreaded system capable of running two threads "in parallel" has two logical processors. A quad-core system has 4 logical processors. A simple way of looking at this is that you have as many logical processors as graphs in the performance tab of the task manager!

So, all you have to do is use the CPUID function 0x0001 and look in bits 31-24 of the EBX register, which is where the APIC ID lives. This looks like this:

int apic_id;
__asm
{
mov eax, 1;
cpuid;
shr ebx, 24;
mov apic_id, ebx;
}

So that's that!
Let me know if you have questions/comments.