Alieniloquent


Interview Question: Combinations

April 15th, 2010

So I had a job interview yesterday with a great company, and I met a lot of awesome people. A question that was asked, presumably to test my approach to algorithm design, caught me off guard, and I didn’t give the best answer I feel I was capable of. It bugged me the whole flight home, so, I whipped up a better answer. It seemed like the perfect kind of thing to post here on my blog.

The problem: generate all combinations (without repetiton) with length len of the numbers from 1 to max.

Hearing that problem, my mind immediately jumped to all of my combinatorics background, and I started thinking about how to count all of those combinations. I really just could not get my mind out of that little rat hole, but my interviewers coaxed me out. By the time we were done with that part of the interview, we had come up with something that was on the right track, but still wouldn’t work.

A few simplifying assumptions can be made. Since these are combinations, order doesn’t matter (e.g. [1,2,3] is the same as [3,2,1]). Also, since they do not have repetition, we can place an ordering constraint on the individual elements. So, for the 3-element combinations, we can say we want all combinations [z,y,x] where 1 ≤ x < y < z ≤ max (the tuple is backwards for convenience of implementation, you could do it the other way just as easy).

So with those two assumptions in mind, I stubbed out my function.

combinations :: Int -> Int -> [[Int]]
combinations len max = undefined

I had gotten three-quarters of the way toward a working implementation during my interview, so I was already leaning toward a recursive solution here. But, when I keyed in what I’d come up with earlier, it wasn’t working right. I was getting things like [2,2] which should just not show up. I also could not do something like combinations 3 3 and get anything back. I clearly had some boundary issues. So, I decided to actually write out the sets I was expecting and see if I saw any patterns.

combinations 0 3 => [[]]
combinations 1 3 => [[1],[2],[3]]
combinations 2 3 => [[2,1],[3,1],[3,2]]
combinations 3 3 => [[3,2,1]]
combinations 4 3 => [[]]

The most obvious thing is that there’s a clear relationship between the length of the combination and the number of combinations available, which is pretty basic combinatorics. There’s only one way to choose 3 items from a set of 3 items. But looking at this, I’m trying to conceive of some way to devise a recursive algorithm to produce those lists. So I rewrite the output to show how I would expect those to get built recursively.

combinations 0 3 => [[]]
combinations 1 3 => [1:[]] ++ [2:[]] ++ [3:[]]
combinations 2 3 => [2:[1]] ++ [3:[1], 3:[2]]
combinations 3 3 => [3:[2,1]]
combinations 4 3 => [[]]

Now it might be apparent why I chose the ordering constraint I did. It makes it easy to build these lists with conses. The most imporant observation to make from this data is what numbers actually get selected to be consed. At each level of recursion we’re selecting only the numbers between len and max to be added onto lists, and then we recurse with all the numbers less than those.

Here is the final implementation:

combinations 0 _ = [[]]
combinations len max = foldr reduce [] [len..max]
  where reduce x ys = recurse x ++ ys
        recurse x = prepend x (combinations (len - 1) (x - 1))
        prepend x = map (\xs -> x:xs)

you.would? like(:fries).with(that)

January 27th, 2010

I was recently linked to an insightful article about how software is hard. He makes several good points, but the one that sticks out most in my mind is the following:

Every software engineer has a low opinion of the way we develop software. Even the term “software engineering,” Rosenberg writes, is a statement of hope, not fact. He quotes the 1968 NATO Software Engineering Conference that coined the term: “We undoubtedly produce software by backward techniques.” “We build systems like the Wright brothers built airplanes–build the whole thing, push it off the cliff, let it crash, and start over again.” Certainly statements that could still be made forty years later.

That is so true, especially when it comes to the programming languages we use. Most of the languages with widespread use in industry today are essentially thirty years old, and even those are just the latest in a long lineage that leads back to FORTRAN and Lisp in the late sixties. Languages in today’s generation don’t do anything for you that their ancestors didn’t do forty years ago.

What’s worse is that more modern languages exist–they have existed for decades–yet they don’t find mainstream adoption. For the last twenty years the main thrust of innovation in the software industry has been focused on people and process rather than languages and tools. Modern languages are dismissed as academic, and industry experts are struggling to find new and better ways to build useful systems with ancient technology.

We are rapidly approaching a singularity after which the descendants of FORTRAN and Lisp will be grossly inadequate for writing programs. The current rise of concurrent programming is just the beginning. As time plods on and concurrency and distributed programming become more commonplace, we are going to have to develop new ways of writing software.

This really isn’t news. Software is hard. Concurrent software is harder. We know that. The solutions offered today are primitive, and not much different from the thinking over thirty years ago. Whether you use some shared-memory model or some message-passing model, writing concurrent programs is a mentally exhausting process with a lot of work centered around getting the parallelism right.

Another problem we face is the increasing ubiquity of software. In the modern, Western world it is nearly impossible to take ten steps without walking past at least a hundred semiconductors. Many of those little bits of manufactured silicon are running some sort of program (whether it is firm or software). We passed the point in time where our lives are computerized a long time ago, and many people didn’t even notice. Your life relies on software in vehicles, hospitals and banks.

With software becoming so commonplace, the burden to produce quality software is greater than ever. Software verification is a critical area of the development process that has been shit on and forgotten for over forty years. Today’s state of the art in quality assurance is a combination between exploratory testing and developer-written regression suites. In other words, if you don’t think about verification, then none happens at all.

These problems are both huge. We must solve them, and wetware solutions will be woefully inadequate. We need to adopt and develop new languages and tools that directly address these issues. This is the area where we have regressed from forty years ago. Back then programmers quickly grew impatient with wetware solutions and would write software to automate things and introduce abstractions: programming languages, parser and scanner generators, build automation tools, build configuration tools, stream editors, and the list just goes on.

Somewhere in the last forty years, we just gave up. As an industry, we decided that languages and tool-chains were fundamentally solved problems, and all we needed to do was evolve the syntax and libraries. We have eschewed the virtuous path of software development in favor of a naively pragmatic alternative. Rather than working smarter, we try to work faster. Rather than finding smarter people, we just find more people. These days, there’s an alternative to working in food service: programming.

Layout, design, graphics, photography and text all © 2005-2010 Samuel Tesla unless otherwise noted.

Portions of the site layout use Yahoo! YUI Reset, Fonts & Grids.