Alieniloquent


Broken Window in ActiveRecord: ActiveRecord::StatementInvalid

March 3rd, 2008

I love ruby, and I love Rails, but in some ways it really is a ghetto. It has a lot of broken windows that only serve to encourage bad coding from developers who should know better. Today I ran into an example of one of those broken windows and I was beside myself. I could not believe what I was reading.

One of the projects I work on for my employer is an import process that takes a long time. In order to make it resilient to database fail-overs, I wanted to catch the exception that is raised when the connection dies, wait a few seconds, and then try to reconnect. The idea is simple, and it works once I account for the broken window, but I am not pleased with the code I had to write.

When the database connection disappears, the database driver throws an exception. ActiveRecord::Base catches that exception and does this:

# Find this in Rails 2.0.2
# active_record/connection_adapter/abstract_adapter.rb:121

rescue Exception => e
  # Log message and raise exception.
  # Set last_verfication to 0, so that connection gets verified
  # upon reentering the request loop
  @last_verification = 0
  message = "#{e.class.name}: #{e.message}: #{sql}"
  log_info(message, name, 0)
  raise ActiveRecord::StatementInvalid, message
end

This is the exception handler that catches all exceptions raised during a query run by ActiveRecord. As you can see, it snags the class name, and the exception message off of the exception, and then throws the object away, reraising with ActiveRecord::StatementInvalid. So, if your database driver has hundreds of error codes which are provided in order for you to tell specifically what error occurred, such as Mysql::Error, you lost them.

So ActiveRecord provides one exception that covers everything from primary key violations to database connection errors, and the only way to distinguish them is by inspecting the message. Surely, that can’t be true, right? I dig further and find this:

# Find this in Rails 2.0.2
# active_record/connection_adapters/mysql_adapter.rb:244
#
# Note: I snipped the error message because it is very long

rescue ActiveRecord::StatementInvalid => exception
  if exception.message.split(":").first =~ /Packets out of order/
    raise ActiveRecord::StatementInvalid, snipped_error_message
  else
    raise
  end
end

That is just completely unacceptable. I can find it in my heart to forgive the abstract adapter for doing something that throws away implementation-specific information, but the Mysql adapter should remedy that. It willingly lets it’s exception information be cast aside and goes about inspecting what the abstract adapter had the decency to keep around.

“But that information is good enough to tell what the exception is,” you might say.

Until the Mysql folks change the error message. The Mysql API exposes numeric constants, and I’m sure they’re very careful to keep them the same, but do you think they take the same approach to error messages? I doubt it. They provide a function that will give you an error message given the numeric constant, and encourage you to use it. That’s what the Mysql bindings for ruby do.

Expecting developers to inspect the exception message is essentially promoting programming with magic numbers. Sure, they’re string literals, but they’re still duplicated information, and extremely brittle.

All I’d want is an inner_exception attribute available on ActiveRecord::StatementInvalid or maybe its parent, and then assign it when doing reraises. Is that too much to ask for?

String transforms using Enumerable#inject

February 15th, 2008

I love functional programming, and I love Ruby. One of the most awesome things about Ruby is how much it borrows from the functional programming mindset. One of the most powerful concepts that functional programming brings to the table is higher-order functions. Ruby’s Enumerable module is a great example of how it embraces the idea of higher-order functions to abstract out the various things you do with a collection and let you focus on the operation for each item.

One of the most mysterious methods on Enumerable is Enumerable#inject. The example that’s always given is this:

irb> [1, 2, 3, 4].inject(0) {|sum, i| sum + i}
10

That’s fine, and usually makes sense. But when you try to branch out into more esoteric uses of inject, it can get confusing. So I’m going to give an example of accomplishing something useful with inject that you hopefully find useful.

I always find myself doing a sequence of substitutions on a string. For example, when I implement a Telnet client, I like to normalize the line endings I’m sending so that they’re sane. I accomplish that by translating “\r\n” to “\n”, then translating “\r” to “\n”, then translating “\n” to “\r\n”. It’s a simple thing to do, and I could do it like this:

string.gsub("\r\n", "\n").gsub("\r", "\n").gsub("\n", "\r\n")

But that’s not very extensible. I’d like to apply this idea of a sequence of substitutions in an abstract way so that I can do dynamically. And while I could do something with Object#send, that’s like cheating. This is where inject comes to the rescue.

def normalize_line_endings(string)
  transforms = [proc {|s| s.gsub("\r\n", "\n")},
                proc {|s| s.gsub("\r", "\n")},
                proc {|s| s.gsub("\n", "\r\n")}]
  transforms.inject(string) {|s, transform| transform.call(s)}
end

Kernel#proc (or Kernel#lambda if you prefer) is Ruby’s way of making higher-order functions. It returns a block which you can then call with an argument. In the above code, I make an array of transforms that take a string and return a string. The call to inject at the end is where the magic happens. It calls the first transform with string which was provided as the argument to inject. Then it calls the second transform with the result of the first, and it calls the third transform with the result of the second. That list could be as big as you want. It could even be dynamically generated.

That’s nice, but it’s still a a little verbose. I like to hide my use of Kernel#proc behind a declarative interface when I’m doing this sort of thing with it. So here’s how we can rewrite the method.

def transform(string, specifications = [])
  transforms = specifications.collect do |spec|
    proc {|s| s.gsub(spec[:from], spec[:to])}
  end
  transforms.inject(string) {|s, transform| transform.call(s)}
end

def normalize_line_endings(string)
  transform(string, [{:from => "\r\n", :to => "\n"},
                     {:from => "\r", :to => "\n"},
                     {:from => "\n", :to => "\r\n"}])
end

Of course, at that point, we don’t really need to create the procs. We can just use inject right on the specifications array, so the final code I came up with for this was:

def transform(string, specifications = [])
  specifications.inject(string) do |s, spec|
    s.gsub(spec[:from], spec[:to])
  end
end

def normalize_line_endings(string)
  transform(string, [{:from => "\r\n", :to => "\n"},
                     {:from => "\r", :to => "\n"},
                     {:from => "\n", :to => "\r\n"}])
end

Now that can be used with any list of transformations. Those transformations can be dynamically generated, and it’s a very clean implementation. That is the power of Enumerable#inject.

Living In the House That Rails Built

January 29th, 2008

I wanted to share a snippet of code. This code will print a call stack to STDOUT every time a Ruby class definition is evaluated. It is particularly useful when you find that class constants are being mysteriously redefined.

class Foo
  puts "\nRequired from:\n  #{Kernel.caller.join("\n  ")}"
  # ...
end

What inspired me to write that code? Rails did. The key to writing Ruby on Rails is that you’re writing Ruby on Rails. You don’t follow the Rails best practices because they’re convenient. You follow the Rails best practices because your program won’t work unless you do. Just like trains, you stay on the track and everything is great. If you try to take your train off-track, then it’s gruesome enough to make the nightly news.

How did I derail my application such that I cared how and where a file was being required? I wrote a unit test that explicitly required a model object. Oops. Remember that the semantics of require is load-once based on the name. So:

require “foo”

and:

require “models/foo”

are very different to require. Rails is super helpful and requires everything that it makes for you. So it requires models for you, even when you run your unit tests.

So take this code:

class Foo < ActiveRecord::Base
  RAILS_IS_A_GHETTO = true
end

And then write a test for something that Rails didn’t generate (such as something in the lib directory like I did):

# Require some other stuff
require "foo"

class TestTruth < Test::Unit::TestCase
  def test_truth
    assert true
  end
end

If you rake test you will get an error complaining that RAILS_IS_A_GHETTO was reinitialized, and that’s because Rails loads it for you as “models/foo” and you load it as “foo” so it gets loaded twice.

The moral of the story is: let Rails load the things it built, and you load the things you built.

Base32 0.1.1 Released

June 29th, 2007

Quickly on the heels of the initial release of my Base32 library, I have an update. I should have tried to compile it on Linux, as the GCC settings on my Gentoo box caught some silly things I had done.

It’s all better now, and the gem can install on both Mac OS X and Gentoo Linux. I assume other Linuxes are probably fine, as are BSDs and other *NIXes.

To download it go here.

Base32 0.1.0 Released

June 28th, 2007

As you may know, I’ve been working with base32 encoding. Well, I decided to share my work with the world in the form of a library.

This first release simply contains the code I needed for my original project, but I’ve packaged it up as a nice Ruby extension.

You can visit the project page here.
You can download the release here.

Base32 Encoded Freedom

June 5th, 2007

So I’m writing the license-key generation code for the store-front for a shareware program my friend Tyler and I are preparing to release (more about that later). We’ve decided to use cryptography to reduce the likelihood that our licensing schema will be compromised (for relatively little effort on our part). We also decided to base32 encode the actual keys to make them easier to read.

Well, the store-front is going to be a Rails app, of course. Ruby has a module to base64 encode, but it doesn’t have one to base32 encode. So, I wrote one, and I did it test first (of course).

The first four tests were easy. Really short strings, but they worked out most of the kinks. But, I wanted something that would boost my confidence further. So I wrote the following test which ended up being quite patriotic.

def test_constitution_preamble
  plaintext =<<-EOT
    We the people of the United States, in order to form a more perfect union,
    establish justice, insure domestic tranquility, provide for the common
    defense, promote the general welfare, and secure the blessings of liberty
    to ourselves and our posterity, do ordain and establish this Constitution
    for the United States of America.
  EOT
  encoded = %W(
    EAQCAIBAEBLWKIDUNBSSA4DFN5YGYZJAN5TCA5DIMUQFK3TJORSWIICTORQXIZLTFQQGS3RA
    N5ZGIZLSEB2G6IDGN5ZG2IDBEBWW64TFEBYGK4TGMVRXIIDVNZUW63RMBIQCAIBAEAQGK43U
    MFRGY2LTNAQGU5LTORUWGZJMEBUW443VOJSSAZDPNVSXG5DJMMQHI4TBNZYXK2LMNF2HSLBA
    OBZG65TJMRSSAZTPOIQHI2DFEBRW63LNN5XAUIBAEAQCAIDEMVTGK3TTMUWCA4DSN5WW65DF
    EB2GQZJAM5SW4ZLSMFWCA53FNRTGC4TFFQQGC3TEEBZWKY3VOJSSA5DIMUQGE3DFONZWS3TH
    OMQG6ZRANRUWEZLSOR4QUIBAEAQCAIDUN4QG65LSONSWY5TFOMQGC3TEEBXXK4RAOBXXG5DF
    OJUXI6JMEBSG6IDPOJSGC2LOEBQW4ZBAMVZXIYLCNRUXG2BAORUGS4ZAINXW443UNF2HK5DJ
    N5XAUIBAEAQCAIDGN5ZCA5DIMUQFK3TJORSWIICTORQXIZLTEBXWMICBNVSXE2LDMEXAU===).join
  assert_equal(encoded, Base32.encode(plaintext))
end

Three little, two little, one little-endian

April 24th, 2007

I recently found myself wanting a Cocoa class that represents a set of 8-bit bytes. Cocoa has NSCharacterSet, but that is for unichar, not uint8_t. So I wrote one. It was easy enough, I gave it an array of UINT8_MAX booleans and said that if a particular element in the array was YES then that byte was in the set, and not if the element was NO.

Initially the class only knew how to answer questions of membership: is a byte in the set or not? But then I found a number of places where I was enumerating all possible values and testing for membership, so I figured adding a method that would return a NSData with just the bytes included in the set would be useful.

So I wrote this:

- (NSData *) dataValue
{
  NSMutableData *result = [NSMutableData data];
  for (unsigned i = 0; i <= UINT8_MAX; ++i)
  {
    if (contains[i])
      [result appendBytes: &i length: 1];
  }
  return result;
}

I had unit tests that proved it worked, and they all passed, so I checked in. All was good in the world.

Five days later, I flip open my laptop and decide to use the program this code is part of. I always try to eat my own dog food, and I prefer the freshest dog food I can get. So, whenever I want to use this application, I delete it, update from our Subversion repository, and build it.

Much to my surprise, when I built it on my laptop, some of those tests did not pass. I was expecting the NSData returned from -dataValue to have certain bytes in it. The NSData I actually got back did have the correct number of bytes, but they were all zeroes.

I banged my head against it for about twenty minutes, until I had a flash of insight. My desktop machine at home is an iMac, and inside it is an Intel Core Duo processor. My laptop is a PowerBook, and inside it is a Motorola G4 processor. The Core Duo, like most other Intel processors, stores numbers in the little-endian format, whereas the G4 stores them in big-endian format.

Endianess is a computer topic that makes a lot of programmers’ heads hurt. Unfortunately, Cocoa programmers do have to think about this now. Since Apple switched from their old, big-endian, Motorola platform to their new, little-endian, Intel platform, applications that are meant to run on both have to be aware of byte-order issues.

Computers store data in bytes, which are eight bits long. However, eight bits is only enough to store a number up to 255. In order to store larger numbers, computers just concatenate bytes together. A 16-bit number is comprised of two bytes, and a 32-bit number is comprised of four. The endianess of a system determines what order those bytes are stored in.

When you read a decimal number like 4242, you read it from left to right. The most significant digit is the left-most digit. Similarly, when you read a binary number like 1000010010010, the most significant digit is the left-most digit. If we divide that number into bytes, 00010000 10010010, the left-most byte is called the most significant byte, or the high-order byte. The right-most byte is called the least significant byte, or the low-order byte.

A big-endian processor, like the G4, stores numbers exactly like you’d read them. So if you read a 16-bit integer in big-endian order, the first byte you read is the high-order byte. Now, if the number is less than 255, for example 42, you’ll get this: 00000000 00101010.

A little-endian processor, like the Core Duo, stores numbers just the opposite of how you’d expect. The first byte you read is the least significant byte, followed by the next most significant byte, and then so on. So when we read our binary number in we’ll get 10010010 00010000 instead of what we expected. Now, if we look at that small number again, you’d get this: 00101010 00000000.

So, to bring this back to my bug. The unsigned type is actually an unsigned 32-bit integer. Since my code was manipulating a set of 8-bit numbers, every single number would fit into the low-order byte of that unsigned, thus leaving the other three bytes all zero.

The line of code where I do this:

[data appendBytes: &i length: 1]

Is a clever little trick I’ve used to avoid having to actually declare a one-byte array when I want to append just one byte. It works great if i is actually an uint8_t. It also works great if i is an unsigned and stored in little-endian format, since the first byte happens to be the byte I’m interested in. However, on a big-endian processor, that will reference the most significant byte of the number instead, and since i never gets any bigger than UINT8_MAX (which is 11111111 in binary), that byte will always be zero.

So now the code looks like this:

- (NSData *) dataValue
{
  NSMutableData *result = [NSMutableData data];
  uint8_t byte[1];
  for (unsigned i = 0; i <= UINT8_MAX; ++i)
  {
    if (contains[i])
    {
      byte[0] = i;
      [result appendBytes: byte length: 1];
    }
  }
  return result;
}

The compiler knows to do the correct conversion between the 32-bit and 8-bit types when assigning from one to another, so the new code now works on both of my machines.

Update: The title is a joke that Erica made up when I told her about this bug. All blame for its terribleness should go to her, I just recognized how apropos it was for the post.

Podcast

September 4th, 2006

So I decided to start a pod cast. Check it out: The Agile Mac.

J3Testing 1.0

September 1st, 2006

I decided to make my J3TestCase code into an actual framework. This way I don’t have to copy the files each time I want to use the class.

I’ve put up a disk image with binaries and one with source.
The source is worth looking at, especially to see how I made targets to automatically build those disk images.

These replace the old J3TestCase code I had posted, and I’ve removed that tarball. So, sorry if I broke that link. This is a much better way to deploy anyway.

Quirky Behavior in String#gsub

August 31st, 2006

At my office I develop in Delphi. We use Delphi 2006. As far as IDEs go, it’s not that great. For example, when you tell the Delphi 2006 IDE to do a build all (something you’d think developers do quite frequently), it has a very annoying behavior: it eats up scads of memory. In fact when the build all operation completes on our project group, Delphi has laid claim to over 1GB of memory, and it won’t let it go until you quit the application. But, this post isn’t about Delphi or its buggy IDE. It’s about ruby. More specifically, it’s about a quirk (read: bug) in ruby.

The String class in ruby has a method called gsub. This method takes two parameters and each can take two types of object. The first parameter can either be a Regexp or a String, and it represents what is to be replaced. The second can either be a String or a block, and supplies the value with which to replace it. This seems perfectly natural.

Now, if you’ve ever used regular expressions, you probably know about back-references. When you use the grouping operator in a regular expression (e.g. ^a(ab)b$) it stores a numbered back-reference to the matched value of each group. In ruby you can reference these with the special variables $1, $2, and so on. But, if you are passing a string as the replacement, it will only be interpolated once and those back-references won’t be correct. So, what gsub does is let you put in \1 and \2 instead.

That behavior is awesome, and exactly what you want, if you’re matching a regular expression. But if you’re just matching a string literal, there is absolutely no reason to do it. In fact, if all you’re doing is matching a string literal those back-references will all be the empty string.

So, how do I know all this? Well, because Delphi 2006’s build all operation bites, we wrote a ruby script to replace it. This script has to do file-name manipulation and all sorts of other string manipulation in order to get all of the correct compiler options. One of the things it does is replace strings like $(CodeBase) with a path such as c:\svn\trunk. Well, we have separate code bases for our branches, and they have names like c:\svn\2006. You see that \2 there? Yeah, that one, right in the middle of the path. Even though the script was matching a string literal, gsub was replacing back-references. Since the path happened to have a \2 in it, it would end up coming out of gsub as c:\svn006, and that certainly wasn’t right.

Thankfully, there is a simple work around. Instead of providing a string for the replacement, we can provide a block. That block gets called every time and the value that it returns is exactly what gets used as the replacement.

Layout, design, graphics, photography and text all © 2005-2007 Samuel Tesla unless otherwise noted.

Portions of the site layout use Yahoo! YUI Reset, Fonts & Grids.