» tagged pages
» logout

sorted by: recent | see : popular
Content Tagged with essays + rants

FOSS Sin: Pointless Duplication of Effort | Fred Trotter

Detailed explanation of why two open source efforts working on the same thing is a pointless duplication of effort, and how to tell the difference between pointless and pointful parallel efforts. Uses FOSS electronic health/medical record software as exa

open-source: del.icio.us tag/open-source

When duplication is not duplication

I was looking through some C code today, and stumbled across this lovely little gem:

1
2
3
4
5
tmp = "\"#";
while (*tmp) {
  FD_SET(*tmp, url_encode_map);
  tmp++;
}

Now, be honest. I don’t care how good you are at C, it takes you a few brain cycles to process that and figure out that it is just setting two bits in a bit field. It really should have been written like this:

1
2
FD_SET('"', url_encode_map);
FD_SET('#', url_encode_map);

This raises the question: why wasn’t it? I’ll tell you why:

Programmers have this burning desire to avoid code duplication. We’re taught, almost since the cradle, to abhor duplicated code and to avoid it all cost. Duplicating code is evil, it leads to unmaintainable code, and propogates bugs. Never, ever, do it!!!

Allow me to let you in on a little secret.

Calling the same function twice is NOT duplicating code. Not if the arguments change between calls.

Even calling the same function three times in a row is kosher. Four times, even. At some point, you might want to consider a loop, if the arguments can be determined functionally, but only do so when the list of similar function calls is harder to read and understand than the loop is. This is often when the loop takes fewer lines of code than the function calls do:

1
2
3
4
for (i = 127; i < 256; i++) {
  FD_SET(i, hdr_encode_map);
  FD_SET(i, url_encode_map);
}

There. Had to get that off my chest. Now, back to work.

Capistrano: the { buckblogs :here } - Home

Never. Ever. Cargo-cult.

I was told today on a mailing list that some people have been justifying their coding decisions by saying things like “but that’s how Jamis does it!”

And I was mortified. Because someday a time will come (and likely already has!) when the things I’ve written will be surpassed by a better way, and I will wilt with embarrassment if anyone uses “that’s how Jamis does it” to justify continuing with the antiquated style.

I’m learning, constantly. Every project I undertake teaches me something new. Every programmer I’ve ever worked with has shown me a better way to do things. “How X does it” (for absolutely any mortal value of X) is a moving target, and if you’re blindly basing your designs on something I (or anyone else) wrote a year or two ago, then you should step cautiously.

Never. Ever. Cargo-cult. If someone writes about something that you find clever, understand why you think it is clever. If someone preaches a better algorithm, understand why the algorithm is better. And if someone asks why you do something a certain way, argue it on it’s own merits, without resorting to an appeal to someone’s (supposed) authority. If you can argue that something is better than something else solely by contrasting it’s pros and cons against the alternative, you’ll be taken much more seriously. And you’ll have a much better chance of recognizing a better way when it is presented to you.

I’ll say it again. Never. Ever. Cargo-cult. Ever.

That said, I’ve been very, very quiet lately, and I apologize. I’ve been rethinking some priorities and experimenting with some new interests. Also, I’ve been trying to finish up (finally) Net::SSH v2 and Net::SFTP v2. Hopefully this year I’ll climb out of the hole I dug for myself last year and have more to blog about again.

Capistrano: the { buckblogs :here } - Home

Method visibility in Ruby

A common point of confusion to even experienced Ruby programmers is the visibility of public, protected, and private methods in Ruby classes. This largely stems from the fact that the behavior of those keywords in Ruby is different from what you might have learned from Java and C.

To demonstrate these differences, let’s set up a little script:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
class Foo
  def a; end

  # call 'a' with explicit 'self' as receiver
  def b; self.a; end

  # call 'a' with implicit 'self' as receiver
  def c; a; end
end

def safe_send(receiver, method, message)
  # can't use 'send' because it bypasses visibility rules
  eval "receiver.#{method}"
rescue => e
  puts "#{message}: #{e}"
else
  puts "#{message}: succeeded"
end

visibility = ARGV.shift || "public"
Foo.send(visibility, :a)

foo = Foo.new
safe_send(foo, :a, "explicit receiver       ")
safe_send(foo, :b, "explicit 'self' receiver")
safe_send(foo, :c, "implicit 'self' receiver")

Basically, the script just creates a class “Foo” with three methods: a, which we’ll invoke directly with an explicit, non-self receiver; b, which invokes a with self as receiver, and c, which invokes a with an implicit receiver of self. We’ll use the safe_send method to call each of those methods and log the result.

So, first: the public keyword. In Ruby, public means that the method may be invoked just about any way you please; in technical terms, the receiver of the message may be either explicit (“foo.bar”), self (“self.bar”) or implicit (“bar”).

1
2
3
4
$ ruby demo.rb public
explicit receiver       : succeeded
explicit 'self' receiver: succeeded
implicit 'self' receiver: succeeded

The protected keyword puts a straitjacket around the method. Any method declared protected may only be called if the receiver is self, explicitly or implicitly. (Update: protected methods may actually be called any time the receiver is of the same class as ‘self’...and an explicit self as receiver is just a specific case of that. Modifying the script to demonstrate this condition is left as an exercise for the reader.)

1
2
3
4
$ ruby demo.rb protected
explicit receiver       : protected method `a' called for #<Foo:0x3fc18>
explicit 'self' receiver: succeeded
implicit 'self' receiver: succeeded

Lastly, the private keyword is the tightest setting of all. A private method cannot be called with an explicit receiver at all, even if that receiver is “self”.

1
2
3
4
$ ruby demo.rb private
explicit receiver       : private method `a' called for #<Foo:0x3fc18>
explicit 'self' receiver: private method `a' called for #<Foo:0x3fc18>
implicit 'self' receiver: succeeded

Note that, unlike languages such as Java, inheritance plays absolutely no part in determining method visibility in Ruby. Subclasses can access both protected and private methods of the superclass without trouble, so long as they abide by the rules laid out above.

The difference between protected and private is very subtle, as you can see, which explains why protected is rarely used by most Rubyists. If it is used at all, it is generally as a convention, to document methods that are internal to the class, but which lie closer to the public interface than others. In Rails, for instance, you might declare your controller filter methods and model validation methods as “protected” (because the framework will call those methods) and reserve the “private” designation for those methods that are only ever called from within your own model or controller code.

Capistrano: the { buckblogs :here } - Home

Scaffolding's place

Scaffolding, scaffolding, scaffolding… In a recent article I said that “I have lots of issues with scaffolding”. Why would that be? I mean, what’s not to like about scaffolding, really? It’s all about rapid application development, and prototyping, and getting real, isn’t it? Isn’t it?? WELL????

Specifically, the issue I have with scaffolding is this: it puts the emphasis on the application’s model, instead of the user interface. It assumes that you know the domain of the application before you know how the user is going to interact with it. It assumes that the user interface can successfully follow your conjured domain. It assumes, frankly, far too much.

Now, don’t get me wrong: as a pedagogical aid, scaffolding is great. It lets newcomers to Rails quickly get a skeletal app up and running, giving them a platform from which to beginning learning Rails without stumbling over too many details. That’s great. But scaffolding is not for building real applications.

Your users don’t care about the data model. Face it, they just don’t care. They will never interact with the data model. They will never interact with your carefully crafted schema. They interact with the UI. Therefore, it is very important that when you start an application, you start with what the users will care about. Get the UI right. Sketch it out, mock it up, get it real. Once you have a “real” UI to work from, it is amazing how much it can tell you about the application’s domain.

A single screen can tell you more about what models you need and the relationships between them than a hundred-page written specification. A picture really is worth a thousand words. And the remarkable thing is this: the model you infer from the UI is often not what you would have created had you gone for the model first.

Furthermore, working with scaffolding makes it nigh impossible to do test-driven development, whereas working from a UI makes it very, very easy. With scaffolding, what tests would you write first? What is the behavior your want your final product to have? That’s not a very easy question to answer when all you know is the set of models you think your application needs.

When working from a UI, though, you can look at all the elements and data on the page and immediately start seeing what tests you need. “If the user is an administrator and they view the page, they ought to see this link, but otherwise that link is hidden.” BAM, instant test case. And you immediately know you’re going to need (at the very least) “users”, some of whom can be “administrators”.

I’ll say it again, scaffolding is a great learning tool, like training wheels or parachuting in tandem with an instructor. But when you do the real thing, those training wheels come off. You jump from the plane alone. You design the UI first.

Capistrano: the { buckblogs :here } - Home

Prolog in Ruby

About a month ago, I began experimenting with Prolog. (If you’re a Mac user wanting to tinker with Prolog, I’d recommend SWI-Prolog. I couldn’t get any other prolog implementation to build or run on my MacBook Pro.) I’m certainly not an expert now, and I’m not leaving Ruby for Prolog, but I did learn enough to appreciate the power of logic programming. (Curiously, I found that logic programming is very similar to functional programming in some respects.)

How timely, then, was Mauricio Fernandez’s article today about Logic Programming in Ruby.

It is cool stuff, to be sure! Prolog, in Ruby. You could just drop Mauricio’s library into your app and have a logic engine available for you, using a Prolog-esque DSL. (A previous article on a similar topic, but which only described a possible DSL, is here.)

That Prolog DSL in Ruby is an excellent first step. It opens all kinds of doors. The next step, I think, is a way to do logic programming in Ruby, using a Rubyish syntax. Prolog is nice and all, and its syntax (intentionally) mirrors the mathematic syntax of formal logic, but admit it: unless you’re familiar with that formal syntax, the meaning of a Prolog program is about as transparent as a two-year-old Perl program. Consider the following example from Mauricio’s article:

1
2
3
4
5
6
7
8
9
10
11
sibling[:X,:Y] <<= [ parent[:Z,:X], parent[:Z,:Y], noteq[:X,:Y] ]
parent[:X,:Y] <<= father[:X,:Y]
parent[:X,:Y] <<= mother[:X,:Y]

father["matz", "Ruby"].fact
mother["Trude", "Sally"].fact
father["Tom", "Sally"].fact
father["Tom", "Erica"].fact
father["Mike", "Tom"].fact

query sibling[:X, "Sally"]

Wouldn’t it be cool if you could define that with something closer to natural language? (Natural language, I know, introduces all kinds of ambiguities, which is why mathematicians use a more rigorous formal language for describing things like logic, but just follow along for a minute.) The following has not been implemented (at least by me), but wouldn’t it be nifty if it worked?

1
2
3
4
5
6
7
8
9
10
11
12
13
:X.sibling_of(:Y).if :Z.parent_of(:X).and(:Z.parent_of(:Y)).and(:X.noteq(:Y))
:X.parent_of(:Y).if :X.father_of(:Y)
:X.parent_of(:Y).if :X.mother_of(:Y)

"matz".father_of "Ruby"
"Trude".mother_of "Sally"
"Tom".father_of "Sally"
"Tom".father_of "Erica"
"Mike".father_of "Tom"

# returns an Enumerable of the possible solutions
result = :X.sibling_of("Sally").solutions
result.each { |solution| p solution }

Maybe that’s too verbose, or too much syntax. I’m sure it’s a little naive. (the Towers of Hanoi example, for instance, is hard to convert to this kind of syntax.) It’s pretty much off the top of my head, and could no doubt be made better. Nevertheless, I think it reads more naturally than Prolog, and feels more like Ruby.

Perhaps I’ll tinker on this…I’ve got at least one side project that could use a logic engine, and I’d love to use one with a clean, Ruby-esque syntax. If anyone beats me to the punch, though, I won’t be disappointed.

Capistrano: the { buckblogs :here } - Home

Indexing for DB performance

Isn’t Rails great? It makes interacting with your database so easy, and removes almost every vestige of SQL from the development process. You can build and mutate your entire database schema (thanks to ActiveRecord::Migration and ActiveRecord::Schema), go crazy shoving data into your database (with ActiveRecord::Base.create and friends) and query your data in a very friendly Ruby DSL (ActiveRecord::Base#find).

Wonderful! But I think most of us have experienced the puzzlement and frustration of wondering why our application, which ran so beautifully during testing and for the first few days or weeks after launch, is suddenly running slower and slower, and why our database is being so incredibly overworked. What happened?

Chances are, you forgot to add indexes to your tables. Rails won’t (and, honestly, can’t) do it for you. In fact, Rails doesn’t even try to tell you where those indexes might be needed. And without those indexes, the only recourse the database has when fulfilling your query is to do a “full table scan”, basically looking at each row in the table, one at a time, to find all matching records. That’s not too bad when there are only a few tens (or even thousands, on a fast machine) of rows, but when you starting getting tens of thousands, hundreds of thousands, or even millions of rows, just imagine how hard your database has to work to satisfy those queries!

So you may be wondering, “alright, I need indexes…how do I know what indexes to create?”

Here are a few general tips. My experience is primarily with MySQL, so that’s where my advice is directed, but I believe most of these tips apply regardless of your DBMS:

  • If you have a foreign key on a table (or, phrased another way, you have a belongs_to, has_many, has_one, or has_and_belongs_to_many association on a model), then you almost certainly need to add an index for it, because any time you access those associations, Rails is generating SQL under the covers that queries based on those foreign keys.
  • If you find yourself frequently doing queries on a non-foreign-key column (like user_name or permalink), you’ll definitely want an index on that column.
  • If you frequently sort on a column or combination of columns, make sure the index that is being used for the query includes those sort columns, too (if at all possible). Indexes store the data in sorted order, so if your index includes the sort column, the database can return the sorted data at almost no extra cost.
  • Many databases (like MySQL, or Postgres prior to 8.1) will only use a single index per table, per query, so make sure you have indexes defined for the column combinations that you will query on frequently. A common mistake is to define an index on “user_name” and an index on “account_id”, and then expect the database to use both indexes to satisfy a query that references both columns. (Some databases will use both indexes, though; be sure and understand how your DBMS uses indexes.)
  • Don’t go crazy defining indexes. It is tempting to just add an index on every column that could conceivably be queried on, just to preemptively destroy any possible DB performance problems that may arise. This is bad. Too many indexes can be just as bad as too few, since the DB has to try and determine which of the myriad indexes to use to satisfy a particular query. Also, indexes consume disk space, and they have to be kept in sync every time an insert, delete, or update statement is executed. Lots of indexes means lots of overhead, so try to strike a good balance. Start with only the indexes you absolutely need, and try to use only those. As problem queries surface, see if they can be rewritten to use existing indexes, and only if they can’t should you go ahead and add indexes to fix them.
  • EXPLAIN (MySQL) or ANALYZE (Postgres) (or whatever means your DB provides) are your best friends. Get to know them. Learn how to read their output. They will tell you what indexes (if any) a query will use, and how the database expects to be able to fulfil the query. It is a good idea to play with these commands during testing, to try and locate problem spots before they become problems. Note, though, that the number of rows in a table can affect how the database chooses indexes, so just because your query looks fine with only a handful of test rows in the database, don’t expect it to perform well when there are thousands of rows. In a perfect world, you could test your app with a large corpus of real data. In an imperfect world, you just have to make do.

In short, know your database. As convenient as ActiveRecord makes things, never assume you can get along with zero knowledge of SQL and how your database will work. Find a good book about your DBMS of choice. Read up on it. Take the time to educate yourself—it will pay off handsomely in the long run.

Capistrano: the { buckblogs :here } - Home

Just say "no" to certification

Pat Eyler is looking into designing a certification program, in conjunction with a university course. This really got me thinking.

As a general rule, I believe certifications are a joke. Plain and simple. When I was at BYU, and the mandate came from the suits that we had to drop everything and become Java certified, I saw firsthand what a joke it was. The very idea that a test can, in any way, imply competence is laughable.

Now, I know and respect Pat. He’s got more planned for this than just a test, and that’s great. I certainly commend the idea of a Ruby course. But I have to plead against the introduction of “certification.”

Can certification produce competent programmers? I say “no”. If you are certified and are competent, then you were competent before you were certified. The two have no relation, except insofar as the certification process might ignite the passion of a competent programmer to improve themselves. The problem is that you don’t have to be passionate or competent to take and pass these tests. You just have to be good at memorizing and cargo culting.

Certifications are used primarily by ignorant decision makers as a discriminator. Thus, if someone wants to get noticed by said decision makers, they need to take and pass the test. It’s certification for certification’s sake. This encourages anything but learning. It encourages large-scale mediocrity, caused by people memorizing exactly what the test demands, and nothing more. It encourages learning out of context. It encourages cargo culting, rather than original thinking.

And what happens to the community when this happens? It becomes diluted. The passion gets leeched away. The language becomes inundated by people with little concern for the language itself, or for what they will use the language. They have little care for the community, except insomuch as the community can help them solve their own problems. They take. They demand. They question. They do not give. And the community suffers.

So please, Pat, and anyone else out there that is contemplating a certification program of any sort: don’t do it. By all means, educate, teach, spread the word, and encourage passionate programmers. But don’t certify.

Capistrano: the { buckblogs :here } - Home

Don't be afraid of harnessing SQL

Even after ten years of working with SQL, I still find myself tickled by how powerful it is, in spite of its warts.

In Basecamp, users can create to-do list “templates”. Each template is essentially just a name, an optional description, and a bunch of items. Once defined, users can create new to-do lists based on one of these templates.

We used to do this entirely via the ActiveRecord helper methods. First, we’d create a new list, and then creating the items for the list one at a time, for each item in the template. It looked something like this:

1
2
3
4
5
6
7
8
9
10
11
class TodoListTemplate < ActiveRecord::Base
  has_many :todo_item_templates

  def instantiate
    list = TodoList.create(:name => name, :description => description)
    todo_item_templates.each do |item|
      list.todo_items.create :content => item.content
    end
    list
  end
end

This worked, but was very inefficient. It results in a lot of SQL statements being sent down the pipe, mostly because we’ve got some before_create hooks and observers set up that perform work for each new to-do item that is created. As our traffic grew, we started running into deadlock issues. All those hooks and observers, so convenient at the time, were now wreaking havoc on the database.

The problem was easily solved. First of all, a little thought helped me see that those hooks and observers were either not needed in this case, or could be done slightly differently. Secondly, instead of copying each item template to an item, one at a time, we could do it all in SQL, as a single statement. Here’s more or less how we rewrote it:

1
2
3
4
5
6
7
8
9
10
11
12
13
def instantiate
  list = TodoList.create(:name => name, :description => description)

  TodoItem.connection.insert <<-SQL, "Populating items"
    INSERT INTO todo_items (todo_list_id, content, position, created_at)
      SELECT #{list.id}, content, position, UTC_TIMESTAMP()
        FROM todo_item_templates
       WHERE todo_list_template_id = #{id}
  SQL

  list.todo_items.reset
  list
end

Basically, the INSERT takes the associated SELECT statement, and inserts the results of each returned row into the todo_items table. Not only is this blazing fast, but it is much nicer to the database.

Once everything has been inserted, we call todo_items.reset, to force the todo_items association on the list to be unloaded, and then we return the list.

Your own situation may require more or less logic than this. You may even be completely fine doing everything via ActiveRecord. But if you find your application beginning to flounder in places where you are doing lots of database queries, consider rethinking those areas to consolidate some of that work.

Don’t be afraid of harnessing SQL.

I’ll probably begin publishing these kinds of “best practices” articles to The Rails Way, instead of to this blog. If you want to follow along, be sure and subscribe to that feed, too.

Capistrano: the { buckblogs :here } - Home