Thoughts on Libraries and Optimization

NOTE: I have reconsidered large parts of this post since it was written ~4 years ago. Some of these topics are revisited in later posts. I’m leaving this post as-is, essentially, as proof that I am occasionally an idiot.

I’ve been tinkering with C++ lately. I knew it once upon a time, but that was in 1998 and college was teaching a version that wasn’t terribly removed from C. C++14 has been a bit of a learning curve.

An interesting thing I’ve seen has been a tendency for examples to be unreadable messes. This example from Stack Overflow I found particularly curious:

// trim from end
static inline std::string &rtrim(std::string &s) {
	s.erase(std::find_if(s.rbegin(), s.rend(), std::not1(std::ptr_fun<int, int>(std::isspace))).base(), s.end());
	return s;
}

Say What?

That’s an anti-pattern. Jamming everything on one line is a cheaty way of making a function look short. It hides complexity and makes it difficult to interpret. It’s bad. There are advantages to it as far as compilation and optimization, but we’ll get to the “optimization” excuse later.

I’ve seen that sort of thing a lot (hell, I used to be guilty of it). I actually saw it a lot in the Ruby community, which really turned me off of the language. This particular example took a few seconds of poking through a particularly gnarly Rails project:

pick.members = pick.members.uniq.sort_by{|name| [name.split(' ')[1].downcase, name.split(' ')[0].downcase]}.join(', ')

Again… What? Why not this:

def sorted_members(pick)
	members = pick.members.uniq.sort_by do |name|
		last_then_first(name)
	end
	members.join(', ')
end
 
def last_then_first(name)
	split = name.downcase.split(' ')
	[split[1], split[0]]
end
 
...
pick.members = sorted_members(pick)

Is it more lines? Yep. The programming language doesn’t care about your line count. Is it more readable? Absolutely. Other benefits? Unraveling the single-line bullshit reveals a potential bug: Say you had “Jean Claude Van Damme” in your system. You just sorted him by “Claude Jean.” Good job.

Why would you DO that?

The problem with the above examples is that they are not readable. Also, they’re kind of gross. Unreadable code hides bugs. Readability should trump efficiency, and readability should trump line count. Excessive line count is a sign that you should refactor that function, not that you should compress the code.

Let’s fix the C++ example:

static inline void rtrim(std::string &s) {
	auto non_space = std::not1(std::ptr_fun<int, int>(std::isspace));
	auto erasure_start = std::find_if(s.rbegin(), s.rend(), non_space);
	s.erase(erasure_start.base(), s.end());
}

(disclaimer: I did the above in a text editor without checking compilation.)

Okay, we can kind of see what’s going on here now. There’s a unary predicate for checking a non-space character. It finds an iterator for a non-space character by searching the string backwards. Then it erases the space chars to the end of the string.

It’s better, but it’s still a bit cryptic. You know what’s better?

#include <boost/algorithm/string.hpp>
...
trim_right(your_string);

OMG LIBRARIES

Libraries exist for a reason: They make common tasks faster because someone has already encountered your problem and solved it. This is not a problem you need to solve yourself. You don’t need to write your own trim function.

The “big hammer for a small problem” argument is a fallacy. Unless you have very restrictive performance / memory requirements, the above is fine. You know exactly what it’s doing, you don’t have to roll your own, life is good.

Avoid Premature Optimization

You should avoid nit-picking CPU cycles until you actually have some need to do so. The same goes with memory. In this case, throwing in the Boost library makes things super simple. If it turns out to be a problem, fine, come back and write your own. The developer who sits there writing their own versions of library functions will slow everything down, and possibly introduce weird side-effect bugs (particularly if everything’s jammed on one line where nobody will be able to read it).

Do you know how many instructions your CPU can handle in a second? A quick Google and glance at the Wikipedia article on Instructions per second yields that the CPU in my laptop does something on the order of 120,000 MIPS. That’s 120 billion. It has 16GB of RAM, so a 1MB binary isn’t going to bother anyone. Complaining about using a library or a string copy instead of reference here and there is silly.

Don’t get me wrong, it’s good to care about your code, but there’s taking care to use references instead of copies, and then there’s reinventing the wheel because “eww, that’s a big library.”

An Intel 486DX can do 11.1 MIPS. 11,100,000. Think about that. We’re about 4 orders of magnitude beyond a system where this might be a problem.

The clarity is worth it, at least until it objectively isn’t. ;)