Saturday, December 22, 2007

Strings and StringBuilder optimizations

I keep running across code like this:
public String THIS = "this ";
...
StringBuilder sb = new StringBuilder();
sb.append(THIS);
sb.append("is ").append("a ");
sb.append("test: ");
sb.append(5);

It is hard to read and maintain, and wastes the time of the developer who wrote it because there is basically no benefit to writing it. The (Sun) Java compiler will do some optimizations which end up allowing coders to write nicer looking code and avoid lots of extra typing or copy and paste. Let's rewrite the same code the way we would like to if we were not trying to over optimize.
  String test = THIS + "is " + "a " + "test: " + 5;
That's easier to read and the compiler turns this into a StringBuilder anyway. It is easily prove n by runing this code through a debugger and tracing it.
There is another related trick that the compiler will do for Strings.
  String test = "this " + "is " + "a " + "test: " + 5;
That becomes "this is a test: 5" in the bytecode (this is in the language spec). Very easy to read and zero cost at runtime.

As always, the old rule of thumb applies. Always optimize last (or never). Readable and maintainable code is far more important that a minor optimization which possibly saves a few nanoseconds (and in this case saves none).

Before anyone gets the wrong impression, there are perfectly viable cases for writing StringBuilders in your code. Creating a string from a large number of smaller strings within a loop is the most common use case. For example:
StringBuilder sb = new StringBuilder();
sb.append("Counting: ");
for (int i = 0; i <>
sb.append(" ").append(i);
}
String test = sb.toString();

In this case, using a single string and writing += would be more costly and should be avoided for any code that will run often.

NOTE: There is a pitfall here that one can fall into fairly easily. This example of what NOT to do shows how easy this is:
StringBuilder sb = new StringBuilder();
sb.append("Counting: ");
for (int i = 0; i <>
sb.append(" " + i + " ");
}
String test = sb.toString();

In this case, there will be a new StringBuilder created for each iteration through the loop. Do not use "+" concatenation with variables inside the loop (as shown inside the inner append above). This will end up producing another StringBuilder if there is a variable (it is fine if you are using constants). The loss from the creation of many additional StringBuilder objects will offset the gains from using the outer StringBuilder.
sb.append(" " + i + " ");
is efficiently written (when inside a loop) as:
sb.append(" ").append(i).append(" ");

The rule of thumb I use is that if I am concatenating strings on a single line of code then I use "+", otherwise I use a StringBuilder. I sometimes break that rule if I am just concatenating on 2 or 3 lines and the code would be more readable if I used "+" (since the savings are so small in that case).

For those of you runing Java 1.4, the optimization is still there but uses StringBuffer instead.

The key takeaway here is that writing readable and maintainable code should always be a priority over writing optimized code and this is one case where the compiler makes that a little bit easier on the developer.

A few helpful related links for more in depth study:
http://nicklothian.com/blog/2005/06/09/on-java-string-concatenation/
http://java.sun.com/developer/JDCTechTips/2002/tt0305.html
http://www.precisejava.com/javaperf/j2se/StringAndStringBuffer.htm

1 comment:

Patrick said...

Nice post. This is something I've known for a few years, but when enough people tell you of the evil of using '+', you start to doubt yourself. I'm glad I came across your post to reaffirm the reality of the situation.