Performance history about StringBuffer/String with JDKs
Recently in Apache Ant, we have had bug #37169 regarding a massive slowdown for the sql task with JDK 5 (36 min) while it was fine with JDK 1.4 (a few sec). The sql commands were extremely long and the bug reporter was nice enough to do some quick debugging and provide an immediate and useful report about the problem. It was due to an innocent use of the StringBuffer.toString().endsWith() within a loop.
I have been following rather closely the implementation of the JDK classes StringBuffer/String over the years and I'm very surprised how it is evolving in each major or even minor release. What is interesting is that the fastest way to do things can become the worse in the next release.
In JDK 1.2, there was the notion of data sharing between a StringBuffer and a String. Doing a StringBuffer.toString() call was actually only wrapping a String object over the data array. So no memory copying was done for scoped use of memory objects. It was a performance need since it was the only way to actually use methods such as indexOf, startsWith, endsWith, etc... no memory copying was occurring and the temporary String object created was very small for the VM.
-
sb.append("something");
-
// here sb, s1 share the same 10240-character array
-
sb.setLength(0);
-
// here s1 is of still getting the reference to the old array
-
// but sb just created a new 10240-character array
-
sb.append("hello");
Caveat, as you noticed, if you were actually reusing a buffer to generate String objects with a very heterogeneous length, EVERY String that was created via toString() was actually holding an array the size of your buffer (10K in our example). So assuming for example you had a buffer size of 10K, EVERY String created, was actually 10K, even if it was holding only 1 character. This could creates massive memory use very quickly. The idiom to use was to actually call StringBuffer.substring() to get only the real content:
-
...
-
sb.setLength(0);
-
...
In JDK 1.3, was introduced a new allocation logic to workaround this potential toString() massive memory use when reusing a StringBuffer via setLength(0) and keeping String references. It was chosen to actually, not create a a new char array the same size as the previous content, but rather to create a new 16-character array, defeating the purpose of pre-sizing your StringBuffer. So StringBuffer reuse after calling toString() virtually became totally useless. The advantage of this logic is that if you were keeping references to String created via toString, there was potentially less memory use, but of course it defeats the initial purpose of reusing the buffer, since if you were storing large content, you had to go again through the transparent reallocation mechanism, thus doing unecessary memory copying.
-
String s1 = sb.toString(); // s1 holds a reference to an empty array (but a 10240-char allocated one)
-
sb. setLength(0); // unshare the array, and create a new 16-char array one internally
So here again, the workaround was to use StringBuffer.substring(0) to avoid using this sharing mechanism that was going in your way. (yes the setLength is doing it's 16-character magic only if the content is shared, so you'll want to avoid toString() calls that set the status as shared).
In JDK 1.4 and 1.4.1rc this led to bug #4524848 and 4724129 (one should note that the allocation mechanism in 1.4.0 was reverted to something similar to JDK 1.2 in JDK 1.4.1rc but reverted again for 1.4.1-05)
In JDK 5, sharing strategy changed again. It is gone and does not exist anymore, probably wiped out by the massive refactoring and creation of StringBuilder and introduction of new methods and bug fixing. This becomes actually less important because the new StringBuffer API provides methods that were only part of String but this creates a potential bottleneck for previous code. Each and every call to toString() is copying the whole data to a new String object, just like substring(0). So if you have a massive StringBuffer and if you were naively looking to use toString().endsWith() to cut it piece by piece you were copying massive amount of data in your JVM. It actually annoyed a couple of people and there is bug #6219959 open at Sun.
In Apache Ant, we do our best to be compatible with a wide range of environment and take care about backward compatibility. So, the best solution I found to solve the StringBuffer.toString().endsWith() and make it work with every JDK was...to code it by adding another method to our beloved StringUtils class and of course there is an associated testcase
Apparently our Xerces friends did some fixes about this recently too.
It's interesting that JDK releases can have such major impact. We're talking about a basic case which is a string created from a buffer and we spend our time building distributed, asynchronous, multithreaded applications on top of many protocols and frameworks, and everything is supposed to work on a wide range of computers with no update problem and we need to provide a good estimate of performance at design time. Wow !
Seriously, sometimes I feel like that a well designed and developped software is more like voodoo than engineering. More certainly that's a craft...and you can be damn proud of your achievement when you make readable, fast and reliable software.







Very interesting, Stephane, thanks for the information post. I’m just working through adding some new StringBuffer and String checks to PMD; for example, there’s this one:
http://sourceforge.net/tracker/index.php?func=detail&aid=1274198&group_id=56262&atid=479924
that finds code like:
String foo = rs.getString(“somefield”);
if (foo.toUpperCase().equals( “GENERIC” ) ){
}
rather than using String.equalsIgnoreCase(), and this one:
http://sourceforge.net/tracker/index.php?func=detail&aid=1295534&group_id=56262&atid=479924
which finds code like this:
void foo(StringBuffer sb) {
if (sb.toString().equals(“”) ){ // just use sb.length() == 0 !!
}
}
Never a dull moment…
Comment by Tom Copeland — November 24, 2005 @ 6:35 am
Also, we’re adding “migration” rules to aid in migrating from one JDK version to another:
http://pmd.sourceforge.net/current/rules/migrating.html
Only a few simple ones in there so far (replace Vector with List, etc), but maybe I can add some of the things you’re talking about here…
Comment by Tom Copeland — November 24, 2005 @ 6:36 am
Tom, that’s cool. Thanks for your informative comment. Keep up the good job on PMD !
Comment by stephane — November 25, 2005 @ 12:32 pm
Thanks!
Comment by Tom Copeland — November 26, 2005 @ 5:14 am