Perl, Python, Ruby, PHP, C, C++, Lua, tcl, javascript and Java benchmark/comparison.

Understanding difference(s) between programming languages is crucial. If wrong language is chosen for a project it will take a lot of time and efforts to change the course and re-implement the project or its part in different language. Typically it takes years of efforts, misery and dissatisfaction for everyone: yourself, your colleagues, your clients and your systems administrator(s). Needless to mention it can be dangerous for business.
Knowledge of how languages differ from each other is the key to making right decisions. Environments may have different demands - for example what language will be the best choice for VPS with limited RAM? Sometimes it is not easy to answer questions like this, considering many false beliefs and rumors so common among developers.
This testing is designed to demonstrate the difference between popular programming languages.
I hope you consider results of this little research to be interesting.

Method

Test code grows text string by adding another string in cycle until it grows up to 4 mb. Each iteration substitutes some text. Every time string becomes 256 KiB larger program prints number of seconds passed since beginning of test. App's output is being piped to script capturing memory usage (using memstat) for every line printed.

String manipulation is the core functionality for all languages so this allows to compare languages fairly. Processing of large string(s) reasonably stresses memory which manifests difference between language's efficiency.
Because the test case is very simple it is easy to implement it in different languages in similar way. Obviously code itself should not be considered practical because its only purpose is to create some computational load for measurement. Code samples are available for review. All implementations are reasonably accurate and straightforward. Again, similar amount of work done in similar way should be considered fair for comparison.

String processing has been chosen for numerous reasons.
Most applications don't do long calculations. For serious math core functionality of any language is not good enough. Using 3rd party math libraries will make comparison unfair despite the fact that comparing libraries would be meaningless if we want to compare languages, not math libraries.
Moreover integer calculations are not a good subject to test because integer size may be different. Accuracy of floating point calculations is affected by default precision and may be even more hardware-dependent. String processing is essential in every application because strings are just data. More data means more stress for garbage collection etc. Processing of large strings is easy to compare because all the languages in this testing will be doing same amount of work.
Essentially string processing is very common - XML(-RPC), HTML, logs, messages, GUI - all of this processing string at low level even when details of this process are hidden from developer behind API. Strings processing is not accelerated by hardware. By processing large string languages do many memory (re)allocations and if necessary copy data in memory. Efficiency of such processes is the subject of this testing because it shows well enough how languages are different.

Tests run long enough to compare performance and memory usage, but not the time needed for runtime startup. That's why running each test once is good enough for comparison. During experiments I ran every test many times and noticed just a little deviation between results. I considered those deviation to be negligibly small (statistically insignificant) so final comparison made from just one execution of test case for every language without gathering results of multiple tests and comparing their average. Remember that precise numbers are not too important in this test because relative difference manifested very well.

During the test I compared speed, memory usage, and performance degradation as per grow of processed data. When application struggles with more data it affects processing speed which is important characteristic to understand.

Only core language functionality has been used for testing.

Originally I wanted to compare only mainstream cross platform interpreting languages - namely PHP, Perl5, Python, Ruby and Java (Sun's and OpenJDK). Then curiosity made me include C, C++, Javascript ("spidermonkey", Mozilla), Javascript ("V8", Webkit), tcl, Lua and Java GCJ.

Whilst it is interesting to compare languages to each other, Javascripts, tcl and Lua are falling outside of scope so I will not compare their features.
Technically C and C++ should not belong here because they are very different from interpreting languages by nature, however their results are important to match against.

All tests have been conducted on Intel Core2 Duo T7500@2.20Ghz CPU; 2 GB RAM; OS Debian GNU/Linux 2.6.32 i686
During tests there were always enough free memory to fully accommodate running test without swapping and no resource-hungry applications running. However more accurate results can be gathered if X server and most other processes will be stopped for the period of testing. Difference in running the same test with or without swap or with higher priority were negligibly small if any. During tests CPU power management was disabled so both CPUs (cores) were running at maximum speed.

Defaults has been used for all languages but PHP. By default PHP restrict maximum memory usage and maximum execution time. In order to complete test those parameters had to be changed in PHP runtime configuration.

Compilation time needed for C, C++ and Java wasn't counted in this testing.

This comparison consists of three parts:
Part 1: Speed.
Part 2: Memory usage.
Part 3: Language features.

October 2011 update: Python v3 added to comparison.

Speed

Execution speed is obviously important to understand the language. I would say that if you're not considering performance at all you simply don't care about your application. However performance alone is not the most important characteristic and therefore other aspects should be taken into consideration as well.

This table shows number of seconds taken to complete every testing stage.
Line size Kb Perl5 PHP Ruby Python C++ (g++) C (gcc) Javascript (V8) Javascript (sm) Python3 tcl Lua Java (openJDK) Java (Sun) Java (gcj)
256 2 6 7 7 7 2 3 30 17 33 49 39 38 451
512 7 23 29 32 26 8 21 131 81 141 203 162 157 1783
768 16 54 75 78 60 19 51 300 201 324 480 381 371 3937
1024 27 96 141 144 107 34 91 535 373 583 886 711 696 6952
1280 43 153 225 232 167 53 144 842 598 921 1423 1161 1145 10744
1536 62 227 328 342 242 76 208 1220 877 1334 2090 1751 1739 15372
1792 84 318 452 476 329 104 283 1672 1211 1823 2886 2489 2478 20819
2048 109 424 597 634 431 136 370 2203 1598 2387 3856 3370 3358 27132
2304 139 549 758 815 546 173 469 2799 2039 3030 4963 4453 4448 34302
2560 171 691 941 1019 675 214 578 3463 2533 3753 6198 5710 5719 42330
2816 206 849 1143 1248 817 259 700 4198 3070 4553 7568 7146 7186 51118
3072 245 1022 1366 1497 972 309 834 4997 3659 5422 9084 8852 8983 60779
3328 288 1211 1607 1771 1142 363 979 5875 4300 6378 10759 10784 10916 71275
3584 334 1414 1869 2064 1324 423 1136 6825 4992 7409 12594 12696 12867 82619
3840 384 1634 2150 2381 1522 487 1304 7848 5729 8503 14564 14861 15053 94686
4096 437 1869 2455 2720 1731 555 1484 8928 6534 9680 16674 17262 17426 107887

 

This table has the same results in more human-readable format (h:m:s)
Line size Kib Perl5 PHP Ruby Python C++ (g++) C (gcc) Javascript (V8) Javascript (sm) Python3 tcl Lua Java (openJDK) Java (Sun) Java (gcj)
256 0:00:02 0:00:06 0:00:07 0:00:07 0:00:07 0:00:02 0:00:03 0:00:30 0:00:17 0:00:33 0:00:49 0:00:39 0:00:38 0:07:31
512 0:00:07 0:00:23 0:00:29 0:00:32 0:00:26 0:00:08 0:00:21 0:02:11 0:01:21 0:02:21 0:03:23 0:02:42 0:02:37 0:29:43
768 0:00:16 0:00:54 0:01:15 0:01:18 0:01:00 0:00:19 0:00:51 0:05:00 0:03:21 0:05:24 0:08:00 0:06:21 0:06:11 1:05:37
1024 0:00:27 0:01:36 0:02:21 0:02:24 0:01:47 0:00:34 0:01:31 0:08:55 0:06:13 0:09:43 0:14:46 0:11:51 0:11:36 1:55:52
1280 0:00:43 0:02:33 0:03:45 0:03:52 0:02:47 0:00:53 0:02:24 0:14:02 0:09:58 0:15:21 0:23:43 0:19:21 0:19:05 2:59:04
1536 0:01:02 0:03:47 0:05:28 0:05:42 0:04:02 0:01:16 0:03:28 0:20:20 0:14:37 0:22:14 0:34:50 0:29:11 0:28:59 4:16:12
1792 0:01:24 0:05:18 0:07:32 0:07:56 0:05:29 0:01:44 0:04:43 0:27:52 0:20:11 0:30:23 0:48:06 0:41:29 0:41:18 5:46:59
2048 0:01:49 0:07:04 0:09:57 0:10:34 0:07:11 0:02:16 0:06:10 0:36:43 0:26:38 0:39:47 1:04:16 0:56:10 0:55:58 7:32:12
2304 0:02:19 0:09:09 0:12:38 0:13:35 0:09:06 0:02:53 0:07:49 0:46:39 0:33:59 0:50:30 1:22:43 1:14:13 1:14:08 9:31:42
2560 0:02:51 0:11:31 0:15:41 0:16:59 0:11:15 0:03:34 0:09:38 0:57:43 0:42:13 1:02:33 1:43:18 1:35:10 1:35:19 11:45:30
2816 0:03:26 0:14:09 0:19:03 0:20:48 0:13:37 0:04:19 0:11:40 1:09:58 0:51:10 1:15:53 2:06:08 1:59:06 1:59:46 14:11:58
3072 0:04:05 0:17:02 0:22:46 0:24:57 0:16:12 0:05:09 0:13:54 1:23:17 1:00:59 1:30:22 2:31:24 2:27:32 2:29:43 16:52:59
3328 0:04:48 0:20:11 0:26:47 0:29:31 0:19:02 0:06:03 0:16:19 1:37:55 1:11:40 1:46:18 2:59:19 2:59:44 3:01:56 19:47:55
3584 0:05:34 0:23:34 0:31:09 0:34:24 0:22:04 0:07:03 0:18:56 1:53:45 1:23:12 2:03:29 3:29:54 3:31:36 3:34:27 22:56:59
3840 0:06:24 0:27:14 0:35:50 0:39:41 0:25:22 0:08:07 0:21:44 2:10:48 1:35:29 2:21:43 4:02:44 4:07:41 4:10:53 26:18:06
4096 0:07:17 0:31:09 0:40:55 0:45:20 0:28:51 0:09:15 0:24:44 2:28:48 1:48:54 2:41:20 4:37:54 4:47:42 4:50:26 29:58:07

Speed graph
Speed (seconds)

Speed tests fall into 4 categories:
Slowest: Java gcj (native executable)
Slow: Java (openJDK); Java (Sun); Lua
Not-so-fast: tcl; Javascript (spidermonkey)
Fastest: Python; Ruby; PHP; C++; Javascript V8; C; Perl5

As you can see from performance graph, processing speed slows down as the test string grow. The more graph curves up the more performance degrades. Graph reveals that performance of Java and Lua degrades dramatically.
All tested languages are good with manipulation of little strings but as the processed data grow the difference manifests itself.
Slow group [Java, Lua] suffer from severe performance degradation.
There are almost no difference in performance between OpenJDK Java and Sun Java. Lua's performance is very close to Java.
Initially GCJ Java interpreter crashed during the test, however GCJ Java can compile Java code to executable file which completed the test even though awfully slow. Here and below unqualified "Java" means only mainstream Sun/OpenJDK Java.

Let's have a closer look at Fastest group:
Speed (seconds) closer

Pyhon, Ruby and PHP are slightly slower than than C++. This is not a surprise because those languages are optimised well enough.
Javascript V8 completed test slightly faster than C++.
This group of languages shows average slow down while performance of C and Perl5 is almost a flat line on graph indicating very little degradation. It means that C and Perl5 process increasing amount of data at (almost) constant speed.

Unexpected result: somehow Perl5 managed to finish faster than C. This came as unforeseen surprise which I found difficult to explain. Probably Perl does less memory reallocations to accommodate string growth.
I didn't do serious coding in C since 1995 but implementation is quite simple and straightforward so test result stands.

Perl5 is a clear winner with just a little more than 7 minutes needed to finish test against Java with worst result as big as nearly 5 hours to do the same. (Worst result of GCJ Java - almost 30 hours, doesn't worth comparing against)
Perl5 is not only superior in performance but it shows very little slow down on larger data. This is as close to C (compiled to machine code) as it can be for scripting language. Absolutely amazing!
Interesting to note that with "use strict;" Perl completed the same test ~6 seconds quicker.

In the table below Perl5 has been taken as 1 and other language's performance measured in Perls so you can see how many times slower a particular language comparing to Perl5 in this test. Because of performance degradation it will be incorrect to say something like "This is twice faster than That". Some language's performance degrade faster than others so in beginning of this test Java somewhat 20 times slower than Perl5 and in the end Java is about 40 times slower (for same amount of data).
Clearly this is an important characteristic - size matters! This is correspond with observation of some Java applications which behave well under little load and degrade exponentially as the load increases.

Relative speed: Perl5 (fastest) taken as 1.
Line size Kib Perl5 PHP Ruby Python C++ (g++) C (gcc) Javascript (V8) Javascript (sm) Python3 tcl Lua Java (openJDK) Java (Sun) Java (gcj)
256 1 3.00 3.50 3.50 3.50 1.00 1.50 15.00 8.50 16.50 24.50 19.50 19.00 225.50
512 1 3.29 4.14 4.57 3.71 1.14 3.00 18.71 11.57 20.14 29.00 23.14 22.43 254.71
768 1 3.38 4.69 4.88 3.75 1.19 3.19 18.75 12.56 20.25 30.00 23.81 23.19 246.06
1024 1 3.56 5.22 5.33 3.96 1.26 3.37 19.81 13.81 21.59 32.81 26.33 25.78 257.48
1280 1 3.56 5.23 5.40 3.88 1.23 3.35 19.58 13.91 21.42 33.09 27.00 26.63 249.86
1536 1 3.66 5.29 5.52 3.90 1.23 3.35 19.68 14.15 21.52 33.71 28.24 28.05 247.94
1792 1 3.79 5.38 5.67 3.92 1.24 3.37 19.90 14.42 21.70 34.36 29.63 29.50 247.85
2048 1 3.89 5.48 5.82 3.95 1.25 3.39 20.21 14.66 21.90 35.38 30.92 30.81 248.92
2304 1 3.95 5.45 5.86 3.93 1.24 3.37 20.14 14.67 21.80 35.71 32.04 32.00 246.78
2560 1 4.04 5.50 5.96 3.95 1.25 3.38 20.25 14.81 21.95 36.25 33.39 33.44 247.54
2816 1 4.12 5.55 6.06 3.97 1.26 3.40 20.38 14.90 22.10 36.74 34.69 34.88 248.15
3072 1 4.17 5.58 6.11 3.97 1.26 3.40 20.40 14.93 22.13 37.08 36.13 36.67 248.08
3328 1 4.20 5.58 6.15 3.97 1.26 3.40 20.40 14.93 22.15 37.36 37.44 37.90 247.48
3584 1 4.23 5.60 6.18 3.96 1.27 3.40 20.43 14.95 22.18 37.71 38.01 38.52 247.36
3840 1 4.26 5.60 6.20 3.96 1.27 3.40 20.44 14.92 22.14 37.93 38.70 39.20 246.58
4096 1 4.28 5.62 6.22 3.96 1.27 3.40 20.43 14.95 22.15 38.16 39.50 39.88 246.88
Average: 1 3.84 5.21 5.59 3.89 1.23 3.23 19.66 13.92 21.35 34.36 31.16 31.12 247.32

Memory usage

During testing memory usage were captured as per every completed step.

Memory usage
Line size Kb C (gcc) C++ (G++) Perl5 Python Python3 Ruby Lua tcl PHP Javascript (sm) Javascript (V8) Java (gcj) Java (OpenJDK) Java (Sun)
0 1,668 2,932 4,776 5,352 10,328 11,040 2,416 1,236 36,752 7,720 39,272 49,156 72,4832 658,560
256 1,928 3,444 5,052 6,384 13,404 9,620 3,960 13,696 38,040 50,664 47,236 68,320 725,852 661,056
512 2,184 3,956 5,308 5,876 16,476 11,672 5,404 14,720 39,064 29,672 47,636 76,200 725,852 661,056
768 2,440 3,956 5,564 7,676 19,548 7,328 6,428 18,052 40,088 16,872 49,404 84,392 725,852 661,056
1024 2,696 4,980 5,820 6,388 14,420 12,704 7,820 14,716 41,112 53,224 46,540 92,584 725,852 661,056
1280 2,952 4,980 6,076 9,212 15,444 8,604 6,104 15,228 42,136 44,520 47,044 110,072 725,852 661,056
1536 3,208 4,980 6,332 6,900 16,468 11,164 10,572 18,816 43,160 21,480 50,124 118,264 725,852 662,080
1792 3,464 4,980 6,588 7,156 17,492 8,856 11,812 16,252 44,184 38,376 51,916 126,976 725,852 662,080
2048 3,720 7,028 6,844 11,516 18,516 13,724 10,908 16,764 45,208 51,176 47,540 126,976 725,852 662,080
2304 3,976 7,028 7,100 7,668 19,540 12,700 6,644 17,276 46,232 38,376 46,252 161,824 725,852 662,080
2560 4,232 7,028 7,356 7,924 20,564 11,160 15,592 22,912 41,876 41,960 44,452 161,824 725,852 662,080
2816 4,488 7,028 7,612 8,180 21,588 14,748 16,848 18,300 42,388 79,336 50,612 161,824 725,852 662,080
3072 4,744 7,028 7,868 8,436 22,612 15,772 15,716 18,812 49,304 73,704 51,636 161,824 725,852 662,080
3328 5,000 7,028 8,124 8,692 23,636 16,796 19,492 19,324 50,328 39,400 55,996 170,536 725,852 662,080
3584 5,256 7,028 8,380 12,536 24,660 17,820 17,072 19,840 43,924 27,624 46,500 170,536 725,852 662,080
3840 5,512 7,028 8,636 9,204 25,684 18,844 23,276 20,348 44,436 29,160 58,556 170,536 725,852 662,080
4096 5,768 11,124 8,892 9,460 26,708 15,768 20,200 20,860 44,948 96,232 59,836 170,536 725,852 662,080

Memory usage - there is no "mainstream" Java on graph because of constantly high usage.
Memory usage

Result fall into five categories:
Highest: Java OpenJDK, Java Sun
High:Java GCJ
Medium:Javascript V8, Javascript sm., PHP
Low:tcl, Lua, Ruby
Lowest: Python, Perl5, C++, C

Highest group - mainstream Java pre-allocates a fairly big chunk of memory (certain percentage) by default and does memory management inside this chunk. During this test memory usage hasn't change and was constantly high - so it is not present on graph: if included it makes all other results appear as flat lines well below.
To capture internal memory usage I introduced print statements to Java code to show internal memory usage as per string growth. (It doesn't affect performance) Unfortunately printed numbers has no correspondence with string growth. This shows that Java garbage collection works completely independent from application code. Output numbers appeared to be random, sometimes as high as up to 95% of pre-allocated memory. Even if internal memory usage did not correspond with the string size it seems that sometimes Java is using nearly all of its memory before garbage collection (GC) releases some of it.
Java memory management appears to be extremely ineffective which seems to be the primary cause for poor performance. I leave further investigation with specific Java-monitoring tools for those who might find it interesting. Java professionals may also try to improve results with fine tuning using miscellaneous GC parameters.

High group - Java GCJ compiled to native executable. Thanks to this special feature GCJ Java demonstrated predictable behaviour when memory allocation grows together with data processed. Comparing with other non-Java runtimes memory utilisation is huge.

Medium group: Javascript demonstrate more or less consistent grow in memory usage as per data growth. PHP shows very little grow but its heavy runtime uses a lot of memory from very beginning. Despite initial requirements PHP uses memory pretty wise. High memory usage upon startup is not necessarily bad thing: if meant for continuous execution it may be OK to pre-load common libraries. However this may be a limitation for PHP usage on VPS server i.e. when available memory is limited.

Let's have a closer look at Low and Lowest group:
Memory usage magnified

Lua and tcl runtimes are tiny, but their memory management not very effective. Ruby used more memory than Python. Python utilises memory almost as good as Perl5 - perhaps their runtimes are almost the same size. Once again Perl5 performed amazingly well, demonstrating behaviour very similar to C - best among scripting languages. As expected C++ memory usage is roughly between C and Perl5.

As we did in speed test let's take Perl5 as 1 and see how other language's memory usage compares on every step and on average.

Memory usage in Perls + average
Line size Kb C (gcc) C++ (G++) Perl5 Python Python3 Ruby Lua tcl PHP Javascript (sm) Javascript (V8) Java (gcj) Java (OpenJDK) Java (Sun)
0 0.35 0.61 1 1.12 2.16 2.31 0.51 0.23 7.70 1.62 8.22 10.29 151.77 137.89
256 0.38 0.68 1 1.26 2.65 1.90 0.78 2.15 7.53 10.03 9.35 13.52 143.68 130.85
512 0.41 0.75 1 1.11 3.10 2.20 1.02 2.51 7.36 5.59 8.97 14.36 136.75 124.54
768 0.44 0.71 1 1.38 3.51 1.32 1.16 2.35 7.20 3.03 8.88 15.17 130.46 118.81
1024 0.46 0.86 1 1.10 2.48 2.18 1.34 2.30 7.06 9.15 8.00 15.91 124.72 113.58
1280 0.49 0.82 1 1.52 2.54 1.42 1.00 1.65 6.93 7.33 7.74 18.12 119.46 108.80
1536 0.51 0.79 1 1.09 2.60 1.76 1.67 2.73 6.82 3.39 7.92 18.68 114.63 104.56
1792 0.53 0.76 1 1.09 2.66 1.34 1.79 2.27 6.71 5.83 7.88 19.27 110.18 100.50
2048 0.54 1.03 1 1.68 2.71 2.01 1.59 1.46 6.61 7.48 6.95 18.55 106.06 96.74
2304 0.56 0.99 1 1.08 2.75 1.79 0.94 2.25 6.51 5.41 6.51 22.79 102.23 93.25
2560 0.58 0.96 1 1.08 2.80 1.52 2.12 2.89 5.69 5.70 6.04 22.00 98.67 90.01
2816 0.59 0.92 1 1.07 2.84 1.94 2.21 2.24 5.57 10.42 6.65 21.26 95.36 86.98
3072 0.60 0.89 1 1.07 2.87 2.00 2.00 2.23 6.27 9.37 6.56 20.57 92.25 84.15
3328 0.62 0.87 1 1.07 2.91 2.07 2.40 2.22 6.19 4.85 6.89 20.99 89.35 81.50
3584 0.63 0.84 1 1.50 2.94 2.13 2.04 1.58 5.24 3.30 5.55 20.35 86.62 79.01
3840 0.64 0.81 1 1.07 2.97 2.18 2.70 2.21 5.15 3.38 6.78 19.75 84.05 76.67
4096 0.65 1.25 1 1.06 3.00 1.77 2.27 2.21 5.05 10.82 6.73 19.18 81.63 74.46
Average: 0.53 0.85 1 1.20 2.79 1.87 1.62 2.09 6.45 6.28 7.39 18.28 109.87 100.13

Environment where applications work may have certain memory limits. It is true not only for popular Virtual Private Servers (VPS) where sometimes amount of RAM can be as little as 128 Mb for OS and all applications/services but also for embedded devices and heavily loaded servers.
Good understanding of memory utilisation is equally important for consideration as speed.

Read more after code section below.

Source codes and test results

C (source); Result: C gcc (Debian 4.4.4-1) 4.4.4

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

int main(){

setbuf(stdout,NULL); //disable output buffering

char *str=malloc(8);
strcpy(str,"abcdefgh");

str=realloc(str,strlen(str)+8);
strcat(str,"efghefgh");     //sprintf(str,"%s%s",str,"efghefgh");

int imax=1024/strlen(str)*1024*4;

printf("%s","exec.tm.sec\tstr.length\n"); //fflush(stdout);

time_t starttime=time(NULL);
char *gstr=malloc(0);
int i=0;
char *pos;
int lngth;

char *pos_c=gstr;
int str_len=strlen(str);

    while(i++ < imax+1000){
        lngth=strlen(str)*i;
        gstr=realloc(gstr,lngth+str_len);
        strcat(gstr,str);    //sprintf(gstr,"%s%s",gstr,str);
        pos_c+=str_len;

        pos=gstr;
        while(pos=strstr(pos,"efgh")){
            memcpy(pos,"____",4);
        }

        if(lngth % (1024*256)==0){
            printf("%dsec\t\t%dkb\n",time(NULL)-starttime,lngth/1024); //fflush(stdout);
        }
    }
//printf("%s\n",gstr);

}

C++ (source) Result: C++ g++ (Debian 4.4.3-7) 4.4.3


#include <iostream>
#include <string>
#include <time.h>

using namespace std;

main ()
{
  string str = "abcdefgh";
    str += "efghefgh";
  int imax = 1024 /str.length() * 1024 *4;
  time_t currentTime = time(NULL);
  cout << "exec.tm.sec\tstr.length" << endl;

  string find= "efgh";
  string replace ="____";
  string gstr;
  int i=0;
  int length;
//  int end=0; //  size_t end=0;

  while(i++ < imax +1000){
    gstr += str;
    gstr = gstr;
    size_t start, sizeSearch=find.size(), end=0;

    while((start=gstr.find(find,end))!=string::npos){
        end=start+sizeSearch;
        gstr.replace(start,sizeSearch,replace);
    }
    length = str.length()*i;
    if((length%(1024 * 256))==0){
        cout << time(NULL) - currentTime << "sec\t\t" << length/1024 << "kb" <<  endl;
    }
  }
// cout << gstr << endl;

return 0;
}

Javascript (source); Results: Javascript (Spidermonkey - Mozilla) 1.8.0 pre-release 1 2007-10-03,
Javascript (V8 - Chrome)

#!/usr/local/bin/js

var str = "abcdefgh"+"efghefgh";
var imax = 1024 / str.length * 1024 * 4;

var time = new Date();
print("exec.tm.sec\tstr.length");

var gstr = "";
var i=0;
var lngth;

while (i++ < imax+1000) {
    gstr += str;
    gstr = gstr.replace(/efgh/g, "____");
        lngth=str.length*i;
        if ((lngth % (1024*256)) == 0) {
                var curdate=new Date();
                print(parseInt(((curdate.getTime()-time.getTime())/1000))+"sec\t\t"+lngth/1024+"kb");
        }
}

Java (source) Results: Java (OpenJDK) "1.6.0 18",
Java (Sun) "1.6.0 16",
Java (gcj) (Debian 4.4.3-1) 4.4.3

public class java_test {

    public static final void main(String[] args) throws Exception {
        String str = "abcdefgh"+"efghefgh";
        int imax = 1024 / str.length() * 1024 * 4;

        long time = System.currentTimeMillis();
        System.out.println("exec.tm.sec\tstr.length\tallocated memory:free memory:memory used");
        Runtime runtime = Runtime.getRuntime();
        System.out.println("0\t\t0\t\t"+runtime.totalMemory()/1024 +":"+ runtime.freeMemory()/1024+":"+(runtime.totalMemory()-runtime.freeMemory())/1024);

        String gstr = "";
        int i=0;
        int lngth;

        while (i++ < imax+1000) {
            gstr += str;
            gstr = gstr.replaceAll("efgh", "____");
            lngth=str.length()*i;
                if ((lngth % (1024*256)) == 0) {
                        System.out.println(((System.currentTimeMillis()-time)/1000)+"sec\t\t"+lngth/1024+"kb\t\t"+runtime.totalMemory()/1024+":"+runtime.freeMemory()/1024+":"+(runtime.totalMemory()-runtime.freeMemory())/1024);
                }
        }
    }
}

Perl5 (source); Result: This is perl, v5.10.1 (*) built for i486-linux-gnu-thread-multi

#!/usr/bin/perl
$|=1;    #disable output buffering, this is necessary for proper output through pipe

my $str='abcdefgh'.'efghefgh';
my $imax=1024/length($str)*1024*4;               # 4mb

my $starttime=time();
print "exec.tm.sec\tstr.length\n";

my $gstr='';
my $i=0;

while($i++ < $imax+1000){   #adding 1000 iterations to delay exit. This will allow to capture memory usage on last step

        $gstr.=$str;
        $gstr=~s/efgh/____/g;
        my $lngth=length($str)*$i;   ##     my $lngth=length($gstr);        # Perhaps that would be a slower way
        print time()-$starttime,"sec\t\t",$lngth/1024,"kb\n" unless $lngth % (1024*256); #print out every 256kb
}

PHP (source); Result: PHP 5.3.1-5 with Suhosin-Patch (cgi-fcgi) (built: Feb 22 2010 17:38:41)

<?php


$str="abcdefgh"."efghefgh";
$imax=1024/strlen($str)*1024*4;      # 4mb

$starttime=time();
print("exec.tm.sec\tstr.length\n");

$gstr='';
$i=0;

while($i++ < $imax+1000){

        $gstr.=$str;
        $gstr=preg_replace('/efgh/','____',$gstr);
        $lngth=strlen($str)*$i;
        if($lngth % (1024*256)==0){
                print (time()-$starttime."sec\t\t".($lngth/1024)."kb\n");
        }
}

?>

Python (source); Result: Python 2.5.5

#!/usr/bin/python -u
import re
import time
import sys

str='abcdefgh'+'efghefgh'
imax=1024/len(str)*1024*4   # 4mb

starttime=time.time();
print "exec.tm.sec\tstr.length"
sys.stdout.flush()

gstr=''
i=0

while (i < imax+1000):
        i=i+1
        gstr+=str
        gstr=re.sub('efgh','____',gstr)
        lngth=len(str)*i
        if(lngth % (1024*256) == 0):
                print int(time.time()-starttime),"sec\t\t",(lngth/1024),"kb"
                sys.stdout.flush()

Python3 (source); Result: Python 3.1.3

#!/usr/bin/python3 -u
import re
import time
import sys

str='abcdefgh'+'efghefgh'
imax=1024/len(str)*1024*4   # 4mb

starttime=time.time();
print "exec.tm.sec\tstr.length"
sys.stdout.flush()

gstr=''
i=0

while (i < imax+1000):
        i=i+1
        gstr+=str
        gstr=re.sub('efgh','____',gstr)
        lngth=len(str)*i
        if(lngth % (1024*256) == 0):
                print int(time.time()-starttime),"sec\t\t",(lngth/1024),"kb"
                sys.stdout.flush()

Ruby (source); Result: ruby 1.8.7 (2010-01-10 patchlevel 249) i486-linux

#!/usr/bin/ruby
$stdout.sync=true;

str='abcdefgh'+'efghefgh';
imax=1024/str.length*1024*4;       # 4mb

starttime=Time.new;
print("exec.tm.sec\tstr.length\n");

gstr='';
i=0;

while i < imax+1000
        i=i+1;
        gstr+=str;
        gstr=gstr.gsub(/efgh/, "____")

        lngth=str.length*i;
        if(lngth % (1024*256)==0)
                print(((Time.new-starttime).ceil).to_s+"sec\t\t",(lngth/1024).to_s,"kb\n");
        end
end

#puts gstr;

Lua (source); Result: Lua 5.1.4

#!/usr/bin/lua

io.stdout:setvbuf "no";             --  io.flush();

str='abcdefgh'..'efghefgh';
imax=1024/string.len(str)*1024*4;         -- 4mb

starttime=os.time();
print "exec.tm.sec\tstr.length";

gstr='';
i=0;

while i < imax+1000 do
        i=i+1;
        gstr=gstr..str;
        gstr=string.gsub(gstr,"efgh","____");
        lngth=string.len(str)*i;
        if(math.mod(lngth,1024*256)==0) then
                print(os.time()-starttime.."sec\t\t"..(lngth/1024).."kb");
        end
end



tcl (source); Result: tcl 8.4.19

#!/usr/bin/tclsh

set str "abcdefgh"
append str "efghefgh"

set imax [expr {1024/[string length $str]*1024*4}]

set starttime [clock clicks -milliseconds]
puts "exec.tm.sec\tstr.length";

set gstr ""
set i 0

while {$i<[expr {$imax+1000}]} {
        incr i
        append gstr $str;
        regsub -all {efgh} $gstr ____ gstr
        set lngth [expr {[string length $str]*$i}]
        if {[expr {$lngth % (1024*256)}] == 0} {
                puts "[expr int([expr [clock clicks -milliseconds] - $starttime] / 1000)]sec\t\t[expr {$lngth/1024}]kb"
        }
}

exit

Files:

June 2011 Update: One bright Java developer felt like I'm bashing Java so he decided to optimise Java test. Initially I was sceptical about it because two other Java programmers failed to do so.
As you may already noted from source codes, for high level languages I use regular expression to substitute substring on each iteration.
However when I decided to include C and C++ to the test case regex was replaced with traditional "moving window" technique where searching for substring start from position calculated on previous step instead of scanning the whole growing string every time.
This approach has been chosen because regular expressions are not part of core functionality of C/C++ and also because for low level languages this seems to be a natural way to do substitution.
Unfortunately this affected comparison fairness. (Perhaps all tests should have been using indexed substitutions.)
The fact that C++ example use "moving window" substitution instead of regular expression allow to rewrite Java code like in the following example:

public class java_test_optm {

    public static final void main(String[] args) throws Exception {
        String str = "abcdefgh"+"efghefgh";
        int imax = 1024 / str.length() * 1024 * 4;

    long time = System.currentTimeMillis();
    System.out.println("exec.tm.sec\tstr.length\tallocated memory:free memory:memory used");
    Runtime runtime = Runtime.getRuntime();
    System.out.println("0\t\t0\t\t"+runtime.totalMemory()/1024 +":"+ runtime.freeMemory()/1024+":"+(runtime.totalMemory()-runtime.freeMemory())/1024);

    final StringBuilder gstr = new StringBuilder();
    int i=0;
    int lngth;

        while (i++ < imax+1000) {
            gstr.append(str);

            int startIndx = gstr.indexOf("efgh");
            while(startIndx != -1){
                gstr.replace(startIndx, startIndx + 4, "____");
                startIndx = gstr.indexOf("efgh", startIndx + 4);
            }

        lngth=str.length()*i;
        if ((lngth % (1024*256)) == 0) {
            System.out.println(((System.currentTimeMillis()-time)/1000)+"sec\t\t"+lngth/1024+"kb\t\t"+runtime.totalMemory()/1024+":"+runtime.freeMemory()/1024+":"+(runtime.totalMemory()-runtime.freeMemory())/1024);
        }
        }
    }
}

/*
exec.tm.sec     str.length      allocated memory:free memory:memory used
0               0               32320:32103:216
2sec            256kb           32320:29420:2899
9sec            512kb           32320:29033:3286
21sec           768kb           32320:28250:4069
38sec           1024kb          32320:26692:5627
59sec           1280kb          32320:23612:8707
85sec           1536kb          32320:22116:10203
116sec          1792kb          32320:23647:8672
153sec          2048kb          32320:22101:10218
194sec          2304kb          32000:14067:17932
240sec          2560kb          32000:12571:19428
292sec          2816kb          32192:14283:17908
348sec          3072kb          32192:12713:19478
410sec          3328kb          32064:14356:17707
477sec          3584kb          32064:12827:19236
549sec          3840kb          32128:14615:17512
626sec          4096kb          32128:13095:19032
*/

Surprisingly this took away stress from garbage collection allowing Java to finish the test in 626 seconds only. (Thanks Brian Bason!)
However IMHO this somehow proves that Java is ineffective and overcomplicated because with all the expertise and effort required to optimise Java test case, Perl code modified to use moving window substitution completed the test in less than 2 seconds - somewhat 300+ times faster than Java.
Once again to achieve reasonable performance Java require low level approach which is not only labour intensive but also can't compete with speed of other languages.

Language features

Sometimes comfort and speed of development may outweigh performance and memory usage. Or in other words, perhaps sometimes performance and memory usage may be sacrificed in favour of quicker/easier development. For example, it is understandable if higher level language is chosen over C in order to benefit from automatic memory management. In this section I'm going to briefly scratch the surface of comparing language features.
Whilst it's quite a philosophical statement, language features play an important role in development.
Let's see how easy can we parse an integer value from text string in popular languages. This task only looks straightforward. In fact there are plenty caveats.

In Java we could do something like

//Java
    int val;
    val = Integer.parseInt("10000000000");
 
But there are problems. The example above will not only fail to parse correct value, but actually crash the entire application because of unhandled exception. Sometimes gotchas like this may byte you when you do not expect it: In this Java example
//Java
    val = Integer.parseInt("-10");   //this will work
    val = Integer.parseInt("+10");   //but not this - silly!
 
parsing integer from "+10" crashing application. To emulate this behaviour in PHP or Perl we have to explicitly create point of failure:
$val=intval($str) or die("it didn't work");
In Java pretty much any call that does something can be a failure point unless enclosed within ugly try-catch statements. So to avoid crash we have to wrap 'dangerous operations like this:
//Java
 try {
        val = Integer.parseInt(str);
 } catch (NumberFormatException nx) {
        //it didn't work, do something about it here
 }
 
In fact try-catch is a fancy syntax for if-else. Similar operation in PHP will not crash, but we can wrap it with if-else to make sure number parsed successfully.
#PHP
 if($val=intval($str)){     # please note this has "zero case" caveat: in PHP and Perl 0 = 'false'
    print $val;             # so $val will not get 0 if input string is '0' (zero)
 }
 
Python and Ruby use similar to Java fatal behaviour. Is that good? Perhaps sometimes. However in many cases returning something is better than nothing. Application may not do exactly what's expected but it may be considered to be better than crash. Perhaps you want your application to keep running despite minor error instead of terminating. Maybe particular part of application is not too important to try-catch absolutely everything. I've seen many examples of this in web applications when seemingly innocent operation is in fact a fatal failure point leading to application crash. Several times I had to troubleshoot Java and Python web-apps made by different teams, in different companies, in different time but all of them used to crash on string transformations because of uncatched/unhandled exceptions when unexpected character came from database. Needless to say this was causing a great deal of frustration for users of those applications. You may argue that developers created those applications were incompetent. Could be. However development approach enforced by necessity of catching all possible exceptions is troublesome, difficult and slow. Obviously It clutters the code by generating 'noise' and implies a routine not strictly related to application's logic. I think forgiving nature of Perl better match Test Driven Development when developer is not distracted with try-catch and therefore can concentrate on making code better, create more tests, check input values etc.
ParseInt comparison
String (str) Java
Integer.parseInt(str)
or Integer.valueOf(str)
PHP
intval($str)
Python
int(str)
Ruby
str.to_i
Ruby
Integer(str)
Perl
int($str)
C++
istringstream buffer(str);
double val;
buffer >> val;
C++
istringstream buffer(str);
int val;
buffer >> val;
C++
double val=atoi(str)
C++
int val=atoi(str)
" 1111" exception OK OK OK OK OK OK OK OK OK
"10.0" exception OK exception OK exception OK OK OK OK OK
"10000000000" exception incorrect: 2147483647 OK OK OK OK OK (1e+10) incorrect: 134520252 incorrect: 2.14748e+09 incorrect: 2147483647
"2e+2" exception incorrect: 2 exception incorrect: 2 exception OK OK (200) incorrect: 2 incorrect: 2 incorrect: 2
"-10" OK OK OK OK OK OK OK OK OK OK
"+10" exception OK OK OK OK OK OK OK OK OK
"asdasd" exception 0 exception 0 exception 0 0 incorrect: 134520248 0 0
"0.0" exception incorrect: No value parsed exception OK exception OK OK OK OK OK
"00" OK incorrect: No value parsed OK OK OK OK OK OK OK OK
"2+3" exception 2 exception 2 exception 2 2 2 2 2


1 2e+2=2*102=200

Java has the most number of exceptions to handle - of course you may handle them as one but, as demonstrated in this example, a usable value can be parsed in most cases so if you want to do a good job you have to do it yourselves, for every case. Java is the only language which couldn't extract value from "+10".

Python is slightly smarter with recognising numbers in strings.

Ruby has two different methods to do the job - it is confusing which one is better.

PHP silently parses incorrect values.

Complexity and power of C++ vividly manifested in this example: you can choose from 4 different ways to parse a value from string but as soon you know which one of them is right, results are nearly perfect.
Since return value has to be a number, it returns 0 for non-numeric strings so it can be treated as exception to somehow determine if it was an error or an actual value.

Perl demonstrated perfect result. From the first look you may see that it's almost similar to C++: it returns 0 from non-numeric string. However with standard

use warnings;
a non-fatal warning will be issued: "Argument "asdasd" isn't numeric in int at ./tst.pl line 8." This warning can be converted to fatal with
use warnings FATAL=>'numeric';
Now we have an exception to catch like in the following example:
#!/usr/bin/perl
{ use warnings FATAL=>'numeric';
    my $str="asdasd";
    my $num=eval {int $str};
    if(defined $num){
        print "we got it - it's $num";
    }else{
        print "error: $@";
        # with "use English;" the line above could look like: print "error: $EVAL_ERROR";
    }
}
There are some important things to note:
  • Fatal exception is enabled by developer's decision
    • Only for particular problem;
    • Only for particular block, so exception scope is strictly defined
  • Only core language functionality used
  • It works perfectly, including "zero case" and "2+3"
  • It provides human-readable explanation of failure
  • It extracts all usable values
  • With minimal effort
So with just core Perl functionality it is possible to do the job a lot easier than with other languages. Not only this - the great flexibility of Perl is that you can use modules to introduce different styles of exception handling - you're not bound to the example above. With Try::Tiny (not the only module of such) you can use almost "traditional" Java's try-catch syntax in Perl:
#!/usr/bin/perl
use Try::Tiny;
use warnings FATAL=>'numeric';

    my $str="asdasd";
    my $num = try {
                    int $str;
              } catch {
                    die "error: $_";
              };
    print q{we got it - it's },$num;

Some links below might be interesting in order to compare languages' syntax:
Compare structure of Perl, Ruby, Python, Java and PHP
Wikipedia: Exception handling syntax

Notes (per language)

PHP

PHP is not a universal language. Perhaps it may be considered for web development only.
Another problem with PHP is administration needed to configure runtime for different applications. Some PHP applications have different expectations regarding notorious "Magic quotes" runtime parameter. Read more in Wikipedia: Magic quotes criticism.

Runtime is fast but not very compact. PHP has reputation of lightweight and fast language. While first happen to be false (PHP memory usage is quite big comparing with Python, Ruby and Perl5) it is a close second after Perl5 in Performance.
In some situations PHP functions cannot be trusted as demonstrated in "parsing integer from string" example.

Ruby

Ruby is universal but relatively young language. Its availability on different platforms is still limited and history of introducing backward incompatible changes makes development and maintenance unnecessary complicated. Performance and memory usage of Ruby and Python are close to each other. While Ruby is slightly faster, Python utilises memory better.

Python

Python is ripe and universal language. It stands strong enough during this test. However Python is interpreting white spaces and tabs. This particular 'feature looks unnecessary and silly especially after so much being said about importance of separation presentation from logic. Presentation is logic in Python. Python enforces certain way of formatting code in the most rude way I can imagine. Unless it makes your eyes bleed you may find peace in Python especially after Java. Its "whitespace as constraint" could make reading/writing code harder. To my understanding the only explanation for such strange Python's feature is that you can literally see the code flow pretty much the way interpreter see it. I doubt that good coding style can be effectively enforced - readable code formatting can be easily achieved with other languages through exercising best practice guidelines.
In a way Python use military dress code - all applications should wear the same uniform.
How this can make programming task easier? I believe the more freedom programming language gives you - the better.

"There is no programming language - no matter how structured - that will prevent programmers from making bad programs."
-- Larry Flon
Read more about Python's white spacing in The hard edges of Python.

Perl5

Perl5 demonstrated amazing performance and memory usage far beyond all other languages tested. It proved to be most optimised, ripe and stable language. While some people believe it to be the most advanced programming language in the world it is clearly a very good choice.

  • Perl proved to be an extremely effective, highly optimised language.
  • Perl has a massive library of reusable code.
  • Perl is mature: it's 23 years old; (Perl5 is 17 years old).
  • Perl is very portable.
  • Perl is elegant and flexible.
Unfortunately Perl is often misunderstood because of widespread myths misrepresenting language capabilities. Typically those myths are product of ignorance and/or lack of knowledge.
Some of those myths:
Myth: Perl is UNIX shell on steroids.
This is really an insult to Perl which is much more than this. In year 2010 Perl is a very mature and universal language with perhaps largest library of reusable code available. In Perl you can write GUI applications, web applications, systems daemons etc. It is possible to pack Perl's application, runtime and libraries to windows executable and distribute as single .EXE file. Perl's object oriented features and flexibility are far beyond perhaps any other language. Learn more about Modern Perl (presentation).
Myth: Perl is "write once - read never"
Perls often falsely accused of lack of readability. I confess - sometimes I have problems reading my own poor handwriting from notes I took weeks ago. However is has nothing to do with language I use. With certain discipline you can develop clear, understandable and maintainable code in any language. It's all a matter of learning good habits like commenting the code (especially if you're not the only developer) or choosing meaningful long names for variables etc. It comes with experience. You can't blame programming language for lack of clarity in your code just like you cant blame natural language for its inappropriate use. If your Perl code is not beautiful you're doing it wrong - there is another, nice way.
Most people that complain about syntax have none or very little experience in Perl
-- YAPC::EU::2009 - How Opera Software uses Perl presentation.
There is a very good presentation Perl Myths 2009 where Tim Bunce is explaining some common Perl misunderstandings and revealing some of Perl's powers.

Perl is truly language of freedom. It gives amazing power and has features, non existing in other languages. Those powers can be used to create nice, tidy, clean and yet effective and concise code. Of course same powers can be used to write obfuscated code but, again, this is not a language problem because it is also possible with other languages. This is best explained the by creator of Perl himself (emphasis added):

Let me state my beliefs about this in the strongest possible way. The very fact that it's possible to write messy programs in Perl is also what makes it possible to write programs that are cleaner in Perl than they could ever be in a language that attempts to enforce cleanliness. The potential for greater good goes right along with the potential for greater evil. A little baby has little potential for good or evil, at least in the short term. A President of the United States has tremendous potential for both good and evil.
I do not believe it is wrong to aspire to greatness, if greatness is properly defined. Greatness does not imply goodness. The President is not intrisically "gooder" than a baby. He merely has more options for exercising creativity, for good or for ill.
True greatness is measured by how much freedom you give to others, not by how much you can coerce others to do what you want.
Larry Wall http://www.wall.org/~larry/pm.html
Reasons for using Perl summarised in Why Perl?

Java

Just like Perl, Java is a subject of numerous myths misrepresenting its real position.
Despite commercial popularity there are multiple problems with the language:

* Poor memory management (garbage collection):

IMHO Java suffers from a garbage collection problem. If you don't allocate objects and maybe use only static methods, Java can be quite fast. But when you start creating huge amounts of objects (like required when working with Java's String class) its memory use and performance are getting worse and worse.

In theory GCs should be at least as fast as manual memory management or reference counting (which Python uses). Instead of wasting time for memory management while the program is working, it defers the memory management until the program is idle or it runs out of memory. Unfortunately on today's systems, memory is extremely slow and CPU cycles are cheap, and this is why the GC theory does not work. The Java VM constantly trashes the cache because it does not re-use memory fast enough. Instead it takes new (usually uncached) memory for new objects und defers freeing the unused memory of old objects (that are in the cache). This is probably the worst thing that you can do to the cache. A good VM would try to re-use memory as soon as possible, to increase the chances that it is still in cache (like Python's refcounter). Java does the opposite.

To make things worse, the VM seems to lack any coordination with the kernel. When the system is running out of RAM and needs to swap, the logical action for the VM would be to start the garbage collector. It doesn't however, and instead it starts allocating the new memory, forcing the kernel to move the old (unused) memory into the swap space! And when the VM finally decides to start the GC it will go through all the unused memory that is now in the swap, causing it be reloaded and possibly moving more frequently used memory back in the swap, only to re-load it again later. How much worse can it get?

-- Java has a GC problem, posted 10 Feb 2003 at 16:55 UTC by tjansen

Historically Java was successful partially because developers found it attractive comparing to C due to "automatic" memory management. It's turned to be a Java's greatest weakness. In C memory should me managed by developer to the contrast to Java where memory usually managed by systems administrator. In numerous papers explaining sophisticated garbage collection you may find dozens(!) parameters for memory tuning. And trust me, because Java developers usually cannot predict application's behaviour under load the only reliable way to configure memory management for particular application is to test, change parameter(s) and test again and again. Sometimes it helps. But defaults often not good enough, and it's too easy to make a mistake. Despite configuring Java "automatic" memory usage, developers can do very little. Java applications are handicapped by default.

* Verbosity:

Consider the following HTTP POST example:

JavaPerl
import java.net.URL;
import java.net.HttpURLConnection;
import java.io.DataOutputStream;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.BufferedReader;

public class java_post {

public static void main (String args[]) throws Exception {
    System.out.println(
        executePost("http://www.smh.com.au/execute_search.html",
                    "text=fluoride")
    );
}

public static String executePost(String targetURL, String urlParameters){
    URL url;
    HttpURLConnection connection = null;
    try {
        //Create connection
        url = new URL(targetURL);
        connection = (HttpURLConnection)url.openConnection();
        connection.setRequestMethod("POST");
        connection.setRequestProperty("Content-Type",
                                      "application/x-www-form-urlencoded");
        connection.setRequestProperty("Content-Length", "" +
               Integer.toString(urlParameters.getBytes().length));
        connection.setRequestProperty("Content-Language", "en-US");
        connection.setUseCaches (false);
        connection.setDoInput(true);
        connection.setDoOutput(true);

        //Send request
        DataOutputStream wr = new DataOutputStream (
                  connection.getOutputStream ());
        wr.writeBytes (urlParameters);
        wr.flush ();
        wr.close ();

        //Get Response    
        InputStream is = connection.getInputStream();
        BufferedReader rd = new BufferedReader(new InputStreamReader(is));
        String line;
        StringBuffer response = new StringBuffer();
        while((line = rd.readLine()) != null) {
            response.append(line);
            response.append('\r');
        }
        rd.close();
        return response.toString();
    } catch (Exception e) {
        e.printStackTrace();
        return null;
    } finally {
        if(connection != null) {
            connection.disconnect();
        }
    }
  }
}
#!/usr/bin/perl

use LWP::UserAgent;

my $ua = LWP::UserAgent->new;
my $res=$ua->post(  'http://www.smh.com.au/execute_search.html',
                    {
                         text=>'fluoride',
                    }
                 );

print $res->is_success ? $res->content : $res->status_line;
Ruby
#!/usr/bin/ruby

require "uri"
require "net/http"

x = Net::HTTP.post_form(URI.parse('http://www.smh.com.au/execute_search.html'),
                            {
                                'text' => 'fluoride',
                            }
                        )
puts x.body
Python
#!/usr/bin/python -u

import urllib, urllib2

data = urllib.urlencode({
                'text' : 'fluoride',
        })
req = urllib2.Request('http://www.smh.com.au/execute_search.html', data)
response = urllib2.urlopen(req)

print response.read()
PHP
<?php

$postdata = http_build_query(
    array(
        'text' => 'fluoride',
    )
);

$opts = array('http' =>
    array(
        'method'  => 'POST',
        'header'  => 'Content-type: application/x-www-form-urlencoded',
        'content' => $postdata
    )
);

$context  = stream_context_create($opts);
print file_get_contents('http://www.smh.com.au/execute_search.html', false, $context);

?>

June 2011 Update: Greg McLaghlan made a good point:

I think the Java verbosity example is a little misleading. If we compare it to the Perl example, you are loading Perl module which handles the http post whereas in the Java example you actually code that. It could be argued that that part of the code could have been packaged up and loaded just like the Perl module. It's a minor point I guess.
Yes this is true, but first I chosen the job to do and then it turned out that standard Java distribution does not come with HTTP Post methods by default.
Other languages have instruments to help with similar task in their standard distribution.
I believe it would be incorrect to involve 3rd party libraries to comparison, however here you may find a Java example of HTTP Post using Apache libraries. It is 33 lines long (no empty lines) - about 40% shorter than original Java example but nowhere near as compact as other languages: 2nd longest HTTP Post code is PHP - only 14 lines.

That's how one person expressed his frustration of Java verbosity in his blog:

Whenever I write code in Java I feel like I'm filling out endless forms in triplicate.
"Ok, sir, I'll just need your type signature here, here, and ... here. Now will this be everything, or..."
"Well, I might need to raise an exception."
The compiler purses its lips."An exception? Hmmm... let's see.... Yes, I think we can do that... I have the form over here... Yes, here it is. Now I need you to list all the exceptions you expect to raise here. Oh, wait, you have other classes? We'll have to file an amendment to them. Just put the type signature here, here, ... yes, copy that list of exceptions....
And one of the comments from above blog's discussion (there are some other comments worth reading):
I think the problem with Java is not it's verbosity, but as someone else said "infrastructure framework". I have to go through so many classes, through so much leaps and bounds, to do anything.
I need a factory, to create a manager. Then I factor anther factory, to create a stream, then assign that stream to the manager. Afterwords, I give the manager to a dispatcher.
Then there is an uncaught exception and I have to sift through 50 lines of junk to actually find out what went wrong.
Verbosity is bad because code is read more times than its written therefore verbosity increases effort needed to maintain code.
Java verbosity hurts both maintaining and development.

Usually Java developers claim Java code is easier to develop/maintain. I failed to discover any particular Java language feature to support that claim. Java's makes developers to do a lot of work even for simplest tasks.
Java's bad performance and memory usage are not compensated by any particular language feature(s).
It is far behind other languages in both performance and memory usage/management.
Time needed to tweak and test memory management together with maintenance and troubleshooting efforts are horrifying.

Personal experience:

Results of this testing are consistent with my personal experience.
Over the years I was involved in several projects where all Java applications demonstrated miserable performance while having tremendous system requirements.

Once on public-facing web site I found problematic ~1000+ lines long Java servlet. Incapable of fixing it I couldn't think of better solution than to rewrite it from scratch in different language.
In several days I produced ~200 lines Perl application, running up to 10 times faster than original Java application. Numerous bugs were fixed in process, and new version was easier to debug and had some improvements and new features.

I can't recall a single Java application server which doesn't degrade. Apparently they all leak memory so sooner or later they should be restarted. (I'd like to believe there are exceptions somewhere).
As a matter of fact restarting Java application servers is common practice in the industry, however it appears that only Java really needs it. It seems unnecessary for stable software like Apache web server which can run for years without restart. I rarery let busy Java application run longer than a week while web-facing Java application servers restarted nightly.

Another example vividly demonstrates problems with Java's memory management: once I found that particular web-facing Java application could handle no more than 24 simultaneous requests. (you may suspect it was running on old/virtualized server but it was really a relatively up to date machine, 8 x Intel(R) Xeon(R) CPU L5420 @ 2.50GHz/RAM 6 GiB/CentOS 5.5 GNU/Linux system) After days of tweaking and testing we found that capacity can be increased (doubled) by allocating more memory but this negatively affected response time. Too little memory is not enough; too much and garbage collection is choking.
Ridiculous solution was found: to farm Java application servers on the very same hardware, to give each just enough memory and to restrict maximum simultaneous connections per backend on load balancer. Needless to mention this "solution" cost great deal of effort - to set up, test, tweak memory parameters, test again etc.
Later developers managed to optimise application a little but two or more Java application servers per physical server are still working better than one.
Because of history of degradation each Java application server in a farm runs no longer than 24 hours - they all restarted overnight in round-robin manner. (Believe me it's much better than wake up at 3:00 just to do monkey's job restarting another Java application server which stopped responding.) That much effort needed only to ensure system's normal functioning.
Remarkably this service hosted on 8 HP Proliant G6 servers with two quad-core Intel(R) Xeon(R) CPUs - 64 CPUs (cores) total, and 72 GiB of RAM. With database size only 1GB the whole system can merely respond to ~180 simultaneous HTTP requests (lesst than 3 visitors per CPU and 2.5 GiB RAM per connection) - a tremendous waste of resources.
I remember several cases when new Java application release introduce negative change to backend capacity (surprisingly release/QA team wasn't aware) so during peak hours servers were collapsing unable to sustain load because load balancer was configured to allow more connections to backends than they could handle. Sometimes allowing just 20 less requests make a difference.
Another interesting problem was discovered when about 2500 MiB were allocated to JVM on x86 platform: Resin (Java application server) was crashing under load, sometimes every hour if enough load was provided. Apparently that was because of lack of addressable space (memory), not for application which got pre-allocated 2500 MiB, but for Java runtime itself which on some occasions tried to allocate memory for internal needs and failed.

Java - summary

As you may see from this research, in all three categories Java behave extremely bad, like no other language.
Java applications cannot match a fraction of other language's performance.
Java applications are truly the most expensive in development and administration.
Java needs more system resources i.e. more memory and more processing power. Usually more servers and therefore more electricity needed i.e. Java is not environment-friendly.
Fragile Java application servers need to be periodically restarted.
Unnecessary sophistication creates more points of failure so Java web application's availability is usually not somewhat impressive.

To make high-quality Java code and to run it in well-optimised environment requires tremendous effort and experience. Even then performance and capacity will be a fracture of similar system implemented in different language. By simply using different programming language same result can be achieved with less effort in development, debugging, maintenance and administration. Fortunately there is a good choice of mature languages to use - Nowadays in 2011 there is nothing you can do in Java that cannot be done in other languages.
No matter which other *mainstream* language will you choose - your applications and experience will benefit from switching.
Even if your Java skills are profoundly good, your only excuse to use Java is personal convenience. Lack of experience with other languages should be motivation to learn rather than excuse for using Java. Everyone will benefit from better applications written in other language(s).
Java is disaster. A disease. Rooted deeply to industry it is hard to escape it while ignorant architects keep pushing it. Java is a trap for system architects and managers who know no other languages. Typically they do not understand Java weakness and tend to overuse it because that's "the only tool" for the job. Those people should learn. Blind beliefs that Java is universal and good for any job simply can't be more wrong. Java not suitable for *anything*.
When starting new project you hardly can seriously consider writing it in Lua or tcl. However those languages beat Java in speed/RAM usage. Saying that Java is equally suitable for a job than tcl/Lua would be a compliment to Java. Gap between Java and other languages is so huge, so it would be a good idea to avoid Java whenever possible,
disregarding of how familiar with language you are.

More information about Java problems and weaknesses can be found in excellent Sean Kelly's videos:
Recovery from Addiction
Better Web App Development

Java Quotes:

"If Java had true garbage collection, most programs would delete themselves upon execution."
-- Robert Sewell
"Complexity kills. It sucks the life out of developers, it makes products difficult to plan, build and test, it introduces security challenges, and it causes end-user and administrator frustration."
-- Ray Ozzie
Java is the SUV of programming tools. A project done in Java will cost 5 times as much, take twice as long, and be harder to maintain than a project done in a scripting language such as PHP or Perl. ... But the programmers and managers using Java will feel good about themselves because they are using a tool that, in theory, has a lot of power for handling problems of tremendous complexity. Just like the suburbanite who drives his SUV to the 7-11 on a paved road but feels good because in theory he could climb a 45-degree dirt slope.
-- Greenspun, Philip
Java: write once, run away!
-- Cinap Lenrek
Java is like a variant of the game of Tetris in which none of the pieces can fill gaps created by the other pieces, so all you can do is pile them up endlessly.
-- Steve Yegge (2007, Codes Worst Enemy)
JAVA truly is the great equalizing software. It has reduced all computers to mediocrity and buggyness.
-- NASA's J-Track web site
Using Java for serious jobs is like trying to take the skin off a rice pudding wearing boxing gloves.
-- Tel Hudson

Conclusion

To take the right tool for a job it is important to understand position of programming Languages to each other. Tricky decision is easier to make if you consider right things while avoiding irrelevant ones.

There are some things irrelevant to good decision:

Your favourite language at the moment.
You may be very good and comfortable with language you already know, but this is not good enough excuse for not considering alternatives.
Learning is important.
Language creator(s) personality.
It simply doesn't matter if you like them or not or even who they are.
Your expectations regarding language features.
It is always takes time to get used to new things especially if they are quite different.
Speed of learning.
Some languages have short *startup* learning curve. However in reality it is more like a "A minute to learn, a lifetime to master".
This idea best explained by Peter Norvig in his Teach Yourself Programming in Ten Years essay.

There are some things to avoid:

Considering one single language feature alone.
Considering only one language feature, like speed or memory usage, will inevitably lead to wrong decision.
Narrow purpose languages.
Specialised languages like PHP may be good for web development only. When you need to do something different or simply extend the task's scope, a language for particular use only may not be good enough.
Non-portable languages.
Cross-platform portability matters. Too many people locked-in, stuck with windows-only technologies with only little hope of escaping.
Non-free license.
Non-free licenses comes with risks and restrictions.

There are some valuable things to consider:

Availability of reusable code
Even the best language in the world worth little without good free libraries.
Free license.
Freedom is very important, even if you don't fully understand why.
Universal languages.
Universal languages like Perl5 are generally good for pretty much any task. Universal languages are more powerful by definition which makes your skills universal.
Well-portable languages
Some time later software may be ported to different platform or operating system. Portability guarantees choice. Choice is good.
"Feels good" feeling
Essentially your feelings towards language is an ultimate merit of its goodness for you. For example, not all people can be comfortable with Python, but if you're OK with it you can tell from how comfortable it feels. Coding is fun if you like the language. Fun helps to make better programs.

FAQ.

You deliberately make this test tough for Java! Java not optimised for strings.
The key words here are "not optimised". (Apparently it was tough only for Java.) OK, if Java not optimised for strings, please let me know what exactly Java is optimised for.
You deliberately chosen string manipulation to show Java weakness.
Not quite... As I explained in the beginning, I believe strings are good test subject for comparison. I did expect Java wouldn't be the winner, but I certainly couldn't expect that miserable performance. Initially there was no Java in this testing - it has been added later.
That's no surprise Java is slow.
Even if you already knew it's slow, did you know about performance degradation and garbage collection problems? Did you know HOW slow it is? Honestly?
Java is so slow because strings are immutable in Java.
Immutable strings are not unique to Java. For example, strings are also immutable in Python. Python performed very well in this testing.
Java's internal string representation in memory is UTF16 so Java has to do more work comparing to single-byte representation.
This may be the case for other languages as well. However this does not explain why Java performance so much worse. If that affects Java test results - it may be one of those differences I'm trying to emphasise.
Please note that in this test only Latin characters were used. Other languages support unicode as well. Test case based on defaults so no encoding has been explicitly chosen, neither UTF support explicitly disabled or enabled.
What's wrong with Java?
Well, everything. :( Read the gory details above. In short Java's biggest problems are inefficient Garbage Collection and verbosity. Unfortunately those problems are not compensated by any language features. Java's Language features looks poor comparing to other languages. Java development and maintenance require a great deal of effort.
You shouldn't write a real code like this.
True, but that's test code, remember? It's made slow deliberately, to produce computational load for comparison. Job can be done hundred times faster if optimised. Pretty much any artificial test would be quite different from reality. However even if test code doesn't look like real application, it clearly reflects problems that are manifested in real applications.
Java works for some companies.
We may disagree on definition of "works". Sometimes definition is quite loose - once I've been told that for production web site 2% of request timeouts is acceptable for business. (Yes, it was a web-facing Java application, of course.) I believe any number of timeouts for public facing web site is intolerable. If Java not expected to perform well we may have a double standards problem.
If you look at companies who successfully maintain sophisticated Java services you may find that most of them are big companies who have virtually unlimited resources. If you can have as many servers as you want, as much staff as you want and as much time as you want - you can make everything work, but at what cost?
Big companies may have luxury of being inefficient.
Java may work for you if your survival doesn't depend on your effectiveness.
Why you devote so much attention to Java and Perl and so little to Python and Ruby?
I'm working in environment where Java is dominating. At the same time both Java and Perl are the most misunderstood languages around. In the minds of many developers, managers and system architects Java stands inadequately high while Perl is usually treated badly. Because in general industry so predetermined I believe it is necessary to do some explanations.
There are not as many myths regarding Python and Ruby and their features are not so controversial. Perhaps if I were more competent with Python and Ruby I would have more to add.
Java is good, I know how to make a great applications with Java.
Great, you must be very talented, because for ordinary developer a great effort and experience is needed to overcome numerous problems of developing in Java. If you have to be a genius to create good and reliable Java applications, it's simply too difficult to mere mortals i.e. for most developers. (Author of this article consider Java too difficult for himself).
Unfortunately Java problems, like garbage collection, exist even for well-written programs. Despite problems, comparing to some other languages there is considerably greater effort required for Java to achieve the same result. You may have better productivity with different language.
My Java application works well.
Probably it barely does anything or is not loaded enough to show performance degradation. That's a typical case when no more than few people using application at the same time or when application is extremely simple.
Should we choose Java for our new project?
By all means if
  • you want to sabotage project
  • you want it to be as expensive as possible
  • speed of development doesn't matter
  • product quality doesn't matter
  • developers refuse to learn
  • you didn't read/understand this article.
Seriously, there are simply no reason and no excuse for choosing Java for new project.
What about .NET ?
.NET (dot net) not so portable so it doesn't satisfy criteria for choosing languages. Because it has so much to do with Windows and Microsoft I see no reason for considering dot Net disregarding of its features or performance. Quoting Oktal: "I think Microsoft named .Net so it wouldn't show up in a Unix directory listing."
Dot Net's license is not free which raises an ethical issue as well. There are no reasons to work with non-free language whatsoever. As a matter of fact proprietary nature is a strong argument against dot NET.
You've just started another flame war.
No I've not. Results of testing speak for themselves, even without examples from my personal experience. I have no agenda to soften embarrassing Java's performance to make Java users feel not so bad. If your favourite language wasn't the best in this testing perhaps you may benefit from learning something else and this article aims to encourage such learning. Learning, if done right, leads to better decisions. We need better decisions because industry will benefit from it. Sadly too many people who have been taught Java in Uni know too little about other languages to make good decisions. From my experience I know that Java professionals sometimes take results of this testing personally. It is good, because it is natural to feel outrage knowing how poor their programming language comparing to others. It is good because this outrage may encourage learning which eventually help to create better applications.

Credits

  • I'm indebted to patient colleagues of mine who kindly provided important feedback and criticism for this research
  • I'm grateful to my family - numerous times they had to go out without me when they couldn't separate me from computer
  • I'm obliged to my manager who tolerated discussions related to this research and somehow partially inspired it
  • At last I'm thankful to Cityrail for providing reasonable comfort which makes possible to work on trains during traveling to/from city

If you found this essay interesting please donate to support the author.

Great benchmark. From my own experience I never understood why Java was considered as "fast". I was forced to learn Java in my computer science school and I always considered it as an abomination. If you're not too busy I think it would be worth benchmarking Python 3.2 too because version 3 is out for 3 years now and there was a complete rewrite of strings functionalities (unicode and stuff). Results might be totally different from Python 2.

Best, Laurent from Paris.

Comment by Rudy 2011-08-17

Hello, I fully agree with you on the performance of java.

In the PHP test, if you use str_replace instead of preg_replace, the speed of execution increases so much.

Good benchmark!

Comment by Anonymous 2011-09-02
Hi! I've rerun the Tcl benchmark on Tcl 8.6 beta and it shows quite an improvement in speed (but with a price of significant rise of memory used). See the results at http://sgolovan.nes.ru/tmp/tcl8.6.txt (and at http://sgolovan.nes.ru/tmp/tcl8.4.txt for comparison with 8.4 on my workstation).
Comment by Anonymous 2011-09-30

Really good benchmark.

It would be nice to have comprison of different tasks - not only string substitutions. Because I guess everybody can agree that Perl is the best for dealing with strings (at least with regular expressions). So I would like to see tests of working with network, e.g. fetching source by http, using sockets (e.g. we can implement simple server), acting as a server, deling with maths and so on.

Anyway. Thanks for your time and sharing results with the world :))

@Anonymous, str_replace is not a regexp based, and as we use regexp based substitutions in other tests it would be invalid comparison.

Comment by Aleksey V Zapparov AKA ixti 2011-09-30

Just before I forgot :)) Really interesting would be testing of working with IO as well. I can promise that JavaScript will be on the latest position :))

As you were comparing Ruby, Perl and others, I assume you was testing "server-side" JavaScript, so it would be really interesting to take a look on other comparisons.

We can talk closer and work on tests together if you want, my email is ixti at members of fsf.org ;)) Or we can catch up on Freenode (my nick is the same)

Comment by Aleksey V Zapparov AKA ixti 2011-10-02

To Rudi:

Thank you for first comment. Nothing encourage further testing like modest donation, but anyway I updated graphs and tables with Python3 data. Indeed results are quite different: note the speed degradation. To me it is not a surprise because first they introduce features and only then optimise them. Another thing I noticed is the more object-oriented the language - the worse its performance.

To second (anonymous) commenter:

Feel free to redesign the test for str_replace. Yes it improves performance and probably I should have done testing without regular expressions. However you need to test longer to clearly demonstrate the difference. (See update regarding similar redesign for Java for "moving window" substitution)

To third (anonymous) commenter:

Thanks for bringing your Tcl test results. I tried updated PHP some time ago and was surprised to see up to 10% performance increase - quite impressive optimisation. But, it didn't change the position of PHP to other languages. As we try different versions we get slightly different results but it does not change "the big picture" much.

To Aleksey:

Thank you for your kind comment. But please stop repeating this dodgy stereotype that "Perl is good with strings". The question is what makes Perl so good? My answer is memory management. Perl is so good because of amazingly effective garbage collection and memory reuse (but not limited to). This is more or less universal because when interpreter operates with memory chunks on low level is doesn't make much difference if those bytes sequences are strings or object representations. Probably for some object-oriented languages there is no difference at all. Testing IO may be interesting but we can't clearly separate the analysing language from underlying OS so I'm not too interested. I already commented about math tests - integer math is not a good test subject when floating math highly depend on implementation and hardware acceleration. To me most important language features are demonstrated well enough here.

Comment by onlyjob 2011-10-06
Well said about the performance of the JAVA with the best analysis.U have made it worthy to ur blog.But I share that if we use str_replace instead of preg_replace, it would result much favorly rite.
Comment by web design company 2011-10-13

Anyone that actually uses Java knows that the + operator is to be avoided for extensive String manipulation:

http://kaioa.com/node/59

It's the reason StringBuffer and StringBuilder were created.

public class JavaTest {

public static final void main(String[] args) throws Exception {
String str = "abcdefgh"+"efghefgh";
int imax = 1024 / str.length() * 1024 * 4;

long time = System.currentTimeMillis();
System.out.println("exec.tm.sec\tstr.length\tallocated memory:free memory:memory used");
Runtime runtime = Runtime.getRuntime();
System.out.println("0\t\t0\t\t"+runtime.totalMemory()/1024 +":"+ runtime.freeMemory()/1024+":"+(runtime.totalMemory()-runtime.freeMemory())/1024);

StringBuilder gstr = new StringBuilder("");
int i=0;
int lngth;

while (i++ < imax+1000) {
gstr.append(str);
int index = gstr.indexOf("efgh");
while (index >= 0){
gstr = gstr.replace(index, index+4, "____");
index = gstr.indexOf("efgh");
}
lngth=str.length()*i;
if ((lngth % (1024*256)) == 0) {
System.out.println(((System.currentTimeMillis()-time)/1000)+"sec\t\t"+lngth/1024+"kb\t\t"+runtime.totalMemory()/1024+":"+runtime.freeMemory()/1024+":"+(runtime.totalMemory()-runtime.freeMemory())/1024);
}
}
}
}

Granted that's a bit uglier than it should be, but you have picked a known wart with Java (that should put the performance near the top of the heap). If you performed just about anything Math related or measured transactions per second in a similarly configured web-app, most of the dynamically typed languages are going to be decimated by Java.

There's a reason it's the #1 language (tiobe.com).

Comment by Anonymous 2011-10-24

dude, seriously, do you know Java at all?

u r using String? do u know Stringbuffer at all?

if u don't know a language, don't pretend an expert and do a 3rd-rate benchmark.

ur code sucks.

Comment by Anonymous 2011-10-30
What about add an well written C# code?
Comment by Junior Mayhe 2011-11-05

Hi There, I agree with some that should extend to other areas besides strings. Regarding PHP, you should not include Susohin since it slows down almost 8%. Regards Ricardo

Comment by ricardok1 2011-12-06

I ran your C & Perl codes, but the result was significantly different from yours. I tested on 2 different linux machines (up-to-date arch on my small server, a little out-of-date ubuntu on a huge server), and C was definitely better. All I did was "gcc test.c && ./a.out" and "perl test.pl". Can I know the exact configuration and how you ran your codes?

Comment by 07dosa 2011-12-14
Very biased remarks on Perl and Python. To say Perl is easy to read relative to other languages is plain wrong. Of course if your experienced with it you can read it...
Comment by Anonymous 2011-12-15

^--- To say Python is easy to read relative to other languages is plain wrong. :) Don't spread inexperienced programmers superstition.

Comment by Anonymous 2011-12-16
Lua : use locals + don't concatenate strings, use table concat.
Comment by Anonymous 2011-12-18
Just to point out to all of you who say "Don't use + in X language, use " I think you get my point. The most efficient concatenation operation should be easy to do, and intuitive.
Comment by TaIř sj Ořit 2011-12-20

The following lines are wrong in your c-code:

char *str=malloc(8);
strcpy(str,"abcdefgh");

You need to allocate memory for the terminating null character:

char *str=malloc(9);
Comment by Anonymous 2012-01-06

The reason Java is so slow, is that you are using read-only String instead of using a modifiable string, namely StringBuilder. If you have some experience with Java you know how important StringBuilders are for performance of string operations. In every other language you use modifiable strings, so the comparison is not fair here. The reason String is read-only in Java is for security: you don't want another thread to change the string after the policy decided that it is a valid filename you are allowed to open.

Fixing the code to use StringBuilder is easy: declare gstr as StringBuilder and instead of

gstr+=str;
gstr = gstr.replaceAll("efgh","____")

you can use:

gstr.append(str);
java.util.regex.Matcher matcher = Pattern.compile("efgh").matches(gstr);
while(matcher.find())
gstr.replace(matcher.start(), matcher.end(), "____");

Yes, the API could be improved, one wishes to have a replaceAll method in StringBuilder that wraps the three last lines. But still it is very similar to the C++ code and has roughly the same speed.

Still perl is faster, but Java is not worse than other languages like python.

It is not the performance of the garbage collection that matters here. If you would use read-only strings in C (i.e. use malloc and strcpy instead of realloc) you get the same bad performance even without a garbage collection.

Comment by Anonymous 2012-01-13
Really? can you look at that code and keep a straight face? Java supposedly prides itself on it's clarity and simplicity.
Comment by TaIř sj Ořit 2012-01-20

Hmm, Java version should look like this:

StringBuffer gstr = new StringBuffer();
int i = 0;
int lngth;
while (i++ < imax + 1000) {
gstr.append(str);
int index = gstr.indexOf("efgh");
while (index >= 0) {
gstr = gstr.replace(index, index + 4, "____");
index = gstr.indexOf("efgh", index);
}
Comment by Anonymous 2012-01-22
Please remove this flawed analysis from the web or correct it. It can give a beginner who can't see the obvious problem with the Java code used the wrong impression about Java's performance.
Comment by Anonymous 2012-02-10
This article proves only that you cannot use java, so don't... that's better for everyone!!!
Comment by Anonymous 2012-02-15
I think that the author wants only to show that Java stinks!! But he is the only one stinking here!!
Comment by Anonymous 2012-02-15
How did you measure the memory consumption for all of languages? I need to have that code.
Comment by Anonymous 2012-03-08

To all:

Thanks for your comments. Please remember this is not just about speed. Less than 30% of this article is about benchmark.

Re: comment 08:

Thanks for sample code. In the update to this article which you obviously missed, StringBuilder approach is discussed and tested. Even though it took the stress away from GC it, in my view it lose to other languages even more, performance wise. As I explained I believe math comparison is not convincing and very difficult to design.

In my everyday job I observe the same pattern of exponential slowdown on real Java web applications. It is starting with multiple HTTP requests which stresses GC enough to cripple 8 CPU server with more than 8 GiB of RAM. Backend doesn't do much string operations (if any) but whatever initialisation necessary for every new session is enough to provoke aggressive GC which utilise only one CPU when other 7 are waiting for GC to finish.

With GC quickly become bottleneck for most multi-threaded workload in Java. Ultimately it makes Java a single-user language - that's why it is not too bad on android but for web applications where more than one request is expected you won't find anything more miserable than Java no matter if strings operations are used or not.

Re: comment 09:

Yes my Java code sucks. However to me any Java code is like this - not beautiful to say the least.

Re: comment 10:

Applications in C# are facing potential patent threat from Microsoft. It just silly to work with C# unless you work with Windows which makes it double silly. If you're interested to compare C# at least you could contribute a sample code (or donation which would be more convincing).

Re: comment 11:

Thanks, very interesting about Susohin! To your knowledge Debian no longer ship PHP with Susohin by default.

Re: comment 12:

If you test with different compiler/interpreter version, which is likely the case given you tried about a year later, you can expect performance difference within 10%. I'm not sure if this match your definition of 'significally'. I have no explanation of poor C performance in my test. It could be related to default optimisations, not optimal for notebook processor or even performance regression of a particular compiler version. In this test I deliberately use defaults so I run test as simple as 'gcc c_test.c -o c_test' and then

./c_test | ./runtest.sh c_test c_test.data.txt

I just run C and Perl tests again on different machine (amd64; Linux kernel 3.2) with all recent updates and C completed the test in 271 sec. while Perl finished in 560 sec. (gcc 4.6.3; Perl v5.14.2). Please note that original testing for this article was done on notebook on 'i386' architecture. See particular compiler's versions in *.data.txt files.

Re: comment 13:

I never claimed that any Perl code is easy to read. My arguments have nothing to do with experience or bias. Poorly written Python code can be as hard to read as anything else.

Re: comment 15:

Your suggestion would be easier to understand if you provide a code example. Please do not assume all readers to be profoundly competent in Lua to understand what you're saying.

Re: comment 16:

Exactly! Methodology-wise it is also makes sense to compare code written pretty much alike. Java people argue for using very special workaround they can use for a particular case but it may be said that comparison would be unfair because sample code diverge more from other examples.

Re: comment 17:

Thanks, I didn't try your suggestion as I doubt it would affect test results. I verified the output produced by all code samples before testing so this minor flaw is unlikely to affect comparison in any way.

Re: comment 18:

I don't find your argument about read-only (aka immutable) strings convincing. You suggest to compare special case for Java to generic case for other languages. It has already been done in the update to this article and Java results were not as good in the context of other languages. It might be interesting to see how badly C results will be affected by changing realloc to malloc but Java GC is obviously stressed very much unless StringBuilder is used.

In multi-threaded applications Java is much worse than Python and other languages because GC is choking and freezing other threads.

Re: comment 20:

We already tried pretty much what you're suggesting in June 2011. I'm surprised how many commenting readers didn't read the whole article.

Re: comment 21 and 23:

I was seriously consider removing your comments. In the future I won't tolerate remarks worthless to other readers.

This article is especially useful for beginners because it demonstrate the consequence of sloppy Java coding and provide with possible solutions kindly contributed by readers. Java is very unforgiving and demanding - this is exactly the point of this article because it shows the amount of effort needed to make something barely working in Java comparing to other languages.

Maybe Java is not too bad for you, but hopefully others may get real about it.

Re: comment 24:

All the hard work is done by memstat provided by 'memstat' package. You can pipe the output of sample code to runtest.sh which takes two arguments: <executable to monitor> and <file name to save data>

Comment by onlyjob 2012-03-17

Hi!

Sorry for my mistakes, I'm not a native speaker.

Being a Java/test automation developer, I was shocked to see the bad performance of Java! I started to debug and profile your code and finally I ended up with almost the same code that was already proposed by another Java-guy (I did not notice it at first).

Let me explain what's going on the background.

First and foremost, using pure strings instead of StringBuilder is not the main problem here - the Java compiler converts normal String concatenations to StringBuilder calls as an optimization technique (at least it did on my computer). The main issue is with the implementation of String/StringBuffer classes.

There is a class Matcher, which handles regular expression matching. When you call String.replaceAll(), in turn it calls Matcher.replaceAll(). In replaceAll(), a StringBuffer instance is used as a buffer. When the replace is done, the a String object is returned, so the contents of StringBuffer must be converted to String (returning the StringBuffer is not an option, because it does not extend String). So a new String instance is created - but the complete char[] array inside StringBuffer is deepcopied* with System.arrayCopy() every time you call replaceAll(), causing a huge performance hit on the memory subsystem!

That's why the StringBuilder.replace() is necessary to avoid moving big chunks of data between StringBuffer and String.

Also, note that the code above by the "brilliant Java developer" can be further optimized because once the string "efgh" is not found, the index must be calculated again with indexOf() which is an extremely costly operation (it uses the naive substring search algorithm instead of an efficient one, like Knutt-Morris-Pratt):

gstr.append(str);

        int startIndx;
        if (savedLastIndex == -1) {
            startIndx = gstr.indexOf("efgh");
            savedLastIndex = startIndx;
        } else {
            startIndx = gstr.indexOf("efgh", savedLastIndex);
            savedLastIndex = startIndx;
        }

        while(startIndx != -1){
            gstr.replace(startIndx, startIndx + 4, "____");
            startIndx = gstr.indexOf("efgh", startIndx + 4);
        }

If you save the index, it gives a huge performance boost:

exec.tm.sec str.length  allocated memory:free memory:memory used
0       0       15872:15590:281
0sec        256kb       15872:12921:2950
0sec        512kb       15872:13284:2587
1sec        768kb       15872:11580:4291
1sec        1024kb      15872:10034:5837
2sec        1280kb      15872:3881:11990
2sec        1536kb      15872:6766:9105
3sec        1792kb      15872:5212:10660
3sec        2048kb      15872:3727:12144
4sec        2304kb      25156:11130:14025
4sec        2560kb      25156:9625:15530
4sec        2816kb      25156:8119:17036
5sec        3072kb      29316:15215:14100
5sec        3328kb      29316:13675:15640
6sec        3584kb      29316:12135:17180

I know it's almost like cheating... But this is the unfortunate effect of how Strings are handled in the platform library (not in the JVM!).

Comment by peter 2012-07-24

Hi!

Very nice post! It's good to have comparative data on so many languages in one place.

I have one suggestion: your speed graphs would really be more readable and easier to interpret if you plotted them in log-log scale. For starters, you would not need separate graphs for the slow and fast languages. But more importantly, some features of your data set would stand out so more clearly.

The most striking feature of such a log-log plot is that the influence of string length on execution time is very consistent across languages: every language gives a straight line in the plot, and all the lines are parallel! This is because the execution time scales as O(n²), where n is the size of the string, irrespective of the language. This is no big surprise, as that's what one would expect from examination of the algorithm you used for the test, but at least it shows that none of those languages is doing anything weird, like optimizing your code to O(n), or failing catastrophically to O(exp(n)).

Here is a plot of (time/size²) vs. size, base on your data: http://edgar-bonet.org/misc/lang-speeds.png

Compared to a plain time vs. size plot, this plot emphasizes the differences between languages, as well as the deviations from a pure O(n²) law. Here one can see that most languages do actually very slightly worse than O(n²), while two of them (Java gcj and Perl 5) do slightly better.

Based on this plot, some of your conclusions could be reexamined:

Speed tests fall into 4 categories: [...]

Here your "Fastest" group is over-broad: there is a 6-fold performance gap between Python and Perl. On the other hand, you put tcl and Lua in different groups, although Lua is only 1.72 times slower than tcl. A better classification would probably be:

  • Java gcj
  • Java (Sun and OpenJDK), Lua, tcl, JavaScript (sm) and Python 3
  • Python, Ruby, PHP, C++ and JavaScript (V8)
  • C and Perl 5

All tested languages are good with manipulation of little strings but as the processed data grow the difference manifests itself.

On the contrary: your data clearly shows that not all languages are good at manipulating small strings. Actually, the performance spread for small strings (a factor 225 between fastest and slowest) is huge, and only marginally smaller than the spread for big strings (a factor 247).

performance of C and Perl5 is almost a flat line on graph indicating very little degradation. It means that C and Perl5 process increasing amount of data at (almost) constant speed.

C++, JavaScript (sm and also V8 beyond 768 KiB) and tcl also provide constant speed. Java on gcj even shows a slight speedup with longer strings! It goes from 6.88 to 6.43 ms/KiB². But obviously, constant speed does not mean good speed.

Some language's performance degrade faster than others so in beginning of this test Java somewhat 20 times slower than Perl5 and in the end Java is about 40 times slower (for same amount of data). Clearly this is an important characteristic - size matters!

Size matter barely. It is true that the curves are not strictly parallel, but deviations are small. And even though you can spot some crossings (like Lua outperforming the Javas beyond 3328 KiB, or C++ beating PHP beyond 2304 KiB), the languages that cross are of comparable speed on the whole range of tested sizes.

I realize this post sounds a little bit too critical of your study. That was not my intention. The main conclusions of your article (that string manipulations are an important performance metric, and the relative performances of the contenders) seem solid. Or at least, should I say, you convinced me. ;-)

Regards,

Edgar.

Comment by Edgar 2013-02-12

C correction:

include <stdio.h>

include <stdlib.h>

include <string.h>

include <time.h>

int main() { setbuf(stdout, NULL); //disable output buffering

char *str = malloc(9);
strcpy(str, "abcdefgh");
str = realloc(str, 8 + 8 + 1);
strcat(str, "efghefgh");
int str_len = strlen(str);

printf("%s", "exec.tm.sec\tstr.length\n");

time_t start = time(NULL);
char *gstr = malloc(1);
*gstr = '\0';
int gstr_len = 0;

int imax = 1024 / str_len * 1024 * 4;
int i = 0;
while (i++ < imax + 1000) {
    gstr_len += str_len;
    gstr = realloc(gstr, gstr_len + 1);
    strcat(gstr, str);

    char *pos = gstr;
    while (pos = strstr(pos + 4, "efgh")) {
        memcpy(pos, "____", 4);
    }

    if (gstr_len % (1024 * 256) == 0) {
        printf("%lisec\t\t%dkb\n", time(NULL) - start, gstr_len / 1024);
    }
}

}

C Benchmark:

exec.tm.sec str.length 0sec 256kb 1sec 512kb 3sec 768kb 5sec 1024kb 8sec 1280kb 12sec 1536kb 16sec 1792kb 22sec 2048kb 29sec 2304kb 38sec 2560kb 48sec 2816kb 60sec 3072kb 73sec 3328kb 89sec 3584kb 108sec 3840kb 129sec 4096kb

C++ correction:

include

include

include <time.h>

using namespace std;

int main() { string str("abcdefgh"); str += "efghefgh";

time_t start = time(NULL);
cout << "exec.tm.sec\tstr.length" << endl;

string gstr;

int imax = 1024 / str.length() * 1024 * 4;
int i = 0;
while (i++ < imax + 1000) {
    gstr += str;
    for (size_t pos = gstr.find("efgh"); 
         pos != string::npos; 
         pos = gstr.find("efgh", pos + 4)) {

        gstr.replace(pos, 4, "____");
    }

    if ((gstr.length() % (1024 * 256)) == 0) {
        cout << time(NULL) - start << "sec\t\t" << gstr.length() / 1024 << "kb" <<  endl;
    }
}

return 0;

}

C++ Benchmark:

exec.tm.sec str.length 2sec 256kb 9sec 512kb 21sec 768kb 37sec 1024kb 58sec 1280kb 83sec 1536kb 113sec 1792kb 148sec 2048kb 188sec 2304kb 232sec 2560kb 281sec 2816kb 335sec 3072kb 393sec 3328kb 457sec 3584kb 525sec 3840kb 598sec 4096kb

Perl Benchmark (for comparison):

exec.tm.sec str.length 2sec 256kb 6sec 512kb 14sec 768kb 24sec 1024kb 38sec 1280kb 54sec 1536kb 73sec 1792kb 96sec 2048kb 122sec 2304kb 150sec 2560kb 182sec 2816kb 218sec 3072kb 256sec 3328kb 298sec 3584kb 343sec 3840kb 391sec 4096kb

Comment by Troy 2013-04-24