On Mindcraft's April 2022 Benchmark

Return to site

On Mindcraft's April 2022 Benchmark

  I just compiled [Apache] using almost all of the modules disabled I'm using the highperformance-conf.dist config file from the distribution. Karthik also posted about linux Kernel and its followups. This sounds rather like the behavior Mindcraft reported (After the restart, Apache performance climbed back to within 30% of its peak from a low of about 6% of the peak performance). Kernel issue #2 - Wake-One vs. The Thundering Herd     (Note: According to the Linux Scalability Project's paper on the thundering herd problem, a task exclusive wake-one patch is now integrated into the 2.3 kernel; however, according to Andrea, as of 2.4.0-test10, it still wakes up processes in same order they were put to sleep, which is not optimal from a caching point of view. It would be better to have them in reverse order. See also Nov 2000 measurements by Andrew Morton (andrewm@uow.edu.au); post 1, post 2, and Linus' reply.) - Phillip Ezolt, 5 May 1999, in linux-kernel ( Overscheduling DOES happen with high web server load. ): When running a SPECWeb96 strobe run on Alpha/linux, I found that when the CPU is pegged, 18% of the time is spent in the scheduler. This is what Russowitzich mentioned in his critique of Linux. This post started a very lively thread in linux-kernel (now on its second week). Looks like the scheduler (and possibly Apache) are in for some changes. - Rik van Riel, 6 May 1999, in linuxperf (Re: [linuxperf] Possible fix for Mindcraft Apache problem): ... The main bug with the web benchmark remains. The way Apache and Linux 'cooperate' is a problem. When a signal is received, all processes get woken up. The scheduler must then choose one of the dozens ....runnable process. The real solution is to switch from wake-all semantics and use a wake-one style to avoid the huge runqueues that Phillip Ezolt, the DEC guy, experienced. The good news is that it's a simple patch that can probably be fixed within a few days... - Tony Gale, 6 May 1999, in linuxperf ( Re: [linuxperf] Possible fix for Mindcraft Apache problem): Apache uses file locking to serialise access to the accept call. This can cause high costs on some systems. I have not yet had the time to run the Linux numbers for the various server models. This will allow me to determine which one is the most efficient. Check Stephens UNPv1 2nd Edition Chapter 27 for details. - Andrea Arcangeli, May 12th, 1999, in linux-kernel ( [patch] wake_one for accept(2) [was Re: Overscheduling DOES happenwith high web server load.] and 2.2.8_andrea1.bz2): I released a new andrea-patch against 2.2.8. This new one has my new wake-one on accept(2) strightforward code (but to get the improvement you must make sure that your apache tasks are sleeping in accept(2), a strace -p `pidof apache` should tell you that). The patch can be accessed from this link. David Miller's answer to the above question:... on every TCP connection, there are 2 spurious and unsolicited wakeups. These wakeups originate in the write_space socket callback. This is because we free up SYN frames and wakeup listening socket sleepers. I have been working on this exact issue today. - Ingo Molnar, May 13th, 1999, in linux-kernel ( Re: [RFT] 2.2.8_andrea1 wake-one [Re: Overscheduling DOES happen with high web server load. ]): note that pre-2.3.1 already has a wake-one implementation for accept() ... and more coming up. - Phillip Ezolt (ezolt@perf.zko.dec.com), May 14th, 1999, in linux-kernel ( Great News!! Was: [RFT] 2.2.8_andrea1 wake-one ): I've been doing some more SPECWeb96 tests, and with Andrea's patch to 2.2.8 (ftp://ftp.suse.com/pub/people/andrea/kernel/2.2.8_andrea1.bz) **On identical hardware, I get web-performance nearly identical to Tru64! **... Tru64 4ms2.2.5 100ms2.2.8 9ms2.2.8_a4ms... I get web-performance almost identical to Tru64, according to this Iprobe data: The number of SPECWeb96 maxOps per second has increased as well. **Please add the wakeone patch to the 2.2.X kernel. ** Larry Sendlosky tried this patch, and says: Your 2.2.8 patch really helps apache performance on a single cpu system, but there is really no performance improvement on a 2 cpu SMP system.     below. Also, see: - Dimitris Michailidis at dimitris@sgi.com, 14 May 1999 in Linux-kernel. See also: Andrea Arcangeli at andrea@suse.de 21 May 1999 in Linux-kernel. Re: andrea buffer codes (2.2.9-C.gz. ) - Update of same. Might have some SMP bottleneck fixes, too. Kernel issue #3: SMP bottlenecks in 2.2 Kernel     - Juergen Schmidt, May 19th, 1999, in linux-kernel ( Bad apache perfomance wtih linux SMP), asked what could make Apache do poorly under SMP. Andi Kleen responded: It is most likely that TCP sending data copies are completely serialized. This can be fixed by replacing the skb->csum = csum_and_copy_from_user(from, skb_put(skb, copy), copy, 0, &err); in tcp.c:tcp_do_sendmsg with unlock_kernel(); skb->csum = csum_and_copy_from_user(from, skb_put(skb, copy), copy, 0, &err); lock_kernel(); The patch does not violate any locking requirements in the kernel... [To fix your connection refused errors,] try: echo 32768 > /proc/sys/fs/file-max echo 65536 > /proc/sys/fs/inode-max Overall it should be clear that the current Linux kernel doesn't scale to CPUs for system load (user load is fine). Although it is false, I blame Linux vendors for promoting it. ... The work to fix all these issues is underway. [2.3 will first be fixed, then the modifications will be backported into 2.2]. [Note : Andi's TCP unlocking solution appears to be in 2.2.9_ac3. Andrea Arcangeli responded describing his own version of this fix ( ftp://ftp.suse.com/pub/people/andrea/kernel/2.3.3_andrea2.bz2 ) as less cluttered: If you look at my patch (the second one, in the first one I missed the reaquire_kernel_lock done before returning from schedule, woops :) then you'll see my approch to address the unlock-during-uaccess. My patch doesn’t change tcp/ipext2 etc... it only touches uaccess.h.c. I don't like unlock_kernel being all over the place. Juergen Schmidt, 26 May 1999, on linux-kernel and new-httpd, ( Linux/Apache and SMP - my fault ), retracted his earlier problem report: I reported disastrous performance for Linux and Apache on a SMP system. I downloaded clean kernel sources (2.2.8 and 2.2.9) to double-check. These do not have the reported penalty for running on SMP system. After seeing the first very poor results, I made the mistake of using the kernel sources that were already installed. However, these sources had been modified long before my machine was born. They should have been thrown away in the first instance. Please accept my apologies for this confusion. Others have reported modest performance increases (20% or so) when Andrea's SMP fix was used. However, this only occurred when large files were being loaded (100 kilobytes). Juergen has now finished his testing. Unfortunately, he neglected to compile Apache with -DSINGLE_LISTEN_UNSERIALIZED_ACCEPT, which ( according to Andrea) significantly hurt Apache performance. Juergen did not notice that. This means that it's too difficult for him to figure it out. To make it easier to get good performance in the future, we need the wake-one patch added to a stable kernel (say, 2.2.10), and we need Apache's configuration script to notice that the system is being compiled for 2.2.10 or later, and automatically select SINGLE_LISTEN_UNSERIALIZED_ACCEPT. Other Apache users can help solve performance problems     - Mike Whitaker (mike@altrion.org), 22 May 1999, in linuxperf ( High load under Apache1.3.3/mod_perl 1.16/Linux 2.2.7 SMP ), described an interesting performance problem: Our typical webserver is a dual PII450 with 1G, and split httpd's, typically 300 static to serve the pages and proxy to 80-100 dynamic to serve the mod_perl adverts. Unneeded modules are disabled and hostname lookups are turned off as a sensible person would. There's typically between one and three mod_perl hits/page on top of the usual dozen or so inline images... The kernel (2.2.7) has MAX_TASKS upped to 4090, and the unlock_kernel/lock_kernel around csum_and_copy_from_user() in tcp_do_sendmsg that Andi Kleen suggested. Performance is .. interesting. Load fluctuates between 10-12, while the user CPU goes 0 (80% idle) up to 180% (0% idle machine *crawling*), around once per minute and a third. Vmstat shows that the number of processes in a state ranges from 0 (when load has been low) to 30-40. The static servers can manage 60-70 peak hits/sec. Without the dynamic httpd, everything *flies He was advised to try a kernel that has wake-one support. Identical systems: dual PII450, 1G and two disk controllers. As far as I can *tell*, the wake-one patch is definitely doing its stuff: the 2.2.7 machine still has cycles of load into three figures, and the 2.3.3 machine hasn't actully managed a load of 1 yet. Unfortunately, observation suggests that the 2.3.3/Apache combination drops/ignores about one connection in ten. (Network error. Connection reset by peer. His next update is on May 25th: More progress from a bleeding edge. (Remember: The config here split static/mod_perl httpds with a CPU-intensive mod_perl Script serving ads as an SSI to be the probable bottleneck.) Linux kernel 2.2.9 plus 2.2.   9_andrea3's (wake-1) patch seems to work. It handles hits at a speed that suggests that it's pushing the adverser close to its observed maximum. (As I mentioned in a previous notice, avoid 2.2.8 like a plague: it trashes HDs. - See threads at linux-kernel. However... When it *does* get overstressed, BOY does it get overstressed. When the idle CPU is at zero (i.e. The idle CPU drops to zero (i.e. Spikes in demand are a possibility. Once you're in this situation, it can be difficult to get back under the load of prgressively higher backlog requests. This is counterintuitive. You can *REDUCE* MaxClients and hope that the tcp listen queue can handle a load surge. This seems to work, according to experience. This is an excellent case for Eddieware's load-balancing DNS. - Eric Hicks, 26 May 1999, in linux-kernel ( Apache/kernel problem? ): ... I'm having some big problems in which it appears that a single PII 400Mhz or a single AMD 400 will outrun a dual PII 450 at http requests from Apache. ... HTTP Server Tests Data: 100 1MByte MPEG files stored on local drives. Results: - AMD 400Mghz K6, 128MB, Linux 2.0.36; handles 1000 simultaneous clients @ 57.6Kbits/sec. - PII 400Mghz, 512MB, Linux 2.0.36; handles 1000 simultaneous clients @ 57.6Kbits/sec. - Dual PII/450Mghz and 512MB, Linux 2.2.8 and Linux 2.0.36; handles far fewer than 300 simultaneous clients @57.6Kbits/sec.   I advised him to use 2.2.9_andrea3; and he said that he would try it and report back. Kernel issue #4 - Interrupt Bottlenecks     According to Zach, the Mindcraft benchmark's use of four Fast Ethernet cards and a quad SMP system exposes a bottleneck in Linux's interrupt processing; the kernel spent a lot of time in synchronize_bh(). (A single Gigabit Ethernet cable would lessen this bottleneck. Mingo claims that TCP throughput scales better with more CPUs in 2.3.9 than in 2.2.10; however, he hasn’t yet tried it with multiple Ethernets. Steven Guo and Steve Underwood also commented on the issue of interrupts under heavy loads. See also Linus's State of Linux talk at Usenix '99 where he talks about the Mindcraft benchmark and SMP scalability. See also SCT's Jan 2000 comments about progress in scalability. Softnet is coming! Kernel 2.3.43 adds the new softnet networking changes. Softnet changes the interface of the networking cards. Therefore, every driver must be updated. However, network performance should be much better on large SMP systems. (For more details, see Alexy’s readme.softnet or his softnet–howto on February 15, which explains how to convert old drivers. The Feb '00 thread Gigabit Ethernet Blockages (especially its second week), has lots of interesting tidbits regarding what interrupt and other bottlenecks are still present, as well how they will be addressed in the 2.3 kernel. Ingo Molnar wrote a post 27 February 2000 that explains the IA32 code's improvements to interrupt handling in great detail. These improvements will be integrated into core kernel 2.5, it seems. Kernel issue #5 is a mysterious network slowdown     This is a bug and not a scaling issue. Several users of 2.2 reported that their networking performance sometimes drops to 1 to 10% below normal. These slowdowns are often associated with high ping times. However, they were able to temporarily fix the problem by cycling the interface. Oystein Sigsen reported that after upgrading to 2.2, we experienced occasional slowdowns in TCP performance. The performance goes back to normal when I take down the interface and reinsert the eepro100 module into the kernel. After I've done that, the performance is fine for a couple of days or maybe weeks. David Stahl reported 29 June 1999: I have 3 machines running 2.2.10 with multiple 3COM 905/905bPCI [cards ]...] After approximately 2 days of uptime, I start to notice ping times on my local lan jump up to 7-20 seconds. As others have noted, there is no loss. There is just some damn high latency. ... It seems to depend on network load -- lighter loads mean longer periods between problems. The problem ALSO is gradual -- it'll start at 4 second pings, then 7 second pings about 20 minutes later, than 30 minutes later it's up to 12-20 seconds. - Another eepro100 report. A tulip report. Less likely to happen again. - David Stahl wrote on 13 July 1999: What DID fix the problem was a private reply from someone elese (sorry about the credit, but i'm not in the mood to sieve 10k emails right now), to try the alpha version of the latest 3c59x.c driver from Donald Becker (http://cesdis.gsfc.nasa.gov/linux/drivers/vortex.html). 3c59x.c:v0.99L 5/28/99 is the version that fixed it, from ftp://cesdis.gsfc.nasa.gov/pub/linux/drivers/test/3c59x.c - On 23 Sep 1999, Alexey posted a one-line patch that clears up a similar mysterious slowdown. 2.2.13 and Red Hat 6.1 already have this patch applied. This patch was applied to three Red Hat systems 6.0 that I know of, with Masq support compiled into, connected to cable modesms. It fixed an issue that caused very high packet pings after TCP transfers to distant hosts. Rickard Cedergren and Michael Brown reported about October 21st on linux-kernel that that although Alexey's patch greatly improved the problem, it is not totally gone. Tony Hoyle is also experiencing long delays with 2.2.13. Jeremy Fitzhardinge reports another delay. The replies state that it is likely caused by a Tulip driver. Kernel issue #6: 2.2.x/NT TCP slowdown     Petru Paler, July 10, 1999 in linux-kernel ([BUG] TCP connections among Linux and NT ), reported that any type of TCP connection between Linux 2.2.10 and a NT Server 4 Service Pack 5 slows down to a crawl. With 2.0.37, the problem was much less severe (6kbytes/sec). Andi Kleen included a log from tcpdump of a slow connection. This helped Andi see that NT was taking a lot of time to ACK a packet, which was causing Linux back to throttle. Solved: false alarm! It wasn’t Linux’s fault at any point. Turns out NT needed to be told to not use full duplex mode on the ethernet card. Kernel issue #7, Scheduler     Phil Ezolt 22 January 2000 in linux–kernel ( Re : Interesting analysis by IBM Linux kernel threading): I see both a large amount of running processes and a great number of context switches when I run SPECWeb96 testing here. ... Here's a sample of the vmstat data: procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id ... 24 0 0 2320 2066936 590088 1061464 0 0 0 0 8680 7402 3 96 1 24 0 0 2320 2065752 590664 1061464 0 0 0 1095 11344 10920 3 95 1 Notice. 24 running process and ~7000 context switches. This is a lot of overhead. Each second, 7000*24 goodnesses can be calculated. Not the (20*3) that a desktop system sees. This is a scalability issue. A better scheduler means better scalability. Don't tell us that benchmark data is ineffective. If you cannot provide me data using a real computer system and its faults, benchmark data is what we have. SPECWeb96 pushes Linux to the limit until it bleeds. I'm going tell you where it is bleeding. You have two options: fix it or bury yourself in the sand. It might not be what your system sees today, but it will in future. Would you rather fix it right away or wait until someone else does? ... Here's an interesting fact. During my runs I see 98% contention for the [2.2.14] kernel locks, and it's accessed A LOT. I don't have much memory support so I don't know how it compares to 2.3.40. Andrea will probably be kind enough to give me a patch and I'll be able to see if things have improved. [Phil's data pertains to the webserver that was subject to the SPECWeb96 test. It is an ES404 CPU alpha EV6 running Redhat 6.0 w/kernel v2.2.14 w/SGI speed patches; the interfaces taking the load are 2 ACENic gigabit ethernetcards. Minecraft servers Kernel issue #8: SMP Bottlenecks in 2.4 kernel     Manfred Spraul, April 21, 2000, in linux-kernel ( [PATCH] f_op->poll() without lock_kernel()): kumon@flab.fujitsu.co.jp noticed that select() caused a high contention for the kernel lock, so here is a patch that removes lock_kernel() from poll(). [tested] with version 2.3.99. There was some discussion about whether this was wise at this late date, but Linus and David Miller were enthusiastic. It seems like one more bottleneck is in the mix. On 26 April 2000, kumon@flab.fujitsu.co.jp posted benchmark results in Linux-Kernel with and without the lock_kernel() in poll(). A kernel patch was released to improve checksum performance. Apache 1.3 was patched to align its buffers within 32-word boundaries. Linus praised Dean Gaudet for the latter patch. Linus also reported rumors that this could speed up SPECWeb results 3%. This was an interesting thread. This thread was very interesting. Kernel issue #9: csum_partial_copy_generic     kumon@flab.fujitsu.co.jp, 19 May 2000, in linux-kernel ( [PATCH] Fast csum_partial_copy_generic and more ) reports a 3% reduction in total CPU time compared to 2.3.99-pre8 on i686 by optimizing the cache behavior of csum_partial_copy_generic. The workload was ZD's WebBench. He adds The benchmark we used has almost same setting as the MINDCRAFT ones, but the apache setting is [changed] slightly not to use symlink checking. We used only 24 clients independent of each other and there were 16 apache processes.   The performance of a four-way XEON processor system is twice as good as a single CPU. Note that in ZD's benchmarks with 2.2.6, a 4 CPU system only achieved a 1.5x speedup over a single CPU. Kumon reports a > 2x speedup. This appears to be about the same speedup NT 4.0sp3 achieved with 4 CPUs at that number of clients (24). It's encouraging news to hear that things might have improved in the 11 month since the 2.2.6 testing. Kumon stated that there was a significant improvement between pre3 and post5, which is poll optimization. Until pre4 (I forget exact version), kernel-lock prevents performance improvement. If you can retrieve all l-k mails between Apr 20-25, the following mails should help you understand the background. subject: namei() query subject: [PATCH] f_op->poll() without lock_kernel() subject: lockless poll() (was Re: namei() query) subject: movb for spin-unlock (was Re: namei() query)   On 4 Sept 2000, kumon posted again, noting that his change still hadn't made it into the kernel. Kernel issue number 10: getname() and poll() optimizations     Manfred Spraul posted a patch to linux-kernel on 22 May 2000. It optimized kmalloc() and getname() a bit. This speeds up apache by 1.5% on 2.3.99.pre8. Kernel issue #11     Alexander Viro posted a fix on 30 May 2000 to get rid of a lock in close_flip(). Kumon ran a benchmark and reported: I measured viro’s ac6D patch using WebBench on a 4cpu Xeon computer. I applied for 2.4.0-test1 and not ac6. The patch decreased stext_lock time by 50% and OS time by 4%. ... Some part of kmalloc/kfree overhead is come from do_select, and it is easily eliminated using small array on a stack. kumon then posted a patch that avoids kmalloc/kfree in select() and poll() when # of fd's involved is under 64. Kernel issue #12: Poor disk seek behavior in 2.2, new elevator code in 2.4     On 20 July 2000, Robert Cohen (robert@coorong.anu.edu.au) posted a report in Linux-kernel listing netatalk (appletalk file sharing) benchmarks comparing 2.0, 2.2, and several versions of 2.4.0-pre. The elevator code in version 2.4 seems to be helpful (some versions of 2.4 can manage 5 benchmark clients instead 2) but... The test4 and test5pre2 versions aren't as good. They handle 2 clients on a 128 Meg server fine, so they're doing better than 2.2 but they choke and go seek bound with 4 clients. So something has definitely taken a turn for the worse since test1-ac22. Here's an update. The *only* 2.4 kernel versions that could handle 5 clients were 2.4.0-test1-ac22-riel and 2.4.0-test1-ac22-class 5+; everything before and after (up to 2.4.0-test5pre4) can only handle 2. Robert Cohen posted a patch on 26 Sept 2000. It included a simple program that demonstrated the problem. Jens Axboe replied to axboe@suse.de that Andrea and he had a fix for the problem. Robert Cohen posted an updated on 4/10/2000 with benchmark results for many Kernels, showing that the issue still exists in 2.4.0.test9. Kernel issue #13: Fast Forwarding / Hardware flow control     Jamal (hadi@cyberus.ca), posted a note in Linux kernel on 18 September 2000. It described proposed changes to the network driver interface of the 2.4 kernel. The changes include hardware flow control and other refinements. Robert Olson and he said that they decided after the OLS to try to reach 100Mbps (148.8Kpps), routing maximum by year end. I am afraid the bar has been raised. Robert is already hitting 2.4.0-test7 at 148Kpps using an ASUS CUBX motherboard with PIII 700MHZ coppermine and about 65% CPU utilization. I was able get a consistent value in the 110Kpps range with a single PII-based Dell computer. So the new goal is to go to about 500Kpps ;-> (maybe not by year end, but surely by that next random Linux hacker conference) A sample modified tulip driver (hacked by Alexey for 2.2 and mod'ed by Robert and myself over a period of time) is supplied as an example on how to use the feedback values. ... I believe we could have done better with the mindcraft tests with these changes in 2.2 (and HW FC turned on). [update] BTW, I have been informed that Linux people were not allowed to alter the hardware for these tests. Therefore, I don't believe they could have used this update if they were still available. Kernel tuning issue: hitting TIME_WAIT     On 30 March 2000, Takashi Richard Horikawa posted a report in Linux-Kernel listing SPECWeb96 results for both the 2.2.14 and 2.3.41. Performance between a 2.1.4 client and a 2.14 server was poor due to not enough ports being used. Ports were not done with Time_WAIT by time that the port number was required again for a new connection. The moral of the story may be to tune the client and servers to use as large a port range as possible, e.g. with echo 1024 65535 > /proc/sys/net/ipv4/ip_local_port_range to avoid bumping into this situation when trying to simulate large numbers of clients with a small number of client machines. On the 2 April 2000, Mr. Horikawa confirmed the solution. Suggestions on future benchmarks     Become familiarized with linux kernel and Apache mailing lists, as well the Linux newsgroups that exist on Usenet (try DejaNews power-searches in forums matching *linux *').. Post your proposed configuration and see whether people agree with it. Also, be open about the benchmark. You can post intermediate results and ask for suggestions. You can expect to spend about a week mulling over ideas with these mailing list members during your tests. If possible, use a modern benchmark like SPECWeb99 rather than the simple ones used by Mindcraft. To make the Internet more real, it might be worth injecting latency into the network path between the client and server. If possible, benchmark single CPUs and multiple CPUs. It is important to note that the networking performance for version 2.2.x (Linux kernel) does not scale well when you add Ethernet cards or CPUs. This applies mostly to static pages and cached dynamic pages; noncached dynamic pages usually take a fair bit of CPU time, and should scale very well when you add CPUs. To save frequently generated pages, caches can be used to speed up dynamic page speeds. If you are testing dynamic content, don't use the old model that runs a separate process for every request. That is too slow. Always use a modern dynamic content generation interface (e.g. Apache mod_perl Configuring Linux     Tuning problems probably resulted in less than 20% performance decrease in Mindcraft's test, so as of 3 October 1999, most people will be happy with a stock 2.2.13 kernel or whatever comes with Red Hat 6.1. The 2.4 kernel will help with SMP performance when it becomes available. If you're interested in seeing what people were doing in June, here are some notes: - Linux kernel 2.2.9 and 2.2.9_andrea3 have been praised for their performance on a dual processor task as of June 1 (see above). (2.2.9_andrea3 seems include both a wakeone scheduler fix and an SMP unlock_kernel fix. (andrea3 works only on x86. PPC's and Alphas will need to apply another wake-one or tcp copy kernel_unlock fix. Jan Gruber writes: The 2.2.9_andrea3_patch does not compile with SMP Support disabled. Andrea told me to use ftp://ftp.suse.com/pub/people/andrea/kernel-patches/2.2.9_andrea-VM4.gz instead. - On 7 June, Andrea Arcangeli asked: If you are going to do bench I would like if you would bench also the patch below. ftp://e-mind.com/pub/andrea/kernel-patches/2.2.9_andrea-perf1.gz - On 11 Oct 1999, Andrea Arcangeli posted his list of pending 2.2.x patches, waiting to go into 2.2.13 or so. These include several that may improve performance for SMP systems and systems subject to heavy I/O. These might be worth considering if you encounter bottlenecks. - For the truly adventurous, you might consider using the kernel-mode http web server, khttpd, to serve as a front end for Apache. It speeds up static web page fetches tremendously. It's at version 0.1, so use caution. - linux-kernel ( week 1, week 2 ) is currently (8 June 1999) discussing benchmarking Apache. Linus Torvalds, who is generally supportive of khttpd or a similar program, points out that NT is doing essentially the same thing. Configuring Apache     - The usual optimizations should be applied (all unused modules should be left out when compiling, host name lookup should be disabled, and symbolic links should be followed; see http://www.apache.org/docs/misc/perf-tuning.html) - Apache should be compiled to block in accept, e.g. env CFLAGS='-DSINGLE_LISTEN_UNSERIALIZED_ACCEPT' ./configure - The http://www.arctic.org/~dgaudet/apache/1.3/top_fuel.patch may be worth applying. PC Week used top_fuel during their recent benchmarks. (See also Dean Gaudet’s interesting comments in new-httpd and linux-kernel.) According to top_fuel.patch, using mod_mmap_static for a set of documents can reduce request times by 18 to 9. - For static file benchmarks, try compiling mod_mmap_static into Apache (see http://www.apache.org/docs/mod/mod_mmap_static.html) and configuring Apache to memory-map the static documents, e.g. Create a config file by searching for /www/htdocs and printing */mmapfile. - Several people have mentioned that using Squid as a front-end to Apache would greatly accellerate static web page fetches.   Related reading   - Usenet posts showing slow Apache and Linux connections: Apache isn't as fast than people claim? , 1999/04/05, comp.infosystems.www.servers.unix ...when we run WebBench to test the requests/sec and total throughput, Microsoft IIS 4 is 3 times faster for both Linux and Mac OS X. Re: Apache vs IIS 4: IIS 4 3 times faster, 1999/04/02, comp.infosystems.www.servers.unix Why are you surprised? It was well-known that Apache is slow. I haven’t tested IIS, however I did compare Apache with a number other servers last summer and found that some were three to five times faster. Methods to profile the kernel: Kernel Spinlock Metering For Linux IA32 - Tools to measure SMP spinlock contention. See also some test results comparing 2.2 to 2.3. A spinlock metering example to identify and fix a kernel bottleneck in 2.3.39. Andrea Arcangeli’s ikd SGI's Gprof Kernel Profiling Patch (original announcement). Ingo Molnar’s ktracer for 2.1.x Example ktracer usage - Christoph Lameter's perfstat patches, at Captech’s Linux Performance, Stability and Scalability Project. - Christoph Lameter's perfstat patches, at Captech's Linux Performance, Stability and Scalability Project. - How to profile user programs: gprof.out with gprof. Mikael Pettersson's x86 performance-monitoring counters patch. Supports 2.3.22 and 2.2.13. Includes a list of other related tools. David Mentre PCL - Performance Counter Library - How to use hardware performance counters with Linux Stephan Meyer's MSR patch -- only supports up to 2.2.6. No longer actively developed. Richard Gooch's MSR/PTC patch -- only supports Version 2.2. Requires devfs. A few linux Kernel posts: 2.2.5 Optimizations For Web Benchmarks? , 16 Apr 1999 -- Karthik Prabhakar, about to do serious SPECWeb96 benchmarking, asks the right questions. The followups can be very interesting. Regard: 2.2.5 optimizations web benchmarks? Dean Gaudet's reply, 16 April 1999. An Apache insider offers some interesting insights. [patch] new scheduler, 9 May 1999 -- the thread started by Rik van Riel about possible scheduler changes The smbtorture benchmark, which lets you test an SMB server like the big boys Rik van Riel's Linux Performance Tuning site The Linux Scalability Project The C10K problem - Why can't Johnny serve 10000 clients? Banga and Druschel's paper on web server benchmarking Linus's State of Linux talk at Usenix '99 where he talks about the Mindcraft benchmark and SMP scalability. my NT vs. Linux Server Benchmark Graphs page A post on comp.unix.bsd.freebsd.misc from June '99 which mentions that FreeBSD also has similar SMP scaling properties as Linux on tests like those run by Mindcraft. Mike Abbott from SGI has posted Apache performance patches 1.3.9.  

Minecraft servers