» tagged pages
» logout

sorted by: recent | see : popular
Content Tagged with async + io

Async IO on Linux

trunk/ just got support Linux Native AIO.

I implemented Async IO based on libaio which is a minimal wrapper around the aio-syscalls for the 2.6.x kernels.

Implementation

It was a bit tricky to get it working as libaio is basicly undocumented, but hey … that’s why we are hackers :)

The async file IO support is part of Linux 2.6.9 and later and should be on every recent linux box. A separate library call libaio is providing very simple wrappers and is used as the base for the new network backend.

The idea is:

  1. create a buffer in /dev/shm and mmap() it
  2. start a async read() from the source file to the mmap() buffer
  3. wait until the data is ready
  4. use sendfile() to send the data from /dev/shm to the network socket

Important for the performance: the data is never copied into user space. We only move it from one side of the kernel to the other side.

Hack ahead

Sadly I had to add pthread to the dependencies. Having threads in a single-threaded server is a bit strange, but it is necessary.

fdevent_poll() was waiting for fd-events for 1s. While it was waiting the server was waiting. The handling the async-notifications is also blocking and we can’t make them return as soon as one of them is done.

If necessary we start a io-getevent-thread which run in parallel to the fdevent_poll() call. The call which returns first is interrupting the other one by sending a SIGUSR1 to the process. It makes the waiting calls (poll() and io_getevents()) return with a EINTR and we can continue handling the result of one of the two calls.

Benchmarks

As testbed we have a RAID1 (linux md) via two

  • ST3160827AS (SATA, 120Mb each)
  • nVidia Corporation CK8S as SATA controller
  • AMD Athlon™ 64 Processor 3000+
  • Linux 2.6.16.21-0.25-xen (SuSE 10.1)

siege, 700Mb

I’ll compare linux-sendfile vs. linux-aio-sendfile.

$ siege—reps=1 -c 1—benchmark http://127.0.0.1:1025/file-700M
conc non-aio aio [512k] aio [1M]
1 52.38 MB/sec [9% idle] 89.85 MB/sec [70% idle] 107.50 MB/sec [67% idle]
2 39.94 MB/sec [8% idle] 94.52 MB/sec [70% idle] 92.74 MB/sec [70% idle]
5 35.45 MB/sec [7% idle] 31.81 MB/sec [86% idle] 72.84 MB/sec [70% idle]
10 .. 25.22 MB/sec [82% idle] 32.87 MB/sec [90%] idle

More important than the throughput is the CPU time that can be spent with other tasks now.

What’s next ?

Next is bug fixing, load testing (more parallel connections), random load, ...

lighttpd: lighty's life