In an earlier post, I had talked about scaling out the service layer of your application using actors and asynchronous processing. This can buy you some more donuts over and above your current throughput. With extra processing in the form of n actors pounding the cores of your CPU, the database will still be the bottleneck and SPOF. As long as you have a single database, there will be latency and you will have to accept it.
Exploit the virtues of immutability. Design your application around stateless abstractions interacting with each other through asynchronous message passing. These are some of the mantras that I have been trying to grok recently. In a typical Java EE application, we design the service layer to be maximally stateless. What this means is that each individual service has localized mutability interacting with other services on a shared-nothing basis. Asynchronous message passing offers some interesting avenues towards scaling up the throughput of such service layers in an application.
In this presentation from QCon London 2008, Brian Goetz discusses the difficulties of creating multithreaded programs correctly, incorrect synchronization, race conditions, deadlock, Software Transactional Memory, the history of concurrency, alternatives to threads, Erlang, Scala, and recommendations for concurrency in Java.