Randall Stewart of Cisco Systems gave a talk titled SCTP, what it is and how to use it, discussing the Stream Control Transmission Protocol (SCTP). A paper that was displayed on the overhead projecter before the talk began summarized:
"Integrated into FreeBSD 7.0 -- first standardized by the Internet Engineering Task force (IETF) in October of 2000, in RFC 2960 and later updated by RFC 4960. SCTP is a message oriented protocol providing reliable end to end communication between two peers in an IP network."
Randall explained that SCTP is an alternative protocol to TCP, UDP. To describe SCTP, he suggested you start with TCP features, including: reliable retransmission, congestion control, flow control, connection oriented, and selective acknowledgements. You then add to it more features, including: "association" 4-way handshake, framing and ordered service, multistreaming, multihoming, and reachability.
Pawel Dawidek first ported ZFS to FreeBSD from OpenSolaris in April of 2007. He continues to actively port new ZFS features from OpenSolaris, and focuses on improving overall ZFS stability. During the introduction to his talk at BSDCan, he explained that his goal was to offer an accessible view of ZFS internals. His discussion was broken into three sections, a review of the layers ZFS is built from and how they work together, a look at unique features found in ZFS and how they work internally, and a report on the current status of ZFS in FreeBSD.
The BSDCan website notes that Pawel is a FreeBSD committer, adding:
"In the FreeBSD project, he works mostly in the storage subsystems area (GEOM, file systems), security (disk encryption, opencrypto framework, IPsec, jails), but his code is also in many other parts of the system. Pawel currently lives in Warsaw, Poland, running his small company."
BSDCan 2008 officially started this morning at 9AM with an opening talk by the event's organizer, Dan Langille. However, in reality the event has already been running for two days, with the FreeBSD tutorials having started on the 14'th. After arriving in Ottawa yesterday afternoon and finding my room in a 20 story University of Ottawa residence, I wandered down to the Royal Oak Pub for early registration, meeting several dozen BSD hackers from all over the world.
This morning's opening talk was well attended, filling up first with clusters of laptop users around the power outlets along both walls. By 15 minutes after the hour, the room was completely full, and Dan started with a humorous slideshow of example letters he's been receiving ever since posting the words "letter of invitation" somewhere on the BSDCan website two year back. Coming primarily from Nigeria, the letter's authors often claim to represent large groups of developers, yet always coming from "disposable" email addresses. After some laughs, he launched into his opening keynote.
"I'd like to send a small update on my progress on the Performance Tracker project," noted Erik Cederstrand on the FreeBSD -current mailing list. He continued, "I now have a small setup of a server and a slave chugging along, currently collecting data. I'm following CURRENT and collecting results from super-smack and unixbench." The project performs regular benchmarks of the FreeBSD -current source tree using Unixbench and Super Smack, allowing you to chart the results over time. Erik highlighted an example of a visible change in performance when the generic kernel moved from the 4BSD scheduler to the ULE scheduler on October 19th, 2007.
Kris Kennaway responded favorably, then noted, "one suggestion I have is that as more metrics are added it becomes important for an 'at a glance; overview of changes so we can monitor for performance improvements and regressions among many workloads." He went on to suggest, "at some point the ability to annotate the data will become important (e.g. 'We understand the cause of this, it was r1.123 of foo.c, which was corrected in r1.124. The developer responsible has been shot.")" Erik agreed with both recommendations, and noted that he would continue to work in that direction.
A recent thread on the FreeBSD -current mailing list discussed the stability of ZFS on FreeBSD. Scott Long noted that ZFS requires proper tuning to be stable:
"I guess what makes me mad about ZFS is that it's all-or-nothing; either it works, or it crashes. It doesn't automatically recognize limits and make adjustments or sacrifices when it reaches those limits, it just crashes. Wanting multiple gigabytes of RAM for caching in order to optimize performance is great, but crashing when it doesn't get those multiple gigabytes of RAM is not so great, and it leaves a bad taste in my mouth about ZFS in general."
ZFS was committed in April of 2007 by Pawel Dawidek who notes that he is using ZFS quite successfully on all of his systems. He then cautioned, "of course all this doesn't mean ZFS works great on FreeBSD. No. It is still an experimental feature." In response to some negative comments about ZFS on FreeBSD, Pawel noted, "in my opinion people are panicing in this thread much more than ZFS:) Let try to think how we can warn people clearly about proper tunning and what proper tunning actually means. I think we should advise increasing KVA_PAGES on i386 and not only vm.kmem_size. We could also warn that running ZFS on 32bit systems is not generally recommended."
Marcel Moolenaar has been very busy with GDB code as of late, having imported gdb version 6.1.1 in late June and now supplying a patch to freebsd-arch@ that adds kernel debugging and helpful features to FreeBSD's gdb and ddb code, including thread awareness. Other interesting additions include optimizations for the 64-bit platforms, compression for remote gdb, and improved symbol handling. Marcel is looking to commit the patch in a week, barring any major issues, but the code requires testing on most supported platforms:
"The patch applies to alpha, amd64, i386, ia64 and sparc64. amd64 is known to compile but I can't test this stuff yet due to lack of hardware."
Completion of this work will satisfy a required feature on the 5.3 Open Issues list. Read on for the diff and Marcel's full announcement.
FreeBSD core team member Scott Long posted the latest bi-monthly status report, covering FreeBSD development for May and June of 2004. Scott begins:
"May and June were yet again busy months; the Netperf project passed major milestones and can now be run with the debug.mpsafenet tunable turned on from sources in CVS. The ARM, MIPS, and PPC ports saw quite a bit of progress, as did several other SMPng and Netgraph projects. FreeBSD 5.3 is just around the corner, so don't hesitate to grab a snapshot and test the progress!"
The code freeze for FreeBSD 5.3 [story] is scheduled to begin on August 15. The current todo list can be found on the FreeBSD project website. Read on for the latest status report, covering a wide range of FreeBSD projects.
While FreeBSD -current is still moving toward more stable footing, many users have posted issues with panics and deadlocks in recent kernel builds. Bjoern A. Zeeb has kindly compiled a running list of lock order reversals, links to relevant threads, PRs, and existing patches. Lock order reversal messages are the result of FreeBSD's lock validation facility, witness(4), notifying the system of potential deadlocks as a means for developers to isolate bugs. Robert Watson explains in a Dec. 2003 thread:
"[...]Among other things, Witness performs run-time lock order
verification using a combination of hard coded lock orders, and run-time
detected lock orders, and generates console warnings when lock orders are
violated. The intent of this is to detect the potential for deadlocks due
to lock order violations; it's worth observing that Witness is actually
slightly conservative, and so it's possible to get false positives...."
Read on for relevant links.
Recent development patches committed to FreeBSD's HEAD (-current) and RELENG_5 branches as well as recent patches submitted to the -current, -arch, -threads, -hackers and -net lists need testing in a variety of system configurations. As many know, the quality of future FreeBSD releases is in no small part dependent on user feedback, and FreeBSD developers need feedback on their patches to ensure feature stability and reliability in core subsystems. At present, many of the most important changes in -current surround removing GIANT remnants from various subsystems to make them multi-processor safe, continued integration and improvement of fine-grained locking and ACPI/APIC code. Those users running -current may be interested in stress testing some recent code and reporting findings back to the respective mailing lists.
Jeff Roberson summarizes in his recent VFS patch:
"This patch removes Giant from the read(), write(), and fstat() syscalls,as well as page faults, and bufdone (io interrupts) when using FFS. It adds a considerable amount of locking to FFS and softupdates. You may also use non ffs filesystems concurrently, but they will be protected by Giant. If you are using quotas you should not yet run this patch. I have done some buildworlds, but any heavy filesystem activity would be appreciated."
Read on for a quick round-up of recent patches, the authors original posts and links to their respective threads. This is not a complete list, so please comment if we've missed any!
Scott Long recently submitted a list of "nice-to-have" projects for the next 12-months and future releases in general. Many comments and addendums are suggested by various developers and committers. Topics including the adoption of Dragonfly BSD's extensible bsdinstaller (among others) as a worthy sysinstall replacement, ongoing development of ReiserFS4 support for FreeBSD, interesting discussion concerning clustered filesystem support, comments on SANs and much more.
"Most of these tasks are
not trivial, but I hope that talking about them will encourage some
interest. [...]While this is just my personal list, I'd welcome
other additions to it (in the sense of significant projects, not just
individual PRs or bug fixes that one might be interested in)."
Sam Leffler has recently committed a major 80211 update to FreeBSD's -current tree, providing full support for Wi-Fi Protected Access (WPA), 80211i, and 802.1x among other new features such as the QoS WME/WMM protocols. Sam suggests the new code is stable and tested, but advises:"[...] I expect that wi cards operating in hostap with wep will need some fixup. If you encounter problems please post
here as I've been promised other folks will assist in handling issues." Sam will also be committing revisions to ifconfig to support the new 80211 code, as well as new dhclient code that understands 80211 layer events.
Apropos to these changes, Leffler has also updated the ath driver to take advantage of the new 80211 layer functionality, among other things. See the links section below.
Andre Oppermann has posted a patch which completely revises FreeBSD's TCP reassembly code, increasing efficiency and scalability of fragmented packet reassembly with impressive initial test results. Oppermann writes:
"I've totally rewritten the TCP reassembly function to be a lot more
efficient. In tests with normal bw*delay products and packet loss
plus severe reordering I've measured an improvment of at least 30% in
performance. For high and very high bw*delay product links the
performance improvement is most likely much higher."
Those with spare cycles may want to give it a try. Read on for the entirety of Oppermann's educational post and a link to the patch.
Andrew Doran posted some threading benchmark results to NetBSD's tech-kern mailing list, following up to some benchmarks he'd posted earlier. The results compared NetBSD -current with FreeBSD -current, and the Linux 2.6.21 kernel. Kris Kennaway was surprised by the results, and ran his own benchmarks with minimal configuration changes, summarizing, "this measurement shows that FreeBSD is performing 70-80% better than NetBSD in this 4 CPU configuration. This is in contrast to Andrew's findings which seem to show NetBSD performing 10% better than FreeBSD on a 4 CPU system (a very old one though)." He added, "the drop-off above 8 threads on FreeBSD is due to non-scalability of mysql itself. i.e. it comes from pthread mutex contention in userland."
Kris ran additional benchmarks with PostgreSQL instead of MySQL, showing much improved scalability above 8 threads, "postgresql is much more scalable than mysql on this workload and doesn't have silly scaling bottlenecks inside the application (cf the tail of the FreeBSD curve for mysql which is where pthread mutex contention kicked in)." He continued his testing, and found that on older 4CPU P3 hardware NetBSD did outperform FreeBSD, "but only by 3-4% (in particular I am not seeing the ~10% difference that Andrew observes on his 4*p3 700MHz). Given the age of the hardware and the fact that I am not seeing it on other workloads or on modern hardware it might just be due to a small scheduling difference on this configuration."
"Congratulations to the successful students and their FreeBSD Project mentors for participating in another productive Google Summer of Code," Murray Stokely noted on the -announce FreeBSD mailing list. He offered an interesting summary of all of this year's student projects, adding:
"This program encourages students to contribute to an open source project over the summer break with generous funding from Google. We have had a total of over 50 successful students working on FreeBSD as part of this program in 2005, 2006, and 2007. These student projects included security research, improved installation tools, filesystems work, new utilities, and more. Many of the students have continued working on their FreeBSD projects even after the official close of the program. We have gained many new FreeBSD committers from previous summer of code projects already, and more are in the process."
Since the decision to demote ULE [story] in favor of the 4BSD scheduler as the default for FreeBSD's 5.3-Release, many improvements to both schedulers have been committed. At the time it was marked broken, ULE was especially needy in light of the status of its maintainership, performance issues, and its unreliable nature in conjunction with threading and kernel preemption. Having resolved these problems, Jeff Roberson announces to -current that the ULE code is now in working order:
"ULE works again with preemption and kse and so on. As far as I know, the only two problems with ULE currently are these:[...]"
Note that Jeff follows up his initial post with a correction (there are three current problems), and later, that he's fixed the second listed issue. Also discussed are plans to MFC this work to the RELENG_5 stable branch and future direction of ULE and 4BSD scheduler development.
Early last week, Scott Long publicly requested heavy testing of the latest revisions to -current code, especially the new vnode layer [story], net80211 [story], and wireless lan drivers. Now, with the revival of the ULE scheduler [story], stress testing is likely even more urgent. Scott explains:
"There has been an incredible amount of work going into 6-CURRENT in the last 5 weeks. With the holidays quickly approaching, now seems like a good time to step back and start really testing the system to shake out the bugs and catch the developers before they disappear for the holidays."
Read Scott's full message below.
Jeff Roberson, the primary developer of FreeBSD's revived ULE scheduler [story], has committed a new a python/tkinter tool, schedgraph, to the freebsd -current tree. Schedgraph will assist with scheduler testing and refinement as well as help developers study application load and corresponding system behavior. In the initial commit message, Jeff gives a simple description:
"Schedgraph takes input from files produces by ktrdump -ct when KTR_SCHED is compiled into the kernel. The output represents the states of each thread with colored line segments as well as colored points for non-state scheduler events. Each line segment and point is clickable to obtain extra detail."
Jeff includes a screenshot and sample data. Robert Watson follows up with a link to pointers on getting KTR working. Jeff has also been very busy committing more fixes to the ULE scheduler, a couple of which solve long standing bugs. Read on for details.
FreeBSD core team member Scott Long posted the latest project status report, covering the last 6 months of development. Scott begins:
"The FreeBSD status report is back again after another small break. The second half of 2004 was incredibly busy; FreeBSD 5.3 was released [story], the 6-CURRENT development branch started [story], and EuroBSDCon 2004 was a huge success, just to name a few events. This report is packed with an impressive 44 submissions, the most of any report ever!"
Read on for the latest status report, conveniently divided into several sections discussing various FreeBSD pojects, documentation changes, kernel changes, new and updated architectures, the status of various ports, vendor or 3rd party software news, and other miscellaneous news.
Scott Long posted the latest bi-monthly status report, covering FreeBSD development for the first three months of 2005. Scott begins:
"The first quarter of 2005 has been extremely active in both FreeBSD-CURRENT and -STABLE. With FreeBSD 5.4 in the final RC stage and an anticipated branch of FreeBSD-6 this summer we have seen a lot of performance improvements in 5 and a couple of exciting new features in 6."
FreeBSD 5.4 [forum] is expected to be released by the end of this month, focusing primarily on minor feature and performance improvements. As for 6.0 [forum], the status report explains, "in contrast to FreeBSD 5.0, the goal is to take a more incremental approach to major changes, and not wait for years to get as many features in as possible. FreeBSD 6.0 will largely be an evolutionary change from the 5.x series, with the largest changes centered around multi-threading and streamlining the filesystem and device layers."
Colin Percival, a FreeBSD committer and security team member, has found a local exploit against the current implementation of Intel's Hyper-Threading Technology. "Hyper-Threading, as currently implemented on Intel Pentium Extreme Edition, Pentium 4, Mobile Pentium 4, and Xeon processors, suffers from a serious security flaw," Colin explains. "This flaw permits local information disclosure, including allowing an unprivileged user to steal an RSA private key being used on the same machine. Administrators of multi-user systems are strongly advised to take action to disable Hyper-Threading immediately."
Colin will present the details behind the attack at BSDCan 2005 at 10:00 AM EDT on May 13'th. "At the conclusion of my talk I will also be releasing a paper describing the attack and possible mitigation strategies," Colin explains. The flaw affects all operating systems, and for a secure multi-user environment essentially requires that Hyper-Threading be disabled. More information can be found on Colin's web page on the topic. The formentioned paper can be downloaded here in pdf format.
Earlier this month, Alexander Leidinger unveiled a set of FreeBSD kernel source documentation generated with the help of Doxygen. The Doxygen documentation system extracts code structure from source files producing file/directory indexes, struct reference and function descriptions, include dependency graphs, function call graphs, and other useful information. While the output is not entirely perfect, this can be a very helpful visual aid to browsing through source code. Among other notable FreeBSD subsystems, Alexander has generated html linked PDF documentation for GEOM, crypto, virtual memory, netgraph, crypto, cam and net80211. Read on for the original announcement and additional comments from Poul-Henning Kamp.
Robert Watson of FreeBSD core has provided an up-to-date list of remaining non-MPSAFE system calls present in FreeBSD. Asynchronous I/O, extended attributes, and mount related calls appear to need some looking over. This review is a necessary step for the improvement and merging of new code for FreeBSD's POSIX.1e compliant auditing code; work that will eventually allow comprehensive real-time system event detection and monitoring features. In reference to the required work to get these straggling system calls multi-processor safe, Robert offers:
"There's probably quite a bit of low-hanging fruit in the compatibility ABI system call tables. In particular, quite a few calls can probably be marked MPSAFE on the basis that they call MPSAFE code and do little or no work. In many cases we are probably unnecessarily acquiring or recursing Giant as things stand. In other places, more work will be required, where the compat code actually implements substantial services or calls, such as additional file system system calls not present in the FreeBSD interface."
Read below for specifics and the surprisingly short list of remaing non-MPSAFE calls.
Colin Percival continues the discussion regarding the shared-cache vulnerability inherent in multi-core processors [story], offering potential mitigation techniques in the form of fixes to the FreeBSD schedulers. Based on Percival's original discovery, information leakage between threads which share a processor core and the subsequent opportunity to monitor memory access patterns can be prevented by eliminating the co-scheduling of threads that have differing privileges. Additionally, Percival advises that a currently scheduled in-kernel thread should be capable of telling its siblings (who would likely run with the same privileges) to sleep in cases when it is handling sensitive data in a "non-oblivious" manner - IPSec being a good example of this. This would further secure sensitive data from monitoring. For these two solutions, he suggests the use of p_candebug(9) for first and an as yet unimplemented IPI (Interprocessor Interrupt) mechanism for the second: