The
srv(3) device seems like the ugly stepchild within the
Plan 9 system. In a system focused on distributed resources, it is a node-specific service registry. Within Plan 9, its a node-global resource and doesn't allow for multiple instances which can be placed in different dynamic private name spaces. Device resources are treated separately, having separate mount shortcuts instead of being integrated into the
srv(3) registry. In common practice users must rely on static configuration files to access network resources and post them in the local
srv(3) instance. This static nature relies on alternate layers to provide reliability and scalability for resource access.
In order to explore potential solutions to these issues and inconsistencies I've been thinking through some ideas for extending the
srv(3) concept to be more useful in a distributed environment and allow better integration with core Plan 9 principles. In all cases I'll likely implement it as a synthetic user space file server which will use the normal
srv(3) as a back-end. Add a concept of scope to the service registry -- right now there is a single level of scope on Plan 9 (node-level).
Inferno has multiple
srv instances via attach arguments, but nothing that really amounts to scope.
The idea is to allow several automatic levels of scope as well as user-defined scopes:
- Program Group Level
- User level
- User group level
- Node Level
- Tree Pset
- Immediate Neighbors (in Torus)
- Cluster level
- Social Network (Community Level)
- etc. (you get the idea)
The idea of automatically propagating/discovering neighbor services and cluster services is linked to ideas resulting from our port to Blue Gene as part of the
FastOS work -- but also is related to previous ideas I've had related to
collaboration services.
When one registers services at scopes larger than node-level, the local srv² communicates with appropriate neighbors to make sure their registry for that scope is updated. In this way, services are immediately available in the other registries. I'd like to explore both the push and pull methods for this update, a lazy pull/poll may be more appropriate than a push. One thought was to support a plug-in style architecture for this allowing control of different scopes to be maintained by different applications. This could allow for different membership/authentication schemes as well as different discovery. In such an approach srv² would actually only be responsible for overall management of the hierarchy.
In order to keep things tidy, there are some additional nit-picks with srv(3) which must be fixed. In order to organize the various scopes and the larger potential set of services I'd like to make the srv² registry a deep hierarchy. This may just be used for scope or may be used for additional levels of organization. I'm going to start simple with the top layer identifying scope and the second level being the services associated with that scope. However, I'm going to structure the implementation to allow nested scopes and organizational constructs.
Another nit-pick I've had with srv(3) is the lack of the ability to use group permissions. One of the scope concepts is to allow group scopes, so in order to support this we'll need to incorporate group management into the srv² infrastructure. There is an unfortunate amount of complexity here. Groups should have scopes as well, allowing node local groups as well as multiple levels of cluster groups. It would be nice to allow user defined groups instead of just system defined groups -- and this sounds more like ACL's rather than traditional UNIX groups. I'm not confident on a path to pursue and would welcome suggestions.
The requirements of our FastOS research include reliability and scalability. For cluster based services it would be nice to integrate support for load-balancing and fail-over directly into the srv² registry. If we use srv² in a similar way to how we use srv(3) then we'd only be able to load-balance or fail-over at mount time (at least without adding an additional layer). An alternative would be to provide an auto-mount feature which would attach to the service on crossing the mount point and automatically load-balance or fail-over the service as necessary. There is a large degree of complexity here, and an inherent danger in doing anything automatically. Still, for a certain class of service this style of resource and application, this form of access may be preferable.
A final area I'd like to explore is discovery.
ndb/local has always bothered me as being far to static. Inferno had the concept of local discovery via the
virgild which would allow discovery on the local network. With cluster functionality built into srv² we just need a way for the various srv² instances to discover each other. Locally we could use a mechanism like virgild or utilize
zeroconf. For broader discovery we could use central registries, perhaps utilizing
Chord concepts to make it a bit more scalable. Within our FastOS work we'd also like to explore hierarchical or toroidal organizations, scope, and discovery for our service registries.
This will be ongoing work that I plan on implementing and exploring during the first quarter of 2009. I'll update this blog with my experiences, and hopefully compile a paper towards mid-year with the results of the experiment.