» tagged pages
» logout

sorted by: recent | see : popular
Content Tagged with 9 + plan

srv² - next generation service registry

The srv(3) device seems like the ugly stepchild within the Plan 9 system. In a system focused on distributed resources, it is a node-specific service registry. Within Plan 9, its a node-global resource and doesn't allow for multiple instances which can be placed in different dynamic private name spaces. Device resources are treated separately, having separate mount shortcuts instead of being integrated into the srv(3) registry. In common practice users must rely on static configuration files to access network resources and post them in the local srv(3) instance. This static nature relies on alternate layers to provide reliability and scalability for resource access.

In order to explore potential solutions to these issues and inconsistencies I've been thinking through some ideas for extending the srv(3) concept to be more useful in a distributed environment and allow better integration with core Plan 9 principles. In all cases I'll likely implement it as a synthetic user space file server which will use the normal srv(3) as a back-end. Add a concept of scope to the service registry -- right now there is a single level of scope on Plan 9 (node-level). Inferno has multiple srv instances via attach arguments, but nothing that really amounts to scope.

The idea is to allow several automatic levels of scope as well as user-defined scopes:
      1. Program Group Level
      2. User level
      3. User group level
      4. Node Level
      5. Tree Pset
      6. Immediate Neighbors (in Torus)
      7. Cluster level
      8. Social Network (Community Level)
      9. etc. (you get the idea)
The idea of automatically propagating/discovering neighbor services and cluster services is linked to ideas resulting from our port to Blue Gene as part of the FastOS work -- but also is related to previous ideas I've had related to collaboration services.
When one registers services at scopes larger than node-level, the local srv² communicates with appropriate neighbors to make sure their registry for that scope is updated. In this way, services are immediately available in the other registries. I'd like to explore both the push and pull methods for this update, a lazy pull/poll may be more appropriate than a push. One thought was to support a plug-in style architecture for this allowing control of different scopes to be maintained by different applications. This could allow for different membership/authentication schemes as well as different discovery. In such an approach srv² would actually only be responsible for overall management of the hierarchy.

In order to keep things tidy, there are some additional nit-picks with srv(3) which must be fixed. In order to organize the various scopes and the larger potential set of services I'd like to make the srv² registry a deep hierarchy. This may just be used for scope or may be used for additional levels of organization. I'm going to start simple with the top layer identifying scope and the second level being the services associated with that scope. However, I'm going to structure the implementation to allow nested scopes and organizational constructs.

Another nit-pick I've had with srv(3) is the lack of the ability to use group permissions. One of the scope concepts is to allow group scopes, so in order to support this we'll need to incorporate group management into the srv² infrastructure. There is an unfortunate amount of complexity here. Groups should have scopes as well, allowing node local groups as well as multiple levels of cluster groups. It would be nice to allow user defined groups instead of just system defined groups -- and this sounds more like ACL's rather than traditional UNIX groups. I'm not confident on a path to pursue and would welcome suggestions.

The requirements of our FastOS research include reliability and scalability. For cluster based services it would be nice to integrate support for load-balancing and fail-over directly into the srv² registry. If we use srv² in a similar way to how we use srv(3) then we'd only be able to load-balance or fail-over at mount time (at least without adding an additional layer). An alternative would be to provide an auto-mount feature which would attach to the service on crossing the mount point and automatically load-balance or fail-over the service as necessary. There is a large degree of complexity here, and an inherent danger in doing anything automatically. Still, for a certain class of service this style of resource and application, this form of access may be preferable.

A final area I'd like to explore is discovery. ndb/local has always bothered me as being far to static. Inferno had the concept of local discovery via the virgild which would allow discovery on the local network. With cluster functionality built into srv² we just need a way for the various srv² instances to discover each other. Locally we could use a mechanism like virgild or utilize zeroconf. For broader discovery we could use central registries, perhaps utilizing Chord concepts to make it a bit more scalable. Within our FastOS work we'd also like to explore hierarchical or toroidal organizations, scope, and discovery for our service registries.

This will be ongoing work that I plan on implementing and exploring during the first quarter of 2009. I'll update this blog with my experiences, and hopefully compile a paper towards mid-year with the results of the experiment.

v9fs: Grave Robber's from Outerspace

Is OP just CRUD? Give it a REST.

Background

 

Octopus Protocol
(Non-technical observation is that we need to get better at filming/editing our presentations, our production values suck!)

Observation

So - what is being done in the Octopus Protocol is very similar to the current approach being taken by many web developers.  The primary difference between OP and the HTTP approach is that the OP Put method provides additional metadata.  You get a bit of a tradeoff in that 9P provides an authenticated persistent session, but this may make caching and other intermediary operations more difficult.

Discussion

Protocol differences aside, there are some interesting opportunities here -- for one, why has anyone tried to write web applications using a synthetic file server as the "active" backing?  I alluded to this a bit earlier in my Service-Oriented File Systems Post, but understanding more about how the web applications use REST I don't see any reason why an httpd server couldn't be constructed which interpreted HTTP PUT operations against a path as write operations against a file which are followed by a read of that file to get the result (error) to send back to the user.   I'm not sure if it would make sense to incorporate the service documents as something generated by httpd or by the synthetic file system itself..probably better done in the synthetic file system so you could specify types of data as appropriate.

Another interesting aspect is how Google App Engine works with these concepts, essentially you have a template file which maps URI paths to application handlers.  This essentially is like a namespace specification, but it might also be useful as a method to compose synthetic file systems out of simple pieces where each component knows how to read/write (maybe just via stdin/stdout) and more advanced components could handle things like dynamic directory structures.

Finally, you could think of a new multi-faced model for UI development which in some ways is similar to the O/mero approach.  In one "view" of the namespace you create directories and files representing the various UI elements -- included in their descriptions are attributes defining their characteristics, types, and position (most likely hierarchical and not absolute) -- another "view" of the same namespace would export html (and javascript?) elements comprising the form which satisfies the specification made in the other view (this could then be skinned with CSS to make it pretty), and the final view of the namespace is an active synthetic file system which would accept the file reads/writes from the HTTP put operations against particular elements of the form.  What you end up with is essentially a shell/namespace/filesystem SDK for building web-apps.

v9fs: Grave Robber's from Outerspace

Venti and Linux

I've been playing around with the plan9port's version of Venti in a bit more depth lately and thinking about Linux-specific optimization opportunities. I've got more thoughts than I'm prepared to write down right now, but I figured I should start somewhere.

First, there seem to be a number of shortcomings for large systems. We've actually just installed a new 36TB storage system here at work with 3 file server nodes connected to it with 64GB, 64GB, and 12GB of DRAM respectively. One of the first things I ran into is Venti seems to behave poorly if you try to give it too big of a mem, icmem or bcmem (i was going for 4GB icmem and 2GB bcmem and mem) -- I'm fairly certain something is failing silently, but haven't had a chance to track it down. Another potential shortcoming on these 16-way, 16-way, and 4-way systems is that venti appears to be single threaded at its core so most of the processing power of these systems is idle while performance suffers.

Most of my current experimentation revolves around looking at using venti to backup and back block devices. vbackup does a reasonable job of this, but the fact that it scans the entire volume is a little problematic (my 80 gb home directory takes 45 minutes to go through even if I haven't modified anything). I'm currently looking at using lvm2/device-mapper to take snapshorts to make backups more coherent and to track changes since the last venti so I only have to have venti operate on changed blocks. This should allow much tighter granularity on dumps than nightly.

Of course, there is little need to keep 5-minute, 15 minute, or even hour level granularity snapshorts forever. I've started to think through a hierarchical venti approach which used a local 'transient' venti which would use a recycled arena for the transient snapshots and then using venti/copy to send coarser granularity snapshots to the next level of the hierarchy (which could be another cache server or be the central venti server).

Besides using device mapper for helping to take snapshots -- it could also be used to create caches and or cow devices on top of a venti score -- assuming I can get something to look like a disk or a disk image that actually serves a vbackup score as a disk.

More later...there's a bunch of stuff I need to play with to understand how the various pieces I'm thinking of interact.

v9fs: Grave Robber's from Outerspace

Plan 9 Authentication in Linux paper available

Ashwin's paper on implementing Plan 9 authentication and capability device for Linux is now available for free from the ACM Archives along with the rest of the special issue on the Linux kernel that I helped co-edit:

http://portal.acm.org/toc.cfm?id=1400097

v9fs: Grave Robber's from Outerspace

Service Oriented File Systems

Its fun to abstract and think about web2.0 concepts in terms of Plan 9 concepts. Abstract away the gorp that makes up their implementations and think about some of the fundamental concepts. When interfaces are files, mashups are binds.

The larger picture here is about keeping this simple, language independent, and client independent. I'd like an infrastructure that I could build simple web services out of that can be viewed with a browser, or with a rich-client (like acme plug-ins). I'd like the front-end of these services to be defined in the same way the Octopus approach defines normal GUI's (except from a web-browser, the widgets are implemented by Java script). I'd like the back-end of these to be akin to big table and the chubby lock service, with a back-back-end that is simply Venti versus a database or XML.

I'll admit I'm probably naive about all of this, not having really worked on client-facing applications for some time, but I think it would be a fun thought exercise to go through and perhaps a nice refresher to actually try and implement. While part of the goal is to keep things language independent, I'll probably do things under Inferno to keep them portable for me and open up the potential for interaction with the Octopus crowd.

Of course, this all falls under the copious spare time which I have a definite deficit of -- however, I think dedicating one day a week is doable and I should be able to combine this with my constant desire to get back and look at collaborative facilities (aka warren) under Plan 9 and Inferno.

v9fs: Grave Robber's from Outerspace

Blue Gene Project Pages

I've setup an more official website to put publications and presentations relating to the DOE sponsored Plan 9 on Blue Gene work. You can get to it here and I'll be updating it over the next few days.

v9fs: Grave Robber's from Outerspace

Torus and tree networks working on Plan 9 BG/L port

We finally worked out the remaining infrastructure issues and are now able to cpu(1) to both I/O nodes (over the Ethernet) and cpu nodes (over the tree network). The cpu nodes are also able to talk to each other over the torus network.

We will be attempting a live demo during the USENIX poster session, so drop by and play with Plan 9 running on a Blue Gene.

v9fs: Grave Robber's from Outerspace

Service Oriented Synthetic File Systems

I was doing some thinking about how larger synthetic file systems are structured and implemented under Plan 9 (or using 9P in general).

My toy-use-case for thinking about it was developing a synthetic-file-server based service similar to SourceForge -- where you have many different projects, each with different component services (bug-tracking, version-control, blogs, wiki, forums, etc.)


You could do it by running lots of different servers (per project) and binding them into a name space and then exporting, but that seems kinda awful (particularly if you have dozens of projects you are using the same tool to track). Much better to have a single server running which is able to accommodate multiple projects, each with multiple file services underneath.

Obsession with qid-spaces ends up being distracting, with the way fids work, we shouldn't need to segment the space so rigidly, it seems to me that qids only really need to be unique per directory, outside of that you can track which "module" of the hierarchy you are in by saving some state in the server-side fid tracking structure as you traverse the hierarchy.

dot-dot makes things a little more complicated, but the solution used inside the kernel can be used within file servers as well (namely, keep the whole path in the service-side fid tracking structure).

What's really nice is you can develop plug in file server modules which allow you to compose much more complex single-server-services out of many different file system components -- without worrying about how the qid space is split up and managed.


v9fs: Grave Robber's from Outerspace

GSoC 2008 Ideas

My Plan 9/Inferno summer of code ideas (otherwise known as projects I wish I had more time for) - projects should take no longer than 5 weeks to complete (including ramp up, debug, and packaging) -- conservatively projects should be completable in 3 weeks by experienced P9/Inferno developers and 2 weeks by folks unfamiliar with p9/inferno development environments.
  • mount.9P helper program for v9fs including packaging for debian and fedora -
    • hueristic for determining transport type
    • DNS support for resolving hostname <->ip address
    • ssh-based tunneling support (will require server work as well)
    • man pages
    • debian packaging
    • rpm packaging
    • (stretch) authentication support (w/p9p)
    • (stretch) authentication support (w/o p9p)
    • (stretch) integration with idmap solution
  • since this is relatively straightforward it will really need to be "super" version including support for ssh tunneling, authentication, etc.
  • idmap solution for v9fs - maps local uids/gids/error-codes to strings
    • userspace daemon to provide mapping
    • hooks in v9fs to use mapping
    • (stretch) synthetic file server approach to update mappings
  • OLPC Inferno Environment
    • fontfs - on-demand ttf solution
    • metafs - abstract metadata from underlying file system
    • (stretch) Integrate OLPC translation solution for Inferno
    • (stretch) OLPC oriented GUI toolkit, window manager, and toolbar
    • (stretch) Inferno approach to collaboration using file systems
  • Source Control Management based on Venti backend
    • single branch version tracking with associated log file
      • add new files/directories
      • commit changes
      • checkout specific version #
    • (stretch) support multiple branches
    • (stretch) support repository sync (push/pull)
    • (stretch) support three-way merge
  • wrapper for p9p vbackup to make it more user friendly
    • wrapper which tracks venti scores based on volume being backed up
    • GUI admin tool
      • which sets up venti in partition(s) or with files
      • which assists in configuring backup intervals and volumes
    • (stretch) time-traveler like GUI for navigating/searching backups
    • (stretch) support for pruning and/or merging snapshots
  • GSoCFS for managing future community involvement with GSoC
    • synethtic file system and web interface
    • posting project ideas
    • voting for projects
    • registering student interest in projects
    • project milestone tracking
    • project blogs and wikis
    • post-summer project success metrics (subjective and objective)
    • (stretch) syndication points for community monitoring
    • (stretch) potential integration with SCM
    • (stretch) potential integration with some form of chat
    • (stretch) potential integration with name space sharing
(more to come...)

v9fs: Grave Robber's from Outerspace

CPU working on Blue Gene

It took quite a bit of fussing about to gateway between the IBM internal Plan 9 cluster and the Blue Gene VLAN, but with the help of Forsyth's new Ethernet driver for BG/l, Inferno as an intermediary on the front-end node, and a few other bits and pieces... we were able to cpu(1) into Blue Gene I/O nodes. The CRN-Tree network is pushing packets back and forth and the Torus is being debugged. Overall its been a very productive 80-hour week.

v9fs: Grave Robber's from Outerspace

More Plan 9 on Blue Gene

For folks who can't read the memory dump in my last post I've got a rdbfs(4) like interface now that gives me access to the BG/l machine state and memory and includes a more readable console interface:

criswell% con -r /usr/ericvh/bgd/0/con0

Plan 9 bgl
cpu0: 0x5202

BG/l Personality
Block: R000-N60_32_4
Memory Size: 536870912 bytes
clockHz: 700000000
Torus Addr: -1 -1 -1
Tree Hops to Top: 255
EMAC h/w Address: 0:d:60:e9:10:9f
Assigned IP Addr: ac186439

Kernel Status Cpus 0 & 1: 3 3
512M memory: 160M kernel data, 352M user, 977M swap
boot...
--rw-rw-r-- c 0 24 Apr 11 2007 /dev/bintime
--rw-rw---- c 0 0 Apr 11 2007 /dev/cons
---w--w---- c 0 0 Apr 11 2007 /dev/consctl
--r--r--r-- c 0 72 Apr 11 2007 /dev/cputime
--r--r--r-- c 0 0 Apr 11 2007 /dev/drivers
--rw-rw-r-- c 0 48 Apr 11 2007 /dev/hostdomain
--rw-rw-r-- c 0 0 Apr 11 2007 /dev/hostowner
--r--r----- c 0 0 Apr 11 2007 /dev/kmesg
-lr--r----- c 0 0 Apr 11 2007 /dev/kprint
--rw-rw-rw- c 0 0 Apr 11 2007 /dev/null
--r--r--r-- c 0 0 Apr 11 2007 /dev/osversion
--r--r--r-- c 0 12 Apr 11 2007 /dev/pgrpid
--r--r--r-- c 0 12 Apr 11 2007 /dev/pid
--r--r--r-- c 0 12 Apr 11 2007 /dev/ppid
--r--r--r-- c 0 0 Apr 11 2007 /dev/random
--rw-rw-r-- c 0 0 Apr 11 2007 /dev/reboot
--rw-rw-r-- c 0 0 Apr 11 2007 /dev/swap
--rw-rw-r-- c 0 0 Apr 11 2007 /dev/sysname
--rw-rw-rw- c 0 0 Apr 11 2007 /dev/sysstat
--rw-rw-r-- c 0 78 Apr 11 2007 /dev/time
--rw-rw-rw- c 0 0 Apr 11 2007 /dev/user
--r--r--r-- c 0 0 Apr 11 2007 /dev/zero
/boot/bind
/boot/boot
/boot/echo
/boot/ls
/boot/ps
/boot/rc
/boot/rcmain
/boot/sleep
Hello Squidboy
sleep 10
hi hi hi hi hihihihihihihihihihihhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
%

(Picture photoshopping credit: Andrey)

v9fs: Grave Robber's from Outerspace

Virtual Systems

Bit the bullet and got Plan 9 running under Xen on my "victim" laptop. It can now run Plan9 natively, under Xen, or under Qemu. I still am toying with the idea of a bridged network mode where I can mount an Inferno devip from a loopback cable to my windows laptop in case I can't find hard-lines to plug into at Watson. This also means my victim laptop is squared away to be a testbed for x86/Libra -- I might even get to a basic "shot-in-the-dark" (ie. no Xen devices, just devshm) configuration sometime during the trip.

v9fs: Grave Robber's from Outerspace

Collab

In lamenting not really knowing the various states of a distributed team working on Plan 9 related projects, I was thinking through expedient mechanisms for loosely keeping in touch. There are a bunch of tools that seem to help with distributed teams: phone calls, IM, IRC, VNC, Peer-to-peer and centralized file sharing, SCM, etc. I was thinking of architecting a simple suite of tools to provide basic fuctionality within a Plan 9 paradigm.

The cornerstone would be a modified version of faces, which checked against a centralized file system (or maybe just a centralized place with file systems bound from several different servers). People currently active ('logged into') within the collaboration framework would have their face appear. Mouse-over the face gives a quick bit on their current status (i.e. working on /sys/src/collab.c), right click pops up a menu of different options (chat, blog, view, share, mount, etc.) -- each of these would plumb an action to an appropriate tool (either in ACME or raw).
  • chat - simple point-to-point IM facility, I'm toying with the idea of walkie-talkie style audio chat as well.
  • blog - pops up the person's "blog" which really is just news(1), but personalized for that person, maybe work out something more complicated with wikifs later, but news(1) will do for now
  • view - If the person has shared a window with you, view will plumb it to a viewer -- I haven't quite decided how to handle different types of views, could just do VNC, but I'd rather have the ability to share windows -- or really what I want is the ability to share ACME windows and keep the sides in sync (like a distributed Zerox) -- ultimately, there's all sorts of granularity to sharing, but for now it'll be a single thing (kinda works like snarf) and it'll have to be integrated into ACME and other tools which use it.
  • share - really mostly described above -- this basically grants the person access to your current share
  • mount - pops up a window with the person's public namespace mounted -- this could just be their /usr directory, or it could be something more. Haven't quite decided how to handle this completely.
There will be a variety of command-line tools that allow you to post shares, export portions of your namespace into your public namespace, and allow you to easily update your blog or status.

I think the simple approach works for 1-to-1 communication, not sure how to handle 1-to-many or many-to-many without things getting complicated. There's also various ways you can build on this to add additional context (like maybe per-project collaboration spaces). They key thing is that everything works through file system mounts.

A big missing item is SCM, I have been toying with some ideas around a git-like mechanism with a venti back-end. For the most part this isn't an immediate requirement though, centralized file repository backed by venti is fine -- although it would be nice to have a formula or script for setting up replicas of such centrailized repositories.

v9fs: Grave Robber's from Outerspace