The Big Green Button: Turning Plone into a dynamic site factory
Brandon Craig Rhodes has done the impossible, and made explaining the Zope Component Architecture an entertaining experience. I sat with Brandon and Mark-Ramm Christensen at lunch, and they were chatting about Shakespearean play and actors, and Brandon's appreciation of acting has translated well into giving technical presentations :)
Delivering egg-based applications with zc.buildout, using a distributed model
As primarily a Zope/Plone/Grok developer, I thought I'd share some of my experiences in working with the python packaging, dependencies and builds aspects. From my perspective, packaging and deploying Python applications are one of the areas of Python that could use the most improvement - there is a tonne of good and interesting work out there already, but the approachability of this ecosystem can be fairly challenging. I've attempted to teach new Python developers about project and dependency management, but there are a lot of things to learn and a lack of good beginner documentation, so at some point after rattling on about setuptools, virtualenv, buildout, easy_install, PyPI and eggs they get glazed over look and say, "symlinks and shell scripts work for me" too which I answer, "yes, for your requirements, I think you're right." Even many very talented, experienced developers have made mistaken assumptions about this ecosystem at some point, there are threads on the distutils-sig where people get stuck or confused on what should be fairly straightforward things. note: any mistakes in the following text is mearly proof that I'm a "very talented, experienced developer" :)
Everyone knows what this is (hopefully!). A small point to note is that it was originally called The Cheeseshop, there was a long thread started by people who didn't like the name, and it was renamed to PyPI and use of the name Cheeseshop is politely discouraged. While there is only one PyPI, tools that pull packages from here (easy_install and buildout) can easily pull from additional package repositories. These can be mirrors, local caches, private repos, or whatever. Creating your own repo is as easy as dropping a Python packaged .tar.gz file into a directory and exposing it using Apache's default directory views feature. There are also cool projects recently started such as z3c.pypimirror which lets you create a local PyPI mirror, but rather than mirror all 4+ GB of PyPI, it will only mirror packages , very cool and probably something that would be quite neat in a shared hosting environment. Django really needs to update their PyPI page - it hasn't been updated since 0.91! And it's not classified under the Frameworks::Django trove either:
As far as "flooding" PyPI with packages, Zope2+Zope3+Plone has 1,236 packages on PyPI right now. There have been one or two grumbles about this (it would seem people will grumble about just about anything), but I'm pretty sure the consensus is if the number of packages becomes unwieldly, the solution is to improve PyPI itself and not "start more PyPIs". The only valid grumbles about heavy PyPI usage are: packages with really sparse classification and documentation, releasing -dev packages onto PyPI (don't do this!), and especially don't remove a older packages Plone add-ons were released to the Plone Software Center before PyPI existed. Tarek Ziade has done awesome work at making it possible to seamlessly release Plone add-ons to both plone.org and PyPI. This way a package can have additional Plone-specific metadata, but still have a PyPI presence without creating a bunch of extra maintenance work for the add-on developer.
VirtualEnv is fairly new, but it's become fairly popular fairly quickly, since it's a very handy tool. It's basically a "Python symlink utility" for cloning an existing Python installation so that you can easily install packages into a different location than the site-packages directory of your Python installation. With the --no- site-packages switch you can also use a Python install even if you've already hosed it with a million packages :P Often people will use one virtualenv per project, installing packages into here is called your "project-global" space. VirtualEnv is awesome for "trying out" a collection of packages, especially higher level packages that depend upon a number of other packages and you don't want to pull a big gnarly mess into your site- packages - as anyone who's every had the unfortunate experience of doing a "sudo easy_install Plone" or "sudo easy_install grok" has experienced.
The original Python module installation tool. Typically invoked with "python setup.py install", which installs packages into Python installation global location (site-packages). Great for beginners since they don't have to worry about PYTONPATHs, eggs or any of that stuff but eventually you can wind up with a huuuuge ball of files in your site-packages directory. Distutils is so old that the documentation still reads, "Distributing Python Modules" and not "Distributing Python Packages". Note that distutils scope was for installing Python modules and packages - it's not a very good tool for managing the installation of a complete web app! So while this venerable tool is a Python standard, please don't ask it to do too much. It's very annoying when you run "python setup.py install" and the package does more than just install itself, but also tries to act as an "application installer" puts files in unexpected places on your OS.
A packaging format for Python, that requires the setuptools package installed to use.
It's a common misconception that if you are using eggs, you have to use easy_install. Breaking setuptools into it's a core pkg_resources library and a separate easy_install tool is something that people will likely use the Python time machine for at some point :) The egg format is an extension of the distutils format, changes between the two formats is documented here:
install_requires = ['django >= 1.0']
If the Django devs later create a concrete road map that makes explicit your package will break in Django 1.1 you can even write:
install_requires = ['django >= 1.0 < 1.1']
This is useful, because if you're using a tool such as buildout, it will use this information to fetch from PyPI the newest release that doesn't break your package A bit of a red-herring field is the 'zip_safe' field. This lets you specify if a package will still work while zipped. However, all packages work when unzipped! zipped packages can be pretty annoying to deal with, and the resources saved by zipping packages is pretty minimal and not a concern to most people. Eggs is also not yet an "official Python standard". This isn't entirely bad, since it's easier right now to lobby to extend this format somewhat (e.g. install_recommends has been suggested in addition to install_requires).
-- Tim Peters Zope 3 started life as a single source tree, that looked something like this:
zope/ __init__.py testing/ interface/ app/
As the project grew, some people just wanted to be able to use a few core libraries and not have to install the whole damn kitchen sink. If you are distributiong zope.testing and zope.interface as separate packages so that you have:
/opt /zope /testing
/opt2 /zope /interface
There is no way to unify these two packages with symlinks or PYTHONPATHs so that you can import from both. Namespaces simply solves this problem. Culturally, namespaces are also used to reduce the chance of namespace collisions. In the Zope world there are:
zope.app.* : Zope uses "app" as a nested namespace for packages which are concerned with application code (e.g. Views n' URLs n' Models). z3c.* : Short for the "Zope 3 Community", and typically these packages are all managed in the main Zope svn repo.
plone.* : Namespace for code written by the Plone project that can be used outside Plone. plone.app.* : Namespace for code written by the Plone project which
collective.* : Very early on in the Plone project, in the dark days of the internet when we only had SourceForge, it was far too much hassle to create an entire SourceForge project for a very simple Plone add-on. So they created an umbrella project called the Collective and were free with allowing anyone who requested it access to this svn repo. As a testament to the FOSS community in general, it's interesting just how rarely open access to a source tree is abused. lovely.* : Namespace for packages written by Lovely Systems. Various other Zope-centric companies will release code that they contributed under the name of their organization.
By default it installs into a Python global location. People new to this tool often "shoot themselves in the foot" by doing "sudo python easy_install package_name" and making a right mess of their site- packages directory. At which point they decide they hate easy_install. However, easy_install can be used to install to other locations! Or more commonly, combined with VirtualEnv to install into a project global location. This is the recommended install process for TurboGears 2, where they are publishing a PyPI-like index for each release of TurboGears:
$ easy_install -i http://www.turbogears.org/2.0/downloads/current/index tg.devtools
The phrase "repeatable deployments" gets used a lot with Buildout. This just means that you can take an application, checkout the project on a clean system, run ./bin/buildout, and the same configuration and installation actions will be performed. Think of it as maintaining a buildout.cfg file for your project instead of an INSTALL.txt file. This is similar to Ruby's Capistrano tool - although the two tools really differ quite a bit in what they do. Capistrano puts a large focus on describing your deployment infrastructure and managing commands remotely. Buildout is smaller in scope in that it is only concerned with installing parts of your web application locally. Buildout is a configuration-driven build tool. You define a buildout.cfg file that describes the parts that compose your project or application. Only when the configuration of a part changes does buildout re-build that part. This is different from a "source code recompilication" focused build tools such as SCons. If you are working in a compiled language, buildout is not the tool for you. One of Buildout's unique innovations is that the recipes that describe how a part gets installed are themselves Python eggs. These recipes can be be included as part of your project, or they can be published on PyPI and shared with other buildout users.
[buildout] parts = oldervirtualpython newervirtualpython
[oldervirtualpython] recipe = zc.recipe.egg interpreter = oldpython eggs = zope.interface == 3.3
[newervirtualpython] recipe = zc.recipe.egg interpreter = newpython eggs = zope.interface
When buildout is run with this configuration, it will create two scripts (named oldpython and newpython) that act as wrappers around the Python interpreter. The 'oldpython' wrapper will have zope.interface version 3.3 installed, while the 'newpython' wrapper will have the latest version of zope.interface installed (currently 3.4.1). In this way you can address more complex use cases, such as a web site where you have one section of the site relying on an older version of Django, and another section of the site using the latest version.
Buildout has a bit of a reputation for being "opinionated", although this isn't entirely true. Buildout is completely agnostic as to what/ how it builds, but there are certain conventions within existing recipes, such as recipes in the zc.recipe.* being very egg-centric or having the policy that if the configuration for a part changes, it's rebuild by blowing away the existing part and re-install from scratch. Note that Recipes such as zc.recipe.egg will work in a shared hosting environment. At my organization I like it because it lets less experienced developers install Python packages into their home directories without needing to either give them root or teach them how to compile Python. It's worth noting that there are few more competitors in the Python- based project build space. Tools such as Fassembler, Vellum and Paver all have interesting ideas and could be a reasonable replacement for Buildout (or Rake or Capistrano if you like that Ruby junk :P). However, I'm only going to talk about Buildout since it is the only actively developed Python "repeatable deployment" project to date to have passed the 1.0 mark. So why go to the hassle of learning a build tool and maintaining a build configuration for your projects? Tarek's recent OSCON slides () touch on one point. See slide 6, which describes a Zope-based application install as "5 hours in 2006: install python extra packages, get zope, install zope, create an instance, get extra products, read extra products docs, install extra products dependencies, install extra products, doesn't work, ahh right, install python-ldap, checkout products in development, doesn't work, ahh right wrong python-ldap version ... start to work." then the next slide, "5 minutes in 2008". There is a second ancillary motivation, and that's to enable the Django community to "play nicer" (note I said "nicer" I ain't saying you folks are mean ... :P) in the Python ecosytem. A good way to do this is to use more non-Django python packages and to contribute back to them, and to spin-off the non-Django specific parts of a project so that others can more easily use them. However, in order to do this, your project's install is going to require an ever growing list of package dependencies. If you don't have a tool for automatically managing these dependencies, you will drive yourself batty doing manual installs and the impetus to "just keep it all together in a single, monolithic package" will be much greater. Since adopting Buildout I've even found a couple instances where I'll take some internally developed code and split it into a seperate egg only intended for internal re-use, when in the past I would have just employed cut-n-paste re-use. Zope 3 started life as a single source tree, and it wasn't broken into eggs until 2006. As a result of not wanting to break backwards compatability, even though Zope 3 was split into some 80+ individual python packages, many packages have unfortunate couplings between each other. This has the effect that often pulling in one specific Zope 3 package to your project you "pull in the world" and you end up getting all of Zope 3 :(
A very cool side-effect of having an egg-heavy project is being able to visualize the dependency tree. Marius Gedminas used the tl.eggdeps package to visualize the dependency tree expressed in the Zope 3 'install_requires' fields:
Another cool side-effect is the ability to easily determine what has changed between two releases of a framework (yeah, I know, that's what CHANGELOGs are for). Say I've got a web app using Grok 0.11 and want to upgrade to Grok 0.13. The list of packages and versions that compose these two releases are here:
http://grok.zope.org/releaseinfo/grok-0.11.cfg http://grok.zope.org/releaseinfo/grok-0.13.cfg
I can take these two files and run a diff between them. I also started (but didn't get very far due to lack of time and motiviation) to create a web app which would allow you to better visualize the differences between two lists of eggs, highlighting packages whose major or medium version numbers had incremented. This would for example highlight 'django.form' in bright red between the 0.96 and 1.0 release. Even cooler would be for developers to create "playlists" of interesting sets of packages (such as suite of testing and code analysis tools), and share them with each other. Note that Buildout is often "developer centric". This means that it's not always intended to replace your operating systems existng packaging system. Your SysAdmins may really like Debian or Red Hat and insist upon production deployments using as their OS packaging system - especially if they are security conscious. However, it's often a PITA to develop using the same environment used in production (and with Google App Engine it's not even possible to at all), so Buildout is often used to repeat the install only for development environments. For example my own small web-based LDAP management tool has a buildout.cfg file that only bothers to configure LDAP on Mac OS X, since that's what I use for development ( platform/bioinfo/software/gum). In no way did I suggest to our sysadmins that they use Buildout to configure our production LDAP servers, since they'd rightfully just look at me like I was nuts.
In the Grok framework (yay Grok!) (and also zopeproject for Zope 3), the framework does not have an installer built into it! Instead the framework installation experience is as follows:
Run grokproject and answer the prompts to create a sample project. This will generate a starting buildout.cfg file and run Buildout against that file.
Run ./bin/buildout to install the sample app, as well as the Grok framework.
Yes, it's possible to easy_install the Grok framework inside a VirtualEnv, but since the Grok framework is composed of 107 (zoiks!) python packages, run easy_install at any given point is can give you a different set of versions of the packages installed. TurboGears 2 also works around this by publishing their own "TurboGears PyPI repo" for every release of TG2. Since installing that many packages means a lot of network connections, grokproject now downloads a zip file containing all the packages (there are also other tools for packaging up all the files fetched over the net when using buildout so that purely off-line installs are possible). The beauty of this approach is you can also have a user-specific config file that states that you'd like to have a local cache of all eggs, so if you already have a bunch of apps installed, installing new ones is very quick and requires very little disk space. For example, my ~/.buildout/default.cfg has:
[buildout] eggs-directory = /Users/kteague/buildouts/shared/eggs download-cache = /Users/kteague/buildouts/shared/cache
The important thing to note though, is that the framework doesn't have an installer. In the olden days, we would run the Zope 2 installer, and then we'd have a location where we could put our own Python code. This is totally bass-ackwards. Instead you write own Python code, and then either in setup.py or buildout.cfg, you state that your code depends upon the framework. For example, in one Grok app that I'm working on I have in the setup.py file:
install_requires=['setuptools', 'grok', ],
The Grok line is a bit of red-herring, since I'm using a supplying a list of packages and versions to "pin" my app to a specific release of Grok. But when I wanted to use the Grok add-on 'megrok.form', I changed the install_requires to read:
install_requires=['setuptools', 'grok', 'megrok.form', ],
install_requires=['setuptools', 'grok', 'zc.resourcelibrary == 1.0.1', 'z3c.widget == 0.1.6', 'zc.datetimewidget == 0.5.2', 'collective.namedfile == 1.1', 'collective.namedblobfile == 0.3', ],
However, I do not have to install megrok.form manually, nor do I have to track down and manually install the correct versions of zc.resourcelibrary, z3c.widget, zc.datetimewidget, collective.namedfile, collective.namedfile. Instead I just add one line to my project's install_requires, re-run './bin/buildout' to freshen my project's install, and huzzah! the packages are automagically installed (and not in a global location!). Then I check that change into Mercurial, ping the developer I'm collaborating with, and he pulls my changes to his workstation, runs './bin/buildout' - and we've seamleslsy shared the exact same install process and have the same working set of python packages installed :) If a new developer starts on the project, they clone the Mercurial project, run "python bootstrap.py; ./bin/buildout" and a few minutes later they also have a running application that they can begin hacking on. Beautiful!
We'll I've not even touched on anything Django-specific, and this e- mail is already head-spinningly long. As Malcom pointed out, reaching a consensus on some of these issue is going to take more than a few hours. As you can tell, I'm pretty happy with the switch from distutils/framwork install process to the eggs/buildout install process. But there are other tools and approaches to these problems, so I'd encourage people to try out different things, and let others know what worked, what didn't work, where you're getting stuck, and wouldn't worry too much about deciding upon a "standard" just yet. And most of all, hopefully some people will contribute back in the form of beginner friendly documentation so that we can get to the point where Python gets a reputation as being "easy to deploy" :)
Why eggs? Because they are a great way to distribute Python software and declare its dependencies. Zope 3.4 is completely ...
Installed the standard Python 2.4 distribution from the ...
Overcoming an identity crisis Zope is in an identity crisis. Both Zope 2 and 3 don't really know what they want to be. A content management ...
When we created Grok, we realized this obvious complication in Zope 3 and made custom traversal very easy in Grok. You can either ...
non-developers apparently being very productive without having to learn how to "program," the ability to develop collaboratively on one Zope instance, restarts not being necessary, the ability to ...
Grok uses the Component Architecture and builds on Zope 3 concepts like content objects (models), views, and adapters. Its simplicity lies in using convention over ...
Having developed on and for Zope for nearly ten years now, I can say that there is no such thing as "traditional ...
Egg support. This doesn't necessarily mean that Zope 3 itself will be packed up as eggs (or "egg salad" as we called it in Dallas, since each package in the zope namespace would be its own egg). This "just" means that ...
Its my personal opinion that anytime a page template requires logic complicated enough to warrant using a 'python:' expression, that logic should be re-thought and placed into a view class. I know that some python: expressions are fairly simple, but for an HTML designer, *any* python: portions are dangerous to touch (and ...