Buildout Performance Improvements
I've been baffled by the amount of time running buildout takes. In particular, it seemed like a lot of duplicate work was being done when multiple parts had identical or very similar sets of required eggs/distributions. After a bunch of profiling and timing, I've found there are a number of ways to increase your buildout performance.
- Use buildout.dumppickedversions version 0.5 or later:
- If you're using buildout.dumppickedversions, use version 0.5. This release includes a fix I contributed which yields a 3-4 fold decrease in runtime if buildout is run with just the -N option and no -v options. If your're using buildout-versions instead of buildout.dumppickedversions, unfortunately it suffers from the same problem. The author knows about this and may address it but until then you may want to switch to buildout.dumppickedversions.
- Use zc.buildout version 1.5 or later:
- This seems to yield a 2-3 fold decrease in buildout runtime when run with -N and addresses a similar hot loop as found in buildout.dumppickedversions. In fact, it's resolution of required distributions seems to be much more efficient in general. If you're stuck with 1.4, I've contributed the same fix as for buildout.dumppickedversions to the 1.4 branch so it should make it into a next release if someone makes a new 1.4 release.
- Use the -N option:
- With the above newest releases of zc.buildout and buildout.dumppickedversions, you can now exercise a lot of control of your run times depending on what command-line options you use. In particular, make sure newest is false and that verbosity is not increased. Both of these can be influenced by options in the buildout configuration, so keep an eye out for that, but if no such options are interfering then this means just invoking buildout like $ bin/buildout -N -c buildout.cfg. IOW, use one -N option and no -v options. Before the improvements in zc.buildout 1.5 and the fix to buildout.dumppickedversions, a lot of time was wasted doing work that would only actually be used if -v had been given.
- Clean out your egg cache and use virtualenv --no-site-packages:
- Some of the remaining inefficiencies in zc.buildout are proportional to the number of distributions available on sys.path. IOW if you reduce the number of distributions to be scanned, you can increase performance. If you're using a shared eggs-directory or have a buildout that has been around for a long time, there may be a lot of different versions of the same packages in eggs-directory. By cleaning those out you reduce the number of distributions that buildout needs to scan. I recommend just emptying your eggs-directory and then letting buildout re-download all the eggs it actually needs now. I takes a while once, but then it's done. Similarly, using virtualenv --no-site-packages can reduce the number of dists buildout needs to scan.
- Help me get zc.buildout/branches/env-cache merged and released:
- Having said the above about cleaning out the egg cache, buildout, distribute, and setuptools shouldn't be scanning these paths multiple time anyways. Along with package indexes, these paths are global in nature and so the dists found there should only be scanned for once and cached globally. I've started this work on the env-cache branch of zc.buildout. In my timing tests, this branch seems to yield another ~60% decrease in run time. The zc.buildout tests, however, are a PITA and Jim Fulton is a very busy man with little time to evaluate this branch. If anyone else can help me shepherd this branch through to merge and then release, that would be great. I've written up how to use this branch with your existing buildouts.
With all this in effect I took a production buildout and another development buildout from 2 minute run times down to 20 second run times. This really makes re-running buildout something I no longer feel an urge to avoid. So use zc.buildout 1.5.2, buildout.dumppickedversions 0.5, -N and do what you can to help get the env-cache branch merged.