This document is meant to gather resources for the scientist interested in starting to use the Python programming language for scientific computing. Most of the information here should be of general use, though a few pointers are specific to resources at UC Berkeley. Please email me with feedback, corrections or suggestions.
The landscape of Python tools for scientific computing is varied and rapidly growing. Python wasn’t originally designed specifically for numerical computing but instead as a general purpose, high level language. For this reason, as a scientist you will need to install some extra tools on top of the basic language download to provide support for array manipulations, numerical algorithms and data visualization. All of the tools mentioned here are free and developed as open source software in a collaborative manner by other scientists; I encourage you to not only use these tools but to get involved with the groups that develop them. You will find not only help with questions and problems, but likely also the opportunity to shape the development of the major tools in a way that improves them for your own research.
Here are quick instructions on what to download to get started, especially if you will be soon attending a class or workshop I may be teaching. At the end of this page there is a longer description of the various tools and distributions available, with some context to inform your decision.
For a basic verification that you have a functioning installation of the core tools on your system, simply download and run this checklist script as per the instructions at the top of the file.
On a reasonably recent Linux distribution, all the tools you need are available via the package management system. On Ubuntu or other Debian-based distributions, type at the shell (tested on Ubuntu 9.10 Karmic):
sudo apt-get install ipython ipython-notebook ipython-qtconsole \ python-scipy python-matplotlib mayavi2 python-pandas \ python-sympy cython python-networkx python-pexpect python-nose \ python-setuptools python-sphinx python-pygments \ python-tk build-essential sudo apt-get build-dep python python-scipy python-matplotlib mayavi2 cython
These two commands give you all the core packages to get started with scientific Python work, including development tools like compilers. On Fedora, the equivalent commands are (tested on Fedora 12):
sudo yum install yum-utils sudo yum install python-ipython-notebook \ scipy python-matplotlib Mayavi sympy Cython \ python-networkx pexpect python-nose python-setuptools \ python-sphinx python-pygments python-pandas sudo yum-builddep python scipy python-matplotlib Mayavi Cython
Install the Enthought Python Distribution (I’m assuming here you are an academic user who can use the free license). This has all of the above, and much more, in a single installer.
On the Mac, you will also want to have:
Python is a programming language, so at some point you’ll need to type code. Learning how to use a good, powerful text editor is one of the best investments of time you can make in terms of computing-related skills. I’m a life emacs user, but vi is equally sophisticated (in a very different style). These editors, however, aren’t the easiest to get started with (if you’re serious about computing though, I strongly recommend you do learn how to use them).
If you want something with a slightly easier learning curve to begin with, the following are all free, good options:
In all of these, the markers that you see as >>> are the prompts generated by Python which you do not type. Similarly, the IPython prompts look like In :.
In addition to these two minimal requirements, the following links can also be useful:
With a slightly broader view, I very strongly recommend you spend some time with Greg Wilson’s excellent Software Carpentry materials. As of early 2010 he is restructuring them and I’m sure the new version will be even better, but even the archives have a lot of value; Greg addresses the real problems that exist at the intersection of software engineering and scientific computing and tries to offer not only practical solutions, but more importantly, a set of approaches that hopefully lead to the creation of a more robust computational culture in scientific work.
These are a few good links about how to write good Python code:
Quick reference: use Richard Gruet’s excellent Python Quick Reference, available in html and pdf formats for several Python versions.
At some point you’ll need to debug your code, and this page is the cleanest introduction to the Python debugger I’ve read.
In IPython, you can run scripts under the control of the debugger by typing %run -d script.py, and you can debug post-mortem by typing %debug after any exception (or type %pdb to make this happen automatically anytime there is an exception). The IPython debugger is an extended version of the one described in this page, with syntax highlighting and tab completion, but otherwise works identically.
In terms of books for scientists, I recommend the following:
These books are of general value and freely available online, though they can also be purchased in paper form:
The following Python books (except for David Beazley’s) are freely available to UC Berkeley via the O’Reilly Safari system. These are books I have personally found to be useful and can recommend; they are general-purpose books without content specific to scientific use.
U.C. Berkeley users can access Safari for free. For this you need to be either on campus or browsing with the Berkeley Library Proxy.
In late 2008 I taught an intensive 2-day workshop introducing Python to scientific users at UC Berkeley. While this was a very hands-on course and thus probably not the best thing to watch as a recording, a number of people have still told me that they find the lectures useful, all the video is available. They were kindly videotaped and put online by Jeff Teeters.
Enthought offers a webinar series that is open to the public, and recordings of past ones are available as well.
MIT’s famous 6.00 Introduction to Computer Science and Programming course is now using Python and the whole course is available online on their OpenCourseware system. In particular, lecture 18 covers Matplotlib.
And there is a series of basic Python tutorials on YouTube.
These are a few extra video lectures you may find useful:
General Python lectures
All of the projects linked above have mailing lists that are very welcoming; I have personally learned much from the discussions on these lists. You will find that very knowledgeable people are surprisingly generous with their time, if you ask questions carefully and provide sufficient information to clearly delineate your problem. Simply click on each project’s main page and you will typically find an up-to-date link to its mailing lists.
The Planet SciPy blog aggregator is a useful way to keep in touch with what many projects are doing.
Another excellent way to get in touch with what the developers of all these tools are doing is to attend the annual SciPy conference, which combines teaching tutorials, formal presentations and development sprints.
If you are a UC Berkeley (or other Bay Area person for whom coming to campus is feasible), I encourage you to stop by any of the regular Py4Science meetings on campus. This informal group meets to discuss tools, problems and solutions regarding the use of Python in scientific research; we have a very low-traffic mailing list for meeting announcements that anyone can subscribe to.
If you think of Python as a ‘Matlab/IDL replacement’, you probably want at the very least (before you download any of these individually, continue reading below):
These are probably the raw basics, and a community maintained page at the SciPy site lists a vast array of other tools you may find useful in your specific problem domain, all of them free.
In terms of actually downloading and installing tools, there are a few alternatives, partly depending on your operating system of choice:
As an alternative approach, the Sage project also ships most of these tools, and then adds others (like GMP and Pari) to provide a new numerical foundation, as well as its own original libraries for many tasks. It also extends the Python language syntax and modifies its core numerical type system with one based on more structured mathematical abstractions (all integer arithmetic is performed over the rationals, floating point numbers can always be arbitrary precision ones, etc). Sage provides a web-based interactive notebook environment (as well as a customized IPython command-line one) but does not by default build the graphical user interface components for Matplotlib and Mayavi. It’s worth noting that since Sage has its own numerical type system and matrix classes, by default most normal numpy/scipy examples will not work in exactly the same way in Sage. Depending on your needs, you can either use the Sage notebook in ‘pure python mode’ where it will not load Sage’s native types, or use ‘Sage mode’ where its objects provide mathematical computing capabilities not available in Python or NumPy.
Whether you choose to use the integrated Sage environment or the individual libraries is up to you ; I personally do most of my development on top of ‘bare’ Python using only the libraries I need for each problem, but I always keep an updated Sage installation available and use it as needed. Sage is available in source and binary form for many different Unix-like operating systems, and can be used in Windows as a VMWare Linux image.
Thanks to Chris Burns from UC Berkeley for a useful set of links and resources, to Stefan van der Walt from U. Stellenbosch for notes on Sage and numerics, and to Gokhan Sever for a number of useful links.
|||One point that may be of importance to you in making this decision, depending on your context, is licensing. Most of the tools I link to here are licensed in a BSD or similar manner, except for Sage which is GPL licensed. Since Sage builds on a large foundation of other code that includes a mix of BSD and GPL tools, the combined Sage entity is necessarily also a GPL’d project.|