Improving execution times

Problem: You’re trying to work with epydemic and it’s sooo sloooooow.

Solution: Welcome to scientific computing. Speed is always a problem.

epydemic is a Python library. The advantages of using Python are ease of programming, ease of integration, and the availability of lots of libraries (including epydemic) that can massively reduce the amount of programming effort required to address a problem. Using it is “standing on the shoulders of giants” in the code space, just as we do in science.

The disadvantage of Python is that it’s an interpreted language, meaning that it’s slow relative to languages like C, C++, or Fortran, that might otherwise be used for scientific computing. Some measurements suggest that Python is about 100x slower than C. Essentially you trade-off speed of development and integration against speed of experimental execution.

This having been said, there are definitely ways of speeding-up your use of epydemic.

Don’t be too ambitious

epydemic will happily deal with processes over networks of the order of \(N = 10^5\) nodes, and can stretch in many cases to \(10^6\) nodes. However, bear in mind that a lot of network algorithms have poor time complexity: they often scale linearly with the number of edges, which is \(O(N^2)\) in the number of nodes, and even the more well-behaved ones seldom drop below \(O(N)\). This mean that the move to \(10^7\) nodes may be painful, especially if you also want to do lots of repetitions.

Having said that, the most important internal data structures within epydemic (which are the loci of nodes and edges at which events happen) have \(O(\log N)\) operations, which helps keep things linear. See Major implementation challenges for a longer discussion.

Use PyPy

The PyPy project describes itself as “a fast, compliant alternative implementation of Python”. It uses a number of alternative approaches to compiling Python code that claim to make a 4.4x improvement in speed. Our experience with running epydemic on PyPy suggests around a 5x speedup over “normal” Python, which is well worth having.

Note however that PyPy is a work in progress and doesn’t support all Python libraries – and annoyingly this includes matplotlib, which means you can’t have a single program that both simulates an epidemic and generates the graphs of the results. You may be able to split the work by having two programs linked through a common lab notebook, but this makes coding (and more especially code maintenance) more delicate. It also doesn’t play well with Jupyter.

The easiest way of using PyPy is to build a virtual environment and then run your epydemic code in that.

Use a multicore workstation

Most modern workstations (and indeed many laptops) are multicore. Since epydemic uses epyc for experimental management, you can use ParallelLab: Running experiments locally in parallel that will make use of the cores you have available. If you have a 16-core machine you might get a (somewhat less than) 16x performance improvement this way.

The use of multicore doesn’t speed up individual simulations: they’re all single-threaded. But it does mean that multiple simulations can run simultaneously.

Use a compute cluster

The use of epyc means that epydemic can run the same experiment in parallel on a compute cluster, using ClusterLab: Flexible, parallel, asynchronous experiments. Again, this doesn’t speed up each individual experiment but does allow multiple experiments to run simultaneously – and since we’re often doing lots of repetitions of experiments, this is a useful way of getting speedup overall.