Implementing generating functions

The method of generating functions is now ubiquitous in the network science literature, having been first widely introduced by Newman, Strogatz, and Watts in 2001 [NSW01]. Despite this, code to support the creation and use of generating functions is still hard to come by and can be tricky to implement. In this section we’ll briefly discuss the challenges posed, and the way in which epydemic addresses them in the generating functions library.

Generating functions in overview

First, a very swift overview. (Wilf [Wil94] provides considerably more detail.) A generating function is a formal power series of the form:

\[G_0(x) = \sum_{k = 0}^\infty f(k) \, x^k\]

where \(f(k)\) is a function that defines the coefficient of the series at \(k\) and \(x^k\) is a formal power. (\(x\) should not appear free in \(f\).) The generating functions we’re most interested are probability generating functions, and specifically probability generating functions of degree distributions, which take the slightly more specialised form

\[G_0(x) = \sum_{k = 0}^\infty p_k \, x^k\]

where \(p_k\) is the probability that a node chosen at random will have degree \(k\) (or, equivalently, the fraction of nodes in the network with degree \(k\)), and where the \(p_k\) form a valid probability distribution, all non-negative and summing to 1.

The significance of the powers of \(x\) is that they associate the given coefficient with a given value of \(k\). Evaluating the generating function at \(x = 0\) returns the coefficient associated with \(k = 0\) (since \(x^0 = 1\) but 0 for every other value of \(k\)). Evaluating at \(x = 1\) sums all the coefficients, and so will sum to 1 for a valid probability distribution.

As a series, generating functions can also be differentiated:

\[\frac{d}{dx} G_0(x) = G'_0(x) = \sum_{k = 0}^\infty k p_k \, x^{k - 1}\]

Evaluating \(G_0'(1)\) then yields the mean degree of the network whose degree distribution is represented by the original generating function.

Evaluating \(G_0'(0)\) will extract the coefficient of \(k = 0\), which in this case will be \(1 \times p_1\). Differentiating twice and then evaluating at 0 would give \(2 \times p_k\), and so forth, meaning it’s possible to extract all the coefficients using differentiation and evaluation: the \(k\)’th coefficient is given by:

\[\frac{1}{k!} \, \frac{d^{(k)}}{dx} G_0(x) \bigg|_{x = 0}\]

Generating function definition

This sounds very artificial to a computer scientist: why use such a baroque encoding?

The obvious way to define a generating function is to specify all the coefficients manually – and if you have those, why bother with all the extra machinery?

However, for several degree distributions of interest the series of the generating function is defined overall by a “special” function. For example, consider the Poisson degree distribution of an ER network. This is generated by a series:

\[G_0(x) = e^{\langle k \rangle (x - 1)}\]

and this can be manipulated using the generating function properties and operations even though the coefficients aren’t obviously stated. Similarly the powerlaw degree distribution has the form:

\[G_0(x) = \frac{Li_\alpha(x)}{\zeta(\alpha)}\]

making use of the polylogarithm and Riemann zeta functions. Again, it’s not at all obvious what the coefficients of the terms are, but they can be extracted by differentiation.

Differentiating generating functions

There are three main operations we want to perform on generating functions:

  1. Evaluating them;

  2. Extracting the coefficients for particular terms; and

  3. Differentiating them to some (potentially high) degree.

The second operation relies on the third; the first should be straightforward.

How do we differentiate a generating function? For maximum generality we probably want to use numerical methods rather than symbolic. If we have a list of coefficients then differentiation is simply an operation performed per-element. But what about for the series representation? Again, numerical differentiation is commonplace and provided in a number of libraries, notably scipy which has the scipy.misc.derivative function. It turns out however that there’s a problem. Using this form of numerical differentiation works fine for a few degrees (say up to 10) but then suffers numerical instability. Since we might want to differentiate hundreds (or even thousands) of times when analysing large networks, this clearly is a limitation.

The solution, suggested in a throwaway comment by Newman [New02], is to change the differentiation algorithm to one developed by Cauchy, involving a contour integral in the complex plane. That is, we compute:

\[f^{(n)}(z) = \frac{n!}{2 \pi i} \oint_C \frac{f(w)}{(w - z)^{n + 1}} \, dw\]

Getting from here to an implementation is quite involved, but we can implement the Cauchy procedure in Python quite easily (see Dobson [Dob21] for the details).

Providing a custom series

epydemic has several common degree distribution generating functions built-in, but there may sometimes be a need to define new ones. There are some caveats if you need to do this.

Complex numbers. The series are manipulated in the complex plane. This implies that the function that defines the series must be able to evaluate complex arguments. Not all of Python’s mathematical functions do this, but there are generally analogues available, for example using cmath instead of math. numpy is usually also safe, as are the special functions in scipy.

Large values. A related problem comes with numerical precision from high-degree differentiation, where a lot of factorials are computed that rapidly blow-out the standard numerical precision. The solution this time is to use the factorial function from the mpmath package: not a standard package, but a dependency installed along with epydemic.