Exploring network micro- and meso-structure

Problem: You’re interested in exploring how small-scale features of a network affect the way processes behave.

Solution: So are we! We can use network generators to create ensembles of networks to draw from whose elements have the structures we want to explore.

Many phenomena in network science are largely indifferent to the fine structure of the network over which they occur. For example, the degree distribution of nodes is enough to characterise the presence and location of an epidemic threshold. However, there are some assumptions made about the networks to which these general results apply. Typically they need to be selected randomly from an ensemble of possible networks with given characteristics. This sounds innocuous, but it hides a lot of fascinating behaviour.

Suppose we take as out network ensemble to set of all networks with a Poisson degree distribution and given parameters for size and mean degree (\(N\) and \(\langle k \rangle\)). These are the ER networks, and can be generated by the ERNetwork network generator which will return an element selected at random from the ensemble.

However, some elements of the ensemble are special. We might get a network that happens to be bipartite, for example: the nodes are divided into two sub-sets, and all edges link a node in one to a node in the other. Or we might get a clustered network, where the edges form lots of triangles. These are both still an ER network, but they’ve got extra structure beyond what we expect. We refer to these extra properties as microstructure if they affect a very small number of nodes, or mesostructure if they’re larger.

The ensemble of possible ER networks is enormous, and if we select an element at random – a particular ER network – we’re unlikely to get one that’s clustered or bipartite. So a “typical element” of the ensemble won’t exhibit these micro- or meso-structures.

Nonetheless these networks can be fascinating in their own right. Lots of processes’ behaviours are affected to some degree by the micro- and meso-structure. We know, for example, that clustered networks can have different percolation (and other) properties [MSMD21]. Such structure can also be more frequent in real-world networks than in random networks. A social network will typically be quite highly clustered, for example: two people are significantly more likely to be friends if they have a friend in common.

It’s obviously far better to create an ensemble of just the kind of networks you want rather than drawing from the more general ensemble and discarding networks that don’t fit: the latter could be excruciatingly slow for rare networks.

We can use a NetworkGenerator to define ensembles of networks with the structures of interest. That way we’re still creating a random network – but one with the structures we want. These generators often require significant care to create.

epydemic has two examples of generators with mesostructure. Both were defined by Hebert-Dufresne and Allard [HebertDA19] in their study of “smeared” phase transitions. They define two variants on the ER network:

  1. a “core-periphery” network (CorePeripheryNetwork) where two ER networks, one (the core) more dense than the other (the periphery), are attached together with a sparse set of edges; and

  2. a “modular” ER network (ModularNetwork) consisting of a dense “core” and some less dense “satellites”, where each satellite has exactly one edge connecting it to the core.

These are described as “toy” networks, but they’re actually anything but: they’re exemplars of sub-ensembles of the ER networks that have fine structure (and in the case of the modular networks, extremely tenuous connectivity). They can thus be used to study the extremal behaviour of processes.

As an example let’s look at the core-periphery construction, a slightly simplified version of the CorePeripheryNetwork class. For this class of network we want two ER networks with different sizes and edge densities, which we then combine into a single network and link together. The class defines four parameters for the sizes and densities of the core and periphery. It then overrides the _generate method to construct the network:

def _generate(self, params: Dict[str, Any]) -> Graph:
    # generate the core network
    N_core = params[self.N_core]
    phi_core = params[self.PHI_per]
    g_core = fast_gnp_random_graph(N_core, phi_core)

    # generate the periphery network
    N_per = params[self.N_per]
    phi_per = params[self.PHI_per]
    g_per = fast_gnp_random_graph(N_per, phi_per)

    # relabel the nodes into a single sequence
    g_core = convert_node_labels_to_integers(g_core, first_label=0)
    g_per = convert_node_labels_to_integers(g_per, first_label=N_core)

    # form both networks into a a single network, currently disconnected
    g = compose(g_core, g_per)

    # join the periphery to the core using the same density
    # as within the periphery itself
    for n in g_core.nodes():
       for m in g_per.nodes():
          if rng.random() <= phi_per:
             g.add_edge(n, m)             # edges added to composed network

    # restrict to the LCC
    lcc = g.subgraph(max(connected_components(g), key=len)).copy()
    lcc = convert_node_labels_to_integers(lcc, first_label=0)

    return lcc

The core and periphery ER networks are first generated independently using networkx. (We could have used ERNetwork instead, of course.) This will result in two networks with overlapping labels, so we re-label them into a common sequence, then use networkx’s compose function to join them together. This yields a network with two disconnected sub-networks: we then look at each possible edge between the two and add each edge with the same probability as we used to construct the peripheral network. Finally we remove any disconnected sub-networks and tidy-up the labels.

The important point here is that we have a very clear idea of the description of the ensemble we’re interested in, which we can then transcribe directly into code as a network generator: a network generator simply samples the ensemble of networks it defines. We can easily define unit tests to make sure that the networks we generate have the topological properties we want. In this case that might include checking that the numbers of nodes and edges are as expected.