# A.6 – Graph Theory: Measures and Indices

#### Authors: Dr. Cesar Ducruet and Dr. Jean-Paul Rodrigue

Graph theory relies on several measures and indices that assess the efficiency of transportation networks.

# 1. Measures at the Network Level

Transportation networks are composed of many nodes and links, and as they rise in complexity, their comparison becomes challenging. For instance, it may not be at first glance evident to assess which of two transportation networks is the most accessible or the most efficient. Several measures and indices can be used to analyze network efficiency, with many initially developed by Kansky in the 1960s:

• Expressing the relationship between values and the network structures they represent.
• Comparing different transportation networks at a specific point in time.
• Comparing the evolution of a transport network at different points in time.

Outside the description of a network size by the number of nodes and edges, and its total length and traffic, several measures are used to define the structural attributes of a graph; the diameter, the number of cycles, and the order of a node.

Diameter (d). The length of the shortest path between the most distanced nodes of a graph. It measures the extent of a graph and the topological length between two nodes.

The diameter enables us to measure the development of a network in time. A high diameter implies a less linked network. In the case of a complex graph, the diameter can be found with a topological distance matrix (Shimbel distance), which computes for each node pair its minimal topological distance. Graphs whose extent remains constant, but with higher connectivity, have lower diameter values. Planar networks often have a large diameter due to many intermediate stops between two distant nodes.

Number of Cycles (u). The maximum number of independent cycles in a graph. This number (u) is estimated through the number of nodes (v), links (e) and of sub-graphs (p). Trees and simple networks have a value of 0 since they have no cycles. The more complex a network is, the higher the number of cycles, so it can be used as an indicator of the level of development and complexity of a transport system. $\large u = e-v+p$

# 2. Indices at the Network Level

Indices are more complex methods to represent the structural properties of a graph since they involve the comparison of one measure over another. Some indices consider spatial features (distance, surface) and the level of activity (traffic), while others solely rest on the topological dimension of the network.

Cost. Represents the total length of the network measured in real transport distances where aij is the presence (1) or absence (0) of a link between i and j and lij the length of the link. This measure can also be calculated based on two other dimensions of the network; the Minimum Spanning Tree (MST) and the Greedy Triangulation (GT). The MST represents the shortest and/or lowest cost subtree of the network; it can be obtained by applying shortest path algorithms, the Kruskal algorithm, which allows finding the lowest cost route connecting all nodes in the network. The GT refers to the maximal connected planar graph keeping the same number of nodes than in the original network but adding all possible links without breaking its planarity. Such operations consider both the topology and the geography of the network, while comparing the latter with its optimal configurations. More efficient networks have relative costs near to 1, while less efficient networks are closer to 0. $\large Cost = \displaystyle\sum_{i,j} a_{ij} l_{ij} \newline$ $\large Cost_{rel} = \frac {Cost-Cost^{\text{MST}}}{Cost^{GT}-Cost^{\text{MST}}}$

Detour Index. A measure of the efficiency of a transport network in terms of how well it overcomes distance or the friction of distance. The closer the detour index gets to 1, the more the network is spatially efficient. Networks having a detour index of 1 are rarely, if ever, seen and most networks would fit on an asymptotic curve getting close to 1, but never reaching it. For instance, the straight distance, D(S), between two nodes may be 40 km but the transport distance, D(T); real distance, is 50 km. The detour index is thus 0.8 (40 / 50). The complexity of the topography is often a good indicator of the level of detour since rugged areas are associated with higher detour indexes. $\large DI= \frac {D(S)}{D(T)}$

In order to derive a measure of relative efficiency, the Detour Index Relative Efficiency is the ratio between the Detour Index calculated from the original network and the Detour Index calculated either from the MST (minimum spanning tree) or the GT (greedy triangulation). $\large E_{rel}= \frac {DI-DI^{\text{MST}}}{DI^{GT}-DI^{\text{MST}}}$

Network Density. Measures the territorial occupation of a transport network in terms of km of links (L) per square kilometers of surface (S). The higher it is, the more a network and an economy is developed. $\large ND= \frac {L} {S}$

Pi Index. The relationship between the total length of the graph L(G) and the distance along its diameter D(d). It is labeled as Pi because of its similarity with the real Pi value, which is expressing the ratio between the circumference and the diameter of a circle. A high index shows a developed network. It is a measure of distance per units of diameter and an indicator of the shape of a network.

Eta Index. Average length per link. Adding new nodes will cause a decrease of Eta as the average length per link declines. Complex networks tend to have a low eta value.

Theta Index. Measures the function of a node, which is the average amount of traffic per intersection. The higher theta is, the greater the load of the network. The measure can also be applied to the number of links (edges) where it represents the average load per link.

Beta Index. Measures the level of connectivity in a graph and is expressed by the relationship between the number of links (e) over the number of nodes (v). Trees and simple networks have Beta value of less than one. A connected network with one cycle has a value of 1. More complex networks have a value greater than 1. In a network with a fixed number of nodes, the higher the number of links, the higher the number of possible paths in the network. Complex networks have a high value of Beta. The rich-club coefficient is the Beta index applied to relations among larger order (degree) nodes; it verifies whether the connectivity is higher among larger degree nodes than for the whole network.

Alpha Index. A measure of connectivity which evaluates the number of cycles in a graph in comparison with the maximum number of cycles. The higher the alpha index, the more a network is connected. Trees and simple networks will have a value of 0. A value of 1 indicates a completely connected network. Measures the level of connectivity independently of the number of nodes. It is very rare that a network will have an alpha value of 1, because this would imply very serious redundancies. This index is also called Meshedness Coefficient in the literature on planar networks.

Gamma Index. A measure of connectivity that considers the relationship between the number of observed links and the number of possible links. The value of gamma is between 0 and 1 where a value of 1 indicates a completely connected network and would be extremely unlikely. Gamma is an efficient value to measure the progression of a network in time.

Being solely based on the number of nodes and links, Alpha, Beta, and Gamma indices remain limited in revealing structural differences between networks of equal size. More robust measures have thus been proposed by physics, which considers the internal complexity of the graph.

Hierarchy (h). The exponent of the slope for the power-law line drawn in a bi-log plot of node frequency over degree distribution. Networks characterized by strong hierarchical configurations, such as scale-free networks (few large degree nodes and many small degree nodes), often have values over 1 or 2. A value lower than 1 indicates the absence of scale-free properties and a limited hierarchy among nodes.

Transitivity (t). Also called clustering coefficient, it is the overall probability for the network to have adjacent nodes interconnected, thus revealing the existence of tightly connected communities (or clusters, subgroups, cliques). It is calculated by the ratio between the observed number of closed triplets and the maximum possible number of closed triplets in the graph. Another way calculating transitivity is to calculate the average clustering coefficient of all nodes. Complex networks and notably small-world networks often have a high transitivity and a low diameter. Because triplets are not the only way for looking at neighborhood density among nodes, this measure can be extended to cycles of length 4 and 5.

Average shortest path length (s). A measure of efficiency that is the average number of stops needed to reach two distant nodes in the graph. The lower the result, the more efficient the network in providing ease of circulation. In comparison, the diameter is the maximum length of all possible shortest paths.

Assortative coefficient (r). This coefficient is the Pearson correlation between the order (degree) of nodes at both ends of each link (edge) in the network. The result ranges from -1 (low degree nodes often connect high degree nodes) to 1 (nodes of equal or similar degree are often connected). Disassortative networks (r is significantly negative) are often those with strong hierarchical configurations with large nodes connecting smaller nodes, as in scale-free networks, while regular networks are often assortative.

# 3. Measures and Indices at the Node Level

Numerous measures exist for highlighting the situation of a node in a network. Some are made at the “local level” based on links with adjacent nodes, while others on the “global level” consider the node’s situation in the whole network.

Order (degree) of a Node (o). The number of its attached links and is a simple, but effective measure of nodal importance. The higher its value, the more a node is important in a graph as many links converge to it. Hub nodes have a high order, while terminal points have an order that can be as low as 1. A perfect hub would have its order equal to the summation of all the orders of the other nodes in the graph and a perfect spoke would have an order of 1. The percentage of nodes directly connected in the entire graph is thus a measure of reachability. An isolate is a node without connections (degree equals to 0). The difference between in-degree and out-degree in a directed graph (digraph) may underline interesting functions of some nodes as attractors or senders. The order may be calculated at different depths: adjacent nodes (depth 1), adjacent nodes of adjacent nodes (depth 2), etc. The weighted degree is simply the total of values associated with links.

Koenig number (or associated number, eccentricity). A measure of farness based on the number of links needed to reach the most distant node in the graph.

Shimbel Index (or Shimbel distance, nodal accessibility, nodality). A measure of accessibility representing the sum of the length of all shortest paths connecting all other nodes in the graph. The inverse measure is also called closeness centrality or distance centrality.

Betweenness centrality (or shortest-path betweenness). A measure of accessibility that is the number of times a node is crossed by shortest paths in the graph. Anomalous centrality is detected when a node has a high betweenness centrality and a low order (degree centrality), as in air transport.

Hub Dependence (hd). A measure of node vulnerability that is the share of the highest traffic link in total traffic (weighted degree). Weak nodes depending on few links will have a high hub dependence, especially if they locate in the neighborhood of a large node, while hubs will have a more even traffic distribution among their connections. It indicates to what extent removing the largest traffic link would affect the node’s overall activity. The measure can be extended to more links (2, 3 … 10 largest flow links).

Average nearest neighbors degree (knn). A measure of neighborhood indicating the type of environment in which the node situates. A node with low order (degree) may be surrounded by a variety of other nodes, small or large, which has a direct influence on its own centrality and growth potential. A network is assortative or disassortative depending on the similarity of the order (degree) among neighboring nodes, which can be tested by means of Pearson correlation (assortativity coefficient). Neighbor connectivity is the correlation between the order (degree) of nodes and the average order (degree) of their neighbors.

Cohesion index (ci). For a given (link) edge ij, this index measures the ratio between the number of common neighbors connected to nodes i and j and the total number of their neighbors. Links with highest values typically connect dense communities (or clusters) in the graph, and can be removed in order to bisect the graph and reveal such subgroups. Multiplying this index by the weight (e.g. traffic) of links allows coupling topology and flows. This cohesion index is also called Strength index and it corresponds to the observed number of cycles of length 3 and 4 to which the edge belongs divided by the maximum number of such cycles.

Within-module degree (Zi; or z-score). Shows how well connected a node is to other nodes in the same module (or cluster, community), where Ki is the order (degree) of node i in the cluster si, Ksi is the average order (degree) of all nodes in the cluster si, and δKsi is the standard deviation of K in si. Because two nodes having same z-score may play different roles within the cluster, this measure is often compared with the Participation Coefficient (Pi). Both measures are applied to nodes once the clusters in the network are known.

Participation coefficient (Pi). Compares the number of links (order, degree) of node i to nodes in all clusters with its number of links within its own cluster. Zi and Pi reveal whether nodes are truly hubs in the network, while others are bound to local links and therefore do not act as connectors between clusters.

Several critiques have been made towards such indexes as they do not always consider the real length, quality, and weight of the links; networks of equal size may exhibit contrasted topological forms. However, they remain useful for describing the changing structure of a given network.

## Bibliography

• Arlinghaus, S.L., W.C. Arlinghaus, and F. Harary (2001) Graph Theory and Geography: An Interactive View. New York: John Wiley & Sons.
• Jiang B. and C. Claramunt (2004) “Topological analysis of urban street networks”, Environment and Planning B, Vol. 31, pp. 151-162.
• Kansky, K. (1963) Structure of transportation networks: relationships between network geography and regional characteristics, University of Chicago, Department of Geography, Research Papers 84.
• Waters, N.M. (2006) Network and Nodal Indices: Measures of Complexity and Redundancy: A Review. In A. Reggiani & P. Nijkamp (eds) Spatial Dynamics, Network and Modelling, Cheltenham, UK & Northampton, MA, USA: Edward Elgar.
• Watts, D.J., Strogatz, S.H. (1998) “Collective dynamics of ‘small-world’ networks”, Nature 393 (6684): 440–442.