image.png

Community Detection using Louvain

Louvain is an algorithm that can quickly find non-overlapping communities within large networks. It is a hierarchical clustering algorithm, that recursively merges communities into a single node and executes the modularity clustering on the condensed graphs.

Our graph will model a social network in which people have friends and their friendship can be of different strength. We will first apply the algorithm to an unweighted undirected graph, then add friendship strength as weights to the edges and look at the difference in the results.

First, we start by importing the relationalai library and define our model, which we call MyLouvainGraph. We also create two types called Person and Friendship.

In [ ]:
%pip install relationalai
In [2]:
import relationalai as rai
from relationalai.std import alias
from relationalai.std.graphs import Graph

model = rai.Model("MyLouvainGraph")

# Graph will hold these types:

# Represents nodes
Person = model.Type("Person")

# Represents connections between nodes
Friendship = model.Type("Friendship")

Let's add some data to our model

We first create a dictionary of people names, who are friends with each other, and a number representing the strength of their friendship. Then we iterate over the dictionary to create objects of type Person with a name property. We also create Friendship objects with strength property, representing connection between two people. We will use it to create weighted edges between nodes in our graph.

In [3]:
data = {
    ("Mary", "Bob"): 1.0,
    ("Mary", "Jessy"): 0.5,
    ("Mary", "Erica"): 0.1,
    ("Mary", "Tom"): 0.1,
    ("Bob", "Frank"): 0.1,
    ("Jessy", "Erica"): 1.0,
    ("Erica", "Tom"): 1.0,
    ("Tom", "Jessy"): 1.0,
    ("Frank", "Anna"): 1.0,
    ("Frank", "Mark"): 1.0,
    ("Frank", "Peter"): 0.5,
    ("Anna", "Peter"): 0.1,
    ("Anna", "Mark"): 1.0,
    ("Peter", "Mark"): 0.1,
}

with model.rule(dynamic = True):
    for ((name, friend_name), strength) in data.items():
        person = Person.add(name = name)
        friend = Person.add(name = friend_name)
        Friendship.add(person = person, friend = friend).set(strength = strength)

Creating the graph

Let's start by creating a graph with Node and Edge types. We add all Person instances as nodes, and assign the name property so that we can use them in our queries and for visualization purposes. The Friendship instances are then used to form the edges in our graph.

In [4]:
# Create graph
graph = Graph(model, undirected = True)
Node, Edge = graph.Node, graph.Edge

# add all Person instances as Nodes and assign `name` property (for displaying)
Node.extend(Person, name = Person.name)

# add all Friendship instances as Edges from `person` to `friend` Nodes
with model.rule():
    f = Friendship()
    Edge.add(f.person, f.friend)

Running the algorithm

Let's add a rule that calculates for every node of the graph the community that it belongs_to. To derive this value for each node using the Louvain algorithm, we can simply use Graph(model).compute.louvain().

In [5]:
# run the Louvain algorithm on the Graph,
# for every Person node assign the community it belongs to
with model.rule():
    p = Person()
    community = graph.compute.louvain(p)
    p.set(belongs_to = community)
    Node(p).set(community_id = community)

Querying the Graph

Graph Nodes, Edges and their properties are queried using model.query() context and graph.Node() or graph.Edge() types.

Alternatively, the entire graph representation can be fetched using Graph(model).fetch() (not recommended to be used for large graphs). This returns a dictionary with two keys, nodes and edges, that represents the entire graph.

In [6]:
# query all Edges with `name` and `community_id` properties of the Nodes they connect
with model.query() as select:
    e = Edge()
    n1 = Node(e.from_)
    n2 = Node(e.to)
    response = select(alias(n1.name,"from"),alias(n1.community_id,"community (from)"), alias(n2.name,"to"),alias(n2.community_id,"community (to)"))

response
Out[6]:
from community (from) to community (to)
Anna 1 Mark 1
Anna 1 Peter 1
Bob 2 Frank 1
Erica 2 Tom 2
Frank 1 Anna 1
Frank 1 Mark 1
Frank 1 Peter 1
Jessy 2 Erica 2
Mary 2 Bob 2
Mary 2 Erica 2
Mary 2 Jessy 2
Mary 2 Tom 2
Peter 1 Mark 1
Tom 2 Jessy 2

How many communities were found? And to which community do people belong to?

In [7]:
# For each Person, fetch the name and the community it belongs to
with model.query() as select:
    p = Person()
    response = select(p.name, p.belongs_to)

response
Out[7]:
name belongs_to
Anna 1
Bob 2
Erica 2
Frank 1
Jessy 2
Mark 1
Mary 2
Peter 1
Tom 2
In [8]:
# Group the response from the above query by the belongs_to property to separate the communities
communities = response.results.groupby("belongs_to").name.apply(list)

# Print the number of communities
count_communities = len(communities)
print(f"Number of communities: {count_communities}")

# Print communities and people that belong to them
for i, c in enumerate(communities):
    print(f"Community N{i+1}: {c}")
Number of communities: 2
Community N1: ['Anna', 'Frank', 'Mark', 'Peter']
Community N2: ['Bob', 'Erica', 'Jessy', 'Mary', 'Tom']

Let's find out who is in the same community as Frank

In [9]:
person_name = "Frank"

with model.query() as select:
    p1 = Person(name = person_name)
    p2 = Person()
    # We want to skip the person itself
    p1 != p2
    p1.belongs_to == p2.belongs_to
    response = select(p2.name)

response
Out[9]:
name
Anna
Mark
Peter

Visualizing the results

Let's visualize our graph, to better understand the results. We use Graph(model).visualize() to visualize our graph. We define a color for each of the communities.

In [10]:
graph.visualize(three = False, style = {
    "node": {
        "color": lambda n : ['blue', 'pink'][ list(communities.keys()).index(n['community_id']) % 2 ],
        "label": lambda n : n['name'],
        "hover": lambda n: n['community_id'],
        "size": 30
    },
    "edge": {
        "color": "green"
    }
}).display(inline = True)
Details for selected element
General
App state
Display mode
Export
Data selection
Graph
Node label text
Edge label text
Node size
Minimum
Maximum
Edge size
Minimum
Maximum
Nodes
Visibility
Size
Scaling factor
Position
Drag behavior
Hover behavior
Node images
Visibility
Size
Scaling factor
Node labels
Visibility
Size
Scaling factor
Rotation
Angle
Edges
Visibility
Size
Scaling factor
Form
Curvature
Hover behavior
Edge labels
Visibility
Size
Scaling factor
Rotation
Angle
Layout algorithm
Simulation
Algorithm
Parameters
Gravitational constant
Spring length
Spring constant
Avoid overlap
Central gravity

Adding weights to the graph edges

As a final step, let's see how adding weights to the graph edges affects the result of the algorithm. Let's create a new graph with the same Nodes and Edges, but now using Friendship.strength property as a weight of an edge.

In [11]:
# Create weighted graph
weighted_graph = Graph(model, undirected = True, weighted = True)

# add all same Person instances as Nodes and assign `name` property
weighted_graph.Node.extend(Person, name = Person.name)

# add all Friendship instances as Edges from `person` to `friend` Nodes with weight property
with model.rule():
    f = Friendship()
    weighted_graph.Edge.add(f.person, f.friend, weight = f.strength)

# run Louvain algorithm on the new Graph
with model.rule():
    n = weighted_graph.Node()
    n.set(community_id = weighted_graph.compute.louvain(n))

How many communities were found now?

In [12]:
# query weighted graph nodes names and communities they were assigned to
with model.query() as select:
    n = weighted_graph.Node()
    response = select(n.name, n.community_id)

# group names by community_id
communities = response.results.groupby("community_id").name.apply(list)

# print communities and names of people that belong to them
for i, c in enumerate(communities):
    print(f"Community N{i+1}: {c}")
Community N1: ['Anna', 'Frank', 'Mark', 'Peter']
Community N2: ['Erica', 'Jessy', 'Tom']
Community N3: ['Bob', 'Mary']

Visualizing the weighted graph

Let's make the size of the Edges proportional to the weight. Additionally, we'll define more colors since we now have more communities.

In [13]:
weighted_graph.visualize(three = False, style = {
    "node": {
        "color": lambda n : ['yellow', 'orange', 'blue', 'pink'][ list(communities.keys()).index(n['community_id']) % 4 ],
        "label": lambda n : n['name'],
        "hover": lambda n: n['community_id'],
        "size": 30
    },
    "edge": {
        "color": "green",
        "size": lambda e: e['weight'] * 3.5
    }
}).display(inline = True)
Details for selected element
General
App state
Display mode
Export
Data selection
Graph
Node label text
Edge label text
Node size
Minimum
Maximum
Edge size
Minimum
Maximum
Nodes
Visibility
Size
Scaling factor
Position
Drag behavior
Hover behavior
Node images
Visibility
Size
Scaling factor
Node labels
Visibility
Size
Scaling factor
Rotation
Angle
Edges
Visibility
Size
Scaling factor
Form
Curvature
Hover behavior
Edge labels
Visibility
Size
Scaling factor
Rotation
Angle
Layout algorithm
Simulation
Algorithm
Parameters
Gravitational constant
Spring length
Spring constant
Avoid overlap
Central gravity