4 minute read

In this article, I’d like to preview our recent ICML paper.

Shota Saito and Mark Herbster. Multi-class Graph Clustering via Approximated Effective p-Resistance. In Proc. ICML pp. 29697–29733, 2023

To Exploit p More for Multi-Class ClusteringPermalink

We work towards multi-class clustering with better exploitation of “p.” This is the motto.

For clustering problems, we often consider the cut over a graph defined as follows. For a graph G=(V,E), the cut SG,2 is defined as

SG,2(x):=ijVaij|xixj|2.

Sometimes, aiming for generalization of this cut, we consider the p-energy as

SG,p(x):=ijVaij|xixj|p.

Note that these energies can be written as seminorm, which we call graph p-seminorm xG,p, as

xG,pp:=SG.p(x).

The most standard way to incorporate this p-energy is the spectral clustering of graph p-Laplacian. However, the graph p-Laplacian has a major drawback for the multi-class clustering; obtaining higher eigenpairs of graph p-Laplacian is practically difficult. The existing papers on p-Laplacian use a workaround for multi-class clustering.

This is difficult. In a difficult situation, there are two ways we can choose; going forward or taking an alternative path. Going forward is always a hero’s move. However, the difficulty of p-Laplacian is actually a paved way to hell; although the graph p-Laplacian has a broader research community, such as math and physics, and way longer history than ML, but still difficult. Therefore, the hero’s move actually leads to a solid unsolved problem, and the rational decision is to take an alternative path.

We want to exploit p more for multi-class clustering, more than the existing workarounds on p-Laplacian. For this purpose, we consider an alternative; p-resistance. The p-resistance is defined as

rG,p(i,j):=1/(minx{SG,p(x)s.t xixj=1}),()

The benefit of p-resistance is that p-resistance has a triangle inequality, such as

rG,p1/(p1)(i,j)rG,p1/(p1)(i,)+rG,p1/(p1)(,j).

Now, seeing these distance characteristics, some natural ideas for multi-class graph clustering have arisen. We use 1/(p1)-th power of p-resistance as a distance, as we apply some distance-based clustering methods, such as k-center and k-medoids.

However, still, the problem remains. It is computationally expensive to compute all the pairs of p-resistance since, for each pair, we need to solve the optimization problem from scratch. To address this problem, we propose an approximation form of the p-resistance, as

rG,p(i,j)L+eiL+ejG,qp,()

where q satisfies 1/p+1/q=1. We show the quality bound of this approximation. Also, if the graph is a tree, this approximation becomes exact, which indicates that this representation is promising.

Equation () aids to compute the effective resistance faster for many pairs. With Equation (), we compute L+ first, and we obtain the effective resistance using Equation (), and then we can recycle L+.

Now, we have a way to compute p-resistance quickly. The triangle inequality itself does not guarantee the clustering result.

By varying p, the p-resistance captures different topologies of the graph. In the following example, we use the k-center algorithm on the distance obtained by p-resistance.

Since the p-resistance corresponds to the shortest path when p, we observe that the clustering result focuses more on path-based topology, that is, a preference for smaller shortest-path distances between vertices in the cluster. On the contrary, since the p-resistance corresponds to the st-cut when p1, the clustering result biases towards clusters with high internal connectivity, like a min-cut. Using this characteristic, we expect varying p tunes the clustering result somewhere suitable between cut-based and path-based.

We now also provide some theoretical justification for exploiting p-resistance for clustering.

Collection of “Clips”Permalink

Now, I quickly wrapped up what is going on in our ICML paper, but the above is a preview. Here, I have a confession to make.

Yes, I’ve never got some ideas from this kind of “paper blog.” So, I try like a “clip” of a stream from Twitch or youtube at a reasonable length and to highlight some point that itself can stand as an interesting point.

I have written four of this kind of “clip” articles.

1. Limitaition of Graph p-LaplacianPermalink

The first article discusses why the higher order eigenpairs of graph p-Laplacian are difficult.

2. Effective Resistance and GraphPermalink

The second article explains the very introductory stuff; how effective resistance is formulated.

3. p-Resistance and its ApproximationPermalink

The third article briefly explains what is the idea source of our recent paper approximation of p-resistance, how we obtain the guarantee, and briefly explains how p works in the multi-class clustering.

4. Limitation of Effective Resistance and How p Overcomes its LimitationPermalink

While people in graph ML heard effective resistance, I assume that many people, especially younger graph researchers, do not know the effective resistance. This might be because the effective resistance doesn’t dance at the center; instead, spectral clustering and then GNN take the spot. The fourth article explains the limitation of resistance, which I speculate why the effective resistance has not been in the center and how p overcomes the limitation.


Citation:

If you find this piece useful, please cite our paper.

@inproceedings{saito2023multi,
 title={Multi-class Graph Clustering via Approximated Effective $p$-Resistance.},
 author={Saito, Shota and Herbster, Mark},
 booktitle={Proc. ICML},
 pages = {29697--29733},
 year={2023}
}