Tutorial

Dimension Reduction - IsoMap

Updated on October 24, 2024
author

Ashwini Kumar Pal

Dimension Reduction - IsoMap
This tutorial is from a 7 part series on Dimension Reduction, and is based on outdated Python code from the original release in Oct. 2018. The theory within is still valid
  1. Understanding Dimension Reduction with Principal Component Analysis (PCA)
  2. Diving Deeper into Dimension Reduction with Independent Components Analysis (ICA)
  3. Multi-Dimension Scaling (MDS)
  4. LLE
  5. t-SNE
  6. IsoMap
  7. Autoencoders

(A jupyter notebook with math and code(spark) is available on github repo )

Introduction

Isomap stands for isometric mapping. Isomap is a non-linear dimensionality reduction method based on the spectral theory which tries to preserve the geodesic distances in the lower dimension. Isomap starts by creating a neighborhood network. After that, it uses graph distance to the approximate geodesic distance between all pairs of points. And then, through eigenvalue decomposition of the geodesic distance matrix, it finds the low dimensional embedding of the dataset. In non-linear manifolds, the Euclidean metric for distance holds good if and only if neighborhood structure can be approximated as linear. If neighborhood contains holes, then Euclidean distances can be highly misleading. In contrast to this, if we measure the distance between two points by following the manifold, we will have a better approximation of how far or near two points are. Let’s understand this with an extremely simple 2-D example. Suppose our data lies on a circular manifold in a 2-D structure like in the image below.

Prerequisites

In order to follow along with this article, you will need experience with Python code, and a beginners understanding of Deep Learning. We will operate under the assumption that all readers have access to sufficiently powerful machines, so they can run the code provided.

If you do not have access to a GPU, we suggest accessing it through the cloud. There are many cloud providers that offer GPUs. DigitalOcean GPU Droplets are currently in Early Availability, learn more and sign up for interest in GPU Droplets here.

For instructions on getting started with Python code, we recommend trying this beginners guide to set up your system and preparing to run beginner tutorials.

Why geodesic distances are better than the Euclidean distances in nonlinear manifolds?

We will reduce the data to 1-D using the Euclidean distances and approximate geodesic distances. Now, if we look at the 1-D mapping based on the Euclidean metric, we see that for points which are far apart(a & b) have been mapped poorly. Only the points which can be approximated to lie on a linear manifold(c & d) give satisfactory results. On the other hand, see the mapping with geodesic distances, it nicely approximates the close points as neighbors and far away points as distant. The geodesic distances between two points in the image are approximated by graph distance between the two points. Thus, Euclidean distances should not be used for approximating the distance between two points in non-linear manifolds while geodesic distances can be used.

Isomap uses the above principle to create a similarity matrix for eigenvalue decomposition. Unlike other non-linear dimensionality reduction like LLE & LPP which only use local information, isomap uses the local information to create a global similarity matrix. The isomap algorithm uses euclidean metrics to prepare the neighborhood graph. Then, it approximates the geodesic distance between two points by measuring shortest path between these points using graph distance. Thus, it approximates both global as well as the local structure of the dataset in the low dimensional embedding.

Let’s have a basic understanding of few concepts which we need to implement the Isomap algorithm.

Pregel API

Pregel is a distributed programming model developed by Google for processing large-scale graphs. It is the inspiration behind the Apache giraph project and GraphX library of spark. Pregel is basically a message-passing interface based on the idea that a vertex’s state should depend on its neighbors. A pregel computation takes as input a graph and a set of vertex states. At every iteration which is called superstep, it processes messages received at a vertex and updates the vertex state. After that, it decides which of its neighbors should receive the message at next superstep and what should be the message. Thus, messages are passed along edges and computation happens only at the vertices. The graph is not passed across the network only messages. Computation stops at maximum iterations or when no messages are left to pass. Let’s understand it using a simple example. Suppose, We need to find the degree of each vertex for the graph given below. Image shown below represents a single iteration of pregel model.

At initialization, every vertex’s degree is 0. We can send an empty message as an initial message to start the computation. At the end of superstep 1, Each vertex sends message 1 through each of its edges. At next superstep, each vertex sums the messages received and update its degree.

Classical MDS

Isomap is closely related to the original multidimensional scaling algorithm proposed by the Torgerson and Gower. In fact, it is an extension of the classical multidimensional scaling. The Classical multidimensional algorithm gives a closed form solution to the dimensionality reduction problem. Classical MDS uses the euclidean distances as the similarity metric while isomap uses geodesic distances. Steps of classical MDS are

  1. Create a matrix of squared dissimilarities Δ2(X) from the given X.
  2. Obtain the matrix B by double centring the dissimilarity matrix B = −0.5*(J*Δ2*J)
  3. Compute the eigenvalue decomposition of matrix B, BΔ=QΛQ‘
  4. Choose the K eigenvectors having K highest eigenvalues.

Steps of IsoMap

Isomap differs from classical MDS in initial few steps only. Instead of using euclidean metric for dissimilarity, it uses graph distances. Steps of the Isomap algorithm are:

  1. Neighbourhood graph: Create a neighborhood graph and adjacency matrix from the dataset.

  2. Dissimilarity Matrix: After neighborhood search, we will use spark’s graphX library for calculating the geodesic distances between the points. While creating our neighborhood network, we have to make sure that the resulting graph is a single connected component. If not, then our similarity matrix will remain incomplete and results will be incoherent. We need to iterate over the different values of neighborhood selection parameter to get the fully connected graph. As of now, spark does not have the shortest path function for the weighted graph. We will have to implement it. The code below presents a shortest path algorithm using pregel like api of graphX. The code is from the shortest path function for unweighted graphs of graphX lib. Functions addMaps and sendMessage have been modified to support weighted graphs.

def ShortestPath(Verts: RDD[(VertexId, imMap[Long, Double])],
          Edges: RDD[Edge[Double]], landmarks: Seq[Long] = Seq()):
                                                 Graph[imMap[Long,Double],Double] = {

     val g = Graph(Verts, Edges)

     type SPMap = Map[VertexId, Double]

     def makeMap(x: (VertexId, Double)*) = Map(x: _*)

     def incrementMap(spmap1: SPMap, spmap2: SPMap, d: Double): SPMap = {
             spmap1.map { case (k, v) =>
                 if (v + d < spmap2.getOrElse(k, Double.MaxValue)) k -> (v + d)
                 else -1L -> 0.0

             }

         }

     def addMaps(spmap1: SPMap, spmap2: SPMap): SPMap = {
         (spmap1.keySet ++ spmap2.keySet).map {
              k => k -> math.min(spmap1.
               getOrElse(k, Double.MaxValue),
               spmap2.getOrElse(k,
               Double.MaxValue))
              }(collection.breakOut)
          }

     var spGraph: Graph[imMap[Long,Double],Double]  = null

     if (landmarks.isEmpty){
         spGraph = g.mapVertices { (vid, attr) => makeMap(vid -> 0)}
     }
     else{
         spGraph = g.mapVertices { (vid, attr) =>
                  if (landmarks.contains(vid)) makeMap(vid -> 0) else makeMap()}
         }

     val initialMessage = makeMap()

     def vertexProgram(id: VertexId, attr: SPMap, msg: SPMap): SPMap = {
             addMaps(attr, msg)
     }

     def sendMessage(edge: EdgeTriplet[SPMap, Double]):
            Iterator[(VertexId, SPMap)] = {

         val newAttr = incrementMap(edge.srcAttr, edge.dstAttr, edge.attr) - (-1)

         if (!newAttr.isEmpty) Iterator((edge.dstId, newAttr))
     else Iterator.empty

     }

     val h = Pregel(spGraph, initialMessage)(vertexProgram, sendMessage, addMaps)

     return(h)
 }

Eigenvalue decomposition

Before eigenvalue decomposition, we have to square the distance and double center the squared similarities matrix. After eigenvalue decomposition select the first K eigenvectors with K highest eigenvalues.

What we have implemented is the vanilla version isomap. It requires a lot of time and computing power. It has two bottlenecks first Calculation of dissimilarity matrix requires O(N2) operations where N is the number of the samples and the second calculation of pairwise graph distances. If N is huge, which is true generally in case of big datasets, it becomes impractical. Solution to this problem is Landmark Isomap. Landmark isomap is based on landmark MDS. Landmark MDS selects a group of points termed as Landmarks and implements classical MDS on them. Based on the mapping obtained from classical MDS, remaining points are mapped in the low dimensional embedding using distance-based triangulation. Steps for Landmark classical scaling

  1. Selects landmarks points Xlandmarks
  2. Apply classical MDS on landmarks points and obtain low dimensional emebedding Lk
  3. calculate δu where δui is mean of ith row of dissimilarity matrix of landmark points.
  4. Given a vector xa calculate δa where δai is the squared distance between the point xa and the landmark point i
  5. Low dimensional embedding for the xa is given by ya=0.5*L-1k(δa−δu) where L-1k is the penrose moore inverse of the Lk Selection of landmark points can be random or through a specific method. For obtaining a K-dimensional embedding at least K+1 landmark points are needed. For reasons related to the stability of the algorithm, number of landmark points chosen should be more than strict minimum. The accuracy of isometric mapping in landmark isomap does not suffer much due to approximation in the algorithm.

Drawbacks of Isomaps

Isomap performs poorly when manifold is not well sampled and contains holes. As mentioned earlier neighborhood graph creation is tricky and slightly wrong parameters can produce bad results.

Conclusion

In this article, we discussed another manifold learning algorithm IsoMap(Isometric Mapping).

In the beginning of the post, we talked about what is isometric mapping and how it is different from other dimension reduction algorithms. Then, we had a brief discussion on pregel API. Later on, we implemented an isomap algorithm in scala using spark’s GraphX library. For people wishing to go deeper in distributed graph processing GraphX is a good starting point.

Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.

Learn more about our products

About the authors
Default avatar
Ashwini Kumar Pal

author

Still looking for an answer?

Ask a questionSearch for more help

Was this helpful?
 
Leave a comment


This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

Try DigitalOcean for free

Click below to sign up and get $200 of credit to try our products over 60 days!

Sign up

Join the Tech Talk
Success! Thank you! Please check your email for further details.

Please complete your information!

Featured on Community

Get our biweekly newsletter

Sign up for Infrastructure as a Newsletter.

Hollie's Hub for Good

Working on improving health and education, reducing inequality, and spurring economic growth? We'd like to help.

Become a contributor

Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.

Welcome to the developer cloud

DigitalOcean makes it simple to launch in the cloud and scale up as you grow — whether you're running one virtual machine or ten thousand.

Learn more