Sunday, February 24, 2013

Implementation of DBSCAN Algorithm for Density-based clustering in Python

DBSCAN (for density-based spatial clustering of applications with noise) is a data clustering algorithm proposed by Martin Ester, Hans-Peter Kriegel, Jörg Sander and Xiaowei Xu in 1996. It is a density-based clustering algorithm because it finds a number of clusters starting from the estimated density distribution of corresponding nodes.

The given code implements the DBSCAN Algorithm as specified in the book - Introduction to Data Mining by Tan, Steinbach and Kumar. You can study more about the algorithm in the book or other online materials.

For implementing the algorithm, I have generated around 100 random cartesian coordinates and created density-based clusters. Here is the sample output after implementing the algorithm:


To run this code you require: Python 2.7 with matplotlib library.

 

from matplotlib.pyplot import *  
 from collections import defaultdict  
 import random  
   
 #function to calculate distance  
 def dist(p1, p2):  
   return ((p1[0]-p2[0])**2+ (p1[1]-p2[1])**2)**(0.5)  
   
 #randomly generate around 100 cartesian coordinates  
 all_points=[]  
   
 for i in range(100):  
   randCoord = [random.randint(1,50), random.randint(1,50)]  
   if not randCoord in all_points:  
     all_points.append(randCoord)  
   
   
 #take radius = 8 and min. points = 8  
 E = 8  
 minPts = 8  
   
 #find out the core points  
 other_points =[]  
 core_points=[]  
 plotted_points=[]  
 for point in all_points:  
   point.append(0) # assign initial level 0  
   total = 0  
   for otherPoint in all_points:  
     distance = dist(otherPoint,point)  
     if distance<=E:  
       total+=1  
   
   if total > minPts:  
     core_points.append(point)  
     plotted_points.append(point)  
   else:  
     other_points.append(point)  
   
 #find border points  
 border_points=[]  
 for core in core_points:  
   for other in other_points:  
     if dist(core,other)<=E:  
       border_points.append(other)  
       plotted_points.append(other)  
   
   
 #implement the algorithm  
 cluster_label=0  
   
 for point in core_points:  
   if point[2]==0:  
     cluster_label+=1  
     point[2]=cluster_label  
   
   for point2 in plotted_points:  
     distance = dist(point2,point)  
     if point2[2] ==0 and distance<=E:  
       print point, point2  
       point2[2] =point[2]  
   
   
 #after the points are asssigned correnponding labels, we group them  
 cluster_list = defaultdict(lambda: [[],[]])  
 for point in plotted_points:  
   cluster_list[point[2]][0].append(point[0])  
   cluster_list[point[2]][1].append(point[1])  
   
 markers = ['+','*','.','d','^','v','>','<','p']  
   
 #plotting the clusters  
 i=0  
 print cluster_list  
 for value in cluster_list:  
   cluster= cluster_list[value]  
   plot(cluster[0], cluster[1],markers[i])  
   i = i%10+1  
   
 #plot the noise points as well  
 noise_points=[]  
 for point in all_points:  
   if not point in core_points and not point in border_points:  
     noise_points.append(point)  
 noisex=[]  
 noisey=[]  
 for point in noise_points:  
   noisex.append(point[0])  
   noisey.append(point[1])  
 plot(noisex, noisey, "x")  
   
 title(str(len(cluster_list))+" clusters created with E ="+str(E)+" Min Points="+str(minPts)+" total points="+str(len(all_points))+" noise Points = "+ str(len(noise_points)))  
 axis((0,60,0,60))  
 show()  

4 comments:

  1. Hey there! Quick question that's completely off topic.
    Do you know how to make your site mobile friendly? My site looks weird when viewing from my iphone 4.
    I'm trying to find a template or plugin that might be able to resolve this issue.
    If you have any recommendations, please share. With thanks!


    Visit my web site: marketing

    ReplyDelete
  2. The people with adverse credit history for example
    CCJs, IVAs, arrears, late payments, skipped installments, defaults etc.
    You have no avenue or resolution should they decide to stop making the payments.
    we organizing package tours for Delhi, package tours for Agra,
    package tours for Jaipur, Udaipur, Bikkaner (Rajasthan), Shimla, Manali, Amritsar, Dharmshala
    and Nepal also.

    my webb blog: rental mobil murah depok

    ReplyDelete
  3. Hey this is kinda of off topic but I was wondering if
    blogs use WYSIWYG editors or if you have to manually code with HTML.
    I'm starting a blog soon but have no coding skills so I wanted
    to get guidance from someone with experience. Any help would be enormously appreciated!



    My homepage Fancy Food

    ReplyDelete