I had to illustrate a k-means algorithm for my thesis, but I could not find any existing examples that were both simple and looked good on paper. See below for Python code that does just what I wanted.
#!/usr/bin/python # Adapted from http://hackmap.blogspot.com/2007/09/k-means-clustering-in-scipy.html import numpyimport matplotlibmatplotlib.use('Agg')from scipy.cluster.vq import *import pylabpylab.close() # generate 3 sets of normally distributed points around# different means with different variancespt1 = numpy.random.normal(1, 0.2, (100,2))pt2 = numpy.random.normal(2, 0.5, (300,2))pt3 = numpy.random.normal(3, 0.3, (100,2)) # slightly move sets 2 and 3 (for a prettier output)pt2[:,0] += 1pt3[:,0] -= 0.5 xy = numpy.concatenate((pt1, pt2, pt3)) # kmeans for 3 clustersres, idx = kmeans2(numpy.array(zip(xy[:,0],xy[:,1])),3) colors = ([([0.4,1,0.4],[1,0.4,0.4],[0.1,0.8,1])[i] for i in idx]) # plot colored pointspylab.scatter(xy[:,0],xy[:,1], c=colors) # mark centroids as (X)pylab.scatter(res[:,0],res[:,1], marker='o', s = 500, linewidths=2, c='none')pylab.scatter(res[:,0],res[:,1], marker='x', s = 500, linewidths=2) pylab.savefig('/tmp/kmeans.png')The output looks like this (also available in vector format ):
The X’s mark cluster centers. Feel free to use any of these files for whatever purposes. An attribution would be nice, but is not required .