We want to use cosine similarity with hierarchical clustering and we have cosine similarities already calculated. In the sklearn.cluster.AgglomerativeClustering documentation it says:
A distance matrix (instead of a similarity matrix) is needed as input for the fit method.So, we converted cosine similarities to distances as
distance = 1 - similarity
Our python code produces error at the fit() method at the end. (I am not writing the real value of X in the code, since it is very big.) X is just a cosine similarity matrix with values converted to distance as written above. Notice the diagonal, it is all 0.) Here is the code:
import pandas as pd import numpy as np from sklearn.cluster import AgglomerativeClustering X = np.array([0,0.3,0.4],[0.3,0,0.7],[0.4,0.7,0]) cluster = AgglomerativeClustering(affinity='precomputed') cluster.fit(X)
The error is:
runfile('/Users/stackoverflowuser/Desktop/4.2/Pr/untitled0.py', wdir='/Users/stackoverflowuser/Desktop/4.2/Pr') Traceback (most recent call last): File "", line 1, in runfile('/Users/stackoverflowuser/Desktop/4.2/Pr/untitled0.py', wdir='/Users/stackoverflowuser/Desktop/4.2/Pr') File "/anaconda2/lib/python2.7/site-packages/spyder_kernels/customize/spydercustomize.py", line 704, in runfile execfile(filename, namespace) File "/anaconda2/lib/python2.7/site-packages/spyder_kernels/customize/spydercustomize.py", line 100, in execfile builtins.execfile(filename, *where) File "/Users/stackoverflowuser/Desktop/4.2/Pr/untitled0.py", line 84, in cluster.fit(X) File "/anaconda2/lib/python2.7/site-packages/sklearn/cluster/hierarchical.py", line 795, in fit (self.affinity, )) ValueError: precomputed was provided as affinity. Ward can only work with euclidean distances.
Is there anything that I can provide? Thanks already.