1 names Jaccard, binary, Reyssac, Roux
4 PREFUN pr_Jaccard_prefun
11 formula a / (a + b + c)
12 reference Jaccard, P. (1908). Nouvelles recherches sur la
13 distribution florale. Bull. Soc. Vaud. Sci. Nat., 44, pp.
15 description The Jaccard Similarity (C implementation) for binary data.
16 It is the proportion of (TRUE, TRUE) pairs, but not
17 considering (FALSE, FALSE) pairs. So it compares the
18 intersection with the union of object sets.
31 reference Kurzcynski, T.W. (1970). Generalized distance and discrete
32 variables. Biometrics, 26, pp. 525--534.
33 description Kulczynski Similarity for binary data. Relates the (TRUE,
34 TRUE) pairs to discordant pairs.
46 formula [a / (a + b) + a / (a + c)] / 2
47 reference Kurzcynski, T.W. (1970). Generalized distance and discrete
48 variables. Biometrics, 26, pp. 525--534.
49 description Kulczynski Similarity for binary data. Relates the (TRUE,
50 TRUE) pairs to the discordant pairs.
62 formula 2a / (ab + ac + 2bc)
63 reference Mountford, M.D. (1962). An index of similarity and its
64 application to classificatory probems. In P.W. Murphy
65 (ed.), Progress in Soil Zoology, pp. 43--50. Butterworth,
67 description The Mountford Similarity for binary data.
79 formula a / sqrt((a + b)(a + c)) - 1 / 2 sqrt(a + c)
80 reference Fager, E. W. and McGowan, J. A. (1963). Zooplankton species
81 groups in the North Pacific. Science, N. Y. 140: 453-460
82 description The Fager / McGowan distance.
95 reference Russell, P.F., and Rao T.R. (1940). On habitat and
96 association of species of anopheline larvae in
97 southeastern, Madras, J. Malaria Inst. India 3, pp.
99 description The Russel/Rao Similarity for binary data. It is just the
100 proportion of (TRUE, TRUE) pairs.
102 names simple matching, Sokal/Michener
103 FUN pr_SimpleMatching
107 convert pr_simil2dist
113 reference Sokal, R.R., and Michener, C.D. (1958). A statistical
114 method for evaluating systematic relationships. Univ.
115 Kansas Sci. Bull., 39, pp. 1409--1438.
116 description The Simple Matching Similarity or binary data. It is the
117 proportion of concordant pairs.
124 convert pr_simil2dist
129 formula ([a + d] - [b + c]) / n
130 reference Hamann, U. (1961). Merkmalbestand und
131 Verwandtschaftsbeziehungen der Farinosae. Ein Beitrag zum
132 System der Monokotyledonen. Willdenowia, 2, pp. 639-768.
133 description The Hamman Matching Similarity for binary data. It is the
134 proportion difference of the concordant and discordant
142 convert pr_simil2dist
147 formula (a + d/2) / n
148 reference Belbin, L., Marshall, C. & Faith, D.P. (1983). Representing
149 relationships by automatic assignment of colour. The
150 Australian Computing Journal 15, 160-163.
151 description The Faith similarity
153 names Tanimoto, Rogers
154 FUN pr_RogersTanimoto
158 convert pr_simil2dist
163 formula (a + d) / (a + 2b + 2c + d)
164 reference Rogers, D.J, and Tanimoto, T.T. (1960). A computer program
165 for classifying plants. Science, 132, pp. 1115--1118.
166 description The Rogers/Tanimoto Similarity for binary data. Similar to
167 the simple matching coefficient, but putting double weight
168 on the discordant pairs.
170 names Dice, Czekanowski, Sorensen
175 convert pr_simil2dist
180 formula 2a / (2a + b + c)
181 reference Dice, L.R. (1945). Measures of the amount of ecologic
182 association between species. Ecolology, 26, pp. 297--302.
183 description The Dice Similarity
190 convert pr_simil2dist
195 formula (ad - bc) / sqrt[(a + b)(c + d)(a + c)(b + d)]
196 reference Sokal, R.R, and Sneath, P.H.A. (1963). Principles of
197 numerical taxonomy. W.H. Freeman and Company, San
199 description The Phi Similarity (= Product-Moment-Correlation for binary
207 convert pr_simil2dist
212 formula log(n(|ad-bc| - 0.5n)^2 / [(a + b)(c + d)(a + c)(b + d)])
213 reference Stiles, H.E. (1961). The association factor in information
214 retrieval. Communictions of the ACM, 8, 1, pp. 271--279.
215 description The Stiles Similarity (used for information retrieval).
216 Identical to the logarithm of Krylov's distance.
223 convert pr_simil2dist
228 formula 4(ad - bc) / [(a + d)^2 + (b + c)^2]
229 reference Cox, T.F., and Cox, M.A.A. (2001). Multidimensional
230 Scaling. Chapmann and Hall.
231 description The Michael Similarity
233 names Mozley, Margalef
234 FUN pr_MozleyMargalef
238 convert pr_simil2dist
243 formula an / (a + b)(a + c)
244 reference Margalef, D.R. (1958). Information theory in ecology. Gen.
245 Systems, 3, pp. 36--71.
246 description The Mozley/Margalef Similarity
253 convert pr_simil2dist
258 formula (ad - bc) / (ad + bc)
259 reference Yule, G.U. (1912). On measuring associations between
260 attributes. J. Roy. Stat. Soc., 75, pp. 579--642.
261 description Yule Similarity
268 convert pr_simil2dist
273 formula (sqrt(ad) - sqrt(bc)) / (sqrt(ad) + sqrt(bc))
274 reference Yule, G.U. (1912). On measuring associations between
275 attributes. J. Roy. Stat. Soc., 75, pp. 579--642.
276 description Yule Similarity
283 convert pr_simil2dist
288 formula a / sqrt[(a + b)(a + c)]
289 reference Sokal, R.R, and Sneath, P.H.A. (1963). Principles of
290 numerical taxonomy. W.H. Freeman and Company, San
292 description The Ochiai Similarity
299 convert pr_simil2dist
304 formula a / min{(a + b), (a + c)}
305 reference Simpson, G.G. (1960). Notes on the measurement of faunal
306 resemblance. American Journal of Science 258-A: 300-311.
307 description The Simpson Similarity (used in Zoology).
314 convert pr_simil2dist
319 formula a / max{(a + b), (a + c)}
320 reference Braun-Blanquet, J. (1964): Pflanzensoziologie. Springer
321 Verlag, Wien and New York.
322 description The Braun-Blanquet Similarity (used in Biology).
323 #########################################################################@
324 names cosine, angular
329 convert pr_simil2dist
334 formula xy / sqrt(xx * yy)
335 reference Anderberg, M.R. (1973). Cluster Analysis for Applicaitons.
337 description The cos Similarity (C implementation)
338 names eJaccard, extended_Jaccard
341 PREFUN pr_eJaccard_prefun
343 convert pr_simil2dist
348 formula xy / (xx + yy - xy)
349 reference Strehl A. and Ghosh J. (2000). Value-based customer
350 grouping from large retail data-sets. In Proc. SPIE
351 Conference on Data Mining and Knowledge Discovery, Orlando,
352 volume 4057, pages 33-42. SPIE.
353 description The extended Jaccard Similarity (C implementation; yields
354 Jaccard for binary x,y).
355 names fJaccard, fuzzy_Jaccard
358 PREFUN pr_fJaccard_prefun
360 convert pr_simil2dist
365 formula sum_i (min{x_i, y_i} / max{x_i, y_i})
366 reference Miyamoto S. (1990). Fuzzy sets in information retrieval and
367 cluster analysis, Kluwer Academic Publishers, Dordrecht.
368 description The fuzzy Jaccard Similarity (C implementation).
374 convert pr_simil2dist
379 formula xy / sqrt(xx * yy) for centered x,y
380 reference Anderberg, M.R. (1973). Cluster Analysis for Applicaitons.
382 description correlation (taking n instead of n-1 for the variance)
383 ######################################################################
389 convert pr_simil2dist
394 formula sum_ij (o_i - e_i)^2 / e_i
395 reference Anderberg, M.R. (1973). Cluster Analysis for Applicaitons.
397 description Sum of standardized squared deviations from observed and
398 expected values in a cross-tab for x and y.
405 convert pr_simil2dist
410 formula [sum_ij (o_i - e_i)^2 / e_i] / n
411 reference Anderberg, M.R. (1973). Cluster Analysis for Applicaitons.
413 description Standardized Chi-Squared (= Chi / n).
420 convert pr_simil2dist
425 formula sqrt{[sum_ij (o_i - e_i)^2 / e_i] / n / sqrt((p - 1)(q -
427 reference Tschuprow, A.A. (1925). Grundbegriffe und Grundprobleme der
428 Korrelationstheorie. Springer.
429 description Tschuprow-standardization of Chi-Squared.
436 convert pr_simil2dist
441 formula sqrt{[Chi / n)] / min[(p - 1), (q - 1)]}
442 reference Cramer, H. (1946). The elements of probability theory and
443 some of its applications. Wiley, New York.
444 description Cramer-standization of Chi-Squared.
446 names Pearson, contingency
451 convert pr_simil2dist
456 formula sqrt{Chi / (n + Chi)}
457 reference Anderberg, M.R. (1973). Cluster Analysis for Applicaitons.
459 description Contingency Coefficient. Chi is the Chi-Squared statistic.
464 PREFUN pr_Gower_prefun
466 convert pr_simil2dist
471 formula Sum_k (s_ijk * w_k) / Sum_k (d_ijk * w_k)
472 reference Gower, J.C. (1971). A general coefficient of similarity and
473 some of its properties. Biometrics, 27, pp. 857--871.
474 description The Gower Similarity for mixed variable types. w_k are
475 variable weights. d_ijk is 0 for missings or a pair of
476 FALSE logicals, and 1 else. s_ijk is 1 for a pair of TRUE
477 logicals or matching factor levels, and the absolute
478 difference for metric variables. Each metric variable is
479 scaled with its corresponding range, provided the latter is
480 not 0. Ordinal variables are converted to ranks r_i and the
481 scores z_i = (r_i - 1) / (max r_i - 1) are taken as metric
482 variables. Note that in the latter case, unlike the
483 definition of Gower, just the internal integer codes are
484 taken as the ranks, and not what rank() would return. This
485 is for compatibility with daisy() of the cluster package,
486 and will make a slight difference in case of ties. The
487 weights w_k can be specified by passing a numeric vector
488 (recycled as needed) to the 'weights' argument. Ranges for
489 scaling the columns of x and y can be specified using the
490 'ranges.x'/'ranges.y' arguments (or simply 'ranges' for