détail des indices de similitude
Les indices de similitudes proposés dans iramuteq sont ceux disponibles dans la librairie proxy écrit par David Meyer et Christian Buchta. La description des indices suivante est extraite de la documentation de cette librairie.
introduction
id | var1 | var2 | var3 |
---|---|---|---|
1 | 1 | 0 | 1 |
2 | 0 | 1 | 1 |
3 | 1 | 1 | 0 |
4 | 0 | 0 | 1 |
5 | 0 | 1 | 1 |
Tableau 1
var1 | |||
---|---|---|---|
1 | 0 | ||
var2 | 1 | a | b |
0 | c | d |
Tableau 2
n = a + b + c + d = nombre de ligne du tableau
indices
cooccurrence
a
pourcentage de cooccurrence
voir Russel
Jaccard
names Jaccard, binary, Reyssac, Roux type binary loop FALSE formula a / (a + b + c) reference Jaccard, P. (1908). Nouvelles recherches sur la distribution florale. Bull. Soc. Vaud. Sci. Nat., 44, pp. 223--270. description The Jaccard Similarity (C implementation) for binary data. It is the proportion of (TRUE, TRUE) pairs, but not considering (FALSE, FALSE) pairs. So it compares the intersection with the union of object sets.
Kulczynski1
names Kulczynski1 type binary loop TRUE formula a / (b + c) reference Kurzcynski, T.W. (1970). Generalized distance and discrete variables. Biometrics, 26, pp. 525--534. description Kulczynski Similarity for binary data. Relates the (TRUE, TRUE) pairs to discordant pairs.
Kulczynski2
names Kulczynski2 type binary loop TRUE formula [a / (a + b) + a / (a + c)] / 2 reference Kurzcynski, T.W. (1970). Generalized distance and discrete variables. Biometrics, 26, pp. 525--534. description Kulczynski Similarity for binary data. Relates the (TRUE, TRUE) pairs to the discordant pairs.
Mountford
names Mountford type binary loop TRUE formula 2a / (ab + ac + 2bc) reference Mountford, M.D. (1962). An index of similarity and its application to classificatory probems. In P.W. Murphy (ed.), Progress in Soil Zoology, pp. 43--50. Butterworth, London. description The Mountford Similarity for binary data.
Fager
names Fager, McGowan type binary loop TRUE formula a / sqrt( (a + b)(a + c) ) - 1 / 2 sqrt(a + c) reference Fager, E. W. and McGowan, J. A. (1963). Zooplankton species groups in the North Pacific. Science, N. Y. 140: 453-460 description The Fager / McGowan distance.
Russel
names Russel, Rao type binary loop TRUE formula a / n reference Russell, P.F., and Rao T.R. (1940). On habitat and association of species of anopheline larvae in southeastern, Madras, J. Malaria Inst. India 3, pp. 153--178 description The Russel/Rao Similarity for binary data. It is just the proportion of (TRUE, TRUE) pairs.
simple matching
names simple matching, Sokal/Michener type binary loop TRUE formula (a + d) / n reference Sokal, R.R., and Michener, C.D. (1958). A statistical method for evaluating systematic relationships. Univ. Kansas Sci. Bull., 39, pp. 1409--1438. description The Simple Matching Similarity or binary data. It is the proportion of concordant pairs.
Hamman
names Hamman type binary loop TRUE formula ([a + d] - [b + c]) / n reference Hamann, U. (1961). Merkmalbestand und Verwandtschaftsbeziehungen der Farinosae. Ein Beitrag zum System der Monokotyledonen. Willdenowia, 2, pp. 639-768. description The Hamman Matching Similarity for binary data. It is the proportion difference of the concordant and discordant pairs.
Faith
names Faith type binary loop TRUE formula (a + d/2) / n reference Belbin, L., Marshall, C. & Faith, D.P. (1983). Representing relationships by automatic assignment of colour. The Australian Computing Journal 15, 160-163. description The Faith similarity
Tanimoto
names Tanimoto, Rogers type binary loop TRUE formula (a + d) / (a + 2b + 2c + d) reference Rogers, D.J, and Tanimoto, T.T. (1960). A computer program for classifying plants. Science, 132, pp. 1115--1118. description The Rogers/Tanimoto Similarity for binary data. Similar to the simple matching coefficient, but putting double weight on the discordant pairs.
Dice
names Dice, Czekanowski, Sorensen type binary loop TRUE formula 2a / (2a + b + c) reference Dice, L.R. (1945). Measures of the amount of ecologic association between species. Ecolology, 26, pp. 297--302. description The Dice Similarity
Phi
names Phi type binary loop TRUE formula (ad - bc) / sqrt[(a + b)(c + d)(a + c)(b + d)] reference Sokal, R.R, and Sneath, P.H.A. (1963). Principles of numerical taxonomy. W.H. Freeman and Company, San Francisco. description The Phi Similarity (= Product-Moment-Correlation for binary variables)
Stiles
names Stiles type binary loop TRUE formula log(n(|ad-bc| - 0.5n)^2 / [(a + b)(c + d)(a + c)(b + d)]) reference Stiles, H.E. (1961). The association factor in information retrieval. Communictions of the ACM, 8, 1, pp. 271--279. description The Stiles Similarity (used for information retrieval). Identical to the logarithm of Krylov's distance.
Michael
names Michael type binary loop TRUE formula 4(ad - bc) / [(a + d)^2 + (b + c)^2] reference Cox, T.F., and Cox, M.A.A. (2001). Multidimensional Scaling. Chapmann and Hall. description The Michael Similarity
Mozley
names Mozley, Margalef type binary loop TRUE formula an / (a + b)(a + c) reference Margalef, D.R. (1958). Information theory in ecology. Gen.Systems, 3, pp. 36--71. description The Mozley/Margalef Similarity
Yule
names Yule type binary loop TRUE formula (ad - bc) / (ad + bc) reference Yule, G.U. (1912). On measuring associations between attributes. J. Roy. Stat. Soc., 75, pp. 579--642. description Yule Similarity
Yule2
names Yule2 type binary loop TRUE formula (sqrt(ad) - sqrt(bc)) / (sqrt(ad) + sqrt(bc)) reference Yule, G.U. (1912). On measuring associations between attributes. J. Roy. Stat. Soc., 75, pp. 579--642. description Yule Similarity
Ochiai
names Ochiai type binary loop TRUE formula a / sqrt[(a + b)(a + c)] reference Sokal, R.R, and Sneath, P.H.A. (1963). Principles of numerical taxonomy. W.H. Freeman and Company, San Francisco. description The Ochiai Similarity
Simpson
names Simpson type binary loop TRUE formula a / min{(a + b), (a + c)} reference Simpson, G.G. (1960). Notes on the measurement of faunal resemblance. American Journal of Science 258-A: 300-311. description The Simpson Similarity (used in Zoology).
Braun-Blanquet
names Braun-Blanquet type binary loop TRUE formula a / max{(a + b), (a + c)} reference Braun-Blanquet, J. (1964): Pflanzensoziologie. Springer Verlag, Wien and New York. description The Braun-Blanquet Similarity (used in Biology).
Chi-squared
Phi-squared
Tschuprow
Cramer
Pearson
Actions sur le document