Outils personnels

Vous êtes ici : détail des indices de similitude

# détail des indices de similitude

Les indices de similitudes proposés dans iramuteq sont ceux disponibles dans la librairie proxy écrit par David Meyer et Christian Buchta. La description des indices suivante est extraite de la documentation de cette librairie.

## introduction

idvar1var2var3
1 1 0 1
2 0 1 1
3 1 1 0
4 0 0 1
5 0 1 1

Tableau 1

var1
1 0
var2 1 a b
0 c d

Tableau 2

n = a + b + c + d = nombre de ligne du tableau

## indices

` a `

### pourcentage de cooccurrence

`voir Russel`

### Jaccard

```      names Jaccard, binary, Reyssac, Roux
type binary
loop FALSE
formula a / (a + b + c)
reference Jaccard, P. (1908). Nouvelles recherches sur la
distribution florale. Bull. Soc. Vaud. Sci. Nat., 44, pp.
223--270.
description The Jaccard Similarity (C implementation) for binary data.
It is the proportion of (TRUE, TRUE) pairs, but not
considering (FALSE, FALSE) pairs. So it compares the
intersection with the union of object sets.```

### Kulczynski1

```      names Kulczynski1
type binary
loop TRUE
formula a / (b + c)
reference Kurzcynski, T.W. (1970). Generalized distance and discrete
variables. Biometrics, 26, pp. 525--534.
description Kulczynski Similarity for binary data. Relates the (TRUE,
TRUE) pairs to discordant pairs.```

### Kulczynski2

```      names Kulczynski2
type binary
loop TRUE
formula [a / (a + b) + a / (a + c)] / 2
reference Kurzcynski, T.W. (1970). Generalized distance and discrete
variables. Biometrics, 26, pp. 525--534.
description Kulczynski Similarity for binary data. Relates the (TRUE,
TRUE) pairs to the discordant pairs.```

### Mountford

```      names Mountford
type binary
loop TRUE
formula 2a / (ab + ac + 2bc)
reference Mountford, M.D. (1962). An index of similarity and its
application to classificatory probems. In P.W. Murphy
(ed.), Progress in Soil Zoology, pp. 43--50. Butterworth,
London.
description The Mountford Similarity for binary data.```

### Fager

```      names Fager, McGowan
type binary
loop TRUE
formula a / sqrt( (a + b)(a + c) ) - 1 / 2 sqrt(a + c)
reference Fager, E. W. and McGowan, J. A. (1963). Zooplankton species
groups in the North Pacific. Science, N. Y. 140: 453-460
description The Fager / McGowan distance.```

### Russel

```      names Russel, Rao
type binary
loop TRUE
formula a / n
reference Russell, P.F., and Rao T.R. (1940). On habitat and
association of species of anopheline larvae in
southeastern, Madras, J. Malaria Inst. India 3, pp.
153--178
description The Russel/Rao Similarity for binary data. It is just the
proportion of (TRUE, TRUE) pairs.```

### simple matching

```      names simple matching, Sokal/Michener
type binary
loop TRUE
formula (a + d) / n
reference Sokal, R.R., and Michener, C.D. (1958). A statistical
method for evaluating systematic relationships. Univ.
Kansas Sci. Bull., 39, pp. 1409--1438.
description The Simple Matching Similarity or binary data. It is the
proportion of concordant pairs.```

### Hamman

```      names Hamman
type binary
loop TRUE
formula ([a + d] - [b + c]) / n
reference Hamann, U. (1961). Merkmalbestand und
Verwandtschaftsbeziehungen der Farinosae. Ein Beitrag zum
System der Monokotyledonen. Willdenowia, 2, pp. 639-768.
description The Hamman Matching Similarity for binary data. It is the
proportion difference of the concordant and discordant
pairs.```

### Faith

```      names Faith
type binary
loop TRUE
formula (a + d/2) / n
reference Belbin, L., Marshall, C. & Faith, D.P. (1983). Representing
relationships by automatic assignment of colour. The
Australian Computing Journal 15, 160-163.
description The Faith similarity```

### Tanimoto

```      names Tanimoto, Rogers
type binary
loop TRUE
formula (a + d) / (a + 2b + 2c + d)
reference Rogers, D.J, and Tanimoto, T.T. (1960). A computer program
for classifying plants. Science, 132, pp. 1115--1118.
description The Rogers/Tanimoto Similarity for binary data. Similar to
the simple matching coefficient, but putting double weight
on the discordant pairs.```

### Dice

```      names Dice, Czekanowski, Sorensen
type binary
loop TRUE
formula 2a / (2a + b + c)
reference Dice, L.R. (1945). Measures of the amount of ecologic
association between species. Ecolology, 26, pp. 297--302.
description The Dice Similarity```

### Phi

```      names Phi
type binary
loop TRUE
formula (ad - bc) / sqrt[(a + b)(c + d)(a + c)(b + d)]
reference Sokal, R.R, and Sneath, P.H.A. (1963). Principles of
numerical taxonomy. W.H. Freeman and Company, San
Francisco.
description The Phi Similarity (= Product-Moment-Correlation for binary
variables)```

### Stiles

```      names Stiles
type binary
loop TRUE
formula log(n(|ad-bc| - 0.5n)^2 / [(a + b)(c + d)(a + c)(b + d)])
reference Stiles, H.E. (1961). The association factor in information
retrieval. Communictions of the ACM, 8, 1, pp. 271--279.
description The Stiles Similarity (used for information retrieval).
Identical to the logarithm of Krylov's distance.```

### Michael

```      names Michael
type binary
loop TRUE
formula 4(ad - bc) / [(a + d)^2 + (b + c)^2]
reference Cox, T.F., and Cox, M.A.A. (2001). Multidimensional
Scaling. Chapmann and Hall.
description The Michael Similarity```

### Mozley

```      names Mozley, Margalef
type binary
loop TRUE
formula an / (a + b)(a + c)
reference Margalef, D.R. (1958). Information theory in ecology. Gen.Systems, 3, pp. 36--71.
description The Mozley/Margalef Similarity```

### Yule

```      names Yule
type binary
loop TRUE
reference Yule, G.U. (1912). On measuring associations between
attributes. J. Roy. Stat. Soc., 75, pp. 579--642.
description Yule Similarity```

### Yule2

```      names Yule2
type binary
loop TRUE
reference Yule, G.U. (1912). On measuring associations between
attributes. J. Roy. Stat. Soc., 75, pp. 579--642.
description Yule Similarity```

### Ochiai

```      names Ochiai
type binary
loop TRUE
formula a / sqrt[(a + b)(a + c)]
reference Sokal, R.R, and Sneath, P.H.A. (1963). Principles of
numerical taxonomy. W.H. Freeman and Company, San
Francisco.
description The Ochiai Similarity```

### Simpson

```      names Simpson
type binary
loop TRUE
formula a / min{(a + b), (a + c)}
reference Simpson, G.G. (1960). Notes on the measurement of faunal
resemblance. American Journal of Science 258-A: 300-311.
description The Simpson Similarity (used in Zoology).```

### Braun-Blanquet

```      names Braun-Blanquet
type binary
loop TRUE
formula a / max{(a + b), (a + c)}
reference Braun-Blanquet, J. (1964): Pflanzensoziologie. Springer
Verlag, Wien and New York.
description The Braun-Blanquet Similarity (used in Biology).```

### Pearson

Actions sur le document