Vous êtes ici : Accueil / Documentation / détail des indices de similitude

détail des indices de similitude

Les indices de similitudes proposés dans iramuteq sont ceux disponibles dans la librairie proxy écrit par David Meyer et Christian Buchta. La description des indices suivante est extraite de la documentation de cette librairie.

introduction

 

idvar1var2var3
1 1 0 1
2 0 1 1
3 1 1 0
4 0 0 1
5 0 1 1

Tableau 1

var1
1 0
var2 1 a b
0 c d

Tableau 2

n = a + b + c + d = nombre de ligne du tableau

indices

cooccurrence

 a 

pourcentage de cooccurrence

voir Russel

Jaccard

      names Jaccard, binary, Reyssac, Roux
       type binary
       loop FALSE
    formula a / (a + b + c)
  reference Jaccard, P. (1908). Nouvelles recherches sur la
            distribution florale. Bull. Soc. Vaud. Sci. Nat., 44, pp.
            223--270.
description The Jaccard Similarity (C implementation) for binary data.
            It is the proportion of (TRUE, TRUE) pairs, but not
            considering (FALSE, FALSE) pairs. So it compares the
            intersection with the union of object sets.

Kulczynski1

      names Kulczynski1
       type binary
       loop TRUE
    formula a / (b + c)
  reference Kurzcynski, T.W. (1970). Generalized distance and discrete
            variables. Biometrics, 26, pp. 525--534.
description Kulczynski Similarity for binary data. Relates the (TRUE,
            TRUE) pairs to discordant pairs.

Kulczynski2

      names Kulczynski2
       type binary
       loop TRUE
    formula [a / (a + b) + a / (a + c)] / 2
  reference Kurzcynski, T.W. (1970). Generalized distance and discrete
            variables. Biometrics, 26, pp. 525--534.
description Kulczynski Similarity for binary data. Relates the (TRUE,
            TRUE) pairs to the discordant pairs.

Mountford

      names Mountford
       type binary
       loop TRUE
    formula 2a / (ab + ac + 2bc)
  reference Mountford, M.D. (1962). An index of similarity and its
            application to classificatory probems. In P.W. Murphy
            (ed.), Progress in Soil Zoology, pp. 43--50. Butterworth,
            London.
description The Mountford Similarity for binary data.

Fager

      names Fager, McGowan
       type binary
       loop TRUE
    formula a / sqrt( (a + b)(a + c) ) - 1 / 2 sqrt(a + c)
  reference Fager, E. W. and McGowan, J. A. (1963). Zooplankton species
            groups in the North Pacific. Science, N. Y. 140: 453-460
description The Fager / McGowan distance.

Russel

      names Russel, Rao
       type binary
       loop TRUE
    formula a / n
  reference Russell, P.F., and Rao T.R. (1940). On habitat and
            association of species of anopheline larvae in
            southeastern, Madras, J. Malaria Inst. India 3, pp.
            153--178
description The Russel/Rao Similarity for binary data. It is just the
            proportion of (TRUE, TRUE) pairs.

simple matching

      names simple matching, Sokal/Michener
       type binary
       loop TRUE
    formula (a + d) / n
  reference Sokal, R.R., and Michener, C.D. (1958). A statistical
            method for evaluating systematic relationships. Univ.
            Kansas Sci. Bull., 39, pp. 1409--1438.
description The Simple Matching Similarity or binary data. It is the
            proportion of concordant pairs.

Hamman

      names Hamman
       type binary
       loop TRUE
    formula ([a + d] - [b + c]) / n
  reference Hamann, U. (1961). Merkmalbestand und
            Verwandtschaftsbeziehungen der Farinosae. Ein Beitrag zum
            System der Monokotyledonen. Willdenowia, 2, pp. 639-768.
description The Hamman Matching Similarity for binary data. It is the
            proportion difference of the concordant and discordant
            pairs.

Faith

      names Faith
       type binary
       loop TRUE
    formula (a + d/2) / n
  reference Belbin, L., Marshall, C. & Faith, D.P. (1983). Representing
            relationships by automatic assignment of colour. The
            Australian Computing Journal 15, 160-163.
description The Faith similarity

Tanimoto

      names Tanimoto, Rogers
       type binary
       loop TRUE
    formula (a + d) / (a + 2b + 2c + d)
  reference Rogers, D.J, and Tanimoto, T.T. (1960). A computer program
            for classifying plants. Science, 132, pp. 1115--1118.
description The Rogers/Tanimoto Similarity for binary data. Similar to
            the simple matching coefficient, but putting double weight
            on the discordant pairs.

Dice

      names Dice, Czekanowski, Sorensen
       type binary
       loop TRUE
    formula 2a / (2a + b + c)
  reference Dice, L.R. (1945). Measures of the amount of ecologic
            association between species. Ecolology, 26, pp. 297--302.
description The Dice Similarity

Phi

      names Phi
       type binary
       loop TRUE
    formula (ad - bc) / sqrt[(a + b)(c + d)(a + c)(b + d)]
  reference Sokal, R.R, and Sneath, P.H.A. (1963). Principles of
            numerical taxonomy. W.H. Freeman and Company, San
            Francisco.
description The Phi Similarity (= Product-Moment-Correlation for binary
            variables)

Stiles

      names Stiles
       type binary
       loop TRUE
    formula log(n(|ad-bc| - 0.5n)^2 / [(a + b)(c + d)(a + c)(b + d)])
  reference Stiles, H.E. (1961). The association factor in information
            retrieval. Communictions of the ACM, 8, 1, pp. 271--279.
description The Stiles Similarity (used for information retrieval).
            Identical to the logarithm of Krylov's distance.

Michael

      names Michael
       type binary
       loop TRUE
    formula 4(ad - bc) / [(a + d)^2 + (b + c)^2]
  reference Cox, T.F., and Cox, M.A.A. (2001). Multidimensional
            Scaling. Chapmann and Hall.
description The Michael Similarity

Mozley

      names Mozley, Margalef
       type binary
       loop TRUE
    formula an / (a + b)(a + c)
  reference Margalef, D.R. (1958). Information theory in ecology. Gen.Systems, 3, pp. 36--71.
description The Mozley/Margalef Similarity

Yule

      names Yule
       type binary
       loop TRUE
    formula (ad - bc) / (ad + bc)
  reference Yule, G.U. (1912). On measuring associations between
            attributes. J. Roy. Stat. Soc., 75, pp. 579--642.
description Yule Similarity

Yule2

      names Yule2
       type binary
       loop TRUE
    formula (sqrt(ad) - sqrt(bc)) / (sqrt(ad) + sqrt(bc))
  reference Yule, G.U. (1912). On measuring associations between
            attributes. J. Roy. Stat. Soc., 75, pp. 579--642.
description Yule Similarity

Ochiai

      names Ochiai
       type binary
       loop TRUE
    formula a / sqrt[(a + b)(a + c)]
  reference Sokal, R.R, and Sneath, P.H.A. (1963). Principles of
            numerical taxonomy. W.H. Freeman and Company, San
            Francisco.
description The Ochiai Similarity

Simpson

      names Simpson
       type binary
       loop TRUE
    formula a / min{(a + b), (a + c)}
  reference Simpson, G.G. (1960). Notes on the measurement of faunal
            resemblance. American Journal of Science 258-A: 300-311.
description The Simpson Similarity (used in Zoology).

Braun-Blanquet

      names Braun-Blanquet
       type binary
       loop TRUE
    formula a / max{(a + b), (a + c)}
  reference Braun-Blanquet, J. (1964): Pflanzensoziologie. Springer
            Verlag, Wien and New York.
description The Braun-Blanquet Similarity (used in Biology).

Chi-squared

Phi-squared

Tschuprow

Cramer

Pearson

Actions sur le document