Multdimensional Scaling

Dominance approach vs. ideal-point approach in item selection

2009-04-23T14:46:00.000-07:00

Dominance approach (Coombs, 1964; Likert, 1932)

It is about measuring people's ability
It uses items of high internal consistency.
Therefore, if a person scores low on one item, he/she should be score low on the total scores as well. Likewise, if I score higher on the item than you do, my ability would be dominant over your ability.
In IRT terminology, DIF (Differential Item Functioning) refers to "a difference in the probability of endorsing an item for members of a reference group (e.g., US workers) and a focal group (e.g., Chinese workers) having the same standing on the latent attribute measured by a test." It is related to dominance approach.

Ideal-point approach (Thurstone, 1928)

It is about measuring people's attitude
Individuals will endorse an item to the degree that it reflects
More neutral items should be included

Metric MDS and software

2009-04-21T14:27:00.000-07:00

Metric MDS include the followings (Borg & Groenen, 2005, p. 203):

ratio MDS:

(disparities) = b * (proximities in terms of dissimilarities; short for 'prox' below)

interval MDS:

(disparities) = a + b * (prox)

logarithmic MDS:

(disparities) = log(prox)
(disparities) = b * log(prox)
(disparities) = a + b * log(prox)

exponential MDS

(disparities) = exp(prox)
(disparities) = b * exp(prox)
(disparities) = a + b * exp(prox)

power MDS (which includes square root with q = 0.5):

(disparities) = (prox)^q
(disparities) = b * (prox)^q
(disparities) = a + b * (prox)^q

polynomial MDS (i.e., spline MDS without interior knots)

(disparities) = a + b * (prox) + c * (prox)^2
(disparities) = a + b * (prox) + c * (prox)^2 + d * (prox)^3

However, softwares are not always clear about the kinds of metric MDS they are performing. Based on my own testing as of 04/21/09, here is a table of comparison:

Software Package	Program, version, date	Metric MDS supported
MATLAB 7.8.0.347 (R2009a)	mdscale() 1.1.6.9, 12/01/08 Criterion = 'metricstress'	Ratio only
smacof in R 0.9-0 (05/24/08)	smacofSym(), metric = TRUE	Ratio only
SPSS 17.0.0 (08/23/08)	Proxscal version 1.0	Ratio, Interval, Spline
SYSTAT 12.02.00	Multidimensional Scaling Shape = Square (similarities model)	Interval (Linear), Log, Power

To date, no program in any of these software packages provide combinations of two or more than two transformations, but these could be very helpful. For example, log + polynomial may be of interest, because log may be used to normalize residuals, while polynomial may be able to pick up the trend of the data. That is,

(disparities) = a + b * log(prox) + c * log(prox)^2
(disparities) = a + b * log(prox) + c * log(prox)^2 + d * log(prox)^3

Eigendecomposition and Singular Value Decomposition

2009-04-18T06:02:00.000-07:00

Eigenvalue and eigenvector are those satisfying the following eigenequation:

matrix(transformation) * eigenvector = eigenvalue * eigenvector

Thus, if we can find such a eigenvector and therefore a eigenvalue, their interpretations are: after being linearly transformed by the matrix, eigenvector still has the same direction. Eigenvalue can thus be considered some essential part of the matrix, or the characteristic value of the matrix. Eigenvector can be considered a tool to extract such essential part of the matrix.

A nice explanation can be founded here; see also Borg and Groenen (2005) Chapter 7.

Eigendecomposition: matrix A = QΛQ'
Thus, AQ = QΛQ'Q = QΛ, where Λ is a diagonal matrix of eigenvalues

Singular Value Decomposition: matrix A = PΦQ'

P is a matrix of left singular vectors, Φ is a diagonal matrix with singular values, Q is a matrix of right singular vectors. The naming choice of "singular" probably is similar to that of "eigen", because the expressions of the two decompositions are very similar and probably referring to the essential and unique quality of the matrix.

Unfolding Models

2009-03-05T21:22:00.001-08:00

On the top of a folded handkerchief is the ideal point, representing the highest degree of preference for a particular individual, i.e., the optimal choice within a given set of items. The closer the item is to the ideal point, the higher the preference is of this individual; thus, the individual prefers choice 1 to choice 2.

While different individuals have different ideal points on the handkerchief, unfolding the handkerchief will give us a 2D diagram showing all ideal points and all the items on a common space.

Some applications of unfolding models (adapted from this):

Applicaton 1: In American Idol, a set of judges rate a set of contestants. Unfolding would display the ideal point of each judge as a point, and each contestant as a point. Three pieces of information will be revealed: (a) Judges with similar ideal points would cluster; (b) Contestants rated similarly would cluster; (c) The closeness between the ideal point of a judge and a contestant indicates how high the judge would rate the contestant.

Application 2: A set of TV brands (e.g., Panasonic, Sony, ...) were rated on a set of attributes (e.g., price, quality, style, ...). In the matrix, the rows are the brands and the columns are the attributes. Unfolding would display (the ideal point of) each brand as a point and each attribute as a point. Three pieces of information: (a) Similar brands (in terms of ideal points) would cluster; (b) Similar attributes would cluster; (c) Brands rated highly on a particular attribute would appear close to that attribute.

Application 3: Unfolding can also be used to display relationships that may not be symmetric, such as desire between people, trade-flows between nations, and journal citation frequency. Each journal would appear as both a row and a column. The matrix would contain the citation frequency of the row-journal by the column-journal. Self-citing is excluded. Unfolding would produce a diagram in which each journal would appear as two points: citing others and being cited by others. Clusters would have the obvious interpretation, and the distance between a journal’s two points would reflect the imbalances in its citation.

Other variants of unfolding models:

External unfolding models. Besides the preference data, we also have a pre-existing coordinate matrix of the choice objects.
Vector model of unfolding. Representing individuals by preference vectors instead of ideal points. Because it is the direction of the vector that matters, the preference vectors are usually scaled to have equal length.
Weighted unfolding.

Some terms and programs:

In marketing, unfolding model is known as perceptual mapping.
In marketing, MDPREF ("MultiDimensional PREFerence") performs internal unfolding analysis, whereas PREFMAP ("PREFerence MAPping") performs external unfolding analysis.

Procrustes analysis

2009-03-03T09:36:00.002-08:00

The purpose of Procrustes analysis is to fit one MDS solution (configuration, map), B, to another one, A, and eliminate superficial differences between B and A, by means of rotating, mirror-reflecting, dilating/magnifying, shrinking, or shifting/moving B, without changing either's shape.

Application 1. A is the physical location map, whereas B is the travel-time map produced by MDS. In Procrustes analysis, we fit B to A, which allows us to display B on the top of A and to spot differences.

Application 2. Y is easy to interpret, whereas the initial X is not. In Procrustes analysis, we fit X to Y in order to interpret X.

Application 3. F is the result from the female participants, whereas M is that from the male participants. In Procrustes analysis, we fit M to F (or F to M) so that we can compare the results from males and females on the same page (provided that the fitting is satisfactory).

Application 4. CH is is the result from Chinese participants, whereas AM is that from American participants. In Procrustes analysis, we fit CH to AM (or AM to CH) so that we can compare the cross-cultural results on the same page (provided that the fitting is satisfactory).

MDS and social psychology

2009-01-29T07:59:00.000-08:00

Searching JPSP by scholar. The 12 results found are categorized as the following:

A. Structure of Emotion

1. Russell (1980) A circumplex model of affect: 28 emotion-denoting adjectives are reduced to a 2D space: pleasure-displeasure and arousal-sleepiness.

In the same year, Russell and Pratt (1980) also talked about the two dimensions on the meaning that persons attribute to environments.
Russell and Bullock (1985) followed up on Russell (1980) to show that the two dimensions reveal a basic property of the human conception of emotions, rather than represent an artifact that is due to semantic relations learned along with the emotion lexicon.
Russell, Weiss, and Mendelsohn (1989) followed up to develop a single-item scale, the Affect Grid, to quickly assess affect along the dimensions of pleasure-displeasure and arousal-sleepiness.
Feldman (1995) interpreted the 2D as valence-focus and arousal-focus and suggested their relation to Positive Affect and Negative Affect.
Barrett (2004) followed up on Feldman (1995) to talk about how valence-focus and arousal-focus are related to cognitive structure of emotion language vs. phenomenological experience.
Extending Russell's model, Larsen, McGraw, and Cacioppo (2001) argued that people can feel happy and sad at the same time; they do not have to experience positive-negative emotions in a bipolar way.

B. Structure of Self-Other Relationship:

2. Falbo (1977) Multidimensional scaling of power strategies: 16 strategies of "How I Get My Way." reduced to a 2D space: (a) rational/nonrational and (b) direct/indirect.

3. Bartholomew and Horowitz (1991) examined a model of individual differences in adult attachment in which two underlying dimensions, the person's internal model of the self (positive or negative) and the person's internal model of others (positive or negative), were used to define four attachment patterns. (as seen in General Discussion)

4. Wiggins, Phillips, and Trapnell (1989) interpersonal circumplex: dominant/submissive and agreeable/cold-hearted.

Gurtman (1992) applied this to plot individuals' profiles of high/low trust and high/low Machiavellianism.

5. Walker and Hennig (2004) studied the underlying 2D for the three exemplars of morality: just, brave, and caring, and found different 2D for each of them.

6. Abele and Wojciszke (2007) found that a large number of trait names can be organized into the 2D space of agency and communion.

7. Grouzet et al. (2005) found that 11 types of goals can be organized into a 2D space of intrinsic (e.g., self-acceptance, affiliation) versus extrinsic (e.g., financial success, image), and self-transcendent (e.g., spirituality) versus physical (e.g., hedonism). This results has cross-cultural validity.

(Incomplete) list of MDS researchers

2009-01-28T14:03:00.000-08:00

Warren S. Torgerson:

former professor at John Hopkins
developed MDS while he was a PhD student
known for the classical scaling (aka., Torgerson scaling) in MDS
Solution from Torgerson scaling can be used as initial configuration; however, it is a rational configuration and is prone to local minima

Louis E. Guttman:

former president of the Psychometric Society
developed Guttman loss function in SYSTAT

Roger N. Shepard:

former president of the Psychometric Society
professor of cognitive psychology at Stanford University (Emeritus)
known for Shepard diagram

Joseph B. Kruskal:

former president of the Psychometric Society
former president of the Classification Society of North America
developed stress formula 1 and formula 2
developed the program of KYST (Kruskal, Young, & Seery, 1973)

Forrest W. Young:

former president of the Psychometric Society
professor of quantitative psychology at the University of North Carolina at Chapel Hill (Emeritus)
developer of ALSCAL (alternating least squares scaling) (available in SPSS)

J. Douglas Carroll:

former president of the Psychometric Society
professor of management and psychology at Rutgers University
developer of INDSCAL (individual differences scaling)

Jan de Leeuw:

former president of the Psychometric Society
developer of smacof package in R

Lawrence J. Hubert:

former president of the Psychometric Society
developer of combinatorial analysis
developer of dynamic programming
developer of city-block MDS

Ingwer Borg and Patrick J. F. Groenen:

authors of the Bible book of MDS: Borg, I., & Groenen, P.J.F. (2005). Modern multidimensional scaling. 2nd edition. New York: Springer.

Softwares

2009-01-28T13:59:00.000-08:00

Package: stats

cmdscale(): classical (metric) MDS (an example can be found here)

Package: proxy

dist(): distance matrix

Package: MASS (Mondern Applied Statistics in S)

isoMDS(): Kruskal's non-metric MDS (an example can be found here)
Shepard(): for drawing Shepard diagram
sammon(): Sammon's non-metric MDS (similar to Kruskal's non-metric MDS but independently developed)

Package: smacof (Scaling by MAjorizing a COplicated Function; a paper is here)

smacofSym(): for symmetric dissimilarity matrices
smacofRect(): for rectangular input matrices, i.e., unfolding
smacofIndDiff(): individual difference MDS
smacofSphere.primal(): projection of the resulting con gurations onto spheres
smacofSphere.dual(): indirect function to solve linear problems, sometimes faster than primal
sim2diss(): convert similarity matrix to dissimilarity matrix

Package: labdsv (Laboratory for Dynamic Synthetic Vegephenonenology)

nmds(): application of isoMDS()

Package: vegan (R functions for vegetation ecologists)

metaMDS(): an integration of initMDS(), isoMDS(), postMDS(), and wascores()
procrustes(): for the Procrustes Problem
wcmdscale(): weighted classical (metric) multidimensional scaling

Package: rggobi

ggobi(): interactive multidimensional scaling using ggobi and ggvis for display

Useful Links

Task View of the Comprehensive R Archive Network
Notes on the use of R for psychology experiments and questionnaires: here via here
R site search

SYSTAT:

use EM to estimate missing data in nonmetric unfolding model
power transformation (metric MDS)
log transformation (metric MDS)

PERMAP: a highly entertaining, interactive tool to explore perceptual mapping

SPSS: proxscal, prefscal, alscal

MATLAB: mdscale()

A more complete list of MDS softwares can be found here.

Internal and external analyses

2009-01-27T00:53:00.000-08:00

To facilitate the interpretation of the dimensions in the reduced space, we may do internal or external analyses.

In internal analysis, we use the same proximities data, run alternative analysis method (e.g., cluster analysis) with them, and embed the results within MDS. If different methods all converge to the same interpretation, then it is!

In external analysis ("property fitting"), we use supplementary data. Specifically, we may try to predict the property (collected on the objects) for object_i from the 2D coordinates for the objects through multiple regression.

For example, in a study, the objects are 14 stressful experiences relevant to early parenting, and the two dimensions are labeled as "major vs. minor child problems" and "child welfare vs. self-welfare". The external property is "infuriating", and we want to predict "infuriating" for each of the 14 objects from the 2D coordinates for the 14 objects, which results in a directed line. It is found that infuriating tends to be associated with the problems of self-ware as opposed to the welfare of the child.

In external analysis, we regress a given external attribute of the objects (e.g., "infuriating") on the 2D coordinates of the objects (i.e., dim 1 and dim 2), and the resulting unstandardized multiple regression coefficients form a point in the 2D space. A directed line is then drawn from the origin to that point. Evidently, the projections of the objects on this line give a set of 2D coordinates, (dim1, dim2), which correspond best to the external attribute (Borg & Gronen, 2005, pp.77-79).

The scaling: Basic concepts

2009-01-26T20:17:00.001-08:00

The goal of scaling is to minimize the dissimilarity of data between the original and the reduced space. Specifically,

p_ij is the proximity (typically, dissimilarity) between object_i and object_j in the original space, whereas d_ij is the Euclidean distance between object_i and object_j in the reduced space

We use a linear regression equation to predict d_ij from p_ij, and dhat_ij is the predicted value of d_ij. Then, we want to minimize the difference between d_ij and dhat_ij, using least squares. Here, we have the raw stress index (which we want to minimize):

Because the dimensions in the reduced space can be arbitrarily stretched or contracted, we normalize the raw stress index in order to achieve the following,

Also, a square root places the index in the same unit as d_ij, so we have the normalized stress index (which we want to minimize):

(Note. this is Kruskal's stress formula 1)

Typically, a monotone regression (aka., isotonic regression) is used instead of a linear regression, and it leads to minimizing distance ranks and therefore non-metric MDS. If a linear regression is used, it is metric MDS.

According to Kruskal and Wish (1978), with non-metric MDS, at least 9 objects are required for a 2D solution, while at least 13 objects are required for a 3D solution.

Degenerate solution:

According to Merriam-Webster dictionary, degenerate means " being mathematically simpler (as by having a factor or constant equal to zero) than the typical case".

In MDS, a degenerate solution is one with a zero (or very close to zero) stress value but retaining no (or minimal) structural information about the data. For example, the objects cluster into a few (e.g., 2) nodes and the dimensions are uninterpretable.

Why do we need MDS?

2009-01-23T07:12:00.000-08:00

Initially, researchers want to interpret a set of objects in terms of their relationships. However, the proximities (typically, dissimilarity) among them are in a high-dimensional space, which is beyond human's capacity of comprehension. Being troubled, the researchers think,

Heck! Why don't we try to project the objects into a 2D space and display them on a X-Y plane? As human beings, we are much more familiar with a X-Y plane and such an interpretation will be more exciting!

Thus, dimension reduction and therefore information loss is involved in MDS, and the general purpose of MDS program is to preserve the proximities between objects in the high-dimensional space as much as possible. An example of MDS in social psychology is that the 11 factors of the Aspiration Index are visually represented in an 2D plane. (And Don't you like it more when you are familiar with the way of interpreting the results?!)

Some notes:

1. MDS is a visualization tool. The goal is to reduce the observed complexity in the data matrix to lower dimensions (2 or 3) for humans to visualize.

2. MDS is a descriptive tool, rather than an inferential tool (de Leeuw, 2001). However, a representative sample should be recruited in order to generalize the description to the population.

3. MDS is more flexible than factor analysis: (a) it doesn't require that the underlying data are distributed as multivariate normal, and (b) it can be applied to any kind of distances or similarities, rather than just the computed correlation matrix.

4. MDS is different from cluster analysis. The goal of MDS is not to group/partition objects, but users can still visually cluster objects based on MDS.

5. MDS is related to self-organizing map (SOM) because they both enable visualizing low-dimensional views of high-dimensional data. However, SOM preserves data neighorhood, wheres MDS does not.

6. Besides dimensional representation (more exploratory), another goal of MDS is configural verification (more confirmatory).

7. The labeling of a dimension in MDS is arbitrary. The only requirement is that the two ends sum to zero at the center. It is similar to, but not the same as, bipolar, because it doesn't say anything about mutual exclusivity of the two ends in reality.

8. The number of dimensions is usually 2 (at best 3). On the one hand, the number should not be just 1; otherwise, all gradient-based methods in one-dimension will typically result in local optima. On the other hand, the number should not exceed 3; otherwise, visualization could be very difficult.

9. Another example of MDS would be to visualize the travel-times between cities. In the matrix, each row and each column would correspond to a city. MDS could then recreate a map containing the cities, solely from the matrix. This map would look similar to the actual map of city locations, but would differ in interesting ways. Cities connected by faster than average transportation passageways would appear closer together, while roadblocks would move cities apart.