Title: | Mixture Models with Component-Wise Factor Analyzers |
---|---|
Description: | We provide functions to fit finite mixtures of multivariate normal or t-distributions to data with various factor analytic structures adopted for the covariance/scale matrices. The factor analytic structures available include mixtures of factor analyzers and mixtures of common factor analyzers. The latter approach is so termed because the matrix of factor loadings is common to components before the component-specific rotation of the component factors to make them white noise. Note that the component-factor loadings are not common after this rotation. Maximum likelihood estimators of model parameters are obtained via the Expectation-Maximization algorithm. See descriptions of the algorithms used in McLachlan GJ, Peel D (2000) <doi:10.1002/0471721182.ch8> McLachlan GJ, Peel D (2000) <ISBN:1-55860-707-2> McLachlan GJ, Peel D, Bean RW (2003) <doi:10.1016/S0167-9473(02)00183-4> McLachlan GJ, Bean RW, Ben-Tovim Jones L (2007) <doi:10.1016/j.csda.2006.09.015> Baek J, McLachlan GJ, Flack LK (2010) <doi:10.1109/TPAMI.2009.149> Baek J, McLachlan GJ (2011) <doi:10.1093/bioinformatics/btr112> McLachlan GJ, Baek J, Rathnayake SI (2011) <doi:10.1002/9781119995678.ch9>. |
Authors: | Suren Rathnayake, Geoff McLachlan, David Peel, Jangsun Baek |
Maintainer: | Suren Rathnayake <[email protected]> |
License: | GPL (>= 2) |
Version: | 2.0.71 |
Built: | 2024-11-07 04:42:04 UTC |
Source: | https://github.com/suren-rathnayake/emmixmfa |
This package provides functions for fitting mixtures of factor analyzers (MFA) and mixtures of common factor analyzers (MCFA) models.
MFA and MCFA models belong to the class of finite mixture models, that adopt factor models for the component-covariance matrices. More specifically, under the factor model, the correlations between feature variables can be explained by the linear dependance of these variables on a smaller small number q of (unobservable) latent factors. The component distributions can be either from the family of multivariate normals or from the family of multivariate t-distributions. Maximum likelihood estimation of the model parameters is implemented using the Expectation–Maximization algorithm.
The joint distribution of the factors and errors can be taken to be either the multivariate normal or t-distribution. The factor analytic representation of the component-covariance matrices is a way of dimension reduction in that it enables the mixture distributions to be fitted to data with dimension p relatively large compared to the sample size n.
Unlike MFA, MCFA models can be used to display the observed data points in the q-dimensional factor space. The MCFA would also provide a greater reduction in the number of parameters in the model.
Package: | EMMIXmfa |
Type: | Package |
Version: | 2.0.4 |
Date: | 2018-09-17 |
License: | GPL (>= 2) |
Suren Rathnayake, Geoffrey McLachlan, David Peel, Jangsun Baek
Baek J, and McLachlan GJ (2008). Mixtures of factor analyzers with common factor loadings for the clustering and visualisation of high-dimensional data. Technical Report NI08018-SCH, Preprint Series of the Isaac Newton Institute for Mathematical Sciences, Cambridge.
Baek J, McLachlan GJ, and Flack LK (2010). Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualisation of high-dimensional data. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 2089–2097.
Baek J, and McLachlan GJ (2011). Mixtures of common t-factor analyzers for clustering highdimensional microarray data. Bioinformatics 27, 1269–1276.
McLachlan GJ, Baek J, and Rathnayake SI (2011). Mixtures of factor analyzers for the analysis of high-dimensional data. In Mixture Estimation and Applications, KL Mengersen, CP Robert, and DM Titterington (Eds). Hoboken, New Jersey: Wiley, pp. 171–191.
McLachlan GJ and Peel D (2000). Finite Mixture Models. New York: Wiley.
McLachlan GJ, and Peel D (2000). Mixtures of factor analyzers. In Proceedings of the Seventeenth International Conference on Machine Learning, P. Langley (Ed.). San Francisco: Morgan Kaufmann, pp. 599–606.
McLachlan GJ, Bean RW, Ben-Tovim Jones L (2007). Extension of the mixture of factor analyzers model to incorporate the multivariate t distribution. Computational Statistics & Data Analysis, 51, 5327–5338.
McLachlan GJ, Peel D, and Bean RW (2003). Modelling high-dimensional data by mixtures of factor analyzers. Computational Statistics & Data Analysis 41, 379–388.
set.seed(1) Y <- iris[, -5] mfa_model <- mfa(Y, g = 3, q = 3) mtfa_model <- mtfa(Y, g = 3, q = 3) mcfa_model <- mcfa(Y, g = 3, q = 3) mctfa_model <- mctfa(Y, g = 3, q = 3)
set.seed(1) Y <- iris[, -5] mfa_model <- mfa(Y, g = 3, q = 3) mtfa_model <- mtfa(Y, g = 3, q = 3) mcfa_model <- mcfa(Y, g = 3, q = 3) mctfa_model <- mctfa(Y, g = 3, q = 3)
Computes adjusted Rand index.
ari(cls, hat_cls)
ari(cls, hat_cls)
cls |
A numeric or character vector of labels. |
hat_cls |
A numeric or character vector of labels same length as |
Measures the agreement between two sets of partitions. The upper bound of 1 implies perfect agreement. The expected value is zero if the partitions are random.
Scaler specifying how closely two partitions agree.
Hubert L, and Arabie P (1985). Comparing Partitions. Journal of the Classification 2, 193–218.
set.seed(1984) Y <- scale(iris[, -5]) model <- mfa(Y, g = 3, q = 3, nkmeans = 1, nrandom = 0) # ari(model$clust, iris[, 5]) # minmis(model$clust, iris[, 5])
set.seed(1984) Y <- scale(iris[, -5]) model <- mfa(Y, g = 3, q = 3, nkmeans = 1, nrandom = 0) # ari(model$clust, iris[, 5]) # minmis(model$clust, iris[, 5])
This function computes factor scores for observations.
Using factor scores,
we can represent the original data point in a
q-dimensional reduced space. This is only meaningful
in the case of
mcfa
or mctfa
models,
as the factor cores for mfa
and mtfa
are
white noise.
The (estimated conditional expectation of) unobservable factors
given
and the component membership
can be expressed by,
The estimated mean (over the
component membership of
)
is give as
where
estimated posterior probability of
belonging to the
th component.
An alternative estimate of , the posterior expectation
of the factor corresponding to the jth observation
, is
defined by replacing
by
,
where
, if
>=
, else
.
For MFA, we have
and
for where
.
For MCFA,
where .
With MtFA and MCtFA, the distribution of
and of
have the same form as those of MFA and MCFA, respectively.
factor_scores(model, Y, ...) ## S3 method for class 'mcfa' factor_scores(model, Y, tau = NULL, clust= NULL, ...) ## S3 method for class 'mctfa' factor_scores(model, Y, tau = NULL, clust= NULL, ...) ## S3 method for class 'emmix' plot(x, ...)
factor_scores(model, Y, ...) ## S3 method for class 'mcfa' factor_scores(model, Y, tau = NULL, clust= NULL, ...) ## S3 method for class 'mctfa' factor_scores(model, Y, tau = NULL, clust= NULL, ...) ## S3 method for class 'emmix' plot(x, ...)
model |
An object of class |
x |
An object of class |
Y |
Data matrix with variables in columns in the same order as used in model estimation. |
tau |
Optional. Posterior probabilities of belonging to the components
in the mixture model. If not provided, they will be computed based on
the |
clust |
Optional. Indicators of belonging to the components.
If not provided, will be estimated using |
... |
Not used. |
Factor scores can be used in visualization of the data in the factor space.
Uscores |
Estimated conditional expected component scores of the
unobservable factors given the data and the component membership
( |
Umean |
Means of the estimated conditional expected factors scores over
estimated posterior distributions ( |
Uclust |
Alternative estimate of |
Geoff McLachlan, Suren Rathnayake, Jungsun Baek
McLachlan GJ, Baek J, and Rathnayake SI (2011). Mixtures of factor analyzers for the analysis of high-dimensional data. In Mixture Estimation and Applications, KL Mengersen, CP Robert, and DM Titterington (Eds). Hoboken, New Jersey: Wiley, pp. 171–191.
McLachlan GJ, and Peel D (2000). Finite Mixture Models. New York: Wiley.
# Fit a MCFA model to a subset set.seed(1) samp_size <- dim(iris)[1] sel_subset <- sample(1 : samp_size, 50) model <- mcfa(iris[sel_subset, -5], g = 3, q = 2, nkmeans = 1, nrandom = 0, itmax = 100) # plot the data points in the factor space plot(model) # Allocating new samples to the clusters Y <- iris[-c(sel_subset), -5] Y <- as.matrix(Y) clust <- predict(model, Y) fa_scores <- factor_scores(model, Y) # Visualizing new data in factor space plot_factors(fa_scores, type = "Umean", clust = clust)
# Fit a MCFA model to a subset set.seed(1) samp_size <- dim(iris)[1] sel_subset <- sample(1 : samp_size, 50) model <- mcfa(iris[sel_subset, -5], g = 3, q = 2, nkmeans = 1, nrandom = 0, itmax = 100) # plot the data points in the factor space plot(model) # Allocating new samples to the clusters Y <- iris[-c(sel_subset), -5] Y <- as.matrix(Y) clust <- predict(model, Y) fa_scores <- factor_scores(model, Y) # Visualizing new data in factor space plot_factors(fa_scores, type = "Umean", clust = clust)
Performs a matrix factorization on the given data set. The factorization is done using a stochastic gradient decent method.
gmf(Y, q, maxit = 1000, lambda = 0.01, cor_rate = 0.9)
gmf(Y, q, maxit = 1000, lambda = 0.01, cor_rate = 0.9)
Y |
data matrix containing all numerical values. |
maxit |
maximum number of iterations. |
q |
number of factors. |
lambda |
initial learning rate. |
cor_rate |
correction rate. |
Unsupervised matrix factorization of a data matrix
can be expressed as,
where is a
matrix and
is
matrix.
With this matrix factorization method, one replaces
the
th row in matrix
by the
th row in matrix
.
The matrices
and
are chosen to minimize an objective
function
with under constraints specific
to the matrix factorization method.
It is imperative that columns of the data matrix be on the same scale. Otherwise, it may not be possible to obtain a factorization of the data using this approach.
A list containing,
A |
A numeric matrix of size |
B |
A numeric matrix of size |
Nikulin V, Huang T-H, Ng SK, Rathnayake SI, & McLachlan GJ (2011). A very fast algorithm for matrix factorization. Statistics & Probability Letters 81, 773–782.
lst <- gmf(iris[, -5], q = 2, maxit = 100)
lst <- gmf(iris[, -5], q = 2, maxit = 100)
Functions for fitting mixtures of common factor analyzers (MCFA) models. MCFA models are mixture of factor analyzers (belong to the class of multivariate finite mixture models) with a common component matrix for the factor loadings before the transformation of the latent factors to be white noise. It is designed specifically for the task of displaying the observed data points in a lower (q-dimensional) space, where q is the number of factors adopted in the factor-analytic representation of the observed vector.
The mcfa
function fits mixtures common factor analyzers
where the components distributions belong to the family of
multivariate normal distributions.
The mctfa
function fits
mixtures of common t-factor analyzers where
the component distributions corresponds to multivariate
t distributions.
Maximum likelihood estimates of the model parameters are obtained
using the Expectation–Maximization algorithm.
mcfa(Y, g, q, itmax = 500, nkmeans = 5, nrandom = 20, tol = 1.e-5, init_clust = NULL, init_para = NULL, init_method = NULL, conv_measure = 'diff', warn_messages = TRUE, ...) mctfa(Y, g, q, itmax = 500, nkmeans = 5, nrandom = 20, tol = 1.e-5, df_init = rep(30, g), df_update = TRUE, init_clust = NULL, init_para = NULL, init_method = NULL, conv_measure = 'diff', warn_messages = TRUE, ...)
mcfa(Y, g, q, itmax = 500, nkmeans = 5, nrandom = 20, tol = 1.e-5, init_clust = NULL, init_para = NULL, init_method = NULL, conv_measure = 'diff', warn_messages = TRUE, ...) mctfa(Y, g, q, itmax = 500, nkmeans = 5, nrandom = 20, tol = 1.e-5, df_init = rep(30, g), df_update = TRUE, init_clust = NULL, init_para = NULL, init_method = NULL, conv_measure = 'diff', warn_messages = TRUE, ...)
Y |
A matrix or a data frame of which rows correspond to observations and columns to variables. |
g |
Number of components. |
q |
Number of factors. |
itmax |
Maximum number of EM iterations. |
nkmeans |
The number of times the k-means algorithm to be used in partition
the data into |
nrandom |
The number of random |
tol |
The EM algorithm terminates if the measure of convergence falls below this value. |
init_clust |
A vector or matrix consisting of partition of samples to be used in the EM algorithm. For matrix of partitions, columns must corresponds individual partitions of the data. Optional. |
init_para |
A list containing model parameters to be used as initial parameter estimates for the EM algorithm. Optional. |
init_method |
To determine how the initial parameter values are computed. See Details. |
conv_measure |
The default |
df_init |
Initial values of the degree of freedom parameters for |
df_update |
If |
warn_messages |
With |
... |
Not used. |
With init_method = NULL
, the default,
model parameters are initialized using all available methods.
With the init_method = "rand-A"
, the initialization of
the parameters is done using the procedure in
Baek et al. (2010) where initial values for elements of
are drawn from the
distribution.
This method is appropriate when the columns of the data
are on the same scale. The
init_method = "eigen-A"
takes the first eigenvectors of
as the
initial value for the loading matrix
.
If
init_method = "gmf"
then the data are factorized using
gmf
with factors and the resulting loading
matrix is used as the initial value for
.
If specified, the optional argument init_para
must be a list or an object of class mcfa
or mctfa
.
When fitting an mcfa
model, only the
model parameters q
, g
,
pivec
, A
, xi
,
omega
, and D
are extracted from
init_para
, while one extra parameter
nu
is extracted when fitting mctfa
.
Everything else in init_para
will be discarded.
Object of class c("emmix", "mcfa")
or c("emmix",
"mctfa")
containing the fitted model parameters is returned.
Details of the components are as follows:
g |
Number of mixture components. |
q |
Number of factors. |
pivec |
Mixing proportions of the components. |
A |
Loading matrix. Size |
xi |
Matrix containing factor means for components in columns.
Size |
omega |
Array containing factor covariance matrices for components.
Size |
D |
Error covariance matrix. Size |
Uscores |
Estimated conditional expected component scores of the
unobservable factors given the data and the component membership.
Size |
Umean |
Means of the estimated conditional expected factors scores over
estimated posterior distributions. Size |
Uclust |
Alternative estimate of |
clust |
Cluster labels. |
tau |
Posterior probabilities. |
logL |
Log-likelihood at the convergence. |
BIC |
Bayesian information criterion. |
warn_msg |
Description of error messages, if any. |
Suren Rathnayake, Jangsun Baek, Geoff McLachlan
Baek J, McLachlan GJ, and Flack LK (2010). Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualisation of high-dimensional data. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 2089–2097.
Baek J, and McLachlan GJ (2011). Mixtures of common t-factor analyzers for clustering highdimensional microarray data. Bioinformatics 27, 1269–1276.
McLachlan GJ, Baek J, and Rathnayake SI (2011). Mixtures of factor analyzers for the analysis of high-dimensional data. In Mixture Estimation and Applications, KL Mengersen, CP Robert, and DM Titterington (Eds). Hoboken, New Jersey: Wiley, pp. 171–191.
mcfa_fit <- mcfa(iris[, -5], g = 3, q = 3, itmax = 25, nkmeans = 5, nrandom = 5, tol = 1.e-5) plot(mcfa_fit) mctfa_fit <- mcfa(iris[, -5], g = 3, q = 3, itmax = 500, nkmeans = 5, nrandom = 5, tol = 1.e-5, df_update = TRUE)
mcfa_fit <- mcfa(iris[, -5], g = 3, q = 3, itmax = 25, nkmeans = 5, nrandom = 5, tol = 1.e-5) plot(mcfa_fit) mctfa_fit <- mcfa(iris[, -5], g = 3, q = 3, itmax = 500, nkmeans = 5, nrandom = 5, tol = 1.e-5, df_update = TRUE)
Functions for fitting mixtures of factor analyzers (MFA) and mixtures of t-factor analyzers (MtFA) to data. Maximum Likelihood estimates of the model parameters are obtained using the Alternating Expectation Conditional Maximization (AECM) algorithm.
In the case of MFA, component distributions belong to the family of
multivariate normal distributions, while with MFA
the component distributions correspond to multivariate
t distributions.
mfa(Y, g, q, itmax = 500, nkmeans = 20, nrandom = 20, tol = 1.e-5, sigma_type = 'common', D_type = 'common', init_clust = NULL, init_para = NULL, conv_measure = 'diff', warn_messages = TRUE, ...) mtfa(Y, g, q, itmax = 500, nkmeans = 20, nrandom = 20, tol = 1.e-5, df_init = rep(30, g), df_update = TRUE, sigma_type = 'common', D_type = 'common', init_clust = NULL, init_para = NULL, conv_measure = 'diff', warn_messages = TRUE, ...)
mfa(Y, g, q, itmax = 500, nkmeans = 20, nrandom = 20, tol = 1.e-5, sigma_type = 'common', D_type = 'common', init_clust = NULL, init_para = NULL, conv_measure = 'diff', warn_messages = TRUE, ...) mtfa(Y, g, q, itmax = 500, nkmeans = 20, nrandom = 20, tol = 1.e-5, df_init = rep(30, g), df_update = TRUE, sigma_type = 'common', D_type = 'common', init_clust = NULL, init_para = NULL, conv_measure = 'diff', warn_messages = TRUE, ...)
Y |
A matrix or a data frame of which rows correspond to observations and columns to variables. |
g |
Number of components. |
q |
Number of factors. |
itmax |
Maximum number of EM iterations. |
nkmeans |
The number of times the k-means algorithm to be used in partition
the data into |
nrandom |
The number of random |
tol |
The EM algorithm terminates if the measure of convergence falls below this value. |
sigma_type |
To specify whether the covariance matrices (for |
D_type |
To specify whether the diagonal error covariance matrix is common to all
the components or not. If |
init_clust |
A vector or matrix consisting of partition of samples to be used in the EM algorithm. For matrix of partitions, columns must corresponds individual partitions of the data. Optional. |
init_para |
A list containing model parameters to be used as initial parameter estimates for the EM algorithm. Optional. |
conv_measure |
The default |
df_init |
Initial values of the degree of freedom parameters for |
df_update |
If |
warn_messages |
With |
... |
Not used. |
Cluster a given data set using mixtures of factor analyzers or approach or using mixtures of t-factor analyzers.
Object of class c("emmix", "mfa")
or c("emmix",
"mtfa")
containing the fitted model parameters is returned.
Details of the components are as fellows:
g |
Number of mixture components. |
q |
Number of factors. |
pivec |
Mixing proportions of the components. |
mu |
Matrix containing estimates of component means (in columns)
of mixture component. Size |
B |
Array containing component dependent loading matrices. Size
|
D |
Estimates of error covariance matrices. If |
v |
Degrees of freedom for each component. |
logL |
Log-likelihood at the convergence. |
BIC |
Bayesian information criterion. |
tau |
Matrix of posterior probabilities for the data
used based on the fitted values. Matrix of size |
clust |
Vector of integers 1 to g indicating cluster allocations of the observations. |
Uscores |
Estimated conditional expected component scores of the
unobservable factors given the data and the component membership.
Size is Size |
Umean |
Means of the estimated conditional expected factors scores over
estimated posterior distributions. Size |
Uclust |
Alternative estimate of |
ERRMSG |
Description of messages, if any. |
D_type |
Whether common or unique error covariance is used, as specified in model fitting. |
df_update |
Whether the degree of freedom parameter
( |
Suren Rathnayake, Geoffrey McLachlan
Ghahramani Z, and Hinton GE (1997). The EM algorithm for mixture of factor analyzers. Technical Report, CRG-TR-96-1, University of Toronto, Toronto.
McLachlan GJ, Bean RW, Ben-Tovim Jones L (2007). Extension of the mixture of factor analyzers model to incorporate the multivariate t distribution. Computational Statistics & Data Analysis, 51, 5327–5338.
McLachlan GJ, Baek J, and Rathnayake SI (2011). Mixtures of factor analyzers for the analysis of high-dimensional data. In Mixture Estimation and Applications, KL Mengersen, CP Robert, and DM Titterington (Eds). Hoboken, New Jersey: Wiley, pp. 171–191.
McLachlan GJ, Peel D, and Bean RW (2003). Modelling high-dimensional data by mixtures of factor analyzers. Computational Statistics & Data Analysis 41, 379–388.
model <- mfa(iris[, -5], g=3, q=2, itmax=200, nkmeans=1, nrandom=5) summary(model) model <- mtfa(iris[, -5], g=3, q=2, itmax=200, nkmeans=1, nrandom=5)
model <- mfa(iris[, -5], g=3, q=2, itmax=200, nkmeans=1, nrandom=5) summary(model) model <- mtfa(iris[, -5], g=3, q=2, itmax=200, nkmeans=1, nrandom=5)
Given two vectors each corresponding to a set of categories, this function finds the minimum number of misallocations by rotating the categories.
minmis(cls, hat_cls)
minmis(cls, hat_cls)
cls |
A numeric or character vector of labels. |
hat_cls |
A numeric or character vector of labels same length as |
Rotates the categories for all possible permutations, and returns the minimum number of misallocations. The number of categories in each set of labels does not need to be the same. It may take several minutes to compute when the number of categories is large.
Integer specifying the minimum number of misallocations.
set.seed(1984) Y <- scale(iris[, -5]) model <- mcfa(Y, g = 3, q = 3, nkmeans = 1, nrandom = 0, itmax = 200) ari(model$clust, iris[, 5]) minmis(model$clust, iris[, 5])
set.seed(1984) Y <- scale(iris[, -5]) model <- mcfa(Y, g = 3, q = 3, nkmeans = 1, nrandom = 0, itmax = 200) ari(model$clust, iris[, 5]) minmis(model$clust, iris[, 5])
Plot functions for factor scores.
plot_factors(scores, type = "Umean", clust=if (exists('clust', where = scores)) scores$clust else NULL, limx = NULL, limy = NULL)
plot_factors(scores, type = "Umean", clust=if (exists('clust', where = scores)) scores$clust else NULL, limx = NULL, limy = NULL)
scores |
A list containing factor scores specified by
|
type |
What type of factor scores are to be plotted. See Details. |
clust |
Indicators of belonging to components. If available, they will be
portrayed in plots.
If not provided, looks for |
limx |
Numeric vector. Values in |
limy |
Numeric vector. Values in |
When the factor scores were obtained using mcfa
or mctfa
, then a visualization of the group structure
can be obtained by plotting the factor scores.
In the case of mfa
and mtfa
, the factor scores
simply corresponds to white noise.
The type
should either be "Uscores"
, "Uclust"
or
the default "Umean"
. See factor_scores
for a detailed
description of the factor scores.
Geoffrey McLachlan, Suren Rathnayake, Jungsun Baek
McLachlan GJ, Baek J, and Rathnayake SI (2011). Mixtures of factor analyzers for the analysis of high-dimensional data. In Mixture Estimation and Applications, KL Mengersen, CP Robert, and DM Titterington (Eds). Hoboken, New Jersey: Wiley, pp. 171–191.
McLachlan GJ, and Peel D (2000). Finite Mixture Models. New York: Wiley.
# Visualizing data used in model estimation set.seed(1) inds <- dim(iris)[1] indSample <- sample(1 : inds, 50) model <- mcfa (iris[indSample, -5], g = 3, q = 2, nkmeans = 1, nrandom = 0, itmax = 150) minmis(model$clust, iris[indSample, 5]) #same as plot_factors(model, tyep = "Umean", clust = model$clust) plot(model) #can provide alternative groupings of samples via plot_factors plot_factors(model, clust = iris[indSample, 5]) #same as plot_factors(model, tyep = "Uclust") plot(model, type = "Uclust") Y <- iris[-c(indSample), -5] Y <- as.matrix(Y) clust <- predict(model, Y) minmis(clust, iris[-c(indSample), 5]) fac_scores <- factor_scores(model, Y) plot_factors(fac_scores, type = "Umean", clust = clust) plot_factors(fac_scores, type = "Umean", clust = iris[-c(indSample), 5])
# Visualizing data used in model estimation set.seed(1) inds <- dim(iris)[1] indSample <- sample(1 : inds, 50) model <- mcfa (iris[indSample, -5], g = 3, q = 2, nkmeans = 1, nrandom = 0, itmax = 150) minmis(model$clust, iris[indSample, 5]) #same as plot_factors(model, tyep = "Umean", clust = model$clust) plot(model) #can provide alternative groupings of samples via plot_factors plot_factors(model, clust = iris[indSample, 5]) #same as plot_factors(model, tyep = "Uclust") plot(model, type = "Uclust") Y <- iris[-c(indSample), -5] Y <- as.matrix(Y) clust <- predict(model, Y) minmis(clust, iris[-c(indSample), 5]) fac_scores <- factor_scores(model, Y) plot_factors(fac_scores, type = "Umean", clust = clust) plot_factors(fac_scores, type = "Umean", clust = iris[-c(indSample), 5])
Given a fitted model of class
'emmix'
(or of class
'mfa'
, 'mcfa'
, 'mtfa'
and
'mctfa'
), the predict
function
predict clusters for observations.
## S3 method for class 'emmix' predict(object, Y, ...)
## S3 method for class 'emmix' predict(object, Y, ...)
object |
An object of class |
Y |
A data matrix with variable in the same
column locations as the data used in
fitting the model |
... |
Not used. |
A vector integers of length equal to number of
observations (rows) in the data. The integers range from 1 to
where
in the number of components
in the model.
The variables in Y
of the predict
function should be in the order as those used in
obtaining the fitted model object
.
set.seed(42) test <- sample(1 : nrow(iris), 100) model <- mfa(iris[test, -5], g=3, q=3, itmax=500, nkmeans=3, nrandom=5) pred_clust <- predict(model, iris[-test, -5]) minmis(pred_clust, iris[-test, 5])
set.seed(42) test <- sample(1 : nrow(iris), 100) model <- mfa(iris[test, -5], g=3, q=3, itmax=500, nkmeans=3, nrandom=5) pred_clust <- predict(model, iris[-test, -5]) minmis(pred_clust, iris[-test, 5])
Prints a formatted model parameters of
EMMIXmfa
objects.
## S3 method for class 'emmix' print(x, ...) ## S3 method for class 'emmix' summary(object, ...)
## S3 method for class 'emmix' print(x, ...) ## S3 method for class 'emmix' summary(object, ...)
x , object
|
An object of class |
... |
Not used. |
Prints the formatted model parameter values to the screen.
set.seed(1984) Y <- scale(iris[, -5]) model <- mcfa(Y, g = 3, q = 3, nkmeans = 1, nrandom = 0, itmax = 100) # print(model) summary(model)
set.seed(1984) Y <- scale(iris[, -5]) model <- mcfa(Y, g = 3, q = 3, nkmeans = 1, nrandom = 0, itmax = 100) # print(model) summary(model)
Random number generator for emmix
models.
rmix(n, model, ...)
rmix(n, model, ...)
model |
An object of class |
n |
Number of sample to generate. |
... |
Not used. |
This function uses rmvnorm
and
rmvt
functions from the
mvtnorm package to generate samples
from the mixture components.
Algorithm works by first drawing a component based on the mixture proprotion in the model, and then drawing a sample from the component distribution.
A numeric matrix with samples drawn in rows.
set.seed(1) model <- mcfa(iris[, -5], g=3, q=2, nkmeans=1, nrandom=1, itmax = 25) dat <- rmix(n = 10, model = model)
set.seed(1) model <- mcfa(iris[, -5], g=3, q=2, nkmeans=1, nrandom=1, itmax = 25) dat <- rmix(n = 10, model = model)