Title: | Bayesian Analysis of Finite Mixture of Plackett-Luce Models |
---|---|
Description: | Fit finite mixtures of Plackett-Luce models for partial top rankings/orderings within the Bayesian framework. It provides MAP point estimates via EM algorithm and posterior MCMC simulations via Gibbs Sampling. It also fits MLE as a special case of the noninformative Bayesian analysis with vague priors. In addition to inferential techniques, the package assists other fundamental phases of a model-based analysis for partial rankings/orderings, by including functions for data manipulation, simulation, descriptive summary, model selection and goodness-of-fit evaluation. Main references on the methods are Mollica and Tardella (2017) <doi.org/10.1007/s11336-016-9530-0> and Mollica and Tardella (2014) <doi/10.1002/sim.6224>. |
Authors: | Cristina Mollica [aut, cre], Luca Tardella [aut] |
Maintainer: | Cristina Mollica <[email protected]> |
License: | GPL (>= 2) |
Version: | 2.1.2 |
Built: | 2024-11-05 04:30:33 UTC |
Source: | https://github.com/cmollica/plmix |
Attempt to coerce the input data into a top-ordering dataset.
as.top_ordering( data, format_input = NULL, aggr = NULL, freq_col = NULL, ties_method = "random", ... )
as.top_ordering( data, format_input = NULL, aggr = NULL, freq_col = NULL, ties_method = "random", ... )
data |
An object containing the partial sequences to be coerced into an object of class |
format_input |
Character string indicating the format of the |
aggr |
Logical: whether the |
freq_col |
Integer indicating the column of the |
ties_method |
Character string indicating the treatment of sequences with ties (not used for data of class |
... |
Further arguments passed to or from other methods (not used). |
The coercion function as.top_ordering
tries to coerce the input data into an object of class top_ordering
after checking for possible partial sequences that do not satisfy the top-ordering requirements. If none of the supplied sequences satisfies the top-ordering conditions, an error message is returned. NA
's in the input data
are tacitly converted into zero entries.
An object of S3 class c("top_ordering","matrix")
.
Cristina Mollica and Luca Tardella
Turner, H., Kormidis, I. and Firth, D. (2018). PlackettLuce: Plackett-Luce Models for Rankings. R package version 0.2-3. https://CRAN.R-project.org/package=PlackettLuce
Qian, Z. (2018). rankdist: Distance Based Ranking Models. R package version 1.1.3. https://CRAN.R-project.org/package=rankdist
is.top_ordering
, as.rankings
and rankings
## Coerce an object of class 'rankings' into an object of class 'top_ordering' library(PlackettLuce) RR <- matrix(c(1, 2, 0, 0, 4, 1, 2, 3, 2, 1, 1, 1, 1, 2, 3, 0, 2, 1, 1, 0, 1, 0, 3, 2), nrow = 6, byrow = TRUE) RR_rank=as.rankings(RR) RR_rank as.top_ordering(RR_rank, ties_method="random") ## Coerce an object of class 'RankData' into an object of class 'top_ordering' library(rankdist) data(apa_partial_obj) d_apa_top_ord=as.top_ordering(data=apa_partial_obj) identical(d_apa,d_apa_top_ord) ## Coerce a data frame from the package prefmod into an object of class 'top_ordering' library(prefmod) data(carconf) carconf_rank=carconf[,1:6] carconf_top_ord=as.top_ordering(data=carconf_rank,format_input="ranking",aggr=FALSE) identical(d_carconf,carconf_top_ord) ## Coerce a data frame from the package pmr into an object of class 'top_ordering' library(pmr) data(big4) head(big4) big4_top_ord=as.top_ordering(data=big4,format_input="ranking",aggr=TRUE,freq_col=5) head(big4_top_ord)
## Coerce an object of class 'rankings' into an object of class 'top_ordering' library(PlackettLuce) RR <- matrix(c(1, 2, 0, 0, 4, 1, 2, 3, 2, 1, 1, 1, 1, 2, 3, 0, 2, 1, 1, 0, 1, 0, 3, 2), nrow = 6, byrow = TRUE) RR_rank=as.rankings(RR) RR_rank as.top_ordering(RR_rank, ties_method="random") ## Coerce an object of class 'RankData' into an object of class 'top_ordering' library(rankdist) data(apa_partial_obj) d_apa_top_ord=as.top_ordering(data=apa_partial_obj) identical(d_apa,d_apa_top_ord) ## Coerce a data frame from the package prefmod into an object of class 'top_ordering' library(prefmod) data(carconf) carconf_rank=carconf[,1:6] carconf_top_ord=as.top_ordering(data=carconf_rank,format_input="ranking",aggr=FALSE) identical(d_carconf,carconf_top_ord) ## Coerce a data frame from the package pmr into an object of class 'top_ordering' library(pmr) data(big4) head(big4) big4_top_ord=as.top_ordering(data=big4,format_input="ranking",aggr=TRUE,freq_col=5) head(big4_top_ord)
Compute BIC value for the MLE of a mixture of Plackett-Luce models fitted to partial orderings.
bicPLMIX(max_log_lik, pi_inv, G, ref_known = TRUE, ref_vary = FALSE)
bicPLMIX(max_log_lik, pi_inv, G, ref_known = TRUE, ref_vary = FALSE)
max_log_lik |
Maximized log-likelihood value. |
pi_inv |
An object of class |
G |
Number of mixture components. |
ref_known |
Logical: whether the component-specific reference orders are known (not to be estimated). Default is |
ref_vary |
Logical: whether the reference orders vary across mixture components. Default is |
The max_log_lik
and the BIC values can be straightforwardly obtained from the output of the mapPLMIX
and mapPLMIX_multistart
functions when the default noninformative priors are adopted in the MAP procedure. So, the bicPLMIX
function is especially useful to compute the BIC value from the output of alternative MLE methods for mixtures of Plackett-Luce models implemented, for example, with other softwares.
The ref_known
and ref_vary
arguments accommodate for the more general mixture of Extended Plackett-Luce models (EPL), involving the additional reference order parameters (Mollica and Tardella 2014). Since the Plackett-Luce model is a special instance of the EPL with the reference order equal to the identity permutation , the default values of
ref_known
and ref_vary
are set equal, respectively, to TRUE
and FALSE
.
A list of two named objects:
max_log_lik |
The |
bic |
BIC value. |
Cristina Mollica and Luca Tardella
Mollica, C. and Tardella, L. (2017). Bayesian Plackett-Luce mixture models for partially ranked data. Psychometrika, 82(2), pages 442–458, ISSN: 0033-3123, DOI: 10.1007/s11336-016-9530-0.
Mollica, C. and Tardella, L. (2014). Epitope profiling via mixture modeling for ranked data. Statistics in Medicine, 33(21), pages 3738–3758, ISSN: 0277-6715, DOI: 10.1002/sim.6224.
Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist., 6(2), pages 461–464, ISSN: 0090-5364, DOI: 10.1002/sim.6224.
mapPLMIX
and mapPLMIX_multistart
data(d_carconf) K <- ncol(d_carconf) MAP_mult <- mapPLMIX_multistart(pi_inv=d_carconf, K=K, G=3, n_start=2, n_iter=400*3) bicPLMIX(max_log_lik=MAP_mult$mod$max_objective, pi_inv=d_carconf, G=3)$bic ## Equivalently MAP_mult$mod$bic
data(d_carconf) K <- ncol(d_carconf) MAP_mult <- mapPLMIX_multistart(pi_inv=d_carconf, K=K, G=3, n_start=2, n_iter=400*3) bicPLMIX(max_log_lik=MAP_mult$mod$max_objective, pi_inv=d_carconf, G=3)$bic ## Equivalently MAP_mult$mod$bic
Construct the binary group membership matrix from the multinomial classification vector.
binary_group_ind(class, G)
binary_group_ind(class, G)
class |
Numeric vector of class memberships. |
G |
Number of possible different classes. |
Numeric length(class)
matrix of binary group memberships.
Cristina Mollica and Luca Tardella
binary_group_ind(class=c(3,1,5), G=6)
binary_group_ind(class=c(3,1,5), G=6)
Chi-squared index relying on paired comparisons for observed data
chisqmeasureobs(pi_inv, p, weights)
chisqmeasureobs(pi_inv, p, weights)
pi_inv |
Numeric |
p |
Numeric |
weights |
Numeric vector of the |
Chi-squared index value.
Chi-squared index relying on top1 preferences for observed data
chisqmeasureobs1dim(pi_inv, p, weights)
chisqmeasureobs1dim(pi_inv, p, weights)
pi_inv |
Numeric |
p |
Numeric |
weights |
Numeric vector of the |
Chi-squared index value.
Conditional Chi-squared index relying on paired comparisons for observed data
chisqmeasureobscond(pi_inv, p, weights)
chisqmeasureobscond(pi_inv, p, weights)
pi_inv |
Numeric |
p |
Numeric |
weights |
Numeric vector of the |
Conditional Chi-squared index value.
Generic term of the conditional Chi-squared index relying on top1 preferences for observed data
chisqmeasureobsmatrix1dim(pi_inv, p, weights)
chisqmeasureobsmatrix1dim(pi_inv, p, weights)
pi_inv |
Numeric |
p |
Numeric |
weights |
Numeric vector of the |
Numeric matrix of the conditional Chi-squared index.
Chi-squared index relying on paired comparisons for replicated data
chisqmeasuretheo(N, ref_order, p, weights, pi_inv_obs)
chisqmeasuretheo(N, ref_order, p, weights, pi_inv_obs)
N |
Number of sample units. |
ref_order |
Numeric |
p |
Numeric |
weights |
Numeric vector of the |
pi_inv_obs |
Numeric |
Chi-squared index value.
Chi-squared index relying on top1 preferences for replicated data
chisqmeasuretheo1dim(N, ref_order, p, weights, pi_inv_obs)
chisqmeasuretheo1dim(N, ref_order, p, weights, pi_inv_obs)
N |
Number of sample units. |
ref_order |
Numeric |
p |
Numeric |
weights |
Numeric vector of the |
pi_inv_obs |
Numeric |
Chi-squared index value.
Conditional Chi-squared index relying on paired comparisons for replicated data
chisqmeasuretheocond(N, ref_order, p, weights, pi_inv_obs)
chisqmeasuretheocond(N, ref_order, p, weights, pi_inv_obs)
N |
Number of sample units. |
ref_order |
Numeric |
p |
Numeric |
weights |
Numeric vector of the |
pi_inv_obs |
Numeric |
Conditional Chi-squared index value.
Generic term of the conditional Chi-squared index relying on top1 preferences for replicated data
chisqmeasuretheomatrix1dim(N, ref_order, p, weights, pi_inv_obs)
chisqmeasuretheomatrix1dim(N, ref_order, p, weights, pi_inv_obs)
N |
Number of sample units. |
ref_order |
Numeric |
p |
Numeric |
weights |
Numeric vector of the |
pi_inv_obs |
Numeric |
Numeric matrix of the conditional Chi-squared index.
The function CompProbZpartial
computes the multinomial probabilities of the Multinomial full-conditionals of the latent component memberships for the Gibbs sampling of a Bayesian mixture of Plackett-Luce models.
CompProbZpartial(p, pi_inv, Y, u_bin, n_rank, omega)
CompProbZpartial(p, pi_inv, Y, u_bin, n_rank, omega)
p |
Numeric |
pi_inv |
Numeric |
Y |
Numeric |
u_bin |
Binary |
n_rank |
Numeric vector of length |
omega |
Numeric vector of the |
Numeric matrix of multinomial probabilities.
The function CompRateP
computes the rate parameters of the Gamma full-conditionals of the support parameters for the Gibbs sampling of a Bayesian mixture of Plackett-Luce models.
CompRateP(pi_inv, Y, z, u_bin, n_rank, rate0)
CompRateP(pi_inv, Y, z, u_bin, n_rank, rate0)
pi_inv |
Numeric |
Y |
Numeric |
z |
Numeric |
u_bin |
Binary |
n_rank |
Numeric vector of length |
rate0 |
Numeric vector of |
Numeric matrix of rate parameters.
The function CompRateYpartial
computes the rate parameters of the Exponential full-conditionals of the quantitative latent variables for the Gibbs sampling of a Bayesian mixture of Plackett-Luce models.
CompRateYpartial(p, pi_inv, ref_order, z, n_rank)
CompRateYpartial(p, pi_inv, ref_order, z, n_rank)
p |
Numeric |
pi_inv |
Numeric |
ref_order |
Numeric |
z |
Numeric |
n_rank |
Numeric vector of length |
Numeric matrix of rate parameters.
The popular American Psychological Association dataset (d_apa
)
contains the results of the voting ballots of the 1980 presidential
election. A total of voters ranked a maximum of
candidates, conventionally classified as research psychologists (candidate 1
and 3), clinical psychologists (candidate 4 and 5) and community
psychologists (candidate 2). The winner of the election was candidate 3. The
dataset is composed of partial top orderings of varying lengths. Missing
positions are denoted with zero entries.
Object of S3 class c("top_ordering","matrix")
gathering a
matrix of partial orderings with rows and
columns
Each row lists the candidates from the most-liked (
Rank_1
) to the
least-liked (Rank_5
) in a given voting ballot.
Mollica, C. and Tardella, L. (2017). Bayesian Plackett-Luce mixture models for partially ranked data. Psychometrika, 82(2), pages 442–258, ISSN: 0033-3123, DOI: 10.1007/s11336-016-9530-0.
Diaconis, P. W. (1988). Group representations in probability and statistics. Lecture Notes-Monograph Series, pages 94–96.
Diaconis, P. W. (1987). Spectral analysis for ranked data. Technical Report 282, Dept of Statistics, Stanford University.
data(d_apa) head(d_apa) ## Subset of complete sequences d_apa_compl=d_apa[rowSums(d_apa!=0)>=(ncol(d_apa)-1),] head(d_apa_compl)
data(d_apa) head(d_apa) ## Subset of complete sequences d_apa_compl=d_apa[rowSums(d_apa!=0)>=(ncol(d_apa)-1),] head(d_apa_compl)
The Car Configurator dataset (d_carconf
) came up from a marketing
study aimed at investigating customer preferences toward different car
features. A sample of customers were asked to construct their
car by using an online configurator system and choose among
car
modules in order of preference. The car features are labeled as: 1 = price,
2 = exterior design, 3 = brand, 4 = technical equipment, 5 = producing
country and 6 = interior design. The survey did not require a complete
ranking elicitation, therefore the dataset is composed of partial top
orderings of varying lengths. Missing positions are denoted with zero
entries.
Object of S3 class c("top_ordering","matrix")
gathering a
matrix of partial orderings with rows and
columns.
Each row lists the car features from the most important (
Rank_1
) to
the least important (Rank_6
) for a given customer.
Mollica, C. and Tardella, L. (2017). Bayesian Plackett-Luce mixture models for partially ranked data. Psychometrika, 82(2), pages 442–458, ISSN: 0033-3123, DOI: 10.1007/s11336-016-9530-0.
Hatzinger, R. and Dittrich, R. (2012). Prefmod: An R package for modeling preferences based on paired comparisons, rankings, or ratings. Journal of Statistical Software, 48(10), pages 1–31.
Dabic, M. and Hatzinger, R. (2009). Zielgruppenadaequate Ablaeufe in Konfigurationssystemen - eine empirische Studie im Automobilmarkt - Partial Rankings. In Hatzinger, R., Dittrich, R. and Salzberger, T. (eds), Praeferenzanalyse mit R: Anwendungen aus Marketing, Behavioural Finance und Human Resource Management. Wien: Facultas.
data(d_carconf) head(d_carconf) ## Subset of complete sequences d_carconf_compl=d_carconf[rowSums(d_carconf!=0)>=(ncol(d_carconf)-1),] head(d_carconf_compl)
data(d_carconf) head(d_carconf) ## Subset of complete sequences d_carconf_compl=d_carconf[rowSums(d_carconf!=0)>=(ncol(d_carconf)-1),] head(d_carconf_compl)
The Dublin West dataset (d_dublinwest
) contains the results of the
voting ballots of the 2002 Irish general election from the Dublin West
constituency. The Irish voting system allows voters to rank the candidates
in order of preferences, rather than only specify the favorite one. In the
Dublin West constituency, voters ranked a maximum of
candidates, labeled as: 1 = Bonnie R., 2 = Burton J., 3 = Doherty-Ryan D., 4
= Higgins J., 5 = Lenihan B., 6 = McDonald M., 7 = Morrissey T., 8 = Smyth
J. and 9 = Terry S.. The dataset is composed of partial top orderings of
varying lengths. Missing positions are denoted with zero entries.
Object of S3 class c("top_ordering","matrix")
gathering a
partial orderings with rows and
columns. Each row
lists the candidates from the most-liked (
Rank_1
) to the least-liked
(Rank_9
) in a given voting ballot.
The 2002 Dublin West data have been downloaded from http://www.preflib.org/ PrefLib: A Library for Preferences. In that repository, preferences with ties are also included. The original source was publicly available from the Dublin County Returning Officer at the following URL: https://dublincountyreturningofficer.com/.
Mattei, N. and Walsh, T. (2013) PrefLib: A Library of Preference Data. Proceedings of Third International Conference on Algorithmic Decision Theory (ADT 2013). Springer, Lecture Notes in Artificial Intelligence, November 13-15, 2013.
Gormley, I. C. and Murphy, T. B. (2009). A grade of membership model for rank data. Bayesian Analysis, 4(2), pages 65–295.
Gormley, I. C. and Murphy, T. B. (2008). Exploring Voting Blocs Within the Irish Electorate: A Mixture Modeling Approach. Journal of the America Statistical Association, 103(483), pages 1014–1027.
data(d_dublinwest) head(d_dublinwest) ## Subset of complete sequences d_dublinwest_compl=d_dublinwest[rowSums(d_dublinwest!=0)>=(ncol(d_dublinwest)-1),] head(d_dublinwest_compl)
data(d_dublinwest) head(d_dublinwest) ## Subset of complete sequences d_dublinwest_compl=d_dublinwest[rowSums(d_dublinwest!=0)>=(ncol(d_dublinwest)-1),] head(d_dublinwest_compl)
The Gaming Platforms dataset (d_gaming
) collects the results of a
survey conducted on a sample of Dutch students, who were asked to
rank
gaming platforms in order of preference, namely: 1 = X-Box, 2
= PlayStation, 3 = PSPortable, 4 = GameCube, 5 = GameBoy and 6 = Personal
Computer. The dataset is composed of complete orderings.
Object of S3 class c("top_ordering","matrix")
gathering a
matrix of complete orderings with rows and
columns.
Each row lists the gaming platforms from the most-liked (
Rank_1
) to
the least-liked (Rank_6
) for a given student.
The Gaming Platforms dataset in .csv format can be downloaded from http://qed.econ.queensu.ca/jae/2012-v27.5/fok-paap-van_dijk/. The .csv files contains the preference data in ranking format and some covariates collected for each student.
Fok, D., Paap, R. and Van Dijk, B. (2012). A Rank-Ordered Logit Model With Unobserved Heterogeneity In Ranking Capatibilities. Journal of Applied Econometrics, 27(5), pages 831–846.
data(d_gaming) head(d_gaming)
data(d_gaming) head(d_gaming)
The German Sample dataset (d_german
) is part of a comparative
cross-sectional study on political actions and mass participation involving
five Western countries. The dataset regards a sample of German
respondents who were asked to rank
political goals in order of
desirability, namely: 1 = maintaining order in the nation, 2 = giving people
more say in the decisions of government, 3 = fighting rising prices and 4 =
protecting freedom of speech. The dataset is composed of complete orderings.
Object of S3 class c("top_ordering","matrix")
gathering a
matrix of complete orderings with rows and
columns.
Each row lists the political goals from the most desiderable (
Rank_1
)
to the least desiderable (Rank_4
) for a given respondent.
Croon, M. A. (1989). Latent class models for the analysis of rankings. In De Soete, G., Feger, H. and Klauer, K. C. (eds), New Developments in Psychological Choice Modeling, pages 99–121. North-Holland: Amsterdam.
Barnes, S. H. et al. (1979). Political action. Mass participation in five Western democracies. London: Sage.
data(d_german) head(d_german)
data(d_german) head(d_german)
The NASCAR dataset (d_nascar
) collects the results of the 2002 season
of stock car racing held in the United States. The 2002 championship
consisted of races, with 43 car drivers competing in each race. A
total of
drivers participated in the 2002 season, taking part to
a variable number of races: some of them competed in all the races, some
others in only one. The results of the entire 2002 season were collected in
the form of top-43 orderings, where the position of the not-competing
drivers in each race is assumed lower than the 43th, but undetermined.
Missing positions are denoted with zero entries.
Object of S3 class c("top_ordering","matrix")
gathering a
matrix of partial orderings with rows and
columns.
Each row lists the car drivers from the top position (
Rank_1
) to the
bottom one (Rank_87
) in a given race. Columns from the 44th to the
87th are filled with zeros, because only 43 drivers competed in each race.
The NASCAR dataset in the MATLAB format used by Hunter, D. R. (2004) can be downloaded from http://sites.stat.psu.edu/~dhunter/code/btmatlab/. At the same link, a .xls file with drivers' names is also available.
Caron, F. and Doucet, A. (2012). Efficient Bayesian inference for Generalized Bradley-Terry models. J. Comput. Graph. Statist., 21(1), pages 174–196.
Guiver, J. and Snelson, E. (2009). Bayesian inference for Plackett-Luce ranking models. In Bottou, L. and Littman, M., editors, Proceedings of the 26th International Conference on Machine Learning - ICML 2009, pages 377–384. Omnipress.
Hunter, D. R. (2004). MM algorithms for Generalized Bradley-Terry models. Ann. Statist., 32(1), pages 384–406.
data(d_nascar) head(d_nascar) ## Compute the number of races for each of the 87 drivers table(c(d_nascar[,1:43])) ## Identify drivers arrived last (43th position) in all the races which(colSums(rank_summaries(d_nascar, format="ordering")$marginals[1:42,])==0) ## Obscure drivers 84, 85, 86 and 87 to get the reduced dataset ## with 83 racers employed by Hunter, D. R. (2004) d_nascar_hunter=d_nascar[,1:83] d_nascar_hunter[is.element(d_nascar_hunter,84:87)]=0
data(d_nascar) head(d_nascar) ## Compute the number of races for each of the 87 drivers table(c(d_nascar[,1:43])) ## Identify drivers arrived last (43th position) in all the races which(colSums(rank_summaries(d_nascar, format="ordering")$marginals[1:42,])==0) ## Obscure drivers 84, 85, 86 and 87 to get the reduced dataset ## with 83 racers employed by Hunter, D. R. (2004) d_nascar_hunter=d_nascar[,1:83] d_nascar_hunter[is.element(d_nascar_hunter,84:87)]=0
The Occupation dataset (d_occup
) came up from a survey conducted on
graduates from the Technion-Insrael Institute of Tecnology. A sample of
graduates were asked to rank
professions according to
the perceived prestige. The occupations are labeled as: 1 = faculty member,
2 = owner of a business, 3 = applied scientist, 4 = operations researcher, 5
= industrial engineer, 6 = manager, 7 = mechanical engineer, 8 = supervisor,
9 = technician and 10 = foreman. The dataset is composed of complete
orderings.
Object of S3 class c("top_ordering","matrix")
gathering a
matrix of complete orderings with rows and
columns.
Each row lists the professions from the most-liked (
Rank_1
) to the
least-liked (Rank_10
) for a given graduate.
Cohen, A. and Mallows, C. L. (1983). Assessing goodness of fit of ranking models to data. Journal of the Royal Statistical Society: Series D (The Statistician), 32(4), pages 361–374, ISSN: 0039-0526.
Cohen, A. (1982). Analysis of large sets of ranking data. Communications in Statistics – Theory and Methods, 11(3), pages 235–256.
Goldberg, A. I. (1976). The relevance of cosmopolitan/local orientations to professional values and behavior. Sociology of Work and Occupations, 3(3), pages 331–356.
data(d_occup) head(d_occup)
data(d_occup) head(d_occup)
The Rice Voting dataset (d_rice
) collects the results of the 1992
election of a faculty member to serve on the Presidential Search Committee
in the Rice University. A total of people casted their vote in
the ballots by ranking the
candidates in the short list in a
preferential manner. The dataset is composed of partial top orderings of
varying lengths. Missing positions are denoted with zero entries.
Object of S3 class c("top_ordering","matrix")
gathering a
matrix of partial orderings with rows and
columns.
Each row lists the faculty members from the most-liked (
Rank_1
) to
the least-liked (Rank_5
) in a given voting ballot.
Marcus, P., Heiser, W. J. and D'Ambrosio, A. (2013). Comparison of heterogeneous probability models for ranking data, Master Thesis, Leiden University.
Baggerly, K. A. (1995). Visual estimation of structure in ranked data, PhD thesis, Rice University.
data(d_rice) head(d_rice) ## Subset of complete sequences d_rice_compl=d_rice[rowSums(d_rice!=0)>=(ncol(d_rice)-1),] head(d_rice_compl)
data(d_rice) head(d_rice) ## Subset of complete sequences d_rice_compl=d_rice[rowSums(d_rice!=0)>=(ncol(d_rice)-1),] head(d_rice_compl)
The PLMIX package for R provides functions to fit and analyze finite mixtures of Plackett-Luce models for partial top rankings/orderings within the Bayesian framework. It provides MAP point estimates via EM algorithm and posterior MCMC simulations via Gibbs Sampling. It also fits MLE as a special case of the noninformative Bayesian analysis with vague priors.
In addition to inferential techniques, the package assists other fundamental phases of a model-based analysis for partial rankings/orderings, by including functions for data manipulation, simulation, descriptive summary, model selection and goodness-of-fit evaluation.
Specific S3 classes and methods are also supplied to enhance the usability and foster exchange with other packages. Finally, to address the issue of computationally demanding procedures typical in ranking data analysis, PLMIX takes advantage of a hybrid code linking the R environment with the C++ programming language.
The Plackett-Luce model is one of the most popular and frequently applied parametric distributions to analyze partial top rankings/orderings of a finite set of items. The present package allows to account for unobserved sample heterogeneity of partially ranked data with a model-based analysis relying on Bayesian finite mixtures of Plackett-Luce models. The package provides a suite of functions that covers the fundamental phases of a model-based analysis:
Ranking data manipulation
Binary group membership matrix from the mixture component labels.
From the frequency distribution to the dataset of individual orderings/rankings.
Random completion of partial orderings/rankings data.
Censoring of complete orderings/rankings data.
From rankings to orderings and vice-versa.
From the dataset of individual orderings/rankings to the frequency distribution.
Ranking data simulation
Random sample from a finite mixture of Plackett-Luce models.
Ranking data description
Paired comparison frequencies.
Summary statistics of partial ranking/ordering data.
Model estimation
Bayesian analysis with MCMC posterior simulation via Gibbs sampling.
Label switching adjustment of the Gibbs sampling simulations.
Likelihood evaluation for a mixture of Plackett-Luce models.
Log-likelihood evaluation for a mixture of Plackett-Luce models.
MAP estimation via EM algorithm.
MAP estimation via EM algorithm with multiple starting values.
Class coercion and membership
Coercion into top-ordering datasets.
From the Gibbs sampling simulation to an MCMC class object.
Test for the consistency of input data with a top-ordering dataset.
S3 class methods
Plot of the Gibbs sampling simulations.
Plot of the MAP estimates.
Print of the Gibbs sampling simulations.
Print of the MAP estimation algorithm.
Summary of the Gibbs sampling procedure.
Summary of the MAP estimation.
Model selection
BIC value for the MLE of a mixture of Plackett-Luce models.
Bayesian model selection criteria.
Model assessment
Posterior predictive diagnostics.
Posterior predictive diagnostics conditionally on the number of ranked items.
Datasets
American Psychological Association Data (partial orderings).
Car Configurator Data (partial orderings).
Dublin West Data (partial orderings).
Gaming Platforms Data (complete orderings).
German Sample Data (complete orderings).
NASCAR Data (partial orderings).
Occupation Data (complete orderings).
Rice Voting Data (partial orderings).
Data
have to be supplied as an object of class matrix
, where missing
positions/items are denoted with zero entries and Rank = 1 indicates the
most-liked alternative. For a more efficient implementation of the methods,
partial sequences with a single missing entry should be preliminarily filled
in, as they correspond to complete rankings/orderings. In the present
setting, ties are not allowed. Some quantities frequently recalled in the
manual are the following:
Sample size.
Number of possible items.
Number of mixture components.
Size of the final posterior MCMC sample (after burn-in phase).
Cristina Mollica and Luca Tardella
Maintainer: Cristina Mollica <[email protected]>
Mollica, C. and Tardella, L. (2017). Bayesian Plackett-Luce mixture models for partially ranked data. Psychometrika, 82(2), pages 442–458, ISSN: 0033-3123, http://dx.doi.org/10.1007/s11336-016-9530-0.
Mollica, C. and Tardella, L. (2014). Epitope profiling via mixture modeling for ranked data. Statistics in Medicine, 33(21), pages 3738–3758, ISSN: 0277-6715, http://onlinelibrary.wiley.com/doi/10.1002/sim.6224/full.
The function Estep
updates the posterior component membership probabilities in the EM algorithm for MAP estimation of a Bayesian mixture of Plackett-Luce models.
Estep(p, ref_order, weights, pi_inv)
Estep(p, ref_order, weights, pi_inv)
p |
Numeric |
ref_order |
Numeric |
weights |
Numeric vector of the |
pi_inv |
Numeric |
Numeric matrix of estimated posterior component membership probabilities.
Construct the dataset of individual rankings/orderings from the frequency distribution of the distinct observed sequences.
freq_to_unit(freq_distr)
freq_to_unit(freq_distr)
freq_distr |
Numeric matrix of the distinct observed sequences with the corresponding frequencies indicated in the last |
Numeric data matrix of observed individual sequences.
Cristina Mollica and Luca Tardella
library(gtools) K <- 4 perm_matrix <- permutations(n=K, r=K) freq_data <- cbind(perm_matrix, sample(1:factorial(K))) freq_data freq_to_unit(freq_distr=freq_data)
library(gtools) K <- 4 perm_matrix <- permutations(n=K, r=K) freq_data <- cbind(perm_matrix, sample(1:factorial(K))) freq_data freq_to_unit(freq_distr=freq_data)
Perform Gibbs sampling simulation for a Bayesian mixture of Plackett-Luce models fitted to partial orderings.
gibbsPLMIX( pi_inv, K, G, init = list(z = NULL, p = NULL), n_iter = 1000, n_burn = 500, hyper = list(shape0 = matrix(1, nrow = G, ncol = K), rate0 = rep(0.001, G), alpha0 = rep(1, G)), centered_start = FALSE )
gibbsPLMIX( pi_inv, K, G, init = list(z = NULL, p = NULL), n_iter = 1000, n_burn = 500, hyper = list(shape0 = matrix(1, nrow = G, ncol = K), rate0 = rep(0.001, G), alpha0 = rep(1, G)), centered_start = FALSE )
pi_inv |
An object of class |
K |
Number of possible items. |
G |
Number of mixture components. |
init |
List of named objects with initialization values: |
n_iter |
Total number of MCMC iterations. |
n_burn |
Number of initial burn-in drawings removed from the returned MCMC sample. |
hyper |
List of named objects with hyperparameter values for the conjugate prior specification: |
centered_start |
Logical: whether a random start whose support parameters and weights should be centered around the observed relative frequency that each item has been ranked top. Default is |
The size of the final MCMC sample is equal to
n_iter
-n_burn
.
A list of S3 class gsPLMIX
with named elements:
W |
Numeric |
P |
Numeric |
log_lik |
Numeric vector of |
deviance |
Numeric vector of |
objective |
Numeric vector of |
call |
The matched call. |
Cristina Mollica and Luca Tardella
Mollica, C. and Tardella, L. (2017). Bayesian Plackett-Luce mixture models for partially ranked data. Psychometrika, 82(2), pages 442–458, ISSN: 0033-3123, DOI: 10.1007/s11336-016-9530-0.
data(d_carconf) GIBBS <- gibbsPLMIX(pi_inv=d_carconf, K=ncol(d_carconf), G=3, n_iter=30, n_burn=10) str(GIBBS) GIBBS$P GIBBS$W
data(d_carconf) GIBBS <- gibbsPLMIX(pi_inv=d_carconf, K=ncol(d_carconf), G=3, n_iter=30, n_burn=10) str(GIBBS) GIBBS$P GIBBS$W
gibbsPLMIX_with_norm
gibbsPLMIX_with_norm( pi_inv, K, G, init = list(z = NULL, p = NULL), n_iter = 1000, n_burn = 500, hyper = list(shape0 = matrix(1, nrow = G, ncol = K), rate0 = rep(0.001, G), alpha0 = rep(1, G)), centered_start = FALSE )
gibbsPLMIX_with_norm( pi_inv, K, G, init = list(z = NULL, p = NULL), n_iter = 1000, n_burn = 500, hyper = list(shape0 = matrix(1, nrow = G, ncol = K), rate0 = rep(0.001, G), alpha0 = rep(1, G)), centered_start = FALSE )
pi_inv |
|
K |
|
G |
|
init |
|
n_iter |
|
n_burn |
|
hyper |
|
centered_start |
Coerce the Gibbs sampling simulations for a Bayesian mixture of Plackett-Luce models into an mcmc
class object.
gsPLMIX_to_mcmc(gsPLMIX_out)
gsPLMIX_to_mcmc(gsPLMIX_out)
gsPLMIX_out |
Object of class |
gsPLMIX_to_mcmc
attemps to coerce its argument by recalling the as.mcmc
function of the coda
package.
An mcmc
class object.
Cristina Mollica and Luca Tardella
Plummer, M., Best, N., Cowles, K. and Vines, K. (2006). CODA: Convergence Diagnosis and Output Analysis for MCMC, R News, 6, pages 7–11, ISSN: 1609-3631.
data(d_carconf) GIBBS <- gibbsPLMIX(pi_inv=d_carconf, K=ncol(d_carconf), G=3, n_iter=30, n_burn=10) ## Coerce the posterior samples into an mcmc class object gsPLMIX_to_mcmc(GIBBS)
data(d_carconf) GIBBS <- gibbsPLMIX(pi_inv=d_carconf, K=ncol(d_carconf), G=3, n_iter=30, n_burn=10) ## Coerce the posterior samples into an mcmc class object gsPLMIX_to_mcmc(GIBBS)
Counts how many items are ranked in a partial ranking matrix
howmanyranked(pi_inv)
howmanyranked(pi_inv)
pi_inv |
Numeric |
Numeric vector of length with the number of items ranked by each sample unit
Check the consistency of partial ordering data with a top-ordering dataset.
is.top_ordering(data, ...)
is.top_ordering(data, ...)
data |
An object containing the partial orderings whose consistency with a top-ordering dataset has to be tested. The following classes are admissible for |
... |
Further arguments passed to or from other methods (not used). |
The argument data
requires the partial sequences expressed in ordering format. When the value of is.top-ordering
is FALSE
, the membership function returns also a message with the conditions that are not met for the data
to be a top-ordering dataset. NA
's in the input data
are tacitly converted into zero entries.
Logical: TRUE
if the data
argument is consistent with a top-ordering dataset (with a possible warning message if the supplied data need a further treatment with the coercion function as.top_ordering
before being processed with the core functions of PLMIX) and FALSE
otherwise.
Cristina Mollica and Luca Tardella
Turner, H., Kormidis, I. and Firth, D. (2018). PlackettLuce: Plackett-Luce Models for Rankings. R package version 0.2-3. https://CRAN.R-project.org/package=PlackettLuce
Qian, Z. (2018). rankdist: Distance Based Ranking Models. R package version 1.1.3. https://CRAN.R-project.org/package=rankdist
## A toy example of data matrix not satisfying the conditions to be a top-ordering dataset toy_data=rbind(1:5, c(0,4,3,2,1), c(4,3.4,2,1,5), c(2,3,0,0,NA), c(4,4,3,2,5), c(3,5,4,2,6), c(2,-3,1,4,5), c(2,0,1,4,5), c(2,3,1,1,1), c(2,3,0,4,0)) is.top_ordering(data=toy_data) ## A dataset from the StatRank package satisfying the conditions to be a top-ordering dataset library(StatRank) data(Data.Election9) is.top_ordering(data=Data.Election9)
## A toy example of data matrix not satisfying the conditions to be a top-ordering dataset toy_data=rbind(1:5, c(0,4,3,2,1), c(4,3.4,2,1,5), c(2,3,0,0,NA), c(4,4,3,2,5), c(3,5,4,2,6), c(2,-3,1,4,5), c(2,0,1,4,5), c(2,3,1,1,1), c(2,3,0,4,0)) is.top_ordering(data=toy_data) ## A dataset from the StatRank package satisfying the conditions to be a top-ordering dataset library(StatRank) data(Data.Election9) is.top_ordering(data=Data.Election9)
Remove the label switching phenomenon from the MCMC samples of Bayesian mixtures of Plackett-Luce models with components.
label_switchPLMIX( pi_inv, seq_G, MCMCsampleP, MCMCsampleW, MAPestP, MAPestW, parallel = FALSE )
label_switchPLMIX( pi_inv, seq_G, MCMCsampleP, MCMCsampleW, MAPestP, MAPestW, parallel = FALSE )
pi_inv |
An object of class |
seq_G |
Numeric vector with the number of components of the Plackett-Luce mixtures to be assessed. |
MCMCsampleP |
List of size |
MCMCsampleW |
List of size |
MAPestP |
List of size |
MAPestW |
List of size |
parallel |
Logical: whether parallelization should be used. Default is |
The label_switchPLMIX
function performs the label switching adjustment of the MCMC samples via the Pivotal Reordering Algorithm (PRA) described in Marin et al (2005), by recalling the pra
function from the label.switching
package.
A list of named objects:
final_sampleP |
List of size |
final_sampleW |
List of size |
Cristina Mollica and Luca Tardella
Mollica, C. and Tardella, L. (2017). Bayesian Plackett-Luce mixture models for partially ranked data. Psychometrika, 82(2), pages 442–458, ISSN: 0033-3123, DOI: 10.1007/s11336-016-9530-0.
Papastamoulis, P. (2016). label.switching: An R Package for Dealing with the Label Switching Problem in MCMC Outputs. Journal of Statistical Software, 69(1), pages 1–24, DOI: 10.18637/jss.v069.c01.
Marin, J. M., Mengersen, K. and Robert, C.P. (2005). Bayesian modelling and inference on mixtures of distributions. Handbook of Statistics (25), D. Dey and C.R. Rao (eds). Elsevier-Sciences.
data(d_carconf) K <- ncol(d_carconf) ## Fit 1- and 2-component PL mixtures via MAP estimation MAP_1 <- mapPLMIX_multistart(pi_inv=d_carconf, K=K, G=1, n_start=2, n_iter=400*1) MAP_2 <- mapPLMIX_multistart(pi_inv=d_carconf, K=K, G=2, n_start=2, n_iter=400*2) MAP_3 <- mapPLMIX_multistart(pi_inv=d_carconf, K=K, G=3, n_start=2, n_iter=400*3) mcmc_iter <- 30 burnin <- 10 ## Fit 1- and 2-component PL mixtures via Gibbs sampling procedure GIBBS_1 <- gibbsPLMIX(pi_inv=d_carconf, K=K, G=1, n_iter=mcmc_iter, n_burn=burnin, init=list(p=MAP_1$mod$P_map, z=binary_group_ind(MAP_1$mod$class_map,G=1))) GIBBS_2 <- gibbsPLMIX(pi_inv=d_carconf, K=K, G=2, n_iter=mcmc_iter, n_burn=burnin, init=list(p=MAP_2$mod$P_map, z=binary_group_ind(MAP_2$mod$class_map,G=2))) GIBBS_3 <- gibbsPLMIX(pi_inv=d_carconf, K=K, G=3, n_iter=mcmc_iter, n_burn=burnin, init=list(p=MAP_3$mod$P_map, z=binary_group_ind(MAP_3$mod$class_map,G=3))) ## Adjusting the MCMC samples for label switching LS <- label_switchPLMIX(pi_inv=d_carconf, seq_G=1:3, MCMCsampleP=list(GIBBS_1$P, GIBBS_2$P, GIBBS_3$P), MCMCsampleW=list(GIBBS_1$W, GIBBS_2$W, GIBBS_3$W), MAPestP=list(MAP_1$mod$P_map, MAP_2$mod$P_map, MAP_3$mod$P_map), MAPestW=list(MAP_1$mod$W_map, MAP_2$mod$W_map, MAP_3$mod$W_map)) str(LS)
data(d_carconf) K <- ncol(d_carconf) ## Fit 1- and 2-component PL mixtures via MAP estimation MAP_1 <- mapPLMIX_multistart(pi_inv=d_carconf, K=K, G=1, n_start=2, n_iter=400*1) MAP_2 <- mapPLMIX_multistart(pi_inv=d_carconf, K=K, G=2, n_start=2, n_iter=400*2) MAP_3 <- mapPLMIX_multistart(pi_inv=d_carconf, K=K, G=3, n_start=2, n_iter=400*3) mcmc_iter <- 30 burnin <- 10 ## Fit 1- and 2-component PL mixtures via Gibbs sampling procedure GIBBS_1 <- gibbsPLMIX(pi_inv=d_carconf, K=K, G=1, n_iter=mcmc_iter, n_burn=burnin, init=list(p=MAP_1$mod$P_map, z=binary_group_ind(MAP_1$mod$class_map,G=1))) GIBBS_2 <- gibbsPLMIX(pi_inv=d_carconf, K=K, G=2, n_iter=mcmc_iter, n_burn=burnin, init=list(p=MAP_2$mod$P_map, z=binary_group_ind(MAP_2$mod$class_map,G=2))) GIBBS_3 <- gibbsPLMIX(pi_inv=d_carconf, K=K, G=3, n_iter=mcmc_iter, n_burn=burnin, init=list(p=MAP_3$mod$P_map, z=binary_group_ind(MAP_3$mod$class_map,G=3))) ## Adjusting the MCMC samples for label switching LS <- label_switchPLMIX(pi_inv=d_carconf, seq_G=1:3, MCMCsampleP=list(GIBBS_1$P, GIBBS_2$P, GIBBS_3$P), MCMCsampleW=list(GIBBS_1$W, GIBBS_2$W, GIBBS_3$W), MAPestP=list(MAP_1$mod$P_map, MAP_2$mod$P_map, MAP_3$mod$P_map), MAPestW=list(MAP_1$mod$W_map, MAP_2$mod$W_map, MAP_3$mod$W_map)) str(LS)
Compute either the likelihood or the log-likelihood of the Plackett-Luce mixture model parameters for a partial ordering dataset.
likPLMIX(p, ref_order, weights, pi_inv) loglikPLMIX(p, ref_order, weights, pi_inv)
likPLMIX(p, ref_order, weights, pi_inv) loglikPLMIX(p, ref_order, weights, pi_inv)
p |
Numeric |
ref_order |
Numeric |
weights |
Numeric vector of |
pi_inv |
An object of class |
The ref_order
argument accommodates for the more general mixture of Extended Plackett-Luce models (EPL), involving the additional reference order parameters (Mollica and Tardella 2014). A permutation of the first integers can be specified in each row of the
ref_order
argument. Since the Plackett-Luce model is a special instance of the EPL with the reference order equal to the identity permutation, the ref_order
argument must be a matrix with rows equal to
when dealing with Plackett-Luce mixtures.
Either the likelihood or the log-likelihood value of the Plackett-Luce mixture model parameters for a partial ordering dataset.
Cristina Mollica and Luca Tardella
Mollica, C. and Tardella, L. (2017). Bayesian Plackett-Luce mixture models for partially ranked data. Psychometrika, 82(2), pages 442–458, ISSN: 0033-3123, DOI: 10.1007/s11336-016-9530-0.
Mollica, C. and Tardella, L. (2014). Epitope profiling via mixture modeling for ranked data. Statistics in Medicine, 33(21), pages 3738–3758, ISSN: 0277-6715, DOI: 10.1002/sim.6224.
data(d_apa) K <- ncol(d_apa) G <- 3 support_par <- matrix(1:(G*K), nrow=G, ncol=K) weights_par <- c(0.50, 0.25, 0.25) loglikPLMIX(p=support_par, ref_order=matrix(1:K, nrow=G, ncol=K, byrow=TRUE), weights=weights_par, pi_inv=d_apa)
data(d_apa) K <- ncol(d_apa) G <- 3 support_par <- matrix(1:(G*K), nrow=G, ncol=K) weights_par <- c(0.50, 0.25, 0.25) loglikPLMIX(p=support_par, ref_order=matrix(1:K, nrow=G, ncol=K, byrow=TRUE), weights=weights_par, pi_inv=d_apa)
Return complete rankings/orderings from partial sequences relying on a random generation of the missing positions/items.
make_complete( data, format_input, nranked = NULL, probitems = rep(1, ncol(data)) )
make_complete( data, format_input, nranked = NULL, probitems = rep(1, ncol(data)) )
data |
Numeric |
format_input |
Character string indicating the format of the |
nranked |
Optional numeric vector of length |
probitems |
Numeric vector with the |
The completion of the partial top rankings/orderings is performed according to the Plackett-Luce scheme, that is, with a sampling without replacement of the not-ranked items by using the positive values in the probitems
argument as support parameters (normalization is not necessary).
A list of two named objects:
completedata |
Numeric |
nranked |
Numeric vector of length |
Cristina Mollica and Luca Tardella
## Completion based on the top item frequencies data(d_dublinwest) head(d_dublinwest) top_item_freq <- rank_summaries(data=d_dublinwest, format_input="ordering", mean_rank=FALSE, pc=FALSE)$marginals["Rank_1",] d_dublinwest_compl <- make_complete(data=d_dublinwest, format_input="ordering", probitems=top_item_freq) head(d_dublinwest_compl$completedata)
## Completion based on the top item frequencies data(d_dublinwest) head(d_dublinwest) top_item_freq <- rank_summaries(data=d_dublinwest, format_input="ordering", mean_rank=FALSE, pc=FALSE)$marginals["Rank_1",] d_dublinwest_compl <- make_complete(data=d_dublinwest, format_input="ordering", probitems=top_item_freq) head(d_dublinwest_compl$completedata)
Return partial top rankings/orderings from complete sequences obtained either with user-specified censoring patterns or with a random truncation.
make_partial( data, format_input, nranked = NULL, probcens = rep(1, ncol(data) - 1) )
make_partial( data, format_input, nranked = NULL, probcens = rep(1, ncol(data) - 1) )
data |
Numeric |
format_input |
Character string indicating the format of the |
nranked |
Numeric vector of length |
probcens |
Numeric vector of length |
The censoring of the complete sequences can be performed in: (i) a deterministic way, by specifying the number of top positions to be retained for each sample unit in the nranked
argument; (ii) a random way, by sequentially specifying the probabilities of the top-1, top-2, , top-
censoring patterns in the
probcens
argument. Recall that a top- sequence corresponds to a complete ordering/ranking.
A list of two named objects:
partialdata |
Numeric |
nranked |
Numeric vector of length |
Cristina Mollica and Luca Tardella
data(d_german) head(d_german) d_german_cens <- make_partial(data=d_german, format_input="ordering", probcens=c(0.3, 0.3, 0.4)) head(d_german_cens$partialdata) ## Check consistency with the nominal censoring probabilities round(prop.table(table(d_german_cens$nranked)), 2)
data(d_german) head(d_german) d_german_cens <- make_partial(data=d_german, format_input="ordering", probcens=c(0.3, 0.3, 0.4)) head(d_german_cens$partialdata) ## Check consistency with the nominal censoring probabilities round(prop.table(table(d_german_cens$nranked)), 2)
Perform MAP estimation via EM algorithm for a Bayesian mixture of Plackett-Luce models fitted to partial orderings.
mapPLMIX( pi_inv, K, G, init = list(p = NULL, omega = NULL), n_iter = 1000, hyper = list(shape0 = matrix(1, nrow = G, ncol = K), rate0 = rep(0, G), alpha0 = rep(1, G)), eps = 10^(-6), centered_start = FALSE, plot_objective = FALSE )
mapPLMIX( pi_inv, K, G, init = list(p = NULL, omega = NULL), n_iter = 1000, hyper = list(shape0 = matrix(1, nrow = G, ncol = K), rate0 = rep(0, G), alpha0 = rep(1, G)), eps = 10^(-6), centered_start = FALSE, plot_objective = FALSE )
pi_inv |
An object of class |
K |
Number of possible items. |
G |
Number of mixture components. |
init |
List of named objects with initialization values: |
n_iter |
Maximum number of EM iterations. |
hyper |
List of named objects with hyperparameter values for the conjugate prior specification: |
eps |
Tolerance value for the convergence criterion. |
centered_start |
Logical: whether a random start whose support parameters and weights should be centered around the observed relative frequency that each item has been ranked top. Default is |
plot_objective |
Logical: whether the objective function (that is the kernel of the log-posterior distribution) should be plotted. Default is |
Under noninformative (flat) prior setting, the EM algorithm for MAP estimation corresponds to the EMM algorithm described by Gormley and Murphy (2006) to perform frequentist inference. In this case, the MAP solution coincides with the MLE and the output vectors log_lik
and objective
coincide as well.
The mapPLMIX
function performs the MAP procedure with a single starting value. To address the issue of local maxima in the posterior distribution, see the mapPLMIX_multistart
function.
A list of S3 class mpPLMIX
with named elements:
W_map |
Numeric vector with the MAP estimates of the |
P_map |
Numeric |
z_hat |
Numeric |
class_map |
Numeric vector of |
log_lik |
Numeric vector of the log-likelihood values at each iteration. |
objective |
Numeric vector of the objective function values (that is the kernel of the log-posterior distribution) at each iteration. |
max_objective |
Maximized objective function value. |
bic |
BIC value (only for the default flat priors, otherwise |
conv |
Binary convergence indicator: 1 = convergence has been achieved, 0 = otherwise. |
call |
The matched call. |
Cristina Mollica and Luca Tardella
Mollica, C. and Tardella, L. (2017). Bayesian Plackett-Luce mixture models for partially ranked data. Psychometrika, 82(2), pages 442–458, ISSN: 0033-3123, DOI: 10.1007/s11336-016-9530-0.
Gormley, I. C. and Murphy, T. B. (2006). Analysis of Irish third-level college applications data. Journal of the Royal Statistical Society: Series A, 169(2), pages 361–379, ISSN: 0964-1998, DOI: 10.1111/j.1467-985X.2006.00412.x.
data(d_carconf) MAP <- mapPLMIX(pi_inv=d_carconf, K=ncol(d_carconf), G=3, n_iter=400*3) str(MAP) MAP$P_map MAP$W_map
data(d_carconf) MAP <- mapPLMIX(pi_inv=d_carconf, K=ncol(d_carconf), G=3, n_iter=400*3) str(MAP) MAP$P_map MAP$W_map
Perform MAP estimation via EM algorithm with multiple starting values for a Bayesian mixture of Plackett-Luce models fitted to partial orderings.
mapPLMIX_multistart( pi_inv, K, G, n_start = 1, init = rep(list(list(p = NULL, omega = NULL)), times = n_start), n_iter = 200, hyper = list(shape0 = matrix(1, nrow = G, ncol = K), rate0 = rep(0, G), alpha0 = rep(1, G)), eps = 10^(-6), plot_objective = FALSE, init_index = 1:n_start, parallel = FALSE, centered_start = FALSE )
mapPLMIX_multistart( pi_inv, K, G, n_start = 1, init = rep(list(list(p = NULL, omega = NULL)), times = n_start), n_iter = 200, hyper = list(shape0 = matrix(1, nrow = G, ncol = K), rate0 = rep(0, G), alpha0 = rep(1, G)), eps = 10^(-6), plot_objective = FALSE, init_index = 1:n_start, parallel = FALSE, centered_start = FALSE )
pi_inv |
An object of class |
K |
Number of possible items. |
G |
Number of mixture components. |
n_start |
Number of starting values. |
init |
List of |
n_iter |
Maximum number of EM iterations. |
hyper |
List of named objects with hyperparameter values for the conjugate prior specification: |
eps |
Tolerance value for the convergence criterion. |
plot_objective |
Logical: whether the objective function (that is the kernel of the log-posterior distribution) should be plotted. Default is |
init_index |
Numeric vector indicating the positions of the starting values in the |
parallel |
Logical: whether parallelization should be used. Default is |
centered_start |
Logical: whether a random start whose support parameters and weights should be centered around the observed relative frequency that each item has been ranked top. Default is |
Under noninformative (flat) prior setting, the EM algorithm for MAP estimation corresponds to the EMM algorithm described by Gormley and Murphy (2006) to perform frequentist inference. In this case the MAP solution coincides with the MLE. The best model in terms of maximized posterior distribution is returned.
A list of S3 class mpPLMIX
with named elements:
mod |
List of named objects describing the best model in terms of maximized posterior distribution. See output values of the single-run |
max_objective |
Numeric vector of the maximized objective function values for each initialization. |
convergence |
Binary vector with |
call |
The matched call. |
Cristina Mollica and Luca Tardella
Mollica, C. and Tardella, L. (2017). Bayesian Plackett-Luce mixture models for partially ranked data. Psychometrika, 82(2), pages 442–458, ISSN: 0033-3123, DOI: 10.1007/s11336-016-9530-0.
Gormley, I. C. and Murphy, T. B. (2006). Analysis of Irish third-level college applications data. Journal of the Royal Statistical Society: Series A, 169(2), pages 361–379, ISSN: 0964-1998, DOI: 10.1111/j.1467-985X.2006.00412.x.
data(d_carconf) MAP_mult <- mapPLMIX_multistart(pi_inv=d_carconf, K=ncol(d_carconf), G=3, n_start=2, n_iter=400*3) str(MAP_mult) MAP_mult$mod$P_map MAP_mult$mod$W_map
data(d_carconf) MAP_mult <- mapPLMIX_multistart(pi_inv=d_carconf, K=ncol(d_carconf), G=3, n_start=2, n_iter=400*3) str(MAP_mult) MAP_mult$mod$P_map MAP_mult$mod$W_map
Construct the paired comparison matrix for a partial ordering/ranking dataset.
paired_comparisons(data, format_input, nranked = NULL)
paired_comparisons(data, format_input, nranked = NULL)
data |
Numeric |
format_input |
Character string indicating the format of the |
nranked |
Optional numeric vector of length |
Numeric paired comparison matrix: the
-th entry indicates the number of sample units that preferred item
to item
.
Cristina Mollica and Luca Tardella
Mollica, C. and Tardella, L. (2017). Bayesian Plackett-Luce mixture models for partially ranked data. Psychometrika, 82(2), pages 442–458, ISSN: 0033-3123, DOI: 10.1007/s11336-016-9530-0.
data(d_dublinwest) paired_comparisons(data=d_dublinwest, format_input="ordering")
data(d_dublinwest) paired_comparisons(data=d_dublinwest, format_input="ordering")
Random generation from a finite mixture of Plackett-Luce models and subsequent censoring according to a given partial ordering matrix.
PLMIXsim(N, K, G, p, ref_order, weights, rankingFormat, pi_inv)
PLMIXsim(N, K, G, p, ref_order, weights, rankingFormat, pi_inv)
N |
Number of sample units. |
K |
Number of possible items. |
G |
Number of mixture components. |
p |
Numeric |
ref_order |
Numeric |
weights |
Numeric vector of the |
rankingFormat |
Logical: whether the final simulated data should be expressed in ranking format. |
pi_inv |
Numeric |
Numeric matrix of simulated data (default is in ordering format) with the same missingness patterns of
pi_inv
.
plot
method for class gsPLMIX
. It builds a suite of plots, visual convergence diagnostics and credible intervals for the MCMC samples of a Bayesian mixture of Plackett-Luce models. Graphics can be plotted directly into the current working device or stored into an external file placed into the current working directory.
## S3 method for class 'gsPLMIX' plot( x, file = "ggmcmc-output.pdf", family = NA, plot = NULL, param_page = 5, width = 7, height = 10, dev_type_html = "png", post_est = "mean", max_scale_radar = NULL, ... )
## S3 method for class 'gsPLMIX' plot( x, file = "ggmcmc-output.pdf", family = NA, plot = NULL, param_page = 5, width = 7, height = 10, dev_type_html = "png", post_est = "mean", max_scale_radar = NULL, ... )
x |
Object of class |
file |
Character vector with the name of the file to be created in the current working directory. Defaults is "ggmcmc-output.pdf". When NULL, plots are directly returned into the current working device (not recommended). This option allows also the user to work with an opened pdf (or other) device. When the file has an html file extension, the output is an Rmarkdown report with the figures embedded in the html file. |
family |
Character string indicating the name of the family of parameters to be plotted. A family of parameters is considered to be any group of parameters with the same name but different numerical values (for example |
plot |
Character vector containing the names of the desired plots. Default is |
param_page |
Number of parameters to be plotted in each page. Defaults is 5. |
width |
Numeric scalar indicating the width of the pdf display in inches. Defaults is 7. |
height |
Numeric scalar indicating the height of the pdf display in inches. Defaults is 10. |
dev_type_html |
Character vector indicating the type of graphical device for the html output. Default is |
post_est |
Character string indicating the point estimates of the Plackett-Luce mixture parameters to be computed from the |
max_scale_radar |
Numeric scalar indicating the maximum value on each axis of the radar plot for the support parameter point estimates. Default is |
... |
Further arguments passed to or from other methods (not used). |
Plots of the MCMC samples include histograms, densities, traceplots, running means plots, overlapped densities comparing the complete and partial samples, autocorrelation functions, crosscorrelation plots and caterpillar plots of the 90 and 95% equal-tails credible intervals. Note that the latter are created for the support parameters (when either family=NA
or family="p"
), for the mixture weights in the case (when either
family=NA
or family="w"
), for the log-likelihood values (when family="log_lik"
), for the deviance values (when family="deviance"
). Convergence tools include the potential scale reduction factor and the Geweke z-score. These functionalities are implemented with a call to the ggs
and ggmcmc
functions of the ggmcmc
package (see 'Examples' for the specification of the plot
argument) and for the objective function values (when family="objective"
).
By recalling the chartJSRadar
function from the radarchart
package and the routines of the ggplot2
package, plot.gsPLMIX
additionally produces a radar plot of the support parameters and, when , a donut plot of the mixture weights based on the posterior point estimates. The radar chart is returned in the Viewer Pane.
Cristina Mollica and Luca Tardella
Ashton, D. and Porter, S. (2016). radarchart: Radar Chart from 'Chart.js'. R package version 0.3.1. https://CRAN.R-project.org/package=radarchart
Wickham, H. (2009). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York.
Fernandez-i-Marin, X. (2006). ggmcmc: Analysis of MCMC Samples and Bayesian Inference, Journal of Statistical Software, 70(9), pages 1–20, DOI: 10.18637/jss.v070.i09.
ggs
, ggmcmc
, chartJSRadar
and ggplot
# Not run: data(d_carconf) GIBBS <- gibbsPLMIX(pi_inv=d_carconf, K=ncol(d_carconf), G=5, n_iter=30, n_burn=10) # Not run: # Plot posterior samples supplied as an gsPLMIX class object # plot(GIBBS) # Selected plots of the posterior samples of the support parameters # plot(GIBBS, family="p", plot=c("compare_partial","Rhat","caterpillar"), param_page=6) # Selected plots of the posterior samples of the mixture weights # plot(GIBBS, family="w", plot=c("histogram","running","crosscorrelation","caterpillar")) # Selected plots of the posterior log-likelihood values # plot(GIBBS, family="log_lik", plot=c("autocorrelation","geweke"), param_page=1) # Selected plots of the posterior deviance values # plot(GIBBS, family="deviance", plot=c("traceplot","density"), param_page=1)
# Not run: data(d_carconf) GIBBS <- gibbsPLMIX(pi_inv=d_carconf, K=ncol(d_carconf), G=5, n_iter=30, n_burn=10) # Not run: # Plot posterior samples supplied as an gsPLMIX class object # plot(GIBBS) # Selected plots of the posterior samples of the support parameters # plot(GIBBS, family="p", plot=c("compare_partial","Rhat","caterpillar"), param_page=6) # Selected plots of the posterior samples of the mixture weights # plot(GIBBS, family="w", plot=c("histogram","running","crosscorrelation","caterpillar")) # Selected plots of the posterior log-likelihood values # plot(GIBBS, family="log_lik", plot=c("autocorrelation","geweke"), param_page=1) # Selected plots of the posterior deviance values # plot(GIBBS, family="deviance", plot=c("traceplot","density"), param_page=1)
plot
method for class mpPLMIX
.
## S3 method for class 'mpPLMIX' plot(x, max_scale_radar = NULL, ...)
## S3 method for class 'mpPLMIX' plot(x, max_scale_radar = NULL, ...)
x |
Object of class |
max_scale_radar |
Numeric scalar indicating the maximum value on each axis of the radar plot for the support parameter point estimates. Default is |
... |
Further arguments passed to or from other methods (not used). |
By recalling the chartJSRadar
function from the radarchart
package and the routines of the ggplot2
package, plot.mpPLMIX
produces a radar plot of the support parameters and, when , a donut plot of the mixture weights and a heatmap of the component membership probabilities based on the MAP estimates. The radar chart is returned in the Viewer Pane.
Cristina Mollica and Luca Tardella
Ashton, D. and Porter, S. (2016). radarchart: Radar Chart from 'Chart.js'. R package version 0.3.1. https://CRAN.R-project.org/package=radarchart
Wickham, H. (2009). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York.
chartJSRadar
and ggplot
# Not run: data(d_carconf) MAP <- mapPLMIX(pi_inv=d_carconf, K=ncol(d_carconf), G=3) plot(MAP) # Not run: MAP_multi <- mapPLMIX_multistart(pi_inv=d_carconf, K=ncol(d_carconf), G=3, n_start=5) plot(MAP_multi)
# Not run: data(d_carconf) MAP <- mapPLMIX(pi_inv=d_carconf, K=ncol(d_carconf), G=3) plot(MAP) # Not run: MAP_multi <- mapPLMIX_multistart(pi_inv=d_carconf, K=ncol(d_carconf), G=3, n_start=5) plot(MAP_multi)
Perform posterior predictive check to assess the goodness-of-fit of Bayesian mixtures of Plackett-Luce models with a different number of components.
ppcheckPLMIX( pi_inv, seq_G, MCMCsampleP, MCMCsampleW, top1 = TRUE, paired = TRUE, parallel = FALSE )
ppcheckPLMIX( pi_inv, seq_G, MCMCsampleP, MCMCsampleW, top1 = TRUE, paired = TRUE, parallel = FALSE )
pi_inv |
An object of class |
seq_G |
Numeric vector with the number of components of the Plackett-Luce mixtures to be assessed. |
MCMCsampleP |
List of size |
MCMCsampleW |
List of size |
top1 |
Logical: whether the posterior predictive |
paired |
Logical: whether the posterior predictive |
parallel |
Logical: whether parallelization should be used. Default is |
The ppcheckPLMIX
function returns two posterior predictive -values based on two chi squared discrepancy variables involving: (i) the top item frequencies and (ii) the paired comparison frequencies. In the presence of partial sequences in the
pi_inv
matrix, the same missingness patterns observed in the dataset (i.e., the number of items ranked by each sample unit) are reproduced on the replicated datasets from the posterior predictive distribution.
A list with a named element:
post_pred_pvalue |
Numeric |
Cristina Mollica and Luca Tardella
Mollica, C. and Tardella, L. (2017). Bayesian Plackett-Luce mixture models for partially ranked data. Psychometrika, 82(2), pages 442–458, ISSN: 0033-3123, DOI: 10.1007/s11336-016-9530-0.
data(d_carconf) K <- ncol(d_carconf) ## Fit 1- and 2-component PL mixtures via MAP estimation MAP_1 <- mapPLMIX_multistart(pi_inv=d_carconf, K=K, G=1, n_start=2, n_iter=400*1) MAP_2 <- mapPLMIX_multistart(pi_inv=d_carconf, K=K, G=2, n_start=2, n_iter=400*2) MAP_3 <- mapPLMIX_multistart(pi_inv=d_carconf, K=K, G=3, n_start=2, n_iter=400*3) mcmc_iter <- 30 burnin <- 10 ## Fit 1- and 2-component PL mixtures via Gibbs sampling procedure GIBBS_1 <- gibbsPLMIX(pi_inv=d_carconf, K=K, G=1, n_iter=mcmc_iter, n_burn=burnin, init=list(p=MAP_1$mod$P_map, z=binary_group_ind(MAP_1$mod$class_map,G=1))) GIBBS_2 <- gibbsPLMIX(pi_inv=d_carconf, K=K, G=2, n_iter=mcmc_iter, n_burn=burnin, init=list(p=MAP_2$mod$P_map, z=binary_group_ind(MAP_2$mod$class_map,G=2))) GIBBS_3 <- gibbsPLMIX(pi_inv=d_carconf, K=K, G=3, n_iter=mcmc_iter, n_burn=burnin, init=list(p=MAP_3$mod$P_map, z=binary_group_ind(MAP_3$mod$class_map,G=3))) ## Checking goodness-of-fit of the estimated mixtures CHECK <- ppcheckPLMIX(pi_inv=d_carconf, seq_G=1:3, MCMCsampleP=list(GIBBS_1$P, GIBBS_2$P, GIBBS_3$P), MCMCsampleW=list(GIBBS_1$W, GIBBS_2$W, GIBBS_3$W)) CHECK$post_pred_pvalue
data(d_carconf) K <- ncol(d_carconf) ## Fit 1- and 2-component PL mixtures via MAP estimation MAP_1 <- mapPLMIX_multistart(pi_inv=d_carconf, K=K, G=1, n_start=2, n_iter=400*1) MAP_2 <- mapPLMIX_multistart(pi_inv=d_carconf, K=K, G=2, n_start=2, n_iter=400*2) MAP_3 <- mapPLMIX_multistart(pi_inv=d_carconf, K=K, G=3, n_start=2, n_iter=400*3) mcmc_iter <- 30 burnin <- 10 ## Fit 1- and 2-component PL mixtures via Gibbs sampling procedure GIBBS_1 <- gibbsPLMIX(pi_inv=d_carconf, K=K, G=1, n_iter=mcmc_iter, n_burn=burnin, init=list(p=MAP_1$mod$P_map, z=binary_group_ind(MAP_1$mod$class_map,G=1))) GIBBS_2 <- gibbsPLMIX(pi_inv=d_carconf, K=K, G=2, n_iter=mcmc_iter, n_burn=burnin, init=list(p=MAP_2$mod$P_map, z=binary_group_ind(MAP_2$mod$class_map,G=2))) GIBBS_3 <- gibbsPLMIX(pi_inv=d_carconf, K=K, G=3, n_iter=mcmc_iter, n_burn=burnin, init=list(p=MAP_3$mod$P_map, z=binary_group_ind(MAP_3$mod$class_map,G=3))) ## Checking goodness-of-fit of the estimated mixtures CHECK <- ppcheckPLMIX(pi_inv=d_carconf, seq_G=1:3, MCMCsampleP=list(GIBBS_1$P, GIBBS_2$P, GIBBS_3$P), MCMCsampleW=list(GIBBS_1$W, GIBBS_2$W, GIBBS_3$W)) CHECK$post_pred_pvalue
Perform conditional posterior predictive check to assess the goodness-of-fit of Bayesian mixtures of Plackett-Luce models with a different number of components.
ppcheckPLMIX_cond( pi_inv, seq_G, MCMCsampleP, MCMCsampleW, top1 = TRUE, paired = TRUE, parallel = FALSE )
ppcheckPLMIX_cond( pi_inv, seq_G, MCMCsampleP, MCMCsampleW, top1 = TRUE, paired = TRUE, parallel = FALSE )
pi_inv |
An object of class |
seq_G |
Numeric vector with the number of components of the Plackett-Luce mixtures to be assessed. |
MCMCsampleP |
List of size |
MCMCsampleW |
List of size |
top1 |
Logical: whether the posterior predictive |
paired |
Logical: whether the posterior predictive |
parallel |
Logical: whether parallelization should be used. Default is |
The ppcheckPLMIX_cond
function returns two posterior predictive -values based on two chi squared discrepancy variables involving: (i) the top item frequencies and (ii) the paired comparison frequencies. In the presence of partial sequences in the
pi_inv
matrix, the same missingness patterns observed in the dataset (i.e., the number of items ranked by each sample unit) are reproduced on the replicated datasets from the posterior predictive distribution. Differently from the ppcheckPLMIX
function, the condional discrepancy measures are obtained by summing up the chi squared discrepancies computed on subsamples of observations with the same number of ranked items.
A list with a named element:
post_pred_pvalue_cond |
Numeric |
Cristina Mollica and Luca Tardella
Mollica, C. and Tardella, L. (2017). Bayesian Plackett-Luce mixture models for partially ranked data. Psychometrika, 82(2), pages 442–458, ISSN: 0033-3123, DOI: 10.1007/s11336-016-9530-0.
data(d_carconf) K <- ncol(d_carconf) ## Fit 1- and 2-component PL mixtures via MAP estimation MAP_1 <- mapPLMIX_multistart(pi_inv=d_carconf, K=K, G=1, n_start=2, n_iter=400*1) MAP_2 <- mapPLMIX_multistart(pi_inv=d_carconf, K=K, G=2, n_start=2, n_iter=400*2) MAP_3 <- mapPLMIX_multistart(pi_inv=d_carconf, K=K, G=3, n_start=2, n_iter=400*3) mcmc_iter <- 30 burnin <- 10 ## Fit 1- and 2-component PL mixtures via Gibbs sampling procedure GIBBS_1 <- gibbsPLMIX(pi_inv=d_carconf, K=K, G=1, n_iter=mcmc_iter, n_burn=burnin, init=list(p=MAP_1$mod$P_map, z=binary_group_ind(MAP_1$mod$class_map,G=1))) GIBBS_2 <- gibbsPLMIX(pi_inv=d_carconf, K=K, G=2, n_iter=mcmc_iter, n_burn=burnin, init=list(p=MAP_2$mod$P_map, z=binary_group_ind(MAP_2$mod$class_map,G=2))) GIBBS_3 <- gibbsPLMIX(pi_inv=d_carconf, K=K, G=3, n_iter=mcmc_iter, n_burn=burnin, init=list(p=MAP_3$mod$P_map, z=binary_group_ind(MAP_3$mod$class_map,G=3))) ## Checking goodness-of-fit of the estimated mixtures CHECKCOND <- ppcheckPLMIX_cond(pi_inv=d_carconf, seq_G=1:3, MCMCsampleP=list(GIBBS_1$P, GIBBS_2$P, GIBBS_3$P), MCMCsampleW=list(GIBBS_1$W, GIBBS_2$W, GIBBS_3$W)) CHECKCOND$post_pred_pvalue
data(d_carconf) K <- ncol(d_carconf) ## Fit 1- and 2-component PL mixtures via MAP estimation MAP_1 <- mapPLMIX_multistart(pi_inv=d_carconf, K=K, G=1, n_start=2, n_iter=400*1) MAP_2 <- mapPLMIX_multistart(pi_inv=d_carconf, K=K, G=2, n_start=2, n_iter=400*2) MAP_3 <- mapPLMIX_multistart(pi_inv=d_carconf, K=K, G=3, n_start=2, n_iter=400*3) mcmc_iter <- 30 burnin <- 10 ## Fit 1- and 2-component PL mixtures via Gibbs sampling procedure GIBBS_1 <- gibbsPLMIX(pi_inv=d_carconf, K=K, G=1, n_iter=mcmc_iter, n_burn=burnin, init=list(p=MAP_1$mod$P_map, z=binary_group_ind(MAP_1$mod$class_map,G=1))) GIBBS_2 <- gibbsPLMIX(pi_inv=d_carconf, K=K, G=2, n_iter=mcmc_iter, n_burn=burnin, init=list(p=MAP_2$mod$P_map, z=binary_group_ind(MAP_2$mod$class_map,G=2))) GIBBS_3 <- gibbsPLMIX(pi_inv=d_carconf, K=K, G=3, n_iter=mcmc_iter, n_burn=burnin, init=list(p=MAP_3$mod$P_map, z=binary_group_ind(MAP_3$mod$class_map,G=3))) ## Checking goodness-of-fit of the estimated mixtures CHECKCOND <- ppcheckPLMIX_cond(pi_inv=d_carconf, seq_G=1:3, MCMCsampleP=list(GIBBS_1$P, GIBBS_2$P, GIBBS_3$P), MCMCsampleW=list(GIBBS_1$W, GIBBS_2$W, GIBBS_3$W)) CHECKCOND$post_pred_pvalue
print
method for class gsPLMIX
. It shows some general information on the Gibbs sampling simulation for a Bayesian mixture of Plackett-Luce models.
## S3 method for class 'gsPLMIX' print(x, ...)
## S3 method for class 'gsPLMIX' print(x, ...)
x |
Object of class |
... |
Further arguments passed to or from other methods (not used). |
Cristina Mollica and Luca Tardella
## Print of the Gibbs sampling procedure data(d_carconf) GIBBS <- gibbsPLMIX(pi_inv=d_carconf, K=ncol(d_carconf), G=3, n_iter=30, n_burn=10) print(GIBBS)
## Print of the Gibbs sampling procedure data(d_carconf) GIBBS <- gibbsPLMIX(pi_inv=d_carconf, K=ncol(d_carconf), G=3, n_iter=30, n_burn=10) print(GIBBS)
print
method for class mpPLMIX
. It shows some general information on the MAP estimation procedure for a Bayesian mixture of Plackett-Luce models.
## S3 method for class 'mpPLMIX' print(x, ...)
## S3 method for class 'mpPLMIX' print(x, ...)
x |
Object of class |
... |
Further arguments passed to or from other methods (not used). |
Cristina Mollica and Luca Tardella
mapPLMIX
and mapPLMIX_multistart
## Print of the MAP procedure with a single starting point data(d_carconf) MAP <- mapPLMIX(pi_inv=d_carconf, K=ncol(d_carconf), G=3) print(MAP) ## Print of the MAP procedure with 5 starting points MAP_multi <- mapPLMIX_multistart(pi_inv=d_carconf, K=ncol(d_carconf), G=3, n_start=5) print(MAP_multi)
## Print of the MAP procedure with a single starting point data(d_carconf) MAP <- mapPLMIX(pi_inv=d_carconf, K=ncol(d_carconf), G=3) print(MAP) ## Print of the MAP procedure with 5 starting points MAP_multi <- mapPLMIX_multistart(pi_inv=d_carconf, K=ncol(d_carconf), G=3, n_start=5) print(MAP_multi)
Weighted sampling without replacement from a finite urn
quickintsample(n, size, prob)
quickintsample(n, size, prob)
n |
Number of distinct integer-labeled balls in the urn. |
size |
Number of balls sampled without replacement. |
prob |
Numeric vector of length |
vector
Convert the format of the input dataset from orderings to rankings and vice versa.
rank_ord_switch(data, format_input, nranked = NULL)
rank_ord_switch(data, format_input, nranked = NULL)
data |
Numeric |
format_input |
Character string indicating the format of the |
nranked |
Optional numeric vector of length |
Numeric data matrix of partial sequences with inverse format.
Cristina Mollica and Luca Tardella
## From orderings to rankings for the Dublin West dataset data(d_dublinwest) head(d_dublinwest) rank_ord_switch(data=head(d_dublinwest), format_input="ordering")
## From orderings to rankings for the Dublin West dataset data(d_dublinwest) head(d_dublinwest) rank_ord_switch(data=head(d_dublinwest), format_input="ordering")
Compute rank summaries and censoring patterns for a partial ordering/ranking dataset.
rank_summaries( data, format_input, mean_rank = TRUE, marginals = TRUE, pc = TRUE )
rank_summaries( data, format_input, mean_rank = TRUE, marginals = TRUE, pc = TRUE )
data |
Numeric |
format_input |
Character string indicating the format of the |
mean_rank |
Logical: whether the mean rank vector has to be computed. Default is |
marginals |
Logical: whether the marginal rank distributions have to be computed. Default is |
pc |
Logical: whether the paired comparison matrix has to be computed. Default is |
A list of named objects:
nranked |
Numeric vector of length |
nranked_distr |
Frequency distribution of the |
na_or_not |
Numeric |
mean_rank |
Numeric vector of length |
marginals |
Numeric |
pc |
Numeric |
Cristina Mollica and Luca Tardella
Marden, J. I. (1995). Analyzing and modeling rank data. Monographs on Statistics and Applied Probability (64). Chapman & Hall, ISSN: 0-412-99521-2. London.
data(d_carconf) rank_summaries(data=d_carconf, format_input="ordering")
data(d_carconf) rank_summaries(data=d_carconf, format_input="ordering")
Draw a random sample of complete orderings/rankings from a -component mixture of Plackett-Luce models.
rPLMIX( n = 1, K, G, p = t(matrix(1/K, nrow = K, ncol = G)), ref_order = t(matrix(1:K, nrow = K, ncol = G)), weights = rep(1/G, G), format_output = "ordering" )
rPLMIX( n = 1, K, G, p = t(matrix(1/K, nrow = K, ncol = G)), ref_order = t(matrix(1:K, nrow = K, ncol = G)), weights = rep(1/G, G), format_output = "ordering" )
n |
Number of observations to be sampled. Default is 1. |
K |
Number of possible items. |
G |
Number of mixture components. |
p |
Numeric |
ref_order |
Numeric |
weights |
Numeric vector of |
format_output |
Character string indicating the format of the returned simulated dataset ( |
Positive values are required for p
and weights
arguments (normalization is not necessary).
The ref_order
argument accommodates for the more general mixture of Extended Plackett-Luce models (EPL), involving the additional reference order parameters (Mollica and Tardella 2014). A permutation of the first integers can be specified in each row of the
ref_order
argument to generate a sample from a -component mixture of EPL. Since the Plackett-Luce model is a special instance of the EPL with the reference order equal to the identity permutation
, the default value of the
ref_order
argument is forward orders.
If , a numeric
matrix of simulated complete sequences. If
, a list of two named objects:
comp |
Numeric vector of |
sim_data |
Numeric |
Cristina Mollica and Luca Tardella
K <- 6 G <- 3 support_par <- matrix(1:(G*K), nrow=G, ncol=K) weights_par <- c(0.50, 0.25, 0.25) set.seed(47201) simulated_data <- rPLMIX(n=5, K=K, G=G, p=support_par, weights=weights_par) simulated_data$comp simulated_data$sim_data
K <- 6 G <- 3 support_par <- matrix(1:(G*K), nrow=G, ncol=K) weights_par <- c(0.50, 0.25, 0.25) set.seed(47201) simulated_data <- rPLMIX(n=5, K=K, G=G, p=support_par, weights=weights_par) simulated_data$comp simulated_data$sim_data
Compute Bayesian comparison criteria for mixtures of Plackett-Luce models with a different number of components.
selectPLMIX( pi_inv, seq_G, MCMCsampleP = vector(mode = "list", length = length(seq_G)), MCMCsampleW = vector(mode = "list", length = length(seq_G)), MAPestP, MAPestW, deviance, post_est = "mean", parallel = FALSE )
selectPLMIX( pi_inv, seq_G, MCMCsampleP = vector(mode = "list", length = length(seq_G)), MCMCsampleW = vector(mode = "list", length = length(seq_G)), MAPestP, MAPestW, deviance, post_est = "mean", parallel = FALSE )
pi_inv |
An object of class |
seq_G |
Numeric vector with the number of components of the Plackett-Luce mixtures to be compared. |
MCMCsampleP |
List of size |
MCMCsampleW |
List of size |
MAPestP |
List of size |
MAPestW |
List of size |
deviance |
List of size |
post_est |
Character string indicating the point estimates of the Plackett-Luce mixture parameters to be computed from the MCMC sample. This argument is ignored when MAP estimates are supplied in the |
parallel |
Logical: whether parallelization should be used. Default is |
The selectPLMIX
function privileges the use of the MAP point estimates to compute the Bayesian model comparison criteria, since they are not affected by the label switching issue. By setting both the MAPestP
and MAPestW
arguments equal to NULL, the user can alternatively compute the selection measures by relying on a different posterior summary ("mean"
or "median"
) specified in the post_est
argument. In the latter case, the MCMC samples for each Plackett-Luce mixture must be supplied in the lists MCMCsampleP
and MCMCsampleW
. The drawback when working with point estimates other than the MAP is that the possible presence of label switching has to be previously removed from the traces to obtain meaningful results. See the label_switchPLMIX
function to perfom label switching adjustment of the MCMC samples.
Several model selection criteria are returned. The two versions of DIC correspond to alternative ways of computing the effective number of parameters: DIC1 was proposed by Spiegelhalter et al. (2002) with penalty named pD
, whereas DIC2 was proposed by Gelman et al. (2004) with penalty named pV
. The latter coincides with the AICM introduced by Raftery et al. (2007), that is, the Bayesian counterpart of AIC. BPIC1 and BPIC2 are obtained from the two DIC by simply doubling the penalty term, as suggested by Ando (2007) to contrast DIC's tendency to overfitting. BICM1 is the Bayesian variant of the BIC, originally presented by Raftery et al. (2007) and entirely based on the MCMC sample. The BICM2, instead, involved the MAP estimate without the need of its approximation from the MCMC sample as for the BICM1.
A list of named objects:
point_estP |
List of size |
point_estW |
List of size |
fitting |
Numeric |
penalties |
Numeric |
criteria |
Numeric |
Cristina Mollica and Luca Tardella
Mollica, C. and Tardella, L. (2017). Bayesian Plackett-Luce mixture models for partially ranked data. Psychometrika, 82(2), pages 442–458, ISSN: 0033-3123, DOI: 10.1007/s11336-016-9530-0.
Ando, T. (2007). Bayesian predictive information criterion for the evaluation of hierarchical Bayesian and empirical Bayes models. Biometrika, 94(2), pages 443–458.
Raftery, A. E, Satagopan, J. M., Newton M. A. and Krivitsky, P. N. (2007). BAYESIAN STATISTICS 8. Proceedings of the eighth Valencia International Meeting 2006, pages 371–416. Oxford University Press.
Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. (2004). Bayesian data analysis. Chapman & Hall/CRC, Second Edition, ISBN: 1-58488-388-X. New York.
Spiegelhalter, D. J., Best, N. G., Carlin, B. P. and Van Der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(4), pages 583–639.
data(d_carconf) K <- ncol(d_carconf) ## Fit 1- and 2-component PL mixtures via MAP estimation MAP_1 <- mapPLMIX_multistart(pi_inv=d_carconf, K=K, G=1, n_start=2, n_iter=400*1) MAP_2 <- mapPLMIX_multistart(pi_inv=d_carconf, K=K, G=2, n_start=2, n_iter=400*2) mcmc_iter <- 30 burnin <- 10 ## Fit 1- and 2-component PL mixtures via Gibbs sampling procedure GIBBS_1 <- gibbsPLMIX(pi_inv=d_carconf, K=K, G=1, n_iter=mcmc_iter, n_burn=burnin, init=list(p=MAP_1$mod$P_map, z=binary_group_ind(MAP_1$mod$class_map,G=1))) GIBBS_2 <- gibbsPLMIX(pi_inv=d_carconf, K=K, G=2, n_iter=mcmc_iter, n_burn=burnin, init=list(p=MAP_2$mod$P_map, z=binary_group_ind(MAP_2$mod$class_map,G=2))) ## Select the optimal number of components SELECT <- selectPLMIX(pi_inv=d_carconf, seq_G=1:2, MAPestP=list(MAP_1$mod$P_map, MAP_2$mod$P_map), MAPestW=list(MAP_1$mod$W_map, MAP_2$mod$W_map), deviance=list(GIBBS_1$deviance, GIBBS_2$deviance)) SELECT$criteria
data(d_carconf) K <- ncol(d_carconf) ## Fit 1- and 2-component PL mixtures via MAP estimation MAP_1 <- mapPLMIX_multistart(pi_inv=d_carconf, K=K, G=1, n_start=2, n_iter=400*1) MAP_2 <- mapPLMIX_multistart(pi_inv=d_carconf, K=K, G=2, n_start=2, n_iter=400*2) mcmc_iter <- 30 burnin <- 10 ## Fit 1- and 2-component PL mixtures via Gibbs sampling procedure GIBBS_1 <- gibbsPLMIX(pi_inv=d_carconf, K=K, G=1, n_iter=mcmc_iter, n_burn=burnin, init=list(p=MAP_1$mod$P_map, z=binary_group_ind(MAP_1$mod$class_map,G=1))) GIBBS_2 <- gibbsPLMIX(pi_inv=d_carconf, K=K, G=2, n_iter=mcmc_iter, n_burn=burnin, init=list(p=MAP_2$mod$P_map, z=binary_group_ind(MAP_2$mod$class_map,G=2))) ## Select the optimal number of components SELECT <- selectPLMIX(pi_inv=d_carconf, seq_G=1:2, MAPestP=list(MAP_1$mod$P_map, MAP_2$mod$P_map), MAPestW=list(MAP_1$mod$W_map, MAP_2$mod$W_map), deviance=list(GIBBS_1$deviance, GIBBS_2$deviance)) SELECT$criteria
The function CompRateYpartial
simulates from the Exponential full-conditionals of the quantitative latent variables for the Gibbs sampling of a Bayesian mixture of Plackett-Luce models.
SimYpsilon(rate, n_rank)
SimYpsilon(rate, n_rank)
rate |
Numeric |
n_rank |
Numeric vector of length |
Numeric matrix of posterior samples of the quantitative latent variables.
summary
method for class gsPLMIX
. It provides summary statistics and credible intervals for the Gibbs sampling simulation of a Bayesian mixture of Plackett-Luce models.
## S3 method for class 'gsPLMIX' summary( object, quantiles = c(0.025, 0.25, 0.5, 0.75, 0.975), hpd_prob = 0.95, digits = 2, ... )
## S3 method for class 'gsPLMIX' summary( object, quantiles = c(0.025, 0.25, 0.5, 0.75, 0.975), hpd_prob = 0.95, digits = 2, ... )
object |
Object of class |
quantiles |
Numeric vector of quantile probabilities. |
hpd_prob |
Numeric scalar in the grid of values spanning the interval (0,1) by 0.05, giving the posterior probability content of the HPD intervals. Supplied values outside the grid are rounded. |
digits |
Number of decimal places for rounding the posterior summaries. |
... |
Further arguments passed to or from other methods (not used). |
Posterior summaries include means, standard deviations, naive standard errors of the means (ignoring autocorrelation of the chain) and time-series standard errors based on an estimate of the spectral density at 0. They correspond to the statistics
element of the output returned by the summary.mcmc
function of the coda
package. Highest posterior density (HPD) intervals are obtained by recalling the HPDinterval
function of the coda
package.
A list of summary statistics for the gsPLMIX
class object:
statistics |
Numeric matrix with posterior summaries in each row (see 'Details'). |
quantiles |
Numeric matrix with posterior quantiles at the given |
HPDintervals |
Numeric matrix with 100 |
Modal_orderings |
Numeric |
call |
The matched call. |
Cristina Mollica and Luca Tardella
Plummer, M., Best, N., Cowles, K. and Vines, K. (2006). CODA: Convergence Diagnosis and Output Analysis for MCMC, R News, 6, pages 7–11, ISSN: 1609-3631.
data(d_carconf) GIBBS <- gibbsPLMIX(pi_inv=d_carconf, K=ncol(d_carconf), G=3, n_iter=30, n_burn=10) ## Summary of the Gibbs sampling procedure summary(GIBBS)
data(d_carconf) GIBBS <- gibbsPLMIX(pi_inv=d_carconf, K=ncol(d_carconf), G=3, n_iter=30, n_burn=10) ## Summary of the Gibbs sampling procedure summary(GIBBS)
summary
method for class mpPLMIX
. It provides summaries for the MAP estimation of a Bayesian mixture of Plackett-Luce models.
## S3 method for class 'mpPLMIX' summary(object, digits = 2, ...)
## S3 method for class 'mpPLMIX' summary(object, digits = 2, ...)
object |
Object of class |
digits |
Number of decimal places for rounding the summaries. |
... |
Further arguments passed to or from other methods (not used). |
A list of summaries for the mpPLMIX
class object:
MAP_w |
Numeric vector with the MAP estimates of the |
MAP_p |
Numeric |
MAP_modal_orderings |
Numeric |
group_distr |
Numeric vector with the relative frequency distribution of the mixture component memberships based on MAP allocation. Returned only when when |
perc_conv_rate |
Numeric scalar with the percentage of MAP algorithm convergence over the multiple starting points. Returned only when |
Cristina Mollica and Luca Tardella
## Summary of the MAP procedure with a single starting point data(d_carconf) MAP <- mapPLMIX(pi_inv=d_carconf, K=ncol(d_carconf), G=3) summary(MAP) ## Summary of the MAP procedure with 5 starting points MAP_multi <- mapPLMIX_multistart(pi_inv=d_carconf, K=ncol(d_carconf), G=3, n_start=5) summary(MAP_multi)
## Summary of the MAP procedure with a single starting point data(d_carconf) MAP <- mapPLMIX(pi_inv=d_carconf, K=ncol(d_carconf), G=3) summary(MAP) ## Summary of the MAP procedure with 5 starting points MAP_multi <- mapPLMIX_multistart(pi_inv=d_carconf, K=ncol(d_carconf), G=3, n_start=5) summary(MAP_multi)
Compute paired comparison matrix for a partial ordering dataset
tau(pi_inv)
tau(pi_inv)
pi_inv |
Numeric |
The paired comparison matrix: number of times that item
is preferred to item
.
Computation top1 frequencies conditionally on the number of ranked items
top1freq1dim(pi_inv)
top1freq1dim(pi_inv)
pi_inv |
Numeric |
Numeric matrix of top1 frequencies.
The function umat
returns a binary matrix whose elements indicate which items has been ranked by each sample unit.
umat(pi_inv)
umat(pi_inv)
pi_inv |
Numeric |
Binary matrix indicating whether sample unit
ranked item
.
Construct the frequency distribution of the distinct observed sequences from the dataset of individual rankings/orderings.
unit_to_freq(data)
unit_to_freq(data)
data |
Numeric |
Numeric matrix of the distinct observed sequences with the corresponding frequencies indicated in the last -th column.
Cristina Mollica and Luca Tardella
## Frequency distribution for the APA top-ordering dataset data(d_apa) unit_to_freq(data=d_apa)
## Frequency distribution for the APA top-ordering dataset data(d_apa) unit_to_freq(data=d_apa)
The function UpPhetpartial
updates the support parameter estimates in the EM algorithm for MAP estimation of a Bayesian mixture of Plackett-Luce models.
UpPhetpartial(p, ref_order, pi_inv, u_bin, z_hat, shape0, rate0, n_rank)
UpPhetpartial(p, ref_order, pi_inv, u_bin, z_hat, shape0, rate0, n_rank)
p |
Numeric |
ref_order |
Numeric |
pi_inv |
Numeric |
u_bin |
Binary |
z_hat |
Numeric |
shape0 |
Numeric |
rate0 |
Numeric vector of |
n_rank |
Numeric vector of length |
Numeric matrix of estimated component-specific support parameters.
The function UpWhet
updates the mixture weight estimates in the EM algorithm for MAP estimation of a Bayesian mixture of Plackett-Luce models
UpWhet(z_hat, alpha0)
UpWhet(z_hat, alpha0)
z_hat |
Numeric |
alpha0 |
Numeric vector of |
Numeric vector of the estimated mixture weights.