An excellent recent article caught my attention that solves in a unified manner a class of gene array analysis problems using the work horse methodology of statistical Mixed Models. The approach taken is presented for a two group time course experiment. Extension to more complex experimental designs is discussed. A comparison is made to several competing approaches, including a simulation study, and the featured method is implemented in SAS software. However, any statistical package supporting Spectral Decomposition and Mixed Models may be used.
Epi has been around for a long time, starting back in the days of DOS! Over the years Epi has matured into a suite of tools(Pepi) that has the fields of public health and epidemiology as its focus. WINPEPI software for the Windows platform is free and offers a board mix of software that general statistics users might consider as an alternative to more costly options. Epi is like the Energizer Bunny, it is the gift that keeps going and going….
All pair-wise Multiple Comparisons(MCA) is a well known collection of procedures for the stochastic ordering of means; which is a common research task. Classical methods rely on the assumption that the null hypothesis is true. Modern alternatives can be found in the Bayesian Statistics paradigm which abandons the Type 1 error notion. In particular, for problems that can be cast in the hierarchical modeling framework, a principled Bayesian approach relies on partial pooling and shrinkage. Technical arguments supporting this approach have been around for some time. An excellent working paper by Andrew Gleman on the topic presents an overview, simulation results and examples demonstrating the benefits in an applied setting. Suggestions on the use of R and other software is mentioned for implementation.
Spatial scan statistics have been an important class of tools for cluster detection in spatial data. These are often used in support of surveillance and detection activities in public health and other fields. A common limitation of popular spatial scan statistics is the lack of accommodation in the uncertainty of the measure of interest. In a recent JASA Sept. 2009 article, Weighted Normal Spatial Scan Statistic for Heterogeneous Population Data, the authors offer a solution that addresses this problem in more generality. Weights related to local variance measures or proxies such as sample size can be created for use in a weighted likelihood approach. Extensions to non gaussian probability models are addressed. Some case studies and power simulations provided suggest excellent performance. Their solution has been implemented in the freely available software Satscan.
Cluster Analysis(and other tools) are often deployed to investigate structure(clustering) in multidimensional data sets. One approach to model such data is the Gaussian mixture model. mixAK is a new R package for Bayesian estimation of multivariate normal mixtures allowing for selection of the number of mixture components, density estimation and optionally allows for interval-censored multivariate data. Author Arnost Komarek’s journal article Computational Statistics and Data Analysis, Volume 53, Issue 12, October 2009, presents the underlying theory and application of the new approach using RJ-MCMC estimation. The selection of the number of mixture components is aided by Deviance Information Criterion(DIC) and Penalized Expected Deviance(PED) measures.
Many large surveys are structured as complex sample designs that reflect various stratification considerations. Statistics calculated from such designs must be weighted to reflect the general population of interest. A clear discussion and set of recommendations by four prominent researchers for the calculation and implementation of weights using ANES datasets can be found in the Sept. 2009 Technical Report, nes012427, Computing Weights for American National Election Study Survey Data. The report can be found in the Reference Library section of the ANES
Single panel cross-sectional, two-wave panel and multi-wave panel recommendations are considered along with nonresponse and poststratification weighting. The generality of discussion applies to other large studies such as Census data, and similar surveys.
Researchers using spatial data are often faced with a mix of data obtained from several levels of scale, aggregation and point reference data. Classical geospatial regressions do not deal with this mix very well, and standard ordinary regressions even worst. A unified treatment is the topic of a recent article, “Reparameterized and Marginalized Posterior and Predictive Sampling for Complex Bayesian Geostatistical Models” in Volume 18, Number 2 of JCGS. In short, the authors cleverly reparameterized and recast the problem so as to allow efficient MCMC samplers to address the Bayesian estimation task. Their article’s supplemental materials provide the R and OpenBugs codes to address the efficient estimation tasks outlined.
Spss software has an extensive tutorial built into its product and most first time users will benefit from using it. Additional Spss resources can be found here.
Elsewhere on this Blog I mention various bits and pieces of R software. Now that the fall semester is upon us, we have added many new R BioInformatic packages to the baseline R installation on our research linux cluster. This option provides a scalable solution to those needing additional computing power.
Historically, Bayesian solutions were computed as needed in formal languages(Fortran, C,java,etc…) and later in high level solutions like Matlab,Gauss,SAS/IML and others. Then Winbugs came along and offered a higher level interface, similar to what Matlab did for linear algebra syntax and functionality, but closer in spirit to the notation used by Statisticans to depict multilevel probability based models. While all of these still have their pros and cons, we find now an explosion of Bayesian solutions implemented in R with the benefit of object orientation. If one takes a look at the “CRAN Task View: Bayesian Inference” page on the R site maintained by Jong Hee Park, one will find 60+ packages with numerous solutions to many standard statistical modeling problems. Of the many listed, note the package BAS for Bayesian Model Averaging in linear models using stochastic or deterministic sampling without replacement from posterior distributions. Prior distributions on coefficients are from Zellner’s g-prior or mixtures of g-priors corresponding to the Zellner-Siow Cauchy Priors or the Liang et al hyper-g priors. The stochastic search capability allows for model specification searches that would not have been possible a few years ago with the ease that is now possible.