The widespread availability of MicroSoft Excel has created a less than desirable environment for statistical computing. In my opinion the Excel statistics add-in leaves much to be desired relative to real statistics packages. One solution for extending the usefullness of Excel is to abandon the Excel stats package in favor of InferenceforR. This product allow for the use of R within Excel. See the following screencast for a slick presentation.
If you want to save time and improve accuracy of your programs, don’t reinvent the wheel, consider using javanumerics. A large variety of statistical and mathematical classes are available. Note, not all options are free.
Often researchers will need access to functionality that isn’t found in commercial statistics packages. This problem varies quite a bit and is meet with specialized solutions by the statistical community. These solutions are often cutting edge, reflecting new statistical research. Most stats packages allow some form of macro/code authorship. This works to a point and often provides a just in time solution. Well known examples include Matlab’s scripting language, SAS IML, GAUSS, Stata, Splus and R. Yet others will seek stand alone solutions in one form or another. These range from public domain C, C++, Fortran, and Java research subrountines to stand-alone programs with various user interfaces. The goal of this blog is to list references and short descriptions of various solutions that may offer additional insights into your research and the statistical methods, and maybe even save you some time. About a dozen or so topics some to mind and I hope to address them shortly. These posts are not intended as statistical guidance nor endorsement. Most problems are best addressed by the advice of an experienced practitioner in the relevant field.
Statistical power calculations are often needed at various stages of planning for establishing sample sizes.
Elsewhere on this Blog I mention PiFace as a power calculation tool. However SAS users may find the following three SAS macros of interest.
UnifyPow is an extensive collection of power calculators implemented in SAS as a Macro. A SAS proceedings paper about UnifyPow discusses its broad generality. The second macro is rpower and addresses the reprospective aspect of the issue. The third macro, glimmixsamplesize, is designed to use the generality of SAS’s Proc Glimmix for generalized linear mixed models. These macros provide a substantial increase in the number of settings that can be addressed for power calculations.
Sometimes it is important to reinvent the wheel, and sometimes not. Here is a site with a nice collection of contributed R graphic examples from a variety of R packages. Almost all are supportive of some statistical method for purposes of summary and presentation.
This is a broad and rich topic. Applications are found in almost every field. Over the past 30+ years major theoretical contributions from Econometrics, Psychometrics and Statistics have established the topic as a vibrant research area. Most major statistics oriented software packages provide most of the basic functionality. But sometimes this doesn’t go far enough. Sometimes real world models are defined with just enough complication that one can’t cast the model(s) of interest within the user interface provided by most software. Of course the solution is to step out from those software constraints and code the solution that is needed. Elsewhere in this blog there are software options that may be of use, and sometimes one needs access to codes at a more fundamental level. Another issue is that many researchers are not as familiar with the topic as they wish to be, but would otherwise like to know more. University of Calif. Economics Prof. Kenneth Train has provided both Gauss and Matlab codes addressing many Discrete Choice Models. In addition his site has about 20+ hours of lectures available for streaming download.
Cluster analysis has been a data mining tool for some time. There are hundreds of cluster algorithms that compete for various statistical notions of performance. All major statistical software packages offer several solutions to address this task. Recently the notion of Latent Class Analysis has its version in the cluster analysis problem setting, where the unknown number of classes or groups is treated in either a stochastic or deterministic manner. In The American Statistician Feb 2009, Vol. 63 article, Review of Three Latent Class Cluster Analysis Packages: Latent Gold, poLCA, and MCLUST, one finds yet another discussion and comparison of the ever expanding software choices. The point of this note is the solution offered by the MCLUST program is available as free R software of the highest quality and performance. MCLUST performs model based clustering with multivariate normal mixtures. A Bayesian treatment of the latent class problem by MCLUST treats the unknown number of classes/groups as a random variable and its marginal posterior distribution of the number of classes is an outcome!
Spatial smoothing techniques are often employed to estimate mean trends over some spatial and or time domain. An explosion of new estimation methods in the last 15 or so years have improved upon simple multiple regression and Kriging options often found in commercial GIS systems. Spatial regression models for a general linear model setting with different possible link functions and using CAR(conditional autoregressive) or SAR(spatial) error structures were among the early additions. Extensions to hierarchical models allow for additional model complexity at the cost of increased computational burden.The following Article by authors Wheeler and Walker illustrate how the use of Bayesian Spatially Varying Regression Coefficient models improved upon older methods such as Kriging in solving the estimation of the effects of barriers to the transmission of rabies. The estimation of their models were carried out in WinBugs software via MCMC sampling for Bayesian Spatially Varying Regression Coefficient models using MCAR(multivariate conditional autoregressive) errors. To see the inference impact of such models on a per covariate(spatial) basis, one set of maps in Figure 4., illustrates very nicely what is missing in simpler maps and models. This Article presents some statistical background.
A recent article in The International Jorunal of Biostatistics argues for confidence intervals for the mean derived from the use of Bernstein’s inequality. An excellent presentation compares and contrasts the proposed method with standard alternatives. In keeping with the spirit of this post, one may find R software code proposed by the authors to compute the new method.
Cross-Over Designs have played a major role in applied settings that cuts across so many disciplines. And there is a rich history and literature on the topic. I recently acquired: Design and Analysis of Cross-Over Trials, Second Edition, by Byron Jones and Michael G. Kenward, Chapman & Hall/CRC PRESS. In keeping with the spirit of this blog, I would like mention that the authors provide their SAS code that accompanies their excellent text. As one can see, many solutions are cast in the Mixed Model framework offering covariance structures and estimation options that allow the most flexibility for modeling simple to advanced Cross-Over Designs. The codes are available here.