JScience tools are free java routines developed for the broad scientific community interested in coding scientific applications. Core modules addressing areas such as mathematics, physics, nerual networks, biology and others are in active developement. Details on what is available and how to register can be found at: JScience.
Educational statistical software has been an area for tinkering for some time. All sorts of approaches have been taken to explore what works and what doesn’t. SOCR is a freely available web resource for statistical educational purposes. The electronic online Journal of Statistical Software has an article discussing SOCR.
ARC is a framework for the exploration and graphical display of regression model structure and diagnostics. Focus is placed on understanding the conditional mean and variance functions, model structural dimension, nonlinearity, curvature, smoothing, transformation and model assessment. The uniqueness of the user interface was designed to allow interactive choice during all phases of use. Graphical regression, brushing and slicing allows for additional insights related to model building. In addition, extensions of these topics to the Generalized Linear Model framework allows for a larger class of models such as; binomial, logistic, poisson and gamma families.
Bootstrap options with tight intergation within some statistical methods have become availabe in recent years in packages such as SAS, Stata, Spss and Splus and Spss/AMos. The degree and ease of use varies greatly. Options to Bootstrap in packages hosting a sample with replacement method can allow one, in principle, to bootstrap an estimator of choice. Why bootstrap? One does so to achieve better sampling distributions of estimators. Bootstrap methods potentially offer insights into inference matters that might be difficult or impossible to reconcile otherwise. Small and large sample size settings can present complicated data configurations to estimation tasks such as parameter estimates or functions of one or more parameters. For example, settings such as complex surveys have seen bootstrap methods contribute to challanging survey estimation and inference tasks. For users of R one finds several contributed packages for bootstrap methods. Packages boot, bootstrap, pvclust, rqmcmb2, scaleboot, simpleboot, and Hmisc offer standard and advanced options not found in the some of the above commercial packages.
Much has been written in the last 25 years about the bootstrap. Two useful references to consider are:
Efron, B & Tibshirani, R.J. (1993), An Introduction to the Bootstrap, Chapman and Hall.
And: Davison & Hinkley, (1997), Bootstrap Methods and their Applications, Cambridge Univ. Press.
Spatial point pattern data are common across many areas of research. Software for extensive modeling is sparse and spread out across many disiplines. spatstat is a unified collection of tools developed from a modern persepective on spatial statistics. spatstat is a contributed R package. And like many of these packages, tools are provide for exploratory data analysis, point process specific graphical displays, and maximum pseudolikelihood model-fitting methods and diagnostics. Model formulation via Gibbs point processes allow one to address homogeneous and inhomgeneous Poisson, Strauss(hard and soft), Cox processes and others. Consideration and inclusion of covariates and multitype point patters(groups) are possible. The focus is on the definition and formulation of the conditional intensity function depending upon location, trend and interaction. Standard summary space functions and multitype versions of the empty space function F and variants G, K, J are available.
High dimensional data presents many problems related to the tasks of viewing and navigating. One of the first stats oriented software package to address these issues was AT&T’s interactive visualization system, Xgobi http://www.research.att.com/areas/stat/xgobi/. Xgobi is not a stats package per se. Instead Xgobi provides various 1D, 2D and 3D displays in ways that use linkage(brushing) between displays, data IDs and various projection methods such as Grand Tours and Projection Pursuit. XGivs is an interactive MDS, MultiDimensional Scaling, package for proximity data as well as graphical networks. Note, the AT&T URL is a historical reference, but current and new developement of xgobi is now called GGobi, http://www.ggobi.org/.
Graphical networks are those that can be conceptualized as nodes connected by one or more links. Links may be directed or not. Nodes can represent many things, such as concepts, people, tasks, relationships, etc. Some are referred to as; Social Networks, or Concept Maps or Directed Graphs. In many cases of analysis, the modeling of the node linkage structure is of interest, conditional on the graph. Also, visualization and descriptive summary measures of networks graphs are also required. There are several R based http://www.r-project.org/ modeling packages availabe to address simple and complex model structures, such as, logistic random effects, latent space clusters, linear exponential random network models and many more. Two R packages in particular specialize in this area; statnet and latentnet. StatNet http://csde.washington.edu/statnet/ can handle relatively large networks of about 3,000 nodes and provides tools for both model estimation and model-based network simulation. Latentnet is similar but provides access to latent position and cluster model structures. However if on the other hand, when your task is to uncover/discover what the graph is, conditional on observed node specific data, then consider some of the methods available in the Weka http://www.cs.waikato.ac.nz/ml/weka/ package addressing Bayes Net classification methods.
Often in the planning stages researchers will need to consider questions about effect and sample sizes needed to support their project and estimation/modeling tasks. Many funding agencies will require justification of sample size planning with Statistical Power methods. Careful attention to such matters is often not an easy task. Good software is necessary but not sufficient. Asking the right discipline specific questions concerning useful effect sizes is just as important. Various software solutions abound on the internet addressing Power Analysis. Piface is a useful no cost solution to many Power Analysis problem settings. Piface and useful commentary can be found at:
There is no one tool that is considered superior for purposes of Data Mining. Data Mining means different things to different displines and as a result, many solutions to different kinds of problems exist. A simple working definition of Data Mining is one that uses various tools to uncover structure from large amounts(tens of millions to billions of records) of high dimensional data(100s, 1000s or more variables) obtained as a consequence of natural or human systems under interaction. The explosion of data storage and acquistion over the last 30 years has created datasets from all areas of human investigation. The potential and incentive for understanding these structures presents research and business arbitrage opportunities. Weka is a collection of Machine Learning Algorithms written in Java. An interactive Gui is provided as well as a command line invocation capability for running multiple jobs. The tools offered in the base version of Weka is extensive. Data management, database connectivity, clustering, visualization, network modeling, prediction tools and validation methods are among its many features.
Weka is available at http://www.cs.waikato.ac.nz/ml/weka
Bayesian Statistics has evolved over the last 30 years or so with explosive growth and wide reaching theoretical contributions. Widespread adoption by applied researchers has been slow due to the lack of software, computational complexity and model formulations real world problems presents. The BUGS software project has made a substantial step in bridging these problems for small and moderate sized problems. The BUGS (Bayesian inference Using Gibbs Sampling) project is concerned with flexible software for the Bayesian analysis of complex statistical models using Markov chain Monte Carlo (MCMC) methods. GeoBUGS 1.2 is an extension for spatial analysis and PKBUGS is for pharmacokinetic modelling. For additional info see: http://www.mrcbsu.cam.ac.uk/bgs/welcome.shtml