STATISTICA High Performance
StatSoft announces STATISTICA High Performance (HP), the latest release of Version 12 of the STATISTICA analytics platform, designed to leverage information contained in extremely large data sets using massively parallel and in-memory processing.
STATISTICA's HP technology makes it possible for StatSoft customers to bring supercomputing performance to their big data, leveraging the power of multiprocessor servers that are rapidly becoming more affordable and widely available as part of existing computer infrastructures at not only large but also midsize and even some small companies. For example, the familiar Microsoft Windows® server-based environment is now available with up to 256 logical processors with Windows Server 2008 R2, and up to 640 logical processors with Windows Server 2012.
“StatSoft has the distinction of being the only analytics and predictive modeling platform specifically optimized for Windows computing platforms,” says Dr. Thomas Hill, StatSoft’s VP for Analytic Solutions. “With the latest release of STATISTICA HP, we have achieved remarkable performance on practically all computational tasks, in particular for in-memory data processing on high-performance servers.”
To illustrate the remarkable performance of the STATISTICA analytic system, StatSoft has conducted performance tests on a midrange, 64-core server machine with 256GB of RAM.
Statistical Computations and Summaries
As discussed in detail in the StatSoft White Paper (The Big Data Revolution And How to Extract Value from Big Data), many of the use cases around big and high-velocity data involve data summarization, aggregation, and the identification of basic relationships.
Shown below is a screenshot of STATISTICA running against a data set with one million records and 1,000 fields, computing one million correlations.
The STATISTICA software successfully distributes the required computational load over all of the available CPUs, utilizing 100% of the hardware resources available in this system. Computing one million correlations on a data set with one million records completes within seconds or less (depending on the clock speed and memory access architecture of the system).
The Power of Parallel Processing for Predictive Modeling
The architecture of STATISTICA HP provides numerous optimizations that involve massive parallelization, both during the model building process as well as the scoring process.
For example, analytic workspaces such as the one shown below can be run on multicore servers where the competitive evaluation of multiple models is effectively performed in parallel across multiple cores, achieving 100% utilization of the computing resources of the system and yielding remarkable performance.
Building an effective tree-based classification model against one million records on the 64-core 256GB RAM platform described earlier completes in seconds or less (depending on the clock speed and memory access architecture of the system).
Also, many of StatSoft’s customers are currently using the STATISTICA Enterprise Server™ platform to enable massively parallel model-scoring in virtual on-demand environments, again highlighting the flexibility and utility associated with STATISTICA’s adherence to and compatibility with modern software standards, interfaces, and emerging computing technologies.
In addition, in STATISTICA HP 12, all advanced modeling algorithms–including the most powerful ensemble models such as random trees forest, gradient boosted trees, and others–are implemented to take advantage of large numbers of cores and available RAM for efficient in-memory model building against big data.
Computing platforms with large numbers of CPUs and cores and capabilities to handle huge data files via in-memory processing are rapidly becoming less expensive and more common not only in science but also in business use. Too often, however, the bottleneck is the (analytic) software which limits the performance that can be achieved with such hardware. According to George Butler, StatSoft’s VP for Platform Development, “StatSoft has accumulated significant expertise over decades on how to optimize the performance of analytic software, and the STATISTICA High Performance platform will fully take advantage of Microsoft’s newest server platforms supporting hundreds of cores.” He continues, “We are a close Microsoft Partner but also an Intel® Software Premier Elite Partner, and our R&D is constantly looking for new and better ways to leverage existing hardware and operating system resources.”