SPEC Announces SPEC95 Benchmark Suites
As New Standard for Measuring Performance
Need to accommodate new technology forces retirement of SPEC92
FAIRFAX, VA, August 21, 1995 -- The Standard Performance
Evaluation Corp. (SPEC) announces the release of the SPEC95
benchmark suites, the latest version of the worldwide
standard for measuring and comparing computer performance
across different hardware platforms. SPEC95 was developed
by SPEC's Open Systems Group (OSG), which includes more than
30 leading computer vendors, systems integrators, publishers
and consultants throughout the world.
"Computer systems technology evolves so rapidly that we must
provide new benchmark suites every two to three years to
ensure a level playing field," says Kaivalya M. Dixit, SPEC
president. "SPEC92 was a great success, but it is time to
make the transition to standardized benchmarks that reflect
the advances in chip technologies, compilers and
applications that have taken place over the last three
years; those benchmarks constitute SPEC95."
SPEC95 comprises two sets (or suites) of benchmarks: CINT95
for compute-intensive integer performance and CFP95 for
compute-intensive floating point performance. The two suites
provide component-level benchmarks that measure the
performance of the computer's processor, memory architecture
and compiler. SPEC benchmarks are selected from existing
application and benchmark source code running across
multiple platforms. Each benchmark is tested on different
platforms to obtain fair performance results across
competing hardware and software systems.
SPEC95 is the third major version of the SPEC benchmark
suites, which in 1989 became the first widely accepted
standard for comparing computeintensive performance across
various architectures. The new release replaces SPEC92,
which will be gradually phased out between now and June
1996, when SPEC will stop publishing SPEC92 results and stop
selling the benchmark suite. Performance results from SPEC95
cannot be compared to those from SPEC92, since new
benchmarks have been added and existing ones changed.
"Compiler writers have learned to optimize performance for
individual SPEC92 benchmarks," says Jeff Reilly, SPEC95
release manager. "The best way to avoid these
benchmarkspecific optimizations is to develop new benchmark
suites." SPEC95 builds on the lessons learned from the
SPEC89 and SPEC92 suites, according to Reilly. The new
benchmarks were analyzed to ensure that they are as
resistant as possible to compiler optimizations that might
not translate into realworld performance gains. Improvements
to the suites include longer run times and larger problems
for benchmarks, more application diversity, greater ease of
use, and standard development platforms that will allow SPEC
to produce additional releases for other operating systems.
A Sun SPARCstation 10/40 with 128 MB of memory was selected
as the SPEC95 reference machine and Sun SC3.0.1 compilers
were used to obtain reference timings on the new benchmarks.
By definition, the SPECint95 and SPECfp95 numbers for the
Sun SPARCstation 10/40 are both "1."
The Metrics for Measurement
SPEC95 rules permit both baseline and optimized results for
CINT95 and CFP95 suites. The baseline rules restrict the
number of optimizations that can be used for performance
testing. In general, SPEC95 rules are more restrictive in
regard to optimizations than the SPEC92 rules. Baseline
metrics are mandatory for all reported results; reporting of
optimized results is optional.
SPEC95 also allows performance to be measured for both speed
d throughput (rate). Speed metrics such as SPECint95
measure how fast a computer completes a single task. Rate
metrics such as SPECint_rate95 measure how many tasks a
computer can accomplish in a certain amount of time. SPEC95
measures rate performance for single processors, symmetric
multi processor systems and cluster systems.
The CINT95 suite, written in C language, contains eight
CPUintensive integer benchmarks. It is used to measure and
calculate the following metrics:
SPECint95 -- The geometric mean of eight normalized ratios
(one for each integer benchmark) when compiled with
aggressive optimization for each benchmark.
SPECint_base95 -- The geometric mean of eight normalized ratios when
compiled with conservative optimization for each benchmark.
SPECint_rate95 -- The geometric mean of eight
normalized throughput ratios when compiled with
aggressive optimization for each benchmark.
SPECint_rate_base95 -- The geometric mean of eight
normalized throughput ratios when compiled with
conservative optimization for each benchmark.
The CFP95 suite, written in FORTRAN language,
contains 10 CPU-intensive floating point
benchmarks. It is used to measure and calculate
the following metrics:
SPECfp95 -- The geometric mean of 10 normalized
ratios (one for each floating point benchmark)
when compiled with aggressive optimization for
each benchmark.
SPECfp_base95 -- The geometric mean of 10 normalized
ratios when compiled with conservative optimization
for each benchmark.
SPECfp_rate95 -- The geometric mean of 10
normalized throughput ratios when compiled with
aggressive optimization for each benchmark.
SPECfp_rate_base95 -- The geometric mean of 10
normalized throughput ratios when compiled with
conservative optimization for each benchmark.
Vendor Reporting
Initial results for systems from six vendor companies
are included with this release. Additional results
will be reported in the next issue of the SPEC
Newsletter, scheduled for publication at the end of
September. SPEC members are being encouraged to
report SPEC95 results on older platforms to provide
an historical perspective for the new results.
[Poster's note: I know that the comp.benchmarks readership will
probably be interested in a table of these 37 initial results,
which were included in the press kit, and will probably appear
in press reports soon. However, I don't have a complete table in
electronically transferable form. It will probably be posted as soon
as it is available. - RW]
Availability
SPEC95 (CINT95 and CFP95) is available on CD-ROM
from SPEC's administrator, the National Computer
Graphics Association (NCGA). The cost is $600 for
new customers, $300 for new university customers,
$300 for current SPEC92 licensees and $150 for
current university licensees.
SPEC is a non-profit corporation formed to
establish, maintain and endorse a standardized set
of relevant benchmarks that can be applied to the
newest generation of high-performance computers.
Included in its membership are the Open Systems
Group (OSG); OSG Associates, consisting of leading
universities and research facilities; the
HighPerformance Group (HPG); and HPG Associates.
For more information, contact
Dianne Rice,
SPEC, c/o NCGA,
2722 Merrilee Drive, Ste. 200,
Fairfax, VA 22031;
tel: 703-6989604, ext. 325;
fax: 703560-2752;
e-mail: specncga@cup.portal.com.
Press contact:
Bob Cramblitt
Cramblitt & Company
919-481-4599
Cramblitt@cup.portal.com
--------------------------------------------------------------------
Answers to Common Questions About SPEC95 Benchmark Suites
Q1: What is SPEC95?
A1: SPEC95 is a software benchmark product
produced by the Standard Performance Evaluation
Corp. (SPEC), a nonprofit group of computer
vendors, systems integrators, universities,
research organizations, publishers and consultants
throughout the world. It was designed to provide
measures of performance for comparing compute
intensive workloads on different computer systems.
SPEC95 contains two suites of benchmarks:
CINT95 for measuring and comparing compute-
intensive integer performance, and CFP95 for
measuring and comparing computeintensive floating
point performance.
Q2: What is a benchmark?
A2: The definition from Webster's II Dictionary states:
"A standard of measurement or evaluation." SPEC is
a non profit corporation formed to establish and
maintain computer benchmarks for measuring
component and system level computer performance.
Q3: What does the "C" in CINT95 and CFP95 stand for?
A3: In its product line, SPEC uses "C" to denote
a "component-level" benchmark and "S" to denote a
"system level" benchmark. CINT95 and CFP95 are
component-level benchmarks.
Q4: What components do CINT95 and CFP95 measure?
A4: Being compute-intensive benchmarks, they emphasize
the performance of the computer's processor, the
memory architecture and the compiler. It is
important to remember the contribution of the
latter two components; performance is more than
just the processor.
Q5: What component performance is not measured by CINT95 and CFP95?
A5: The CINT95 and CFP95 benchmarks do not stress
other computer components such as I/O (disk
drives), networking or graphics. It might be
possible to configure a system in such a way that
one or more of these components impact the
performance of CINT95 and CFP95, but that is not
the intent of the suites.
Q6: What is included in the SPEC95 package?
A6: SPEC provides the following in its SPEC95 package:
SPEC95 tools for compiling, running and validating
the benchmarks, compiled for a variety of operating
systems; source code for the SPEC95 tools, to allow
the tools to be built for systems not covered by
the precompiled tools; source code for the
benchmarks;
tools for generating performance reports;
run and reporting rules defining how the benchmarks
should be used to produce standard results; and
SPEC95 documentation.
The initial offering of SPEC95 will have tools for
most UNIX operating systems. Additional products
for other operating systems (Windows NT, OpenVMS,
etc.) will be released as later products if SPEC
detects enough demand. All of this will be shipped
on a single CD-ROM disk.
Q7: What does the SPEC95 user have to provide?
A7: The user must have a computer system running a UNIX
environment with a compiler installed and a CD-ROM
drive. Approximately 150 MB will be needed on a hard drive
to install SPEC95. It is also assumed that the system
has at least 64 MB of RAM to ensure that the
benchmarks remain compute-intensive (SPEC is
assuming this will be the standard amount of
desktop memory during the life of this suite).
Q8: What are the basic steps in running the benchmarks?
A8: Installation and use are covered
in detail in the SPEC95 User Documentation. The
basic steps are as follows: Install SPEC95 from
media. Run the installation scripts specifying your
operating system.
Compile the tools, if executables are not provided
in SPEC95.
Determine what metric you wish to run.
Create a configuration file for that metric. In
this file, you specify compiler flags and other
system dependent information.
Run the SPEC tools to build (compile), run and
validate the benchmarks.
If the above steps are successful, generate a report
based on the run times and metric equations.
Q9: What source code is provided? What exactly makes up these suites?
A9: CINT95 and CFP95 are based on compute-
intensive applications provided as source code.
CINT95 contains eight applications written in C
that are used as benchmarks:
099.go -- Artificial intelligence; plays the game of "Go"
124.m88ksim -- Moto 88K chip simulator; runs test program
126.gcc -- New version of GCC; builds SPARC code
129.compress -- Compresses and decompresses file in memory
130.li -- LISP interpreter
132.ijpeg -- Graphic compression and decompression
134.perl -- Manipulates strings (anagrams) and prime numbers in Perl
147.vortex -- A database program
CFP95 contains 10 applications written in FORTRAN
that are used as benchmarks:
101.tomcatv -- A mesh-generation program
102.swim -- Shallow water model with 1024 x 1024 grid
103.su2cor -- Quantum physics; Monte Carlo simulation
104.hydro2d -- Astrophysics; Hydrodynamical Navier Stokes equations
107.mgrid -- Multi-grid solver in 3D potential field
110.applu -- Parabolic/elliptic partial differential equations
125.turb3d -- Simulates isotropic,homogeneous turbulence in a cube
141.apsi -- Solves problems regarding temperature, wind, velocity and
distribution of pollutants
145.fpppp -- Quantum chemistry
146.wave5 -- Plasma physics; electromagnetic particle simulation
Q10: What metrics can be measured?
A10: The CINT95 and CFP95 suites can be used to
measure and calculate the following metrics:
CINT95:
SPECint95: The geometric mean of eight normalized
ratios (one for each integer benchmark) when
compiled with aggressive optimization for each benchmark.
SPECint_base95: The geometric mean of
eight normalized ratios when compiled with
conservative optimization for each benchmark.
SPECint_rate95: The geometric mean of eight
normalized throughput ratios when compiled with
aggressive optimization for each benchmark.
SPECint_rate_base95: The geometric mean of eight
normalized throughput ratios when compiled with
conservative optimization for each benchmark.
CFP95:
SPECfp95: The geometric mean of ten normalized
ratios (one for each floating point benchmark) when
compiled with aggressive optimization for each
benchmark.
SPECfp_base95: The geometric mean of ten
normalized ratios when compiled with conservative
optimization for each benchmark.
SPECfp_rate95: The geometric mean of ten
normalized throughput ratios when compiled with
aggressive optimization for each benchmark.
SPECfp_rate_base95: The geometric mean of ten
normalized throughput ratios when compiled with
conservative optimization for each benchmark.
The ratio for each of the benchmarks is
calculated using a SPEC-determined reference time
and the run time of the benchmark.
Q11: What is the difference between a "base" metric and a "non-base"
metric?
A11: In order to provide comparisons across
different computer hardware, SPEC had to provide
the benchmarks as source code. Thus, in order to
run the benchmarks, they must be compiled. There
was agreement that the benchmarks should be
compiled the way users compile programs. But how do
users compile programs? On one side, people might
experiment with many different compilers and
compiler flags to achieve the best performance. On
the other side, people might just compile with the
basic options suggested by the compiler vendor.
SPEC recognizes that it cannot exactly match how
everyone uses compilers, but two reference points
are possible. The base metrics (e.g.,
SPECint_base95) are required for all reported
results and have set guidelines for compilation
(e.g., the same flags must be used in the same
order for all benchmarks). The non-base metrics
(e.g., SPECint95) are optional and have less strict
requirements (e.g., different compiler options may
be used on each benchmark).
A full description of the distinctions can be found
in the SPEC95 Run and Reporting Rules available
with SPEC95.
Q12: What is the difference between a "rate" and a "nonrate" metric?
A12: There are several different ways to measure
computer performance. One way is to measure how
fast the computer completes a single task; this is
a speed measure. Another way is to measure how many
tasks a computer can accomplish in a certain amount
of time; this is called a throughput, capacity or
rate measure.
The SPEC speed metrics (e.g., SPECint95) are used
for comparing the ability of a computer to complete
single tasks. The SPEC rate metrics (e.g.,
SPECint_rate95) measure the throughput or rate of a
machine carrying out a number of tasks.
Q13: Why and/or when should I use SPEC95?
A13: Typically, the best measure of a system is
your own application with your own workload.
Unfortunately, it is often very difficult and
expensive to get a wide base of reliable,
repeatable and comparable measurements on different
systems for your own application with your own
workload. This might be due to time, money or other
constraints.
Benchmarks exist to act as a reference point for
comparison. It's the same reason that EPA gas
mileage exists, although probably no driver in
America gets exactly the EPA gas mileage. If you
understand what benchmarks measure, they're useful.
It's important to know that CINT95 and CFP95 are CPU-focused and
not system-focused benchmarks. These CPU benchmarks
focus on only one portion of those factors that
contribute to applications performance. A graphics
or network performance bottleneck within an
application, for example, will not be reflected in
these benchmarks. Understanding your own needs
helps determine the relevance of the benchmarks.
Q14: Which SPEC95 metric should be used to determine performance?
A14: It depends on your needs. SPEC provides the
benchmarks and results as tools for you to use. You
need to determine how you use a computer or what
your performance requirements are and then choose
the appropriate SPEC benchmark or metrics.
A single user running a compute-intensive integer
program, for example, might only be interested in
SPECint95 or SPECint_base95. On the other hand, a
person who maintains a machine used by multiple
scientists running floating point simulations might
be more concerned with SPECfp_rate95 or
SPECfp_rate_base95.
Q15: SPEC92 is already an available product. Why create SPEC95
and will it show anything different from SPEC92?
A15: Technology is always improving.
As the technology improves, the benchmarks need to
improve as well. SPEC needed to address the
following issues:
Run-time -- Several of the SPEC92
benchmarks were running in less than a minute on
leading-edge processors/systems. Given the SPEC
measurement tools, small changes or fluctuations in
the measurements were having significant impacts on
the percentage improvements being seen. SPEC chose
to make the SPEC95 benchmarks longer to take into
account future performance and prevent this from
being an issue for the life of the suite.
Application size -- Many comments received by SPEC
indicated that applications had grown in complexity
and size and that SPEC92 was becoming less
representative of what runs on current systems. For
SPEC95, SPEC selected programs with larger resource
requirements to provide a mix with some of the
smaller programs.
Application type -- SPEC felt
that there were additional application areas that
should be included in SPEC95 to increase variety
and representation within the suites. Areas such as
imaging and database have been added. Portability -
SPEC found that computeintensive performance was
important beyond the UNIX workstation
arena where SPEC was founded. It was important,
therefore, that the benchmarks and the tools
running the benchmarks be as independent of the
operating system as possible. While the first
release of SPEC95 will be geared toward UNIX, SPEC
has consciously chosen programs and tools that are
dependent only upon POSIX or ANSI standard
development environments. SPEC will produce
additional releases for other operating systems
(such as Microsoft Windows/NT) based on demand.
Moving target -- The initial hope for benchmarks is
that improvements in the benchmark performance will
be generally applicable to other situations. As
competition develops, however, improvements in the
test performance can become specific to that test only.
SPEC95 provides updated benchmarks so that general
improvements will be encouraged and testspecific optimizations
become less effective.
Education -- As the computer industry
grows, benchmark results are being quoted more
often. With the release of new benchmark suites,
SPEC has a fresh opportunity to discuss and clarify
how and why the suite was developed.
Q15: What happens to SPEC92 after SPEC95 is released?
A15: SPEC will begin the process of
making SPEC92 obsolete. The results published by
SPEC will be marked as obsolete and by June 1996,
SPEC will stop publishing SPEC92 results and stop
selling the SPEC92 suites.
Q16: Is there a way to translate SPEC92 results to SPEC95 results
or vice versa?
A16: There is no formula for converting from SPEC92
results to SPEC95 results; they are different
products. There might be a high correlation between
SPEC92 and SPEC95 results (i.e., machines with
higher SPEC92 results might have higher SPEC95
results), but there is no universal formula for all
systems.
SPEC is strongly encouraging SPEC licensees to
publish SPEC95 numbers on older platforms to
provide a historical perspective.
Q17: What criteria was used to select the benchmarks?
A17: In the process of selecting applications to use as
benchmarks, SPEC considered the following criteria:
portability to all SPEC hardware architectures (32
and 64 bit including Alpha, Intel Architecture, PA
RISC, Rxx00, SPARC, etc.);
portability to various operating systems,
particularly UNIX, NT and OpenVMS;
benchmarks should not include measurable I/O;
benchmarks should not include networking or
graphics; benchmarks should run in 64-MB RAM
without swapping (SPEC is assuming this will be a
minimal memory requirement for the life of SPEC95
and the emphasis is on computeintensive performance
and not disk activity); benchmarks should run at least
five minutes on a Digital Equipment Corp. 200-MHz Alpha system;
and no more than five percent of benchmarking time
should be spent processing code not provided by SPEC.
Q18: Weren't some of the SPEC95 benchmarks in SPEC92?
How are they different?
A18: Although some of the benchmarks from SPEC92 are
included in SPEC95, they all have been given
different workloads or modified to improve their
coding style or use of resources. The revised
benchmarks have been assigned different identifying
numbers to distinguish them from
versions in previous suites and to indicate they
are not comparable with their predecessors.
Q19: Why were some of the benchmarks not carried over from SPEC92?
A19: Some benchmarks were not carried over because it
was not possible to create a longer running
workload or to create a more robust workload, or
the benchmarks were too susceptible to benchmark-
specific compiler optimization.
Q20: Why does SPEC use a reference machine for determining performance
metrics? What machine is used for SPEC95 benchmark suites?
A20: SPEC uses a reference machine to normalize the
performance metrics used in the SPEC95 suites.
Each benchmark is run and measured on this machine
to establish a reference time for that benchmark.
These times are then used in the SPEC calculations.
SPEC uses the SPARCstation 10/40 (40-MHz SuperSPARC
with no L2 cache) as the reference machine. It
takes approximately 48 hours to run a
SPECconforming execution of CINT95 and CFP95 on
this machine.
Q21: How long does it take to run the SPEC95 benchmark suites?
A21: This depends on the suite and the machine that is
running the benchmarks. As mentioned above, on the
reference machine it takes two days for a SPEC
conforming run (at least three iterations of each
benchmark to ensure that results can be
reproduced).
Q22: What if the tools cannot be run or built on a system?
Can they be run manually?
A22: To generate SPEC-compliant results, the tools used
must be approved by SPEC. If several attempts at
using the SPEC tools are not successful for the
operating system for which you purchased SPEC95,
you should contact SPEC for technical support. SPEC
will work with you to correct the problem and/or
investigate SPECcompliant alternatives.
Q23: What if I don't want to run the benchmarks? Is there any place
where results are available?
A23: Here are the alternatives:
Every quarter, SPEC publishes the SPEC Newsletter,
which contains results submitted to SPEC by SPEC
members and licensees. Subscription information is
available from SPEC. SPEC provides information to
the Performance Database Server found at:
http://performance.netlib.org/performance/html/spec.html
This typically lags three months behind the
SPEC Newsletter. SPEC is working on establishing
its own Internet presence, although details are not
yet available.
Q24: How do I contact SPEC?
A24: Here is the contact information for SPEC:
Dianne Rice
SPEC
c/o NCGA
2722 Merrilee Drive, Ste. 200
Fairfax, VA 22031
Tel: 703-698-9604, ext. 325
Fax: 703-560-2752
E-mail: spec-ncga@cup.portal.com
Questions and answers were prepared by Kaivalya
Dixit of IBM and Jeff Reilly of Intel Corp. Dixit
is president of SPEC and Reilly is release manager
for SPEC95.
|