|
TUNING
|
Why Your CPU Capacity May Not Match your Vendor's EstimateIn addressing this issue, I'll cover the following: 1. Definition of terms
This discussion describes considerations for MVS or OS/390 systems running on IBM, Amdahl, or HDS processor models. Most of the issues addressed, however, apply to VM and VSE systems as well. 1 DEFINITION OF TERMSThroughout this paper, I'll use various terms and I want to indicate my definition of these terms.1.1 CPU VS. Model VS. CEC VS. Machine
A CPU is a single processor that can execute instructions on the behalf of some unit of work. It will have one, and sometimes more, high speed buffers in which to store data while being referenced. A CPU can be dispatched by the operating system to execute one unit of work, such as a TCB or SRB, at a time. A CPU is sometimes referred to as a processor, but I'll avoid that use in this paper, because some people refer to a processor as having multiple CPUs. A processor model is a combination of one or more CPUs and is distributed with central and expanded storage, an I/O processor (CPU), possible system assist processors, system control processors, and various levels of cache buffer storage. A vendor will normally market many models, such as the IBM 9672-RX4, the HDS Pilot R7, or the Amdahl Millennium GS545. Various authors will refer to these processor models as CPCs (Central Processing Complexes) or CECs (Central Electronic Complexes), machines, or simply processors. I'll use model or machine in this paper. 1.2 Speed
1.3 Capacity
Given a model's single CPU speed, I define capacity as being equal to the effective CPU speed multiplied by the number of CPUs in the model. In a uni-processor, the capacity and speed are the same.
It is possible for one model to have a faster CPU speed, but a smaller capacity (due to having a fewer number of CPUs) than another model. You can also have one model with a slower CPU speed, but a larger capacity than another model due to a large number of CPUs. 1.4 MIPS
Most MIPS ratings today are simply based on the vendor's claims of the relative performance of each model. Many analysts will provide these MIPS ratings based on the vendor's claims in order to provide a consistent view of speed and capacity across multiple vendors. Gartner Group, the Meta Group, IDC, and Watson & Walker are among the groups who publish MIPS ratings. We publish ours in Cheryl Watson's TUNING Letter [REF001], and I'll use our MIPS ratings for all references in this document. The reason for the continued use of MIPS is that customers are more comfortable with MIPS than relative performance numbers. The primary value of MIPS is to provide a starting point to identify a group of processor models that are close to the capacity required. A single number will not provide a good estimate of what you can expect to receive. There are two types of MIPS to be concerned with. One is the total capacity of the processor model. This provides insight into the total amount of work that can be processed on that particular model. The second MIPS rate to be aware of is the MIPS per CPU. This estimate provides insight into the speed of a single CPU. This is needed since it is possible to have a 400 MIPS model composed of 4 CPUs at 100 MIPS each, 8 CPUs at 50 MIPS each, or 12 CPUs at 33 MIPS each, and your work will perform very differently on each of these configurations. You can see some examples of MIPS ratings from our CPU Chart [REF001] in Figure 1. 2 WHY USE VENDOR CLAIMS?2.1 Alternative Too ExpensiveVendor ratings are the basis of all comparison charts available on the market today because it's simply too expensive for anyone other than the vendor to purchase or obtain access to every processor that's available. The vendors have access to all of their own processors and must make performance runs on all of their own hardware anyway. Sometimes the vendors will have access to their competitor's machines and so can make comparisons between the two with their own workloads. 2.2 Vendor Has Market Goal
The hardware design team targets the processor speed as they begin the design work and they don't stop design, modification of design, and just plain "tweaking" until the model has reached the targeted capacity. This means that you can normally depend on a model matching the vendor's claims prior to their availability.
One of the reasons that a vendor will have a fairly specific goal in the capacity of a model is to provide a full range of capacity relative to software pricing. Software pricing is normally based on either processor group or MSUs (millions of service units), with significant software license charge increases with each higher group or range of MSUs. A vendor wouldn't be wise to provide three models in a series for groups 50, 70, and 70. A better option which would be attractive to more customers would be to provide models in groups 50, 60, and 70, even if it meant down-grading one of the models (in this case, taking the smaller group 70 model and downgrading it to fit into the group 60 range). (More about that later.) The same is true of MSU ratings. Figure 2 shows an extract from our CPU Chart [REF001] that has organized the processors by first software processor group and then by MSUs. Notice that, in some case, a model will have a higher MIPS rating than a different model in a higher group. A goal of most installations is to obtain the highest MIPS rating for their workloads at the smallest processor group and MSU rating in order to reduce costs. In Figure 2, you can see that HDS GX8314, the IBM 9672-R35, and the HDS Pilot 37 might be good bargains because they have the highest average capacity within group 60. Each vendor is concerned with having a model that will provide an easy incremental step in possible upgrades for their customers. 2.3 Performance Guarantees
3 HOW DO VENDORS MEET THEIR CLAIMS?As previously mentioned, the vendors know what capacity they are aiming for in a particular CPU model. As an example, in order to address each area of their target market for the latest Generation 4 models, in June 1997 IBM announced 14 new models ranging from 8 MSUs to 78 MSUs. Based on our analysis, this corresponds to uni-processor speeds of 48 MIPS, 56 MIPS, 62 MIPS, 66 MIPS, and 72 MIPS. Only certain MP (multiprocessor) models are available for each uni-speed, depending on the target market.Let's take a look at how a vendor can produce a model that provides a specific speed and, therefore, capacity. 3.1 Chip Sorting
As an example, the fastest IBM Generation 4 (G4) chip is rated at 2.7 nano-seconds and is used in their RY5 (10-way based on a 72 MIPS uni-processor). Compare this to the 3.1 nano-second chips in the 66 MIPS uni-processor based models (R55 to RX5) and the 3.3 nano-second chips in the 62 MIPS uni-processor based models (R15 to R45). In the case of the 2.7 nano-second chip for the IBM RY5, they were able to take the fastest chips from the chip sorting process and increase the speed by additional cooling using a refrigerant. (IBM states that it is an environmentally safe refrigerant, R134A.) 3.2 System Structure Changes
There are dozens of other changes than can be made in the hardware and microcode that will affect the effective speed of a CPU. Suffice it to say that the vendors have the knowledge and experience to "tweak" these as needed to achieve a specific speed for a machine. Sometimes a vendor will refer to "degraded" or "down-graded" models (although the labels aren't comforting!) that are needed to fill in a processor range. These might be slower chips or they might contain system structure differences to reduce the effective speed in order to fit the machine into a lower software rating. Likewise, a "turbo-charged" model might contain faster chips or include additional system structure changes to provide the needed increase in speed. 3.3 MP Effect
When IBM moved to CMOS models, however, the MP effect seemed to be more important. As an example, look at the #MIX ITRRs for two ten-ways, the bipolar 9021-9X2 and the CMOS 9672-RX5. The column called MP % shows the percentage of effective MIPS in the MP compared to the total possible MIPS if there were no overhead. From Figure 1, we can see that the 9021-9X2 provides about 465 MIPS which is 75% of a potential 620 MIPS (10 CPUs at 62 MIPS, the speed of the 711 uni). The 9672-RX5, on the other hand, only provides about 394 MIPS, at only 60% of its potential 660 MIPS (10 CPUs at 66 MIPS). The Rx5 models show the highest MP effect to date, and is one of the reasons, I think, for the interesting series of models that IBM announced in June 1997. The R55 (5-way) to RX5 (10-way) models are based on a 66 MIPS CPU uni, which is faster than IBM's largest bipolar, the 9021-9X2 (at 62 MIPS uni), but the total capacity of 394 MIPS is far less than the 9X2 (477 MIPS) due to the fact that the RX5 has more MP overhead than the 9X2. So IBM also announced the RY5 10-way at the same time. The RY5 is based on a turbo-charged 72 MIPS uni CPU, and is able to provide a capacity of 439 MIPS, which is much closer to the 9X2. That is, to compensate for the higher MP effect, IBM provided a model with faster CPU chips. The only reason to be aware of the MP-effect is when you are considering the addition of a CPU to a current configuration. From a capacity planning standpoint, you should be aware of the decrease in capacity of the other CPUs. It's not a pricing issue since the prices are adjusted by the vendor based on the effective capacity of the machine. 4 IBM'S LSPR RATINGSTo confirm the speed and capacity of their processor models and to help customers understand what to expect from different models processing their workloads, IBM publishes their Large Systems Performance Reference [REF002], as a manual and as a performance tool. You can also find the LSPR numbers on the Web [REF005]. Both their techniques and results are published in the manual, and I would strongly recommend that you become familiar with their methodology. This section of this paper provides my summary of their 50 page discussion on the technique.4.1 Workloads
The sets of workloads consist of: CB84 - Commercial Batch Workload
Since the two workloads don't invoke DB2 sorts, the DB2 sort assist feature available on some models is not exercised.
Two very important items to note is that only one type of workload is run in each test and the tests are run in totally unconstrained environments. That is, CICS is not tested with TSO and IMS is not tested with batch during the same runs. Also, in order to accurately determine the effect of the processor capacity, IBM must ensure that no other constraints exist on the system. That is, there is virtually no paging due to the abundance of all types of storage, there is no I/O constraint (almost 100% cache), there isn't a lack of VTAM buffers or JES initiators, and even the CPU is not run until it is constrained (it is never run at over 100% busy). 4.2 Measurements
Each workload will have its own ITR. To be able to compare two models, IBM uses an ITRR, Internal Throughput Rate Ratio, which is calculated by taking the ITR for a base model and dividing it into the ITR for the new model. Prior to June 1997, IBM published a list of the ITRRs using their 9021-520 as a base model with the ITR for each workload being set to 1.0. Thus, a model that can process 50% more work in the same amount of CPU time compared to the 520 will have an ITRR of 1.5. In June 1997, IBM published preliminary LSPR ratings for their newest models using the CMOS 9672-R15 as a base. In August 1997, they republished their LSPR ratings for all models using the R15 as the new base. These new ratings were quite a bit different than the 520 ratings because the operating system and subsystem releases used in the LSPR runs were changed at the same time. This led to more than a little confusion. If we take IBM's statement that the R15 is equivalent to the 9021-711, and we also accept the 711 as a 62 MIPS machine, all other machines would see a corresponding 2% to 6% increase in MIPS ratings based on the LSPR ratings! Figure 3 shows an extract from IBM's LSPR charts for their three models as compared to the 9672-R15. You can interpret the chart as saying that their TSO workload achieved 4.40 times as many transactions in the same amount of processor busy time on the 9672-R55 as compared to the 9672-R15. This is based on the total capacity of the model, not necessarily the speed of a CPU as we'll see later. In order to help people consider the capacity based on a mix of workloads, IBM derives an estimated ITRR called #MIX, which consists of 20% of the ITRR of each of the five workloads: CICS, IMS, DB2, TSO, and CB84. This is a calculated value only, and is not confirmed by running 20% of each workload, which would be next to impossible to achieve consistency. 4.3 How These Are Used
First, the SU/Sec value is published and made available often before final LSPR tests have been completed. While the published ITRRs might change, the SU/Sec values are seldom changed. Secondly, in older models, the difference in speed between workloads was fairly close. With modern processors, the difference in speed between workloads can be over 30%. As an example, in Figure 3, the FPC1 workload on a 9672-RX5 has an ITRR (9.61) that's over 60% higher than DB2 (5.92), and over 50% higher than #MIX (6.36). It would be very difficult to use a single number to indicate the speed of the RX5 for these differing workloads. There's a 14% variation in just the five workloads used to derive the #MIX. The published #MIX is also used by most of the industry analysts to determine the relative MIPS ratings of different processors. This is an important concept for people that use published MIPS because it means that there could be a 40% or more variance between the published MIPS and what your workload would see. In our CPU Chart, we list estimated MIPS per workload to help people understand the difference that workloads make in estimating the capacity of a specific model. 4.4 Changes After GA
5 AMDAHL PERFORMANCE CLAIMSAmdahl has a set of internal benchmark jobs similar to IBM's, but they do not publish a description of their workloads or specific performance claims for each type of workload. They normally publish a range of performance that can be expected for a given model compared to their 5995-4570M. For example, their newly announced CMOS Millennium series contains a model GS745, which is listed as having a performance rating of 1.16 to 1.28 of the Amdahl 5995-4570M.Since Amdahl does not publish their workloads, we can't be certain which workloads are at which end of the range, although we might expect them to be similar to IBM's workloads. Most analysts take the midpoint of the high and low to be the average and relate that to IBM's #MIX workload. Whether this is valid or not is to be seen. Amdahl has always derived their SU/Sec value a little differently, however. Their logic has been to provide consistent TSO response across a hardware change. In order to do this, the same percent of TSO transactions need to complete in first period. For this to be true, the durations must be adjusted to match the CPU speed. Amdahl assigns a value to the SU/Sec to ensure that the same percent of TSO transactions complete in first period. This has meant that the Amdahl SU/Sec values for bipolars have been higher by 6-8% than corresponding IBM and HDS bipolars. The Amdahl models had SU/Sec values that resulted in calculations of about 52 SU/Sec for each MIPS, while IBM and HDS had closer to 48 SU/Sec for each MIPS. With CMOS models, however, the vendors are getting closer. The IBM CMOS models are now closer to 51 SU/Sec while the Amdahl models vary from 48 to 52 SU/Sec (with a strange anomaly in the GS535 which results in almost 55 SU/Sec per MIPS). This means two things to you. It is fairly dangerous to try to compare service units between models from different vendors. And it's also dangerous to compare service units between models of widely different ages. 6 HDS PERFORMANCE CLAIMSHDS uses two techniques for publishing performance ratings. Two series of HDS models, the GXxxxx series and their CMOS Pilot models, are designed to be directly competitive to corresponding IBM models, and therefore use comparable IBM ratings. The Skyline models which are based on the fastest CPU speed available today, are not comparable in speed to any IBM or Amdahl CPU, so HDS publishes separate ratings for the Skylines (as well as a few other models that don't have corresponding IBM matches).The HDS models that are comparable to the IBM models are published by HDS as having "equivalency" to the IBM models and their performance claims are equivalent to IBM's claims. For the few models in these series that do not have a direct equivalent model within the IBM range, HDS publishes a performance range, such as one model might provide 1.2 to 1.4 times the performance of an HDS GX8110. The Skyline models, which are really combinations of bipolar and CMOS technology, don't relate to an IBM model, but performance claims are published that indicate, for example, that a Skyline is 2.0 times the HDS GX8114. HDS has derived these performance claims by running their own set of benchmark jobs. Neither a description of the jobs or the resulting measurements are published. I've noticed that Skyline SU/Sec values range from 48 to 52 SU/Sec per MIPS, so the SU/SEC values might appear higher or lower than service units from other vendors. 7 WHY YOU SHOULD USE THESE CLAIMS7.1 The Bad NewsIt's important to understand that there is no measurement in existence that can provide a single rating for a processor model that is indicative of its speed and capacity for a variety of workloads. It's similar to buying a car based on expected mileage. A car might be rated for 20 miles to the gallon, but that is seldom what you will find. You will drive the car much different than the testers that came up with the initial rating. For example, if you happen to have a lead foot (i.e., drive too fast!), you'll NEVER get the mileage your car is rated for. If you drive it according to their recommended speeds, and in their type of traffic, and on their types of roads, and with the same amount of weight in the car, and with all of the extra equipment turned off, you might be able to come close to their estimate. The same is true of processor models. With that said, however, I strongly recommend that you use the vendor's claims for sizing a machine, because it as close as you can get initially. 7.2 Performance Guarantees
So therefore, I think you should trust the vendor to provide the right capacity estimates, but get it in writing! The trick in any contract is to identify how you and the vendor will agree to the performance that you're getting. This often requires very knowledgeable people on both sides who can understand the difference in performance because your workloads may not match the vendor's workloads. 7.3 Industry Charts Based on Vendor's Claims
8 WHY YOUR EXPERIENCE MAY DIFFERWhy wouldn't you get the same performance out of a processor model for your workloads? There are several reasons and I'll address the most common among these:1. Workloads vary
8.1 Workloads Vary
I think that the following summary made from the LSPR manual [REF002] is enlightening and helps provide some insight to what you might expect to see:
b. When comparing n-way models to their corresponding uni-processor model, the actual capacity will be higher for workloads at the batch end and lowest for workloads at the online end of the spectrum. c. When comparing models with larger high speed buffer caches to those with less, the capacity will be higher for workloads at the online end and lowest for workloads at the batch end of the spectrum. 8.2 Your Workloads Don't Match the Vendor's
Here are a few examples where the performance of some workloads might not meet the vendor's expected performance claims:
2. A few customers found that some batch jobs took much longer than expected when they moved to a CMOS processor from a bipolar. It turned out that the problem was due to the fact that the packed decimal instruction set was much slower on the CMOS 9672 models than on the bipolars. A heavy use of packed decimal instructions tend to occur in COBOL programs that use subscripts for heavy table processing and were compiled with a compiler option of 'TRUNC=BIN'. IBM didn't run into this particular combination of heavy packed decimal work because their benchmark programs used indexes rather than subscripts. (I remember teaching students that they should use indexes rather than subscripts back in early 1970, but programmers and even vendors are still using subscripts!) This phenomenon has been significantly improved with some microcode changes, but it still exists in many of the IBM 9672 models and HDS Pilot models. For more information on this, see WSC Flash #9608 and the archives from the Watson & Walker 'Cheryl's List' listserver [REF003]. 3. As mentioned earlier in the DB2 workload description, IBM's DB2 transactions don't cause the DB2 Sort Assist facility to be invoked. Since many applications do require a DB2 sort, your workloads could get better or worse performance when moving between processors with or without the sort assist facility. 4. One of the most common problems I've seen recently is a much larger occurrence of work that uses floating point. SAS, for example, uses floating point for most of its work. Any installation with a large percent of SAS in their daily processing should consider the FPC1 workload as being more representative of SAS than other workloads. Since FPC1 isn't used to determine the #MIX from IBM, SAS users can get very surprised as seen by some quite low ITRRs on FPC1 workloads on some models. In describing IBM's LSPR technique, I referred to their use of 'processor utilization'. This is all of the captured CPU usage for the measurement interval and includes CPU time consumed by all the system address spaces such as MVS, JES, RACF, VTAM, GRS, CONSOLE, etc., not simply the time recorded by the application in the SMF type 30 (job termination) or type 72 (workload by performance group or service class) records. IBM can obtain all of the measurements because they run in a dedicated, stand-alone environment. It's much harder for an installation to obtain all of the CPU for a specific workload. For example, if you run TSO and CICS at the same time, how much of MVS, RACF, VTAM, etc. is being used by the TSO workload and how much by the CICS workload. You simply can't tell. So if you see an CICS ITRR between two machines of 1.2, does that mean that the speed that is 20% faster is seen as reduced CPU time in just CICS or will part of it be seen in reduced CPU time for MVS? You don't really know because IBM is really measuring multiple things at one time (that is, the SMF time of the region, MVS, VTAM, initiators, JES, etc. 8.4 Your Mix Doesn't Match the Vendor's
So you'll need to determine your own mix. For daytime processing, you might want to look at your peak processing period and determine the make up of the work at that time. For example, let's assume you are moving from a 9672-R53 to a 9672-R83 and you run 50% CICS, 10% TSO, 10% batch, and 30% "other things" like MVS, RACF, VTAM, monitors, operation's started tasks, and scheduling programs. When using a variety of work, it's easiest to determine the percent of each type of work during the peak interval (that's when the capacity of the machine is the most important). Simply group MVS and supporting functions with the miscellaneous workloads and use the #MIX ITRR. Let's assume that you had some work on an 8-way 9021-982 and planned to move it to a 10-way 9672-RX5. Also assume that you were running 70% CICS, 10% TSO, and 20% other (MVS) during the peak intervals. From Figure 3, we can calculate the ITRR for CICS to be .91 (6.61 / 7.23), the ITRR for TSO is 1.11 (6.48 / 6.00), the ITRR for #MIX is .98 (6.36 / 6.48). That's 70% at .91, 10% at 1.11, and 20% at .98 for a combined ITRR of .94. 8.5 The Workloads Vary Throughout the Day
But if you have a tight batch window at night, as many installations do, you will need to calculate a daytime ITRR and a nighttime ITRR to better determine the effect of a processor change. It would be quite possible to find a site with a mix of 70% online during one peak hour only to find the mix has shifted to 70% batch in the nighttime peak hour. As more companies are going to more international processing windows, the variation between day and night processing is reduced. Even the online workloads will vary dramatically throughout the day. 8.6 The Volume Affects Capacity
Your results will almost certainly vary if you run at different capacities. Frankly, few sites will upgrade to a new machine and immediately run at between 70% and 100% busy. A new machine almost always has excess capacity, and this will affect how much CPU is needed for the workload. For some models, being underutilized will actually provide worse CPU overhead due to their management of high speed cache and how work is dispatched to the CPUs. Other factors, such as LPAR processing can add to several "low utilization" effects. For most models, however, being underutilized will result in less CPU time per transaction than the work will see as the system gets busier. That means that shortly after moving to a new processor, you will tend to see very good performance. As you get more work on the system, which may be many months later, the CPU usage of the system will increase. In almost every analysis I've made, jobs will take more CPU time when the CPU utilization is at its highest. This is often referred to as the "multi-programming" effect. If you measure the data at 50% CPU busy, it will always be to the vendor's favor, because the machine will be able to get the work done in less time than estimated at higher utilizations. This phenomenon is seen very frequently. An installation that has been severely constrained for months (running well over 100% busy for long periods of time) might replace their current machine with a model that has a higher capacity, so the entire workload can be processed while only running at 60% busy on the new processor. The jobs have been experiencing excessive CPU overhead due to the high utilization and are then moved to an environment where they take less than the vendor's predictions, and it appears that you easily got what you paid for. Therefore, you will need to wait until you've reached full utilization on your processor before knowing whether you have obtained the processor capacity that you had planned for. 8.7 Constraints in Software Affect Capacity
Unfortunately, this happens quite often when an installation upgrades to a new processor. There are several dozen parameters that should be modified when you upgrade to a larger capacity machine. If these aren't modified, you could be restricting the capacity of your new machine. A simple parameter, such as the domain constraints in the IPS, could cause an increase in the amount of swapping, and therefore, overhead in the new model. 8.8 Constraints in Hardware Affect Capacity
You should be aware that if you have any hardware constraints, such as lack of I/O paths, poor cache hit ratios, poorly performing DASD, storage shortages, or other hardware constraints, that you could be impacting the potential capacity of your machine. 8.9 LPAR Affects Capacity
LPAR processing, whether it's from IBM's PR/SM, Amdahl's MDF, or HDS's MLPF will take additional cycles for processing time. A small portion of LPAR processing may be displayed in the partition data available from RMF and CMF, but that is only the LPAR management time and does not include the bulk of the actual overhead. Most LPAR overhead is actually experienced by the workloads, and their CPU time (TCB or SRB) will increase in an LPAR situation. The amount of increase is quite variable and dependent on several factors. The primary factors are the number of LPARs on the machine, the total number of shared logical CPUs, the ratio of logical to physical CPUs, and the activity in the other LPARs. An increase in any of these four will cause an increase in the CPU time for your work. This CPU time has not been considered in the vendor's announced performance claims (nor can it be). The LPAR overhead could be as small as 2% (in a production LPAR that's given 95% of the machine) to 25% (in a grossly over-configured, multiple LPAR, multiple CPU environment). You need to take this into consideration if you are running in any type of LPAR environment. 8.10 Dispatch Priorities Affect Capacity
8.11 Software Levels Affect Capacity
What this means is similar to the discussion in 8.2 where your workloads don't match the vendors. An example of this is in ISPF. ISPF V4 took a lot more cycles than ISPF V3. If the vendor is using ISPF V4 for the base and you are running ISPF V3, you will probably see a difference in how the TSO workload is affected when moving between two models. That is, the vendor did not measure the impact of ISPF V3 – it could have been worse or it could have been better, but only you will know (it won't come out of the benchmarks). 8.12 Levels of PTFs Affect Capacity
There have even been cases where IBM has had to apply some PTFs before running their LSPR tests due to some performance improvements that were related to the hardware. 8.13 Different Facilities Invoked
Since the current benchmarks are run on traditional workloads, how will you be able to tell the impact of a new processor model for your new applications such as IBM's Web Server on MVS, their LANServer MVS, object technology with SOM and CORBA, web applications like Java, TCP/IP instead of VTAM, DB2 stored procedures, OpenEdition MVS, MQSeries, and similar new applications. Likewise, consider the applications that are trying to take advantage of some of the facilities that were new as of SP 4 or 5 and still haven't been used, such as SmartBatch, DB2 Sort Assist, CICS storage protection, LPAR automatic recovery, etc. One of the newest applications, parallel sysplex, is yet to be considered for the hardware benchmarks. In a parallel sysplex configuration, how much does the processor model affect the communication and overhead to and from the coupling facility? 8.14 Amount of Storage Affects Capacity
If you have a lot of storage and use it, you will take the least amount of CPU time per transaction. If you are short on storage, you will end up taking more CPU cycles from productive work and spend them on paging activities. 8.15 Level of Tuning
The easiest example to show of this is good blocking. Yes, you've probably heard for years that good blocksizes (half or full track blocks on disk) are the most efficient. And most installations have ensured that production data sets are well blocked. But in most sites, programmers tend to use a factor of 10 to get blocksizes (80 x 800, 1600 x 16000) which produce very poorly performing jobs. Good blocking could reduce the CPU by 10% to 20%. If you have many of these in your batch workload, the programs aren't running very efficiently and may not be getting the maximum benefit out of the new processor. A well tuned system will always get the best performance out of a new configuration. 8.16 User's Behavior Changes
Many sites have gotten burned because an improvement in response times caused users to change their behavior. Another common example is seen when TSO users find that the system is so fast they start doing all of their work in foreground rather than submitting batch jobs. This leads to excessively longer TSO third period response times and CPU consumption. 8.17 The one thing that remains consistent is that you will
always have change!
This is seldom the environment that you can expect to see. The only
consistency in most production sites is the inconsistency of the workloads.
An entire day of processing can be harmed if a batch job from the nightly
cycle abended and must be run during the day with the online workloads.
TSO users may all come back from a meeting at the same time and hit the
system with double the normal TSO load. The CICS group could change a single
parameter in their CICS parameters and increase the CICS CPU time by 5%.
The DB2 group could add some indexes and reduce DB2 time by 15%.
8.18 All of the Above
It is sometimes more difficult to recognize the reasons for poor results than it is to fix the problem. 8.19 Summary
9 WHAT CAN YOU DO?You can do two things to ensure that you get your money's worth. You can obtain a performance guarantee from your vendor before deciding on a processor model. And you can measure (and understand) the relative change in capacity after you've moved to the new model.9.1 Performance Guarantee
Part of the performance guarantee is an agreement on the methodology that will be used to confirm that you receive the performance you expect. Generally, this consists of itemizing your important workloads and specifying their current performance with expected performance from the vendor. Most performance guarantees require that the analysis be done between two environments where all changes have been frozen. That is, new workloads, changes in operating system, parameter changes, etc. are not allowed between the two period of analysis. Be sure that you can handle this period of time without any system or application changes. 9.2 Measure Your Own System
IBM provides one solution for this in "The Complete View" section of chapter 5 of their LSPR manual. As an introduction to their solution, they state that "For a validation to work, there must be a commitment that the workload run on the new processor be the same as that on the old processor. In other words, there should be no shifting of workloads until after the validation is complete." Their technique is to use the logical I/Os related to the total processor busy over a week of prime shift data. I've tried this technique and have found that it only worked in a few cases because users could not make the commitment that the workload not change during the week directly after a processor upgrade. The workload will almost certainly change after a processor upgrade and changes will be made by the data center personnel. I've found more success with the technique of identifying stable job steps and online transactions before the change was made and seeing how they were affected after the change. This technique was first introduced by Joseph B. Major. Though this technique doesn't take operating system differences into account (such as the effect on JES or RACF), it will definitely show the effect on the application. If you don't have time to write your own programs to find these stable jobs and collect this data, take a look at our latest product, BoxScore. BoxScore identifies and quantifies the effect of any change, such as tuning, Year 2000 conversions, processor upgrades, etc., on stable job steps and transactions. This software is based on research that I've been doing in this area for the past 10 years. 10 SUMMARYThe performance estimates for new processor models from IBM, Amdahl, and HDS provide valuable data to help you understand how much capacity you can expect to see if you move to those new models. This is especially true of the IBM LSPR ratings by workload. We would hope to see workload performance ratings from the other vendors at some point in the future.Your workloads may not see exactly the same effects because of several factors, among which is the fact that your workloads don't mimic the vendor's and most installations run in an LPAR environment, which may not be considered in performance claims. To ensure that the vendor will help you if you don't get the performance you expect, I recommend that you ensure that the vendor provides a performance guarantee before delivery of a new model. You should define a technique to identify the relative affect of any processor model change based on your own workloads, not on estimates from artificial workloads. Remember that the vendor's claims are almost always provided for the optimum environment – one running with no constraints in a non-LPAR environment and one that is well-tuned. If you are running in an LPAR, have any constraints, or are not well-tuned, you can't expect to achieve the same performance results. Note: This article with some modifications has previously been published in Watson & Walker's BoxScore User's Guide [REF004]. 11 BIBLIOGRAPHY[REF001] Cheryl Watson's Tuning Letter, CPU Chart[REF002] Large Systems Performance Reference, IBM, SC28-1187 [REF003] Watson, Cheryl, Listserver archives can be found at: http://www.watsonwalker.com/archives.html [REF004] Watson, Cheryl, BoxScore User's Guide, Watson & Walker, Inc. [REF005] Web site for LSPR Description and Numbers - http://www.s390.ibm.com/lspr/lspr.html
|