![]() TUNING
|
Cheryl Watson's |
![]() |
You've been running in SP 5 WLM compatibility mode for a while and you want to move to goal mode soon. But where to start? Here's an easy method of getting there - a "Quickstart" Service Policy. It's a simple, generalized service policy that will work in most installations as a starting point. Our "Quickstart" service policy provides a standard set of service classes and goals that you can easily use or modify for your initial policy.
Once in goal mode, you'll have access to WLM's superior resource management and a wealth of measurement data that makes tuning a real breeze!
INTRODUCTION
The migration to Workload Manager (WLM) goal mode in MVS SP 5 and OS/390 may seem intimidating at first, but it can be quite simple. In this paper, I'll introduce a simple service policy I've designed that can be used by most installations. It should ease your migration to goal mode.
Some installations have used this "Quickstart" policy with little modification and began running in goal mode within a week or so. I think you'll find it an easy way to start your own migration.
This paper assumes you have a basic knowledge of WLM. If you're not familiar with service policies, service classes, and goals (response, velocity, and discretionary), please refer to the bibliography at the end.
KEEP IT SIMPLE
For years, the people maintaining IPS and ICS members have found that simplicity has its rewards. Not only is there the ease in making changes to your IPS/ICS parameters, but there is also the simplicity when reporting or explaining the system, and the reduction in system overhead to process the resulting data. Sites with only a couple of dozen performance groups take much less time and space to process their data than sites with several hundred groups.
The same reasons to keep things simple hold true when defining a service policy, only more so. The performance is considerably better and WLM is much more effective with a simple definition containing a few service class periods than with a definition with hundreds of periods. A simple service definition is also easier for you to maintain. While you should try to keep the total number of service class periods to a minimum for simplicity, you should know that there is little effect on performance if you choose to add several service classes with a discretionary goal or report classes.
Therefore, I have applied a single rule-of-thumb in providing these recommendations: "Use the minimum number of service definitions, policies, workloads, resource groups, service classes, service class periods, and classification rules that you can get away with." In other words, start small and add only when you need to.
You could begin with my "quickstart" service definition and add to it as you find a need. Start with a single service definition and service policy. A single policy may work just fine around the clock. Define a simple set of workloads and service classes and only use multiple period service classes when you have a specific requirement, such as with TSO.
Workloads and service classes for my "quickstart" service policy are shown in Figure 1. The importance (the "Imp" column) is set by each installation based on the workload priorities. The corresponding classification rules are shown in Figure 2, and the classification groups are in Figure 3.
This is a very easy service definition to implement, and it only contains twenty service classes. Most current performance groups should easily fall into one of these classes.
The SYSTEM, SYSSTC, and SYSOTHER service classes are standard service classes defined by WLM. You may indirectly assign any work to SYSSTC and SYSOTHER, but not SYSTEM.
WLM GOAL MODE USING A QUICKSTART SERVICE POLICY
To be able to use this quickstart policy, you'll need to make some initial assignments. The quickstart policy is described below by subsystem, and describes the purpose of each service class and any initial decisions you might need to make.
With service classes using response time goals, I've recommended starting with percentile response time goals. Percentile goals are much better to use since they're easier to achieve and provide a more accurate picture of what the user is experiencing. A single long job or transaction won't affect a percentile response time but can greatly affect an average response time. You may prefer to start with average response time goals since you can obtain average response times directly from your RMF, CMF, or SMF data. After you've been running in goal mode for a short period of time, you'll be able to determine percentile response time goals to use instead of averages.
STARTED TASKS (STC)
I defined four service classes for started tasks (ONLPRD, ONLTST, STCMD, and STCLO). The system automatically defines two (SYSTEM and SYSSTC).
Transaction name groups (TNG) and transaction class groups (TCG) are used to easily create lists of names or classes that are to be assigned to the same service classes. I normally use the service class name as the group name for simplicity. Classification groups are assigned in the WLM ISPF dialog.
Since you can't directly assign work to SYSSTC using classification rules, you need to code the rules for STCs in a special manner. SYSSTC can only be obtained as the DEFAULT for the STC subsystem. To assign classification groups to SYSSTC, use the following technique in the WLM ISPF screen for assigning service classes:
TYPE STCLO is discussed later.
This technique is described in the WLM Planning manual
described in the bibliography.
If you're not running in a constrained environment, you
can assign all of these address spaces to the same service class with a
velocity goal. If you're running in a constrained environment, you may
want to monitor your primary online systems during peak intervals and determine
the velocity that they now receive. If there are varying velocities (e.g.
20%, 40%, and 60%), then set up different service classes for each velocity.
You might end up with ONLHI, ONLMD, and ONLLO service classes. In a constrained
system you might want your IMS control region and CICS TOR (terminal owning
region) running with a higher velocity goal than the IMS message processing
regions or the CICS AORs (application owning regions).
TNG
TNG
TN
The quickstart policy has five service classes defined
for batch jobs, divided between test and production because most installations
tend to manage their batch work in those two categories.
In a constrained system, you could identify critical path
jobs and give them a unique jobclass or accounting code in order to assign
them to this service class. I defined a TCG (Transaction Class Group) of
PRDBATHI for use with unique jobclasses, but you could easily change it
to a TNG (Transaction Name Group). The PRDBATHI service class can run with
a velocity goal, while other production batch runs as discretionary.
PRDBATLO is the default service class for all batch jobs in this Quickstart policy.
In a constrained environment where the critical batch jobs might not complete before morning, you can assign the majority of jobs to PRDBATLO and the jobs on the critical path to PRDBATHI. Most of the jobs (PRDBATLO) would run as discretionary while the critical jobs would run with a low velocity. If WLM needed to steal some resources, it would look at the jobs in PRDBATLO first.
Some sites may need to define a medium importance and velocity service class, PRDBATMD, as discussed later. PRDBATLO is also the service class I've used as a default in the quickstart policy. Depending on your installation, you may want to use test batch as the default class instead of production batch.
One more consideration for production batch is to determine
if there's a need to run production batch during the day while test batch
is running. In that case, you may want to define only test batch with a
discretionary goal and define production batch with a low velocity goal
so that production is always preferred to test.
You can determine the current average response time by
looking at RMF, CMF, or SMF data for these jobs. In an RMF Workload Activity
report, for example, the ACTUAL (or TOTAL for RMF releases prior to V5)
response time is the average response time from job submission to termination.
I've defined this with a percentile response time goal, but you could start
with average and change it after you've collected more response data.
It's possible that your installation doesn't use jobclasses
to differentiate test jobs since you use multiple periods to determine
short, medium, and long. In that case, you can simply define one service
class, such as TSTBAT, containing three periods with different goals for
each period (similar to the three classes I've just described).
The quickstart policy defines a single TSO service class, TSOPRD, with three periods. It's set up using percentile response time goals, but you could start by using average response times corresponding to the current response times you see reported by performance group period.
The response time goal on third period assumes that the activity is high enough (ten transactions every twenty minutes). Use a velocity goal for third period if the rate is low.
TSOPRD is the default service class for all TSO users in this Quickstart policy.
ASCH
The majority of installations aren't running APPC/MVS workloads but soon will since OpenEdition MVS uses APPC/MVS for its implementation. There are several started tasks that can be assigned to service classes. The APPC and ASCH (scheduler) started tasks (you can change the name, of course) intercept and manage all of the APPC/MVS work. Therefore, these tasks should be run in the SYSSTC service class.
The transaction programs (TPs) are the applications that process the requests coming from APPC/MVS clients. The characteristics of these programs will determine the actual goals. Many are long-lived started tasks with little resource usage and others are short-lived tasks with very high resource usage. You can monitor each one and place them in appropriate service classes, such as STCMD or STCLO. Because of their nature, the TPs are best managed with velocity goals.
The APPC and ASCH address spaces are assigned in the STC subsystem, but the actual APPC/MVS TPs are assigned in the ASCH subsystem.
The quickstart policy has a single service class, ASCH, defined with two periods: a percentile response for short transactions and a velocity goal for longer work. If the volume of ASCH transactions is not high enough, at least 10 transactions in twenty minutes, you can use a single period with a velocity goal.
OMVS
OpenEdition MVS (OMVS) transactions will be new to most installations. The Kernel for UNIX is similar to the master scheduler for MVS so the started task for the OMVS Kernel should be run at a very high priority. You can assign it to the SYSSTC service class. OMVS daemons are long-living started tasks that perform continuous or periodic system-wide functions such as network control. These started tasks should be assigned to STCMD in the STC subsystem.
Actual OMVS transactions, the forked children, are classified in the OMVS subsystem. (Don't you just love UNIX terminology?) Due to the characteristics of OMVS transactions, some of which have short response times and high resource usage and others which have long response times and low resource usage, a service class similar to ASCH is the best place to start.
CICS V4 and IMS V5
These subsystems provide additional WLM support to allow you to assign response goals by transaction, subsystem, transaction class, or userid. If you install either of these versions while still running in compatibility mode, turn on collection of average response times. This will give you better data to let you set response time goals when you migrate to goal mode. The sample service classes that I've shown in the quickstart policy can be used as a starting point. Caution: you may see overhead (5%-10%) while collecting this data.
An important item to keep in mind is that there are two possible users of the transaction response time data. WLM will look at the goals and the actual response times of the transactions being processed by an address space. If goals aren't being met, WLM can respond by finding additional resources for the address space. But WLM can't really divide its resources among the transactions -- it only manages address spaces. If you have short transactions with high importance and long transactions with low importance in the same address space, WLM can't help the high importance transactions without giving a free ride to the low importance transactions.
That's where other products come into play. For example, CICSPlex/SM can route transactions to multiple address spaces. If it sees a transaction that needs a short response time, it can route the transaction to the address space that's providing short response times. If it sees a transaction that doesn't have an aggressive response time goal, CICSPlex/SM can route the transaction to an address space that isn't providing wonderful response time.
Thus, it will do you little good to separate your transactions
into multiple service classes if you'll end up letting WLM manage a single
address space. You could still define a single service class for your most
important transactions and set a response goal for them. If they don't
make their goal, WLM can try to find more resources for the address space
to help the short transactions. Other transactions in that address space
will see some benefit as well.
There are several modifications that you'll want to make to the quickstart policy either initially or after you've collected data under goal mode.
Initial Definitions
Here's a list of some other changes you might want to make to the quickstart policy as you need them.
I've just described a service policy that can get you started into goal mode. Just remember the original concept of "keep it simple," and you'll find your migration will be easy and produce a very usable service policy.
BIBLIOGRAPHY
Basics:
IBM, "MVS/ESA Planning: Workload Management," GC28-1493.
C. Watson, "WLM Goal Mode," Cheryl Watson's TUNING Letter, March/April 1995.
P. Enrico and E. Berkel, "Effective Use of MVS WLM Controls," Cheryl Watson's TUNING Letter, March/ April 1995 and CMG Trans. 87, 21-38 (1995).
For further reading:
C. Watson, "WLM Classification," "A Quickstart Policy," and "WLM Measurements," Cheryl Watson's TUNING Letter, May/June 1995.
C. Watson, "How to Set Velocities," Cheryl Watson's TUNING Letter, 1996, #3.
IBM, "WLM Performance Studies," GG24-4352.
Workload | Service Class |
|
Response Type |
|
SYSTEM | SYSTEM |
|
System goal |
|
SYSTEM | SYSSTC |
|
System goal |
|
SYSTEM | SYSOTHER |
|
Discretionary |
|
ONLINE | ONLPRD |
|
Velocity |
|
ONLINE | ONLTST |
|
desc. |
|
STC | STCMD |
|
Velocity |
|
STC | STCLO |
|
Discretionary |
|
TSO | TSOPRD - Period 1 |
|
Percentile response |
|
|
Percentile response |
|
||
|
Percentile response
or Velocity |
35% |
||
PRDBAT | PRDBATHI |
|
Velocity |
|
PRDBAT | PRDBATLO |
|
Discretionary |
|
ONLINE | ONLPRDHI |
|
Percentile response |
|
ONLINE | ONLPRDMD |
|
Percentile response |
|
ONLINE | ONLPRDLO |
|
Percentile response |
|
TSTBAT | TSTBATHI |
|
Percentile response |
|
TSTBAT | TSTBATMD |
|
Percentile response |
|
TSTBAT | TSTBATLO |
|
Discretionary |
|
OMVS | OMVS - Period 1 |
|
Percentile response |
|
|
Velocity |
|
||
ASCH | ASCH - Period 1 |
|
Percentile response |
|
|
Velocity |
|
||
DDF | DDF - Period 1 |
|
Percentile response |
|
|
Velocity |
|
Subsystem |
|
Value | Service Class |
TSO |
|
. | TSOPRD |
STC |
|
* (See STCHI description) | STCLO |
STC |
|
STCMD | STCMD |
STC |
|
STCHI | SYSSTC |
STC |
|
ONLPRD | ONLPRD |
STC |
|
ONLTST | ONLTST |
JES |
|
. | PRDBATLO |
JES |
|
PRDBATHI | PRDBATHI |
JES |
|
TSTBATHI | TSTBATHI |
JES |
|
TSTBATMD | TSTBATMD |
JES |
|
TSTBATLO | TSTBATLO |
ASCH |
|
. | ASCH |
OMVS |
|
. | OMVS |
DDF |
|
. | DDF |
CICS V4 or IMS V5 |
|
. | ONLPRDLO |
CICS V4 or IMS V5 |
|
ONLPRDHI | ONLPRDHI |
CICS V4 or IMS V5 |
|
ONLPRDMD | ONLPRDMD |
Classification Group |
|
Group Entries |
STCHI |
|
VTAM, NPM |
. | JESx | |
. | TSO (TCAS) | |
. | RMF, online monitors | |
. | RACF/CA-ACF2/TOPSECRET | |
. | auto ops package | |
. | APPC & ASCH address spaces | |
. | DLF, LLA, VLF | |
. | IRLM | |
. | MIM | |
. | SMS, SYSBMAS | |
. | TRACE, PCAUTH | |
. | OMVS kernel | |
ONLPRD |
|
CICS* |
. | IMS* | |
. | DB2* | |
. | other online systems | |
STCMD |
|
your scheduler |
. | your spooler programs | |
. | your important operations STCs | |
. | OMVS daemons | |
ONLTST |
|
your online test regions |
PRDBATHI |
|
your prod batch jobclasses |
TSTBATHI |
|
your hot test batch jobclasses |
TSTBATMD |
|
your medium test batch jobclasses |
TSTBATLO |
|
your low test batch jobclasses |
ONLPRDHI |
|
your high importance, short response CICS 4 or IMS 5 transactions |
ONLPRDMD |
|
your medium importance, medium response CICS 4 or IMS 5 transactions |