22
Tuning Networks

This chapter introduces networking issues that affect tuning.

This chapter contains the following sections:

Understanding Connection Models

The techniques used to determine the source of problems vary depending on the configuration. The three types of configurations are:

Multi-Threaded Server (MTS) Configuration
Dedicated Server Configuration
Pre-Spawned Dedicated Server Configuration

Table 22-1 lists how to tell what type of database configuration you have.

Table 22-1 Database Configurations

Multi-Threaded Server	`LSNRCTL` services lists `dispatchers`.
Dedicated Server	`LSNRCTL` services lists `dedicated` `servers`.
Pre-Spawn Dedicated Server	`LSNRCTL` services lists `prespawned` `servers`.

It is possible to connect to dedicated server with a database configured for MTS by placing the parameter (SERVER = DEDICATED) in the connect descriptor.

Multi-Threaded Server (MTS) Configuration

Registering the Dispatchers

The LSNRCTL control utility's services statement lists every dispatcher registered with it. This list includes the dispatchers process ID. You can check the alert log to confirm that the dispatcher have been started successfully.

Note:
Remember that PMON may take a minute to register the dispatcher with the listener.

For example:

lsnrctl services: 
LSNRCTL for Solaris: Version 8.1.6.0.0 - Production on 27-MAY-99 13:38:02 
(c) Copyright 1999 Oracle Corporation.  All rights reserved. 
Connecting to (ADDRESS=(PROTOCOL=TCP)(Host=ecdc2)(Port=1521)) 
Services Summary... 
  ORCL          has 2 service handler(s) 
    DEDICATED SERVER established:0 refused:0 
      LOCAL SERVER 
    DISPATCHER established:0 refused:0 current:0 max:1 state:ready 
      D000 <machine: ecdc2, pid: 16011> 
      (ADDRESS=(PROTOCOL=tcp)(DEV=20)(HOST=144.25.216.223)(PORT=55304)) 

The command completed successfully.

Configuring the Initialization Parameter File

Make sure that the MTS_DISPATCHER line is correctly set. For example:
```
MTS_DISPATCHERS = 
"(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=hostname)(PORT=1492)(queuesize=32
))) 
          (DISPATCHERS = 1) 
          (LISTENER = alias) 
          (SERVICE = servicename) 
          (SESSIONS = 1000) 
          (CONNECTIONS = 1000) 
          (MULTIPLEX = ON) 
          (POOL = ON) 
          (TICK = 5)"
```
One, and only one, of the following attributes is required: PROTOCOL, ADDRESS, or DESCRIPTION. ADDRESS and DESCRIPTION provide support for the specification of additional network attributes beyond PROTOCOL. In the example above, the entire line with "DESCRIPTION" can be substituted by (PROTOCOL=TCP). The attributes DISPATCHERS, LISTENER, SERVICE, SESSIONS, CONNECTIONS, MULTIPLEX, POOL, and TICKS are all optional.

See Also:
For more information on there parameters, see Oracle8i Reference and Net8 Administrator's Guide.
Make sure that the optional MTS_MAX_DISPATCHER line is correctly set. For example:
```
MTS_MAX_DISPATCHERS = 4
```
This line should reflect the total number of dispatchers you may want to start.
Make sure that the optional MTS_MAX_SERVERS line is correctly set. For example:
```
MTS_MAX_SERVERS = 5
```
This line sets the upper bound on the total number of shared servers PMON can create, based on the peak load of the system. This should be set high enough so that all requests can be serviced, but not so high that the system swaps if they are reached. The purpose of this parameter is to prevent the server from swapping. Run the following script to see what the highwater mark is for the number of servers running, and then set MTS_MAX_SERVERS to more then this.
```
SELECT maximum_connections "MAX CONN", servers_started "STARTED",
```
```
servers_terminated "TERMINATED", servers_highwater "HIGHWATER" 
```
```
FROM V$MTS;
```
Make sure that the optional MTS_SERVERS line is correctly set. For example:
```
MTS_SERVERS = 5
```
This is the total number of shared servers started when the database is started. It also represents the total number of shared servers PMON tries to keep. It should be the total number of servers expected to always be used when the database is active. MTS_MAX_SERVERS is intended to handle peak load.

Registration

Checking the Connections

Use the LSNRCTL control utility's services statement to see if there are excessive connection refusals. Check the listener's log file to see if this is a connection problem. For example:

LSNRCTL> set displaymode normal
Service display mode is NORMAL
LSNRCTL> services
Connecting to
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=net)(QUEUESIZE=32)))
Services Summary...
Service "net.regress.rdbms.dev.us.oracle.com"           has 1 instances.
    Instance "net"
      Status: READY  Total handlers: 3  Relevant handlers: 3
        DEDICATED established:0 refused:0 current:0 max:0 state:ready
            Session: NS
        D001 established:0 refused:0 current:0 max:16383 state:ready
       
(ADDRESS=(PROTOCOL=tcp)(HOST=dlsun1013.us.oracle.com)(PORT=52217))
            Session: NS
        D000 established:0 refused:0 current:0 max:16383 state:ready
       
(ADDRESS=(PROTOCOL=tcp)(HOST=dlsun1013.us.oracle.com)(PORT=52216))
            Session: NS

Under normal conditions, the number refused should be zero. Shut down the listener, and restart it to erase these statistics. If, after the listener restarts, the refused count is increasing, then the connections are being refused. If the refused count stays at zero, and if the problem you are troubleshooting is occurring, then your problem is not with the connections being refused.

Checking the Connect/Second Rate

Connection refusals can occur for many reasons. Examine the listener log to see what the connect per second rate is. Run the listener log analyzer script to check.

The listener is a queue-based process. It receives connect requests from the lower level protocol stack. It has a limited queue stack (which is configurable to the operating system maximum). It can only process one connection at a time, and there is a limit to the number of connections per second the process can handle.

If the rate at which the connect requests arrive exceeds that limit, then the requests will be queued. The queue stack is also limited, but you can configure it. If there are more listener processes, then the requests made against each process will be fewer and, therefore, will be handled more quickly.

Increasing the listener queue is done in the listener.ora file. The listener.ora file can contain many listeners, each by a different name. It is assumed that only one of those listed is having a problem. If not, then apply this method to all applicable listeners. To increase the listener queue, add (queuesize = number) to the listener.ora file. For example:

listener = 

(address = 


(protocol = tcp) 
(host = sales-pc) 
(port = 1521) 
(queuesize = 20)


)

See Also:
For more information, see Net8 Administrator's Guide.

Stop and restart the listener to initialize this new parameter. If you are not currently running an MTS configuration, then you should consider doing so. It is faster for the listener to handle a client request in an MTS configuration than it is in a dedicated server or a pre-spawned dedicated server configuration.

Note:
MTS dispatchers also receive connect requests and may also benefit from tuning the queuesize.
The maximum queue size is subject to the maximum size possible for a particular operating system.

Pre-Spawned Dedicated Server Configuration

Pre-spawned (pre-started) processes can improve connect time with a dedicated server. This is particularly true of heavily loaded systems not using multi-threaded servers, where connect time is slow. If this is enabled, then the listener can redirect the connection to an existing process with no wait time whenever a connection request arrives. Connection requests do not have to wait for new processes to be started.

Note:
Oracle Corporation recommends that you use a multi-threaded server configuration rather than a pre-spawned dedicated server configuration to solve performance and scalability problems. Use of pre-spawned dedicated servers is recommended only on platforms where MTS is not available.

Checking for the Correct Number of Dedicated Pre-Spawn Servers

Determine if the pre-spawn configuration was properly configured and sized for this system.

Pre-spawning dedicated servers have intents. One is to have faster connect times to the database by having the server shadow processes created before the client makes a connect request. Once connected, the listener creates the next shadow process for the next connect. Pre-spawning is also useful in a controlling resource starved system or access into the system. It lets you cap the number of shadow processes that can be pre-spawned. After this limit is reached, all new connections come in as dedicated.

If there is no activity on the database and there are no users connected, then the number of pre-spawn servers is the number listed in the listener.ora file for POOL_SIZE. Otherwise, depending on the number of connections to the database, they will range from the minimum (POOL_SIZE) to the maximum (PRESPAWN_MAX). For example:

LISTENER = 
    (ADDRESS_LIST =(ADDRESS= (PROTOCOL= TCP)(Host= ecdc2)(Port= 1521))) 
SID_LIST_LISTENER = 
    (SID_LIST = 
        (SID_DESC = 
            (ORACLE_HOME = /u01/oracle/product/oracle/8.1.6) 
            (SID_NAME = ORCL) 
            (PRESPAWN_MAX = 12) 
            (PRESPAWN_LIST = 
                (PRESPAWN_DESC = 
                    (PROTOCOL = TCP) 
                    (POOL_SIZE = 1) 
                    (TIMEOUT = 1)))))

In the above example, with no database activity, there will be one pre-spawn process. During periods of high activity there will be a maximum of 12. After this point, any connect requests that arrive have their shadow processes created in the same manner as a dedicated connection.

To check if there are the correct number of pre-spawn processes, use the LSNRCTL utility's services statement and the operating system command to list running processes (ps on UNIX). For example:

lsnrctl services: 
LSNRCTL for Solaris: Version 8.1.6.0.0 - Production on 26-MAY-99 18:22:49 
(c) Copyright 1999 Oracle Corporation.  All rights reserved. 
Connecting to (ADDRESS=(PROTOCOL=TCP)(Host=ecdc2)(Port=1521)) 
Services Summary... 
  ORCL          has 2 service handler(s) 
    DEDICATED SERVER established:0 refused:0 
      LOCAL SERVER 
    PRESPAWNED SERVER established:0 refused:0 current:0 max:1 state:ready 
      PID:15587 
      (ADDRESS=(PROTOCOL=tcp)(DEV=8)(HOST=144.25.216.223)(PORT=55221)) 

The command completed successfully 

ps -ef | grep oracle 
oracle 15587     1  0 17:54:21 ?        0:00 oracleORCL / 
        (DESCRIPTION=(COMMAND=prespawn)(PROTOCOL=TCP)(SERVICE_ID=2)(HANDLER_

The first statement shows that there is one pre-spawn server process, which is confirmed by the ps command.

If there were more pre-spawn servers listed by ps than set by the PRESPAWN_MAX parameter, then there are processes that are defunct.

If there were other process listed in the ps command, like

oracle 15634     1  3 18:31:31 ?        0:01 oracleORCL (LOCAL=NO)

then there may not be enough pre-spawn servers to handle the load for this configuration. Extra processes like this imply that the maximum number of pre-spawn servers needs to be increased.

There should be, at a minimum, the same number of idle pre-spawn servers as the POOL parameter. This can be examined by looking at LSNRCTL services to see how many pre-spawn servers have no current connections. For example:

lsnrctl services: 
LSNRCTL for Solaris: Version 8.1.6.0.0 - Production on 26-MAY-99 18:22:49 
(c) Copyright 1999 Oracle Corporation.  All rights reserved. 
Connecting to (ADDRESS=(PROTOCOL=TCP)(Host=ecdc2)(Port=1521)) 
Services Summary... 
  ORCL          has 2 service handler(s) 
    DEDICATED SERVER established:0 refused:0 
      LOCAL SERVER 
    PRESPAWNED SERVER established:0 refused:0 current:0 max:1 state:ready 
      PID:15587 
      (ADDRESS=(PROTOCOL=tcp)(DEV=8)(HOST=144.25.216.223)(PORT=55221)) 

The command completed successfully

Note:
The number of current connections to the pre-spawn server is 0 (current:0). This means that it is part of the free pool of pre-spawn. If there are fewer idle pre-spawn than the pool is configured for, then you must be hitting the maximum of pre-spawn.

Determining the Problem

To determine if there is a problem with the listener or pre-spawn servers, test to see if the behavior is due to pre-spawn or something else.

Create a listener not configured for pre-spawn. By placing (SERVER=DEDICATED) in the connect descriptor, it still connects to a pre-spawn server process. You need to create a listener that does not pre-spawn for this test.

See Also:
For more information on setting up tnsnames.ora and listener.ora, see Net8 Administrator's Guide.

Determining if There Enough Physical RAM

Determine how much RAM is present so that you do not cause the server to swap when the number of pre-spawn servers is increased.

Find out what the physical size of the pre-spawn server is. On some systems, the command ps gives a false size to a process. Sometimes it gives you the size of the process plus any common memory it shares, like the SGA. Check with the system administrator of the system to get the proper utility.

Find out the amount of RAM that is free in the system. This amount must not include the amount of swap.

Divide the total amount of free RAM by the size of the pre-spawn server process. This gives you an approximate top number of pre-spawn servers that you can add to the system without the fear of it beginning to swap. The actual number of servers to pre-spawn depends on the suspected number of simultaneous connections plus the expected connect rate. The first number determines the setting for PRESPAWN_MAX, while the other number determines the setting for POOLSIZE.

Detecting Network Problems

This section encompasses Local Area Network (LAN) and Wide Area Network (WAN) troubleshooting methods.

Using Dynamic Performance Views

Networks entail overhead that adds a certain amount of delay to processing. To optimize performance, you must ensure that your network throughput is fast, and you should try to reduce the number of messages that must be sent over the network. It can be difficult to measure the delay the network adds.

Three dynamic performance views are useful for measuring the network delay: V$SESSION_EVENT, V$SESSION_WAIT, and V$SESSTAT.

In V$SESSION_EVENT, the AVERAGE_WAIT column indicates the amount of time that Oracle waits between messages. You can use this statistic as a yardstick to evaluate the effectiveness of the network.

In V$SESSION_WAIT, the EVENT column lists the events for which active sessions are waiting. The "sqlnet message from client" wait event indicates that the shared or foreground process is waiting for a message from a client. If this wait event has occurred, then you can check to see whether the message has been sent by the user or received by Oracle.

You can investigate hang-ups by looking at V$SESSION_WAIT to see what the sessions are waiting for. If a client has sent a message, then you can determine whether Oracle is responding to it or is still waiting for it.

In V$SESSTAT you can see the number of bytes that have been received from the client, the number of bytes sent to the client, and the number of calls the client has made.

Understanding Latency and Bandwidth

The most critical aspects of a network that contribute to performance are latency and bandwidth.

The term latency refers to a time delay; for example, the gap between the time a device requests access to a network and the time it receives permission to transmit.

Bandwidth is the throughput capacity of a network medium or protocol. Variations in the network signals can cause degradation on the network. Sources of degradation can be cables that are too long or wrong cable type. External noise sources, such as elevators, air handlers, or florescent lights, can also cause problems.

Common Network Topologies

Local Area Network Topologies:

Ethernet
Fast Ethernet
1 Gigabit Ethernet
Token Ring
FDDI
ATM

Wide Area Network Topologies:

DSL
ISDN
Frame Relay
T-1, T-3, E-1, E-3
ATM
SONAT

Table 22-2 lists the most common ratings for various topologies.

Table 22-2 Bandwidth Ratings

Topology or Carrier	Bandwidth
Ethernet	10 Megabits/second
Fast Ethernet	100 Megabits/second
1 Gigabit Ethernet	1 Gigabits/second
Token Ring	16 Megabits/second
FDDI	100 Megabits/second
ATM	155 Megabits/second (OC3), 622 Megabits/second (OC12)
T-1 (US only)	1.544 Megabits/second
T-3 (US only)	44.736 Megabits/second
E-1 (non-US)	2.048 Megabits/second
E-3 (non-US)	34.368 Megabits/second
Frame Relay	Committed Information Rate, which can be up to the carrier speed, but usually is not.
DSL	This can be up to the carrier speed.
ISDN	This can be up to the carrier speed. It is usually used with slower modems.
Dial Up Modems	56 Kilobits/second. It is usually accompanied with data compression for faster throughput.

Solving Network Problems

This section describes several techniques for enhancing performance and solving network problems.

Finding Bottlenecks

The first step in solving network problem is to understand the overall topology. Gather as much information about the network that you can. This kind of information usually manifests itself as a network diagram. Your diagram should contain the types of network technology used in the Local Area Network and the Wide Area Network. It should also contain addresses of the various network segments.

Examine this information. Obvious bottlenecks include:

Using a dial-up modem (normal modem or ISDN) to access time critical data.
A frame relay link is running on a T-1, but has a 9.6 Kilobits CIR so that it only reliably transmits up to 9.6 Kilobit's per second and if the rest of the bandwidth is used, then there is a possibly that the data will be lost.
Data from high speed networks channels through low speed networks.
There are too many network hops (a router constitutes one hop).
A 10 Megabit network for a Web site.

There are many problems that can cause a performance breakdown. Follow this checklist:

Get a Network Sniffer trace.
Check the following:
- Is the bandwidth being exceeded on the network, the client, and/or the server?
- Ethernet collisions.
- Token ring or FDDI ring beacons.
- Are there many runt frames?
- The stability of the WAN links.
Get a bandwidth utilization chart for frame relay, and see if CIR is being exceeded.
Is any Quality of Service or packet prioritizing going on?
Is a firewall in the way somewhere?

If nothing is revealed, then find the network route from the client to the data server. Understanding the travel times on a network gives you an idea as to the time a transaction will take. Client-server communication requires many small packets. High latency on a network slows the transaction down due to the time interval between sending a request and getting the response.

Use trace route (trcroute or equivalent) from the client to the server to get address information for each device in the path. For example:

tracert usmail05 
Tracing route to usmail05.us.oracle.com [144.25.88.200]over a maximum of 30 
hops: 
  1   <10 ms   <10 ms    10 ms  whq1davis-rtr-749-f1-0-a.us.oracle.com 
[144.25.216.1] 
  2   <10 ms   <10 ms   <10 ms  whq4op3-rtr-723-f0-0.us.oracle.com 
[144.25.252.23] 
  3   220 ms   210 ms   231 ms  usmail05.us.oracle.com [144.25.88.200] 

Trace complete.

Ping each device in turn to get the timings. Use large packets to get the slowest times. Make sure you set the "don't fragment bit" so that routers do not spend time disassembling and reassembling the packet. Also note that the packet size is 1472. This is for Ethernet. Ethernet packets are 1536 octets (actual 8 bit bytes) in size. ICPM packets (this is what ping is designed to use) have 64 octets of header. Evaluate the area where the slowness seems to occur. For example:

ping -l 1472 -n 1 -f 144.25.216.1 
Pinging 144.25.216.1 with 1472 bytes of data: 
Reply from 144.25.216.1: bytes=1472 time<10ms TTL=255 

ping -l 1472 -n 1 -f 144.25.252.23 
Pinging 144.25.252.23 with 1472 bytes of data: 
Reply from 144.25.252.23: bytes=1472 time=10ms TTL=254 

ping -l 1472 -n 1 -f 144.25.88.200 
Pinging 144.25.88.200 with 1472 bytes of data: 
Reply from 144.25.88.200: bytes=1472 time=271ms TTL=253

The above example validates trace route. Ideally, you would ping from the workstation to 144.25.216.1, from 144.25.216.1 to 144.25.252.23, then from 144.25.252.23 to 144.25.88.200. This would show the exact latency on each segment traveled.

Dissecting Bottlenecks

This section helps you determine the problem with your bottleneck.

Determine if the Problem is with Net8 or the Network

Net8 tracing reveals whether an error is Oracle-specific or due to conditions that the operating system is passing to the Transparent Network Substrate (Oracle TNS layer).

Enable Net8 tracing at the Oracle server, the listener, and at a client suspected of having the problem you are trying to resolve.

To enable tracing at the server, find the sqlnet.ora file for the server and create the following lines in it:

TRACE_LEVEL_SERVER = 16 
TRACE_UNIQUE_SERVER = ON

To enable tracing at the client, find the sqlnet.ora file for the client and create the following lines in it:

TRACE_LEVEL_CLIENT = 16 
TRACE_UNIQUE_CLIENT = ON

To enable tracing at the listener, find the listener.ora file and create the following line in it:

TRACE_LEVEL_listener_name = 16

Reproduce the problem, so that you generate traces on the client and server. Now analyze the traces generated.

See Also:
For detailed directions on enabling Net8 tracing, see Net8 Administrator's Guide. For definitions to Net8 errors noted in the trace file, see Oracle8i Error Messages.

If the problem is with the network and not Net8, then you must determine the following:

Does the problem only occur in one location on the local network?
Does the problem only occur in one area on the WAN?

For example, perhaps the system is fine in the building where the Data Center is, but it is slow in other buildings that are several miles away.

Not all Oracle error codes represent pure Oracle troubles. ORA-3113 is the most common error which points to an underlying network problem.

Note:
Enabling tracing on the server can generate a large amount of trace, or large number of trace, files. To prevent this, you can set up a separate environment that traces itself. This configuration works for dedicated and pre-spawn connections. First, log into the server's operating system as the Oracle software owner. Create a temporary directory to keep configuration files and trace files that will be created. Copy the sqlnet.ora, listener.ora, and tnsnames.ora to that directory. Edit the sqlnet.ora file to enable tracing as above. Add to the sqlnet.ora file the following line:
TRACE_DIRECTORY_SERVER = temporary directory just created

Now, modify the listener.ora file and change the listening port (for TCP, other protocols, use a similar technique) to an unused port. You need to make a similar modification to the client's tnsnames.ora file for the connect string you will be using for this test.
Set the TNS_ADMIN environment to point to the temporary directory. Start the listener. Now all new connections to the new listener send Server traces to this directory. Reproduce the problem.

If you are getting an Oracle error message, then look into the trace file to find the error. For troubleshooting bugs, Net8 trace analysis takes some time to fully find the problem. However, high level simple trace analysis is rather simple.

On Net8, Determine if the Problem is on the Client or the Server

If the problem is with Net8, then use Net8 tracing to show you where the problem lies. If there are errors in the trace files, then do they appear in only the client traces, only in the server traces, or in both?

Errors Only in the Client Trace

The problem is on the client. However, if you are getting ORA-3113 or ORA-3114 errors, then the problem is on the server.

Errors Only in the Server Trace or Listener Trace

The problem is on the server. However, if you are getting ORA-3113 or ORA-3114 errors, then the problem is on the client.

Errors in All: Client, Server, and Listener Trace

If you are getting ORA-3113 or ORA-3114 errors, then the problem is on the Network. Troubleshoot the server first. If it is fine, then the client is at fault.

Check if the Server is Configured for MTS

The multi-threaded server (MTS) is an advanced solution for many customers, and it can be more complex to troubleshoot. Check the initialization parameter file for any MTS parameters. Look at the operating system to see if any of the MTS processes are present.

Check for dispatchers by looking for names like ora_d000, ora_d001, etc. For example:

ps -ef | grep ora_d

Check for shared servers by looking for names like ora_s000, ora_s001, etc. For example:

ps -ef | grep ora_s

See Also:
For more information on tuning the multi-threaded server, see "Multi-Threaded Server (MTS) Configuration". For more information on MTS concepts and parameters, see Oracle8i Concepts and Net8 Administrator's Guide.

Using Array Interfaces

Reduce network calls by using array interfaces. Instead of fetching one row at a time, it is more efficient to fetch ten rows with a single network round trip.

Adjusting Session Data Unit Buffer Size

Before sending data across the network, Net8 buffers data into the Session Data Unit (SDU). It sends the data stored in this buffer when the buffer is full or when an application tries to read the data. When large amounts of data are being retrieved and when packet size is consistently the same, it may speed retrieval to adjust the default SDU size.

Optimal SDU size depends on the normal transport size. Use a sniffer to find out the frame size, or set tracing on to its highest level to check the number of packets sent and received and to determine whether they are fragmented. Tune your system to limit the amount of fragmentation.

Use Net8 Assistant to configure a change to the default SDU size on both the client and the server; SDU size should generally be the same on both.

Using TCP.NODELAY

When a session is established, Net8 packages and sends data between server and client using packets. Use the TCP.NODELAY parameter in the protocol.ora file, which causes packets to be flushed on to the network more frequently. If you are streaming large amounts of data, then there is no buffering and hence no delay.

Although Net8 supports many networking protocols, TCP tends to have the best scalability.

Using Connection Manager

In Net8, you can use the Connection Manager to conserve system resources by multiplexing. Multiplexing means funneling many client sessions through a single transport connection to a server destination. In this way, you can increase the number of sessions that a process can handle. This applies only to MTS configurations.

Alternately, you can use Connection Manager to control client access to dedicated servers. In addition, Connection Manager provides multiple protocol support allowing a client and server with different networking protocols to communicate.

22 Tuning Networks