Performance Event
        Monitoring
           
         
       
      
       
        The Performance Event Monitoring feature uses Solaris
        DTrace or Linux SystemTap to "publish" interesting events as
        they occur inside of Berkeley DB. The operating system
        utilities dtrace or stap
        run scripts which select, analyze, and display events. There
        is no need to modify the application. Any application which
        uses that Berkeley DB library can be monitored. For more
        information about these instrumentation tools refer to the
        following pages: 
        
      
      
    
       
        Performance Event Monitoring is available for operating
        systems where DTrace or SystemTap supports static probe points
        in user applications, such as Solaris 10 and OpenSolaris, some
        versions of Linux, and Mac OS X 10.6 and later. By including
        --enable-dtrace to the configuration options, the resulting
        libraries will include probe points for these event
        categories: 
        
      
        
          -  
                database operations: opening a database or
                cursor, get, put, delete.
            
- 
                internal operations regarding disk allocation,
                transactions, concurrency control, and caching. 
            
- 
                the beginning and ending of waiting periods due
                to conflicting transactional consistency locks (for
                example, page locks), mutexes, or shared latches.
            
-  
                timing dependent code paths which are expected
                to be infrequent. These could raise concerns if they
                were to happen too often. 
            
 
       
        These probe points are implemented as
        user-level statically defined traces (USDT's) for DTrace, and
        static userspace markers for SystemTap. 
    
      
        To monitor the statistics values as they are updated
        include the --enable-perfmon-statistics configuration option.
        This option generates probe points for updates to many of the
        counters whose values are displayed by the db_stat utility or
        returned by the various statistics functions. The "cache"
        example script uses a few of these probe points. 
    
      
        Performance Event Monitoring is intended to be suitable for
        production applications. Running Berkeley DB with DTrace or
        SystemTap support built in has little effect on execution
        speed until probes are enabled at runtime by the
        dtrace or stap
        programs.
    
       
        The list of available events may be displayed by running
        'make listprobes' after building the libdb-6.1
        shared library.
    
      
        
          
            
              Using the DTrace Provider
             
           
         
         
            The DTrace probe provider for Berkeley DB is named
            'bdb'. A simple dtrace command to monitor all
            dtrace-enabled Berkeley DB activity in the system is: 
        
        
             dtrace -Zn 'bdb*::: { printf("%s", probename); }'
            
        
        
            DTrace requires elevated privileges in order to run. On
            Solaris you can avoid running as root by giving any users
            who need to run dtrace the dtrace_proc
            or dtrace_user privilege in
            /etc/user_attr. 
        
        
            DTrace works on both 32 and 64 bit applications.
            However, when tracing a 32-bit application on a 64-bit
            processor it might be necessary to pass a "32 bit" option
            to dtrace. Without this option the D
            language might use a pointer size of 8 bytes, which could
            cause pointer values (and structures containing them) to
            be processed incorrectly. Use the -32 option
            on Solaris; on Mac OS X use -arch i386.
        
       
      
        
         
            SystemTap looks up its static userspace markers in the
            library name specified in the stap
            script. A simple stap command to list
            the probes of the default Berkeley DB installation is:
        
        
             stap -l 'process("/usr/local/BerkeleyDB.6.1/lib/libdb-6.1.so").mark("*")'
            
        
        
            Berkeley DB supports SystemTap version 1.1 or later.
            Building with userspace marker support requires
            sys/sdt.h, which is often
            available in the package
            systemtap-sdt-devel. Running
            stap with userspace markers
            requires that the kernel have "utrace" support; see http://sourceware.org/systemtap/wiki/utrace
            for more information. 
        
         
            SystemTap needs elevated privileges in order to run.
            You can avoid running as root by adding the users who need
            to run stap to the group stapdev.
        
       
      
        
         
            Berkeley DB includes several example scripts, in both
            DTrace and SystemTap versions. The DTrace examples end
            with a .d suffix and are located in
            util/dtrace. The SystemTap
            examples have a .stp suffix and can
            found be in util/systemtap. The
            Berkeley DB shared library name, including any necessary
            path, is expected as the first parameter to the SystemTap
            examples. 
            
        
          
            - 
              apicalls
            
- 
                        This script graphs the count of the
                        main API calls. The result is grouped by
                        thread of the target process.
                    
- 
              apitimes
            
- 
                        This script graphs the time spent in
                        the main API calls, grouped by thread.
                    
- 
               apitrace 
            
-  
                        This script displays the entry to and
                        return from each of the main API calls.
                    
- 
               cache 
            
- 
                        This script displays overall and
                        per-file buffer cache statistics every
                        N (default: 1)
                        seconds for M (default:
                        60) intervals. It prints the number of cache
                        hits, misses, and evictions for each file with
                        any activity during the interval. 
                    
- 
               dbdefs
            
-  
                        This contains DTrace-compatible
                        declarations of Berkeley DB data structures
                        returned from probe points. There is no Linux
                        equivalent; SystemTap can obtain type
                        information directly from the debugging
                        symbols compiled into the
                        libdb*-6.1.so shared library.
                    
- 
               locktimes 
            
-  
                        This script graphs the time spent
                        waiting for DB page locks. The result times in
                        nanoseconds are grouped by filename, pgno, and
                        lock_mode. The optional integer maxcount
                        parameter directs the script to exit once that
                        many page lock waits have been measured.
                    
- 
               locktimesid 
            
- 
                        This is similar to the locktimes script
                        above, except that it displays the 20 byte
                        file identifier rather than the file name.
                        This can be useful when there are several
                        environments involved, or when database files
                        are recreated during the monitoring period.
                    
- 
               mutex 
            
-  
                        This script measures mutex wait
                        periods, summarizing the results two ways.
                        -  
                                The first grouping is by mutex,
                                mode (exclusive or shared), and thread
                                id. 
                            
-  
                                The second grouping is by the
                                mutex category and mode (exclusive or
                                shared). The mutex categories are the
                                MTX_XXX definitions in
                                dbinc/mutex.h.
 
- 
               showerror 
            
- 
                        This script displays the application
                        stack when the basic error routines are
                        called. It provides additional information
                        about an error, beyond the string sent to the
                        diagnostic output.
                    
 
        
        
         
            These examples are designed to trace a single process.
            They run until interrupted, or the monitored process
            exits, or the limit as given in an optional 'maximum
            count' argument has been reached. 
        
       
      
        
          
            
              Performance Events Reference
             
           
         
        
            The events are described below as if they were
            functions with ANSI C-style signatures. The values each
            event provides to DTrace or SystemTap are the arguments to
            the functions. 
        
         
        
          
            - 
               alloc 
            
- 
               
                The alloc class covers the allocation of "on disk"
                database pages. 
             
                
                  - 
                    alloc-new (char *file, char *db, unsigned
                        pgno, unsigned type, struct _db_page *pg, int
                        ret); 
                  
- 
                        An attempt to allocate a database page
                        of type 'type' for database 'db' returned
                        'ret'. If the allocation succeeded then ret is
                        0, pgno is the location of the new page, and
                        pg is the address of the new page. Details of
                        the page can be extracted from the pg pointer.
                    
- 
                     alloc-free (char *file, char *db, unsigned
                        pgno, unsigned ret); 
                  
-  
                        An attempt to free the page 'pgno' of
                        'db' returned 'ret'. When successful the page
                        is returned to the free list or the file is
                        truncated.
                    
- 
                    alloc-btree_split (char *file, char *db,
                        unsigned pgno, unsigned parent, unsigned
                        level); 
                  
- 
                        A btree split of pgno in db is being
                        attempted. The parent page number and the
                        level in the btree are also provided.
                    
 
 
- 
               db 
            
- 
               
                These DB API calls provide the name of the file and
                database being accessed. In-memory databases will have
                a NULL (0) file name address. The db name will be null
                unless subdatabases are in use.
             
                
                  - 
                     db-open (char *file, char *db, unsigned
                        flags, uint8_t *fileid); 
                  
- 
                        The database or file name was opened.
                        The 20 byte unique fileid can be used to keep
                        track of databases as they are created and
                        destroyed.                   
                    
- 
                     db-close (char *file, char *db, unsigned
                        flags, uint8_t *fileid); 
                  
-  
                        The database or file name was closed.
                    
- 
                    db-cursor (char *file, char *db, unsigned
                        txnid, unsigned flags, uint8_t *fileid); 
                  
-  
                        An attempt is being made to open a
                        cursor on the database or file.
                    
- 
                    db-get (char *file, char *db, unsigned
                        txnid, DBT *key, DBT *data, unsigned flags); 
                  
-  
                        An attempt is being made to get data
                        from a db. 
                    
- 
                    db-put (char *file, char *db, unsigned
                        txnid, DBT *key, DBT *data, unsigned flags); 
                  
- 
                        An attempt is being made to put data to
                        a db. 
                    
- 
                     db-del (char *file, char *db, unsigned
                        txnid, DBT *key, unsigned flags); 
                  
- 
                        An attempt is being made to delete data
                        from a db. 
                    
 
 
- 
               lock 
            
- 
               
                The lock class monitors the transactional
                consistency locks: page, record, and database. It also
                monitors the non-transactional file handle locks.
             
                
                  - 
                     lock-suspend (DBT *lock, db_lockmode_t
                        lock_mode); 
                  
- 
                        The thread is about to suspend itself
                        because another locker already has a
                        conflicting lock on object 'lock'. The lock
                        DBT's data points to a __db_ilock structure,
                        except for the atypical program which uses
                        application specific locking. 
                    
- 
                     lock-resume (DBT *lock, db_lockmode_t
                        lock_mode); 
                  
-  
                        The thread is awakening from a suspend.
                    
- 
                     lock-put (struct __sh_dbt *lock, unsigned
                        flags); 
                  
-  
                        The lock is being freed.
                    
- 
                     lock-put_reduce_count (struct __sh_dbt
                        *lock, unsigned flags); 
                  
-  
                        The lock would have been freed except
                        that its refcount was greater than 1.
                    
 
  
                These lock counters are included by
                --enable-perfmon-statistics. 
             
                
                  - 
                    lock-deadlock (unsigned st_ndeadlocks,
                        unsigned locker_id, struct __sh_dbt
                        *lock_obj); 
                  
-  
                        The locker_id's lock request in
                        lock_obj is about to be aborted in order to
                        resolve a deadlock. The lock region's
                        st_ndeadlocks has been incremented.
                    
- 
                     lock-nowait_notgranted (unsigned count, DBT
                        *lock, unsigned locker_id); 
                  
- 
                        A DB_LOCK_NOWAIT lock request by
                        locker_id would have had to wait. The lock
                        regions's st_lock_nowait has been incremented
                        and the request returns DB_LOCK_NOTGRANTED.
                    
- 
                     lock-steal (unsigned st_locksteals,
                        unsigned from, unsigned to); 
                  
- 
                        A lock is being stolen from one
                        partition for another one. The 'from' lock
                        partition's st_locksteals has been
                        incremented. 
                    
- 
                     lock-object_steal (unsigned
                        st_objectsteals, unsigned from, unsigned to); 
                  
- 
                        A lock object is being stolen from one
                        partition for another one. The 'from' lock
                        partition's st_objectsteals has been
                        incremented.
                    
- 
                     lock-locktimeout (unsigned
                        st_nlocktimeouts, const DBT *lock); 
                  
- 
                        A lock wait expired due to the lock
                        request timeout. 
                    
- 
                     lock-txntimeout (unsigned st_ntxntimeouts,
                        const DBT *lock); 
                  
- 
                        A lock wait expired due to the
                        transaction's timeout. 
                    
- 
                     lock-nlockers (unsigned active, unsigned
                        locker_id); 
                  
-  The allocation or deallocation of the
                        locker id changed the number of active locker
                        identifiers. 
- 
                     lock-maxnlockers (unsigned new_max_active,
                        unsigned locker_id); 
                  
- 
                        The allocation of the locker id set a
                        new maximum number of active locker
                        identifiers.
                    
 
 
- 
               mpool 
            
- 
              
                The mpool class monitors the allocation and
                management of memory, including the cache.
             
                
                  - 
                     mpool-read (char *file, unsigned pgno,
                        struct __bh *buf); 
                  
- 
                        Read a page from file into buf.
                    
- 
                     mpool-write (char *file, unsigned pgno,
                        struct __bh *buf); 
                  
- 
                        Write a page from buf to file.
                    
- 
                     mpool-env_alloc (unsigned size, unsigned
                        region_id, unsigned reg_type); 
                  
-  
                        This is an attempt to allocate size
                        bytes from region_id. The reg_type is one of
                        the reg_type_t enum values. 
                    
- 
                     mpool-evict (char *file, unsigned pgno,
                        struct __bh *buf); 
                  
- 
                        The page is about to be removed from
                        the cache. 
                    
- 
                    mpool-alloc_wrap (unsigned alloc_len, int
                        region_id, int wrap_count, int put_counter); 
                  
- 
                        The memory allocator has incremented
                        wrap_count after searching through the entire
                        region without being able to fulfill the
                        request for alloc_len bytes. As wrap_count
                        increases the library makes more effort to
                        allocate space.
                    
 
  
                These mpool counters are included by
                --enable-perfmon-statistics.
             
                
                  - 
                     mpool-clean_eviction (unsigned st_ro_evict,
                        unsigned region_id); 
                  
-  
                        The eviction of a clean page from a
                        cache incremented st_ro_evict.
                    
- 
                     mpool-dirty_eviction (unsigned st_rw_evict,
                        unsigned region_id); 
                  
-  
                        The eviction of a dirty page from a
                        cache incremented st_rw_evict. The page has
                        already been written out.
                    
- 
                     mpool-fail (unsigned failure_count,
                        unsigned alloc_len, unsigned region_id); 
                  
- 
                        An attempt to allocate memory from
                        region_id failed. 
                    
- 
                     mpool-hash_search (unsigned
                        st_hash_searches, char *file, unsigned pgno); 
                  
- 
                        A search for pgno of file incremented
                        st_hash_searches. 
- 
                     mpool-hash_examined (unsigned
                        st_hash_examined, char *file, unsigned pgno); 
                  
- 
                        A search for pgno of file increased
                        st_hash_examined by the number of hash buckets
                        examined. 
                    
- 
                     mpool-hash_longest (unsigned
                        st_hash_longest, char *file, unsigned pgno); 
                  
- 
                        A search for pgno of file set a new
                        maximum st_hash_longest value.
                    
- 
                     mpool-map (unsigned st_map, char *file,
                        unsigned pgno); 
                  
-  
                        A file's st_map count was incremented
                        after a page was mapped into memory. The
                        mapping might have caused disk I/O.
                    
- 
                     mpool-hit (unsigned st_cache_hit, char
                        *file, unsigned pgno); 
                  
-  
                        The hit count was incremented because
                        pgno from file was found in the cache.
                    
- 
                     mpool-miss (unsigned st_cache_miss, char
                        *file, unsigned pgno); 
                  
- 
                        The miss count was incremented because
                        pgno from file was not already present in the
                        cache.
                    
- 
                     mpool-page_create (unsigned st_page_create,
                        char *file, unsigned pgno); 
                  
- 
                        The st_page_create field was
                        incremented because the pgno of file was
                        created in the cache. 
                    
- 
                     mpool-page_in (unsigned st_page_in, char
                        *file, unsigned pgno); 
                  
-  
                        The st_page_in field was incremented
                        because the pgno from file was read into the
                        cache. 
                    
- 
                     mpool-page_out (unsigned st_page_out, char
                        *file, unsigned pgno); 
                  
-  
                        The st_page_out field was incremented
                        because the pgno from file was written out.
                    
 
 
- 
               mutex 
            
- 
               
                The mutex category monitors includes shared
                latches. The alloc_id value is one of the MTX_XXX
                definitions from dbinc/mutex.h 
             
                
                  - 
                    mutex-suspend (unsigned mutex, unsigned
                        excl, unsigned alloc_id, struct __db_mutex_t
                        *mutexp); 
                  
- 
                        This thread is about to suspend itself
                        because a thread has the mutex or shared latch
                        locked in a mode which conflicts with the this
                        request.
                    
- 
                    mutex-resume (unsigned mutex, unsigned excl,
                        unsigned alloc_id, struct __db_mutex_t
                        *mutexp); 
                  
-  
                        The thread is returning from a suspend
                        and will attempt to obtain the mutex or shared
                        latch again. It might need to suspend again.
                    
 
  
                These mutex counters are included by
                --enable-perfmon-statistics.
             
                
                  - 
                     mutex-set_nowait (unsigned
                        mutex_set_nowait, unsigned mutex); 
                  
-  
                        Increment the count of times that the
                        mutex was free when trying to lock it.
                    
- 
                     mutex-set_wait (unsigned mutex_set_wait,
                        unsigned mutex); 
                  
-  
                        Increment the count of times that the
                        mutex was busy when trying to lock it.
                    
- 
                     mutex-set_rd_nowait (unsigned
                        mutex_set_rd_nowait, unsigned mutex); 
                  
-  
                        Increment the count of times that the
                        shared latch was free when trying to get a
                        shared lock on it. 
                    
- 
                     mutex-set_rd_wait (unsigned
                        mutex_set_rd_wait, unsigned mutex); 
                  
- 
                        Increment the count of times that the
                        shared latch was already exclusively latched
                        when trying to get a shared lock on it.
                    
- 
                     mutex-hybrid_wait (unsigned hybrid_wait,
                        unsigned mutex); 
                  
-  
                        Increment the count of times that a
                        hybrid mutex had to block on its condition
                        variable. n a busy system this might happen
                        several times before the corresponding
                        hybrid_wakeup. 
                    
- 
                     mutex-hybrid_wakeup (unsigned
                        hybrid_wakeup, unsigned mutex); 
                  
- 
                        Increment the count of times that a
                        hybrid mutex finished one or more waits for
                        its condition variable.
                    
 
 
- 
               txn 
            
- 
               
                The txn category covers the basic transaction
                operations.
             
                
                  - 
                     txn-begin (unsigned txnid, unsigned flags); 
                  
-  
                        A transaction was successfully begun.
                    
- 
                     txn-commit (unsigned txnid, unsigned
                        flags); 
                  
-  
                        A transaction is starting to commit.
                    
- 
                     txn-prepare (unsigned txnid, uint8_t *gid); 
                  
-  
                        The transaction is starting to prepare,
                        flushing the log so that a future commit can
                        be guaranteed to succeed. The global
                        identifier field is 128 bytes long.
                    
- 
                     txn-abort (unsigned txnid); 
                  
- 
                        The transaction is about to abort.
                    
 
 
                These txn counters are included by
                --enable-perfmon-statistics.
             
                
                  - 
                     txn-nbegins (unsigned st_nbegins, unsigned
                        txnid); 
                  
-  
                        Beginning the transaction incremented
                        st_nbegins.
                    
- 
                     txn-naborts (unsigned st_nbegins, unsigned
                        txnid); 
                  
-  
                        Aborting the transaction incremented
                        st_naborts.
                    
- 
                     txn-ncommits (unsigned st_ncommits,
                        unsigned txnid); 
                  
- 
                        Committing the transaction incremented
                        st_ncommits. 
                    
- 
                     txn-nactive (unsigned st_nactive, unsigned
                        txnid); 
                  
-  
                        Beginning or ending the transaction
                        updated the number of active transactions.
                    
- 
                     txn-maxnactive (unsigned st_maxnactive,
                        unsigned txnid); 
                  
-  
                        The creation of the transaction set a
                        new maximum number of active transactions.