Skip Headers
Oracle® Collaboration Suite Deployment Guide
10g Release 1 (10.1.2)

Part Number B25492-04
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Master Index
Master Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
View PDF

6 Deploying Oracle Content Services

This chapter discusses planning information designed to help you make important decisions about how to configure and deploy Oracle Content Services.

The following sections are included in the chapter:

See Chapter 1, "Oracle Content Services Administration Concepts", in Oracle Content Services Administrator's Guide for detailed information on Oracle Content Services architecture and integration with key Oracle technologies.

Understanding the Oracle Content Services Architecture and Functionality

The following sections describe the technology underlying Oracle Content Services, as well as how the nodes and other processes interact. It also provides information about the Oracle Content Services Site model and Oracle Internet Directory.

For more detailed information about the Oracle Content Services architecture, see the Oracle Content Services Administrator's Guide.

Figure 6-1 Oracle Content Services Architecture

Oracle Content Services Architecture
Description of "Figure 6-1 Oracle Content Services Architecture"

The Oracle Content Services application is built using Oracle Content Management SDK (Oracle CM SDK) Java APIs. This low-level API provides much of the required functionality, infrastructure, and runtime environment for content management, but does not dictate the business rules governing that content. These business rules and policies are implemented in a separate business logic layer.

A façade layer then provides a uniform Java interface that encompasses both Oracle CM SDK and application business logic. This layer is the foundation for the Oracle Content Services Web application, protocol servers, and Web services. The façade ensures that all components interfacing with Oracle Content Services do so at an abstraction level that respects the application business logic.

The Oracle Content Services Domain

An Oracle Content Services domain is a logical grouping of Oracle Content Services nodes and an Oracle database instance (called the Collaboration Suite Database) that contains the Oracle Content Services data. The nodes run on Oracle Application Server.

Oracle Content Services Schema

A schema is a group of related objects in a database. The Oracle Content Services schema is created in an Oracle database during the configuration process. The schema owns all database objects, including metadata about Oracle Content Services and configuration information (see Figure 6-2).

Figure 6-2 The Oracle Content Services Domain

Oracle Content Services Domain
Description of "Figure 6-2 The Oracle Content Services Domain"

Oracle Content Services Nodes

An Oracle Content Services node is the application that comprises the product, along with the underlying Java Virtual Machine (JVM) required to support the application at runtime.

The Oracle Content Services node processes and the database itself can be physically configured on a single host, or across several, separate hosts.

By default, an Oracle Content Services domain includes two nodes:

  • regular node

  • HTTP node

You can configure additional HTTP or regular nodes on the same computer or on additional computers.

The regular node supports protocol servers, such as FTP, as well as agents, such as the Garbage Collection Agent. The HTTP node supports the Oracle Content Services application, portlet, and WebDAV by means of servlets that are configured to work with the Oracle Application Server Containers for J2EE (OC4J).

Services, Servers, and Agents

Each node supports a service with specific configuration parameters, such as language, default character set, connections to the database, and cache sizes.

The service, in turn, supports the servers. Each server is either a protocol server or an agent. The protocol servers listen for requests from clients on a specific Internet Protocol (IP) port and respond to requests according to the rules of the protocol specification.

Agents perform operations periodically (time-based) or in response to events generated by other Oracle Content Services servers or processes (event-based). Although different agents can run in different nodes, each agent must run only on a single node. Typically, all the shipped agents must be run to ensure a stable system.

Oracle Internet Directory

Oracle Content Services uses Oracle Internet Directory for its identity management directory (list of user names and passwords). During configuration of Oracle Collaboration Suite, you select an Oracle Internet Directory server to be used with Oracle Content Services.

The Site Model

In Oracle Content Services, a Site is an organizational entity whose users can collaborate on files and folders. Oracle Content Services sites are based on Identity Management Realms.

See Chapter 1, "Oracle Content Services Administration Concepts", in Oracle Content Services Administrator's Guide for detailed information on Sites.

Planning for Oracle Content Services Deployment

This section provides information for planning an Oracle Content Services deployment.

Choice of Protocols

The most important decision regarding performance and scalability is the choice of which protocols to use to access Oracle Content Services.

See Chapter 4, "Oracle Content Services Protocol Support", in Oracle Content Services Administrator's Guide for detailed information on protocol servers supported by Oracle Content Services, along with the client access paths and software for the supported protocols.

Oracle Content Services Sizing Guidelines

This section describes hardware requirements for a sample deployment of Oracle Content Services and formula that allow you to determine the hardware configuration required to deploy Oracle Content Services in your organization.

This section includes the following topics:

Hardware requirements for Oracle Content Services are primarily determined by the factors described in Table 6-1:

Table 6-1 Primary Factors Determining Oracle Content Services Hardware Requirements

Hardware Resource Applications tier computer requirement variables Database computer requirement variables

CPU

  • Peak number of operations performed each second

  • Peak number of operations performed each second

  • Whether using Oracle Text indexing

Memory

  • Peak number of operations performed each second

  • Peak number of concurrent connected users

  • Average number of protocols used each concurrent connected user

  • Average number of sessions used each concurrent connected user

  • Number of users accessing Oracle Content Services through FTP

  • Number of files each folder

  • Peak number of operations performed each second

  • Number of files

Disk Size

N/A

  • Number of files

  • Average content size of files, whether they can be indexed or not

Disk Throughput

(not discussed in this file)

N/A

  • Peak number of files read and written each second

  • Average content size of files


In order to determine hardware requirements, assumptions must be made about the type of work that users are performing. The following measurements are averages extrapolated from deployment of Oracle Content Services within Oracle Corporation (40,000+ users), and are generally applicable for projecting Oracle Content Services usage.

Table 6-2 User Profiles

User Task Number of Operations each Connected User each Hour

Folder opens

8

Files read / written

10

Queries

0.1



Note:

These sizing guidelines may be inaccurate if the desired user profile is significantly larger than the average measurements detailed in Table 6-2.

These sizing guidelines are based on benchmarks of 10,000 concurrent connected users on Sun Microsystems hardware. The guidelines have been validated against measurements taken from internal Oracle Corporation production usage of Oracle Content Services by 40,000 Oracle employees, with 20 million files and 6.5TB of content. This system uses Intel Linux hardware for the Applications tier computers, and Sun hardware for the database.

Sizing Formulas for Each Applications Tier Computer

This section provides formulas that you can use to determine specific hardware sizing for each Applications tier computer.

The following table summarizes the sizing formulas:

Table 6-3 General Oracle Content Services Sizing Recommendations for Each Applications Tier Computer

Component Sizing Recommendations

Number of CPUs

roundup(peak concurrent connected users / 250 + 33% headroom)

Required usable disk space

At least 500MB for Oracle Content Services

Total computer memory

If HTTP is the primary protocol: 480MB + (3.6 MB * peak concurrent connected users)

If HTTP is not the primary protocol, or if the desired user profile is different than the average measurements described in Table 6-2:

480MB + (1MB * peak concurrent connected users *
average number of sessions in use by each
concurrent connected user) + (3KB * number of
objects desired in the java object cache) + (8MB
* number of connections to the database)

Number of CPUs

Use the following formula to determine the number of CPUs required:

roundup(peak concurrent connected users / 250 + 33% headroom)

The peak concurrent connected users parameter is the number of users who are signed in to Oracle Content Services and have performed an operation during the peak hour of the day. If you do not know how many users that is likely to be, assume 10% of your entire Oracle Content Services named user population.

The headroom parameter represents the amount of CPU resources that should be left available. In order to ensure optimal efficiency, no more than 75% of the CPU should be allocated.

This formula is based on the following assumptions:

  • The formula assumes Sun SPARC Solaris 400MHz UltraSPARC-II processors with 8MB secondary cache.

  • Other RISC processors should perform roughly proportional to their MHz.

  • Intel Pentium III (or later) processors on Windows and Linux computers should perform roughly proportional to half their MHz. For example, an 800MHz Pentium processor is approximately equivalent to a 400MHz RISC processor.

Required Usable Disk Space

Allocate at least 500MB for Oracle Content Services.

Total Computer Memory, HTTP as the Primary Protocol

If HTTP is the primary protocol, then use the following formula to determine the total computer memory required:

480MB + (3.6MB * peak concurrent connected users)

The 480MB is for the first Oracle Content Services Applications tier computer. The value of 3.6MB is calculated from the following assumptions:

  • 1.6 sessions each concurrent connected user: This assumes that the primary interface for Oracle Content Services is through the HTTP node. The additional 0.6 sessions are HTTP sessions which are started whenever a user of the Oracle Content Services Web interface starts another Oracle Content Services Web interface or if the user accesses Web Folders or Oracle Drive.

  • 0.1 connection pool connections each concurrent connected users: This assumes the stated user profile.

  • 400 objects in the Java data cache each concurrent connected user: This assumes 50 files each folder and 8 folders opened each hour, assuming the stated user profile.

Total Computer Memory, Primary Protocol Other Than HTTP

If HTTP is not the primary protocol, or if the desired user profile is different than the average measurements described in Table 6-2, use the following formula to determine the total computer memory required:

480MB + (1MB * peak concurrent connected users * average number of sessions in use 
by each concurrent connected user) + (3KB * number of objects desired in the Java
object cache) + (8MB * number of connections to the database)
 

The 480MB is for the first Oracle Content Services Applications tier computer. The other values are calculated from the following assumptions:

  • The value of 1MB is high by design. Oracle Content Services has been optimized to reduce database CPU load by using Applications tier memory to cache items. This ensures a more scalable and less expensive system, because the database computer is less of a scalability bottleneck, and because memory on one- or two-processor Applications tier computers is typically less expensive than memory or CPU on high-end database computers (computers with large amounts of attached storage or with many processors).

  • Oracle recommends limiting the number of peak concurrent user sessions through the IFS.SERVICE.MaximumConcurrentSessions parameter in the service configuration. Oracle has tested with Java heaps up to 2GB. With this constraint, this implies up to approximately 700 concurrent connected users each node and a total of 1986MB in size, if the following are true:

    • Each user uses 1.6 sessions

    • Each session is 1MB (700 * 1.6 * 1MB = 1,120MB)

    • Each user needs 400 Java data cache objects

    • Each object is 3KB in size (700 * 400 * 3KB = 866MB)

    For each additional node on the same computer, you must include the node overhead in the sizing. See Table 6-5 for more information.

    The HTTP/WebDAV memory overhead includes memory for 10 simultaneous guest user requests. Because of this, guest users should not be counted as connected users for HTTP/WebDAV access.

  • For the average number of sessions in use by each concurrent connected user, use the value 1.6 for the HTTP node.

  • Calculate the number of objects desired in the Java object cache by using the following formula:

    (number of folder opens in the peak hour) * (number of objects each folder) * (number peak concurrent connected users)
    
    

    Use the result to set the value of the IFS.SERVICE.DATACACHE.Size parameter.

  • The number of connections to the database depends on the number of simultaneous read or write operations being performed. Assume 0.1 database connections each user if using a standard user profile. This is a sum of the parameters IFS.SERVICE.CONNECTIONPOOL.WRITEABLE.MaximumSize and IFS.SERVICE.CONNECTIONPOOL.READONLY.MaximumSize for each service.

Sizing Formulas for the Database Computer

This section provides formula that you can use to determine specific hardware sizing for each database computer to be used for Oracle Content Services users.

The following table summarizes the sizing formulas:

Table 6-4 General Oracle Content Services Sizing Recommendations for the Database Computer

Component Sizing Recommendations

Number of CPUs

roundup(peak concurrent connected users / 250 + 33% headroom)

Required usable disk space

4.5GB + total raw file size + (total raw files size * 20%)

Total computer memory

64MB + 128MB + database buffer cache + (1MB * number of connections to the database) + (500 bytes * number of files) + (100KB * peak concurrent connected users)


Number of CPUs

Use the following formula to determine the number of CPUs required:

roundup(peak concurrent connected users / 250 + 33% headroom)

The peak concurrent connected users parameter is the number of users who are signed in to Oracle Content Services and have performed an operation during the peak hour of the day. If you do not know how many users that is likely to be, assume 10% of your entire Oracle Content Services named user population.

The headroom parameter represents the amount of CPU resources that should be left available. In order to ensure optimal efficiency, no more than 75% of the CPU should be allocated. One additional CPU is used for the background Oracle Text indexing of new file content, if you are using Oracle Text indexing.

This formula is based on the following assumptions:

  • The formula assumes Sun SPARC Solaris 400MHz UltraSPARC-II processors with 8MB secondary cache.

  • Other RISC processors should perform roughly proportional to their MHz.

  • Intel Pentium III (or later) processors on Windows and Linux computers should perform roughly proportional to half their MHz. For example, an 800MHz Pentium processor is approximately equivalent to a 400MHz RISC processor.

Required Usable Disk Space

Use the following formula to determine the usable disk space required:

4.5GB + total raw file size + (total raw file size * 20%)

The 4.5GB represents the space required for Oracle software and the initial database configuration. If you are not using Oracle Text to index the content, multiply the total raw file size by 15% instead of 20%.

The following considerations can increase the amount of usable disk space required for the database computer:

  • Mirroring for backup and reliability

  • Redo log size, which should be determined by how many files are inserted and their size

  • Unused portion of the last extent in each database, which occurs with pre-created database files or which can be large if the next extent setting is large

Total Computer Memory

Use the following formula to determine the total computer memory required:

64MB + 128MB + database buffer cache + (1MB * number of connections
to the database) + (500 bytes * number of files) + (100KB * peak concurrent
connected users)

This formula is based on the following assumptions:

  • 128MB is the minimum amount of memory required to run a small Oracle Server.

  • Number of files: The database buffer cache in the default Oracle database configuration is sufficient for approximately 50,000 files. For deployments with more than 50,000 files, allocate 500 bytes each file for optimal performance, including wildcard filename searches. Reduce this number if users do not perform wildcard filename searches.

  • 100KB is calculated by assuming that 0.1 database connections are needed each concurrent connected user as in the stated user profile. Each database connection takes approximately 1MB of database memory.

Memory Requirements: Sample Deployment

Table 6-5 describes approximate minimum memory overhead on the Applications tier computers for each component.

Table 6-5 Memory Overhead by Component

Description Approximate. minimum memory (MB) for Applications tier computer running a regular node and HTTP node Approximate minimum memory (MB) for Applications tier computer running an additional HTTP node Approximate minimum memory (MB) for Applications tier computer running an additional regular node

Memory used by the operating system upon booting the computer.

60

60

60

Overhead for first Java Virtual Computer (JVM).

30

30

30

Domain controller JVM. Only needs to be run once for a single Oracle Content Services schema, regardless of how many Applications tier computers are running Oracle Content Services protocols.

20

0

0

Oracle Enterprise Manager Web site. Must run on every node to allow managing the node through Oracle Enterprise Manager.

150

150

150

Regular Oracle Content Services node JVM. By default, runs the FTP server and the Oracle Content Services agents.

50

0

50

Oracle Content Services Node guardian JVM, which monitors the Oracle Content Services regular node and recovers from node failures.

10

0

10

Oracle HTTP Server, including the default HTTP daemons. Only needs to run where HTTP access is required.

30

30

0

Oracle Content Services OC4J process. Only needs to run where Oracle Content Services HTTP/WebDAV/Oracle Drive access is required. Must be paired with Oracle HTTP Server.

130

130

0

Total

480

400

300


Tablespaces

This section provides information about the Oracle Content Services tablespaces.

This section includes the following topics:

Data Types and Storage Requirements

Table 6-6 shows the different types of data stored in Oracle Content Services and describes the purpose of each tablespace. Each of these tablespaces will be discussed in further detail in subsequent sections of this file.

Table 6-6 Tablespace Definitions

Tablespace Type Name (in Oracle Files Configuration Assistant) Tablespace Name Description

File Storage

Indexed Media

IFS_LOB_I

Stores the Large Object (LOB) data for files that are indexed by Oracle Text, such as text and word processing files.

File Storage

Non-Indexed Media

IFS_LOB_N

Stores the LOB data for files that are not indexed by Oracle Text, such as zip files.

File Storage

interMedia Media

IFS_LOB_M

Stores the LOB data for files that are indexed by Oracle interMedia, such as image, audio, and video files.

Oracle Text

Oracle Text Data

IFS_CTX_I

Stores words (tokens) extracted by Oracle Text from Oracle Content Services files (the Oracle table DR$IFS_TEXT$I).

Oracle Text

Oracle Text Index

IFS_CTX_X

Stores the Oracle B*tree index on the Oracle Text tokens (the Oracle index DR$IFS_TEXT$X).

Oracle Text

Oracle Text Keymap

IFS_CTX_K

Stores miscellaneous Oracle Text tables (the Oracle tables DR$IFS_TEXT$K, DR$IFS_TEXT$N, DR$IFS_TEXT$R).

Metadata

Primary

IFS_MAIN

Stores metadata for files, information about users and groups, and other Oracle Content Services object data.

General Oracle Storage

N/A

Various

SYSTEM, ROLLBACK, TEMP, and other tablespaces that store the Oracle data dictionary, temporary data during transactions, and so on.


Typical tablespace storage space and disk I/O are detailed in Table 6-7:

Table 6-7 Tablespace Storage Requirements and Disk I/O

Tablespace % of Total I/O Throughput Requirements % of Disk Space Requirements

IFS_MAIN

50%

2%

IFS_CTX_X

20%

1%

IFS_CTX_I

10%

1%

IFS_LOB_I

8%

35%

IFS_LOB_N

5%

55%

Various

5%

1%

IFS_LOB_M

1%

4%

IFS_CTX_K

1%

1%

Total

100%

100


Note the following issues regarding the information in Table 6-7:

  • I/O rates are highly dependent on the size of the db_cache_size. These measurements were taken on the Oracle-internal Oracle Content Services implementation, with 8GB db_cache_size, 17 million files, and 40,000 named users.

  • The IFS_MAIN tablespace is the most important tablespace to spread across disks for maximum I/O capacity.

  • Disk I/O for the IFS_CTX_I, IFS_CTX_X and IFS_CTX_K tablespaces is largely generated from Oracle Text batch processes (ctx_ddl.sync_index, and ctx_ddl.optimize_index), which are not critical to end-user performance. Therefore, these tablespaces can be on disks with lower I/O capacity, if necessary.

Storing Files in an Oracle Database

The largest consumption of disk space will occur on the disks that actually contain the files that reside within Oracle Content Services, namely the Indexed Media tablespaces, Non-Indexed Media tablespaces, and interMedia tablespaces. This section explains how the files are stored and how to calculate the amount of space those files will require.

As previously mentioned, files stored in Oracle Content Services are actually stored in database tablespaces. Oracle Content Services makes use of the Large Object (LOB) facility of the Oracle Database. All files are stored as Binary Large Objects (BLOBs), which is one type of LOB provided by the database. LOBs provide for transactional semantics much like the normal data stored in a database. In order to accomplish these semantics, LOBs must be broken down into smaller pieces which are individually modifiable and recoverable. These smaller pieces are referred to as chunks. Chunks are a group of one or more sequential database blocks from a tablespace that contains a LOB column.

Both database blocks and chunk information within those blocks (BlockOverhead) impose some amount of overhead for the stored data. BlockOverhead is presently 60 bytes each block, which consists of the block header, the LOB header, and the block checksum. Oracle Content Services configures its LOBs to have a 32K chunk size.

As an example, assume that the DB_BLOCK_SIZE parameter of the database is set to 8192(8K). A chunk would require four contiguous blocks and impose an overhead of 240 bytes. The usable space within a chunk would be 32768-240=32528 bytes.

Each file stored in Oracle Content Services consists of an integral number of chunks. Using the previous example, for instance, a 500K file will actually use 512000/32528=15.74=16 chunks. Sixteen chunks will take up 16*32K = 524288 bytes. The chunking overhead for storing this file would then be 524288-512000=12288 bytes which is 2.4% of the original file's size.

The chunk size used by Oracle Content Services is set to optimize access times for files. Note that small files, files less than one chunk, will incur a greater disk space percentage overhead since they must use at least a single chunk.

Another structure required for transactional semantics on LOBs is the LOB Index. Each LOB index entry can point to 8 chunks of a specific LOB object (NumLobPerIndexEntry = 8). In our continuing example, where a 500K file takes up 16 chunks, two index entries would be required for that object. Each entry takes 46 bytes (LobIndexEntryOverhead) and is then stored in an Oracle B*tree index, which in turn has its own overhead depending upon how fragmented that index becomes.

The last factor affecting LOB space utilization is the PCTVERSION parameter used when creating the LOB column. For information about how PCTVERSION works, please consult the Oracle Database SQL Reference.

Oracle Content Services uses the default PCTVERSION of 10% for the LOB columns it creates. This reduces the possibility of "ORA-22924 snapshot too old" errors occurring in read consistent views. So by default, a minimum of a 10 percent increase in chunking space must be added in to the expected disk usage to allow for persistent PCTVERSION chunks.

For large systems where disk space is an issue, Oracle recommends reducing PCTVERSION to 1, in order to reduce disk storage requirements. This may be done at any time in a running system using the following SQL commands:

alter table odmm_contentstore modify lob (globalindexedblob) (pctversion 1);
alter table odmm_contentstore modify lob (emailindexedblob) (pctversion 1);
alter table odmm_contentstore modify lob (emailindexedblob_t) (pctversion 1);
alter table odmm_contentstore modify lob (intermediablob) (pctversion 1);
alter table odmm_contentstore modify lob (intermediablob_t) (pctversion 1);
alter table odmm_nonindexedstore modify lob (nonindexedblob2) (pctversion 1);

The steps for calculating LOB tablespace usage are as follows:

  1. Calculate the number of chunks a file will use by figuring the number of blocks each chunk, then subtracting the BlockOverhead (60 bytes) from the chunk size to get the available space each chunk.

  2. Divide the file size by the available space each chunk to get the number of chunks, each the following formula:

    chunks = roundup(FileSize / ChunkSize=((ChunkSize/BlockSize) * BlockOverhead)))
    
    

    For example, if FileSize = 100,000, ChunkSize = 32768, Blocksize = 8192, and BlockOverhead = 60, then the number of chunks is as follows:

    roundup(100000 / (32768 - ((32768 / 8192) * 60))) = 4 chunks
    
    
  3. Calculate the amount of disk space for a file by multiplying the number of chunks times the chunk size, multiplying that result by the PCTVERSION factor, and then adding the space for NumLobPerIndexEntry (8) and LobIndexEntryOverhead (46 bytes).

    FileDiskSpaceInBytes = roundup(chunks * ChunkSize * PCTVERSIONFactor) + roundup(chunks / NumLobPerIndexEntry * LobIndexEntryOverhead)
    
    

    Hence, if chunks = 4, ChunkSize = 32768, PCTVERSIONFactor = 1.1, NumLobPerIndexEntry = 8, and LobIndexEntryOverhead = 46:

    roundup(4 * 32768 * 1.1) + (roundup(4 / 8) * 46)= 144226 FileDiskSpaceInBytes
    
    
  4. Calculate the total disk space used for file storage by summing up the application of the preceding formulas for each file to be stored in the LOB, using the following formula:

    TableSpaceUsage = sum(FileDiskSpaceInBytes)
    
    

    This is for all files stored.

Oracle Content Services creates multiple LOB columns. The space calculation must be made for each tablespace based upon the amount of content that will qualify for storage in that tablespace.

Oracle Content Services Metadata and Infrastructure

The Oracle Content Services server keeps persistent information about the file system and the contents of that file system in database tables. These tables and their associated structures are stored in the Oracle Content Services Primary tablespace. This tablespace contains approximately 300 tables and 500 indexes. These structures are required to support both the file system and the various protocols and user interfaces that make use of that file system.

The administration and planning tasks of this space should be very similar to operations on a normal Oracle database installation. The administrator of the system should plan for approximately 6K of overhead each file to be used from this tablespace, or about 2% of the overall content. If there is a significant amount of custom metadata, such as categories, this overhead will be larger.

The initial disk space allocated for this tablespace is approximately 50MB for a default install. Of this 50MB, 16MB is actually used at the completion of installation. This includes instantiations for all required tables and indexes and the metadata required for the approximately 700 files that are loaded into Oracle Content Services as part of the install. Different tables and indexes within this tablespace will grow at different rates depending on which features of Oracle Content Services are used in a particular installation.

Oracle Text

When Oracle Content Services works in conjunction with Oracle Text, it enables users to access powerful search capabilities on the files stored within Oracle Content Services. Disk space for these capabilities is divided among three distinct tablespaces for optimal performance.

The Oracle Text Data tablespace contains tables which hold the text tokens (separate words) that exist within the various indexed files. The storage for these text tokens is roughly proportional to the ASCII content of the file.

The ASCII content percentage varies depending on the format of the original file. Text files only have white space as their non-ASCII content and therefore incur a greater each file percentage overhead. File types such as Microsoft Word or PowerPoint contain large amounts of data required for formatting that does not qualify as text tokens. The each file percentage on these types of files is therefore lower. On a system with diverse content types the expected overhead is approximately 8% of the sum of the original sizes of the indexed files.

Table 6-8 offers general guidelines for the amount of ASCII text in a file for several popular formats:

Table 6-8 Average ASCII Content each File Type

Format Plain ASCII Content as Percentage of File Size Typical Percentage of all File ContentFoot 1 

Microsoft ExcelFoot 2 

250%

4%

ASCII

100%

2%

HTML

90%

10%

Rich Text Format

80%

2

Microsoft Word

70%

13%

Acrobat PDF

10%

18%

Microsoft PowerPoint

1%

3%

Images (JPEG, BMP), Compressed files (Zip, TAR), Binary files, and so on.

0%

50%

Total

 


100%


Footnote 1 From statistics of Oracle Corporation's internal usage of Oracle Content Services.

Footnote 2 By default, Oracle Text indexes each number in an Excel file as a separate word. Excel stores a number more efficiently than its ASCII equivalent, which is why the ASCII content as a percentage of the file size is greater than 100%.

The Oracle Text Keymap tablespace contains the tables and indexes required to translate from the Oracle Content Services locator of a file (the Oracle Content Services DocID) to the Oracle Text locator of that same file (the Oracle Text DocID). The expected space utilization for this tablespace is approximately 70 bytes each indexed file.

The Oracle Text Index tablespace contains the B*tree database index that is used against the text token information stored in the Oracle Text Data tablespace. This will grow as a function of the ASCII content just as the Oracle Text Data tablespace does. On a system with diverse content types the expected overhead is approximately 4% of the sum of the ASCII content of the files, or approximately 1% of the sum of the total sizes of the indexed files.

Disk Space Requirements: Sample Deployment

This section details various requirements for disk space, and offers guidance as to how necessary disk space will expand with the addition of files to the server.

Based on experience running Oracle Content Services for Oracle Corporation's internal usage, the disk overhead of Oracle Content Services for a large system (hundreds of gigabytes of file content) is approximately as detailed in Table 6-9.

Table 6-9 Disk Space Requirements Summary

Tablespace Overhead Type Overhead Versus Total Raw File ContentFoot 1  Primarily Determined By

File Storage

12%

Size of files relative to chunk size (32KB by default)

Oracle Text

5%

Amount of ASCII content in all files

Metadata

2%

Number of folders, files, and so on.

General Oracle Storage

1%

Fixed, not configurable, database settings for TEMP, UNDO, and other tablespaces

Total

20%

 



Footnote 1 This does not include: Mirroring for backup and reliability; Redo log size, which should be determined by how many files are inserted and their size; Unused portion of the last extent in each database file (which will occur with pre-created database files or which may be large if the next extent setting is large).

See Oracle Database Concepts for explanations of the terms Large Object, tablespace, chunk size, and extents.


Note:

Given that a large percentage of the overhead is in LOB overhead, the overhead for your Oracle Content Services instance may vary depending on the average and median sizes of files.

Oracle Content Services Backup and Recovery

Planning for failures is one of the most important jobs of any system administrator or database administrator (DBA). Be sure to implement a daily or weekly backup plan that meets the needs of your business and operations environment. Take advantage of the Oracle database backup capabilities, that are built in the database.

Always back up the system before upgrading, migrating new data, or making other major changes. See Oracle Database Backup and Recovery Basics for additional information.


Note:

In addition to the Oracle Content Services schema, there are three special schemas that ensure secure connectivity to other systems. When you back up your system, make sure to include these schemas.

The special schema names are derived from the Oracle Content Services schema name. For example, if the Oracle Content Services schema name is CONTSRV, the additional schemas are CONTSRV$CM, CONTSRV$DR, and CONTSRV$ID.