Oracle Files Planning Guide Release 2 (9.0.4.1) Part Number B10974-01 |
|
View PDF |
June 2003
Part No. B10974-01
This document presents planning information designed to help you make important decisions about how to configure and deploy Oracle Files.
The following sections are included in the document:
It is also recommended that you read Chapter 1, Concepts, in the Oracle Files Administrator's Guide for detailed information on Oracle Files architecture and integration with key Oracle technologies.
Our goal is to make Oracle products, services, and supporting documentation accessible, with good usability, to the disabled community. To that end, our documentation includes features that make information available to users of assistive technology. This documentation is available in HTML format, and contains markup to facilitate access by the disabled community. Standards will continue to evolve over time, and Oracle Corporation is actively engaged with other market-leading technology vendors to address technical obstacles so that our documentation can be accessible to all of our customers. For additional information, visit the Oracle Accessibility Program Web site at:
Table 1 and Table 2 requirements are based on using the Oracle Collaboration Suite Middle-Tier Install.
The information in this table assumes that you are installing Oracle Files on its own middle-tier machine, and that Oracle Ultra Search and Oracle9iAS Unified Messaging (Email) will be run on a separate machine if you are also deploying those components.
Table 1 and Table 2 do not include requirements for Oracle Internet Directory. Oracle recommends that you install, configure, and run Oracle Internet Directory on a completely separate machine.
Description | Requirement |
---|---|
Number of machines |
1 |
Oracle Files users supported |
2 concurrent connected usersFoot 1 |
Number of CPUs |
1 (add 1 CPU if Oracle Text is being used for indexing) |
Minimum processor type |
AIX CPU: All AIX-compatible processors |
RAM |
1 gigabyte |
Hard disk drive space and swap space |
8.5 gigabytes minimum total free hard disk drive space required, which includes 6 gigabytes of space required by the Oracle database and Oracle Collaboration Suite Middle-Tier Install, and 2 gigabytes of swap space |
1 A concurrent connected user is a user performing operations during a particular hour. |
The hardware requirements in Table 1, " Minimum Hardware Requirements for Single-Machine Deployment" can support approximately two Oracle Files concurrent connected users accessing two protocols moderately.
The hardware requirements in Table 2, " Minimum Hardware Requirements for Multiple-Machine Deployment for Production Environments" support a workgroup of about 50 Oracle Files concurrent connected users accessing all protocols moderately.
Hardware requirements for Oracle Files are primarily determined by the factors described in Table 3:
Hardware Resource | Middle-tier machine requirement variables | Database machine requirement variables |
---|---|---|
CPU |
||
Memory |
||
Disk Size |
N/A |
|
(not discussed in this document) |
N/A |
In order to determine hardware requirements, assumptions must be made about the type of work that users are performing. The following measurements are averages extrapolated from deployment of Oracle Files within Oracle Corporation (40,000+ users), and are generally applicable for projecting Oracle Files usage.
User Task | Number of Operations per Connected User per Hour |
---|---|
Folder opens |
8 |
Documents read / written |
10 |
Queries |
0.1 |
Note: These sizing guidelines may be inaccurate if the desired user profile is significantly larger than the average measurements detailed in Table 4. |
The most important decision regarding performance and scalability is the choice of which protocols to use to access Oracle Files.
When possible, Oracle recommends using Wide Area Network (WAN) protocols as the primary mechanism for accessing Oracle Files, and using Local Area Network (LAN) protocols only as secondary protocols, or only for those users who are unable to use WAN protocols.
WAN protocols include:
LAN protocols include:
WAN protocols generally are much more efficient in terms of network round trips, and perform fewer server operations to accomplish end user requests. Both of these factors improve performance for the end user.
For example, Oracle recommends using Web Folders with Microsoft Office 2000/XP for viewing and editing documents on Windows machines, rather than using SMB.
The advantages of Web Folders over SMB, AFP, or NFS are as follows (SMB will be used as the example):
Since each user session takes approximately 1MB of server memory, SMB increases the session memory by approximately 10 times (because its concurrency rates are 10 times higher).
The disadvantages of using Web Folders are:
These sizing guidelines are based on benchmarks of 10,000 concurrent connected users on Sun Microsystems hardware. The guidelines have been validated against measurements taken from internal Oracle Corporation production usage of Oracle Files by 40,000 Oracle employees, with 17 million documents and 4TB of content. This system uses Intel Linux hardware for the middle-tier machines, and Sun hardware for the database.
Note: These sizing guidelines may be inaccurate if the desired user profile is significantly larger than the average measurements detailed in Table 4. |
This section offers rough calculations for determining appropriate hardware sizing.
Component | Each Middle-Tier Machine | Each Database MachineFoot 1 |
---|---|---|
Number of CPUsFoot 2 |
roundup(peak concurrent connected users /250 + 33%Foot 3 headroom) |
roundup(peak concurrent connected users /250) + 33%Foot 4 headroom +1Foot 5 |
Needed usable disk spaceFoot 6 |
500MB for software |
4.5GBFoot 7 + total raw file size + (total raw files size * 20%Foot 8) |
Total machine memory |
If HTTP is the primary protocol, 480MBFoot 9 + (3.6 MBFoot 10 * peak concurrent connected users) If HTTP is not the primary protocol, use 480MB + (1MBFoot 11 * peak concurrent connected usersFoot 12 * average number of sessions in use by each concurrent connected userFoot 13) + (3KB * number of objects desired in the java object cacheFoot 14) + (8MB * number of connections to the databaseFoot 15) |
+ 128MBFoot 16 + (1MB * number of connections to the database) + (500 bytes * number of documentsFoot 17) + (100KBFoot 18 * peak concurrent connected users) |
1 The middle-tier machine and database machine configurations may be combined into a single machine. 2 For Sun SPARC Solaris 400MHz UltraSPARC-II processors with 8MB secondary cache; other RISC processors should perform roughly proportional to their MHz. Intel Pentium III or IV processors on Windows boxes should perform roughly proportional to half their MHz (800MHz Pentium ~ = 400MHz RISC). 3 In order to ensure optimal efficiency, no more than 75% of the CPU should be allocated. 4 In order to ensure optimal efficiency, no more than 75% of the CPU should be allocated. 5 One CPU is for the background Oracle Text indexing of new document content, if Oracle Text indexing is desired. 6 This does not include: Mirroring for backup and reliability; Redo log size, which should be determined by how many documents are inserted and their size; Unused portion of the last extent in each database file (which will occur with pre-created database files or which may be large if the next extent setting is large). 7 Oracle software and initial database configuration. 8 Assuming Oracle Text is being used to index the content; if not, subtract 5%. 9 For the first Oracle Files middle-tier machine. See Table 6 for more information. 10 The number 3.6MB is calculated assuming: a) 1.6 sessions per concurrent connected user. This assumes the primary interface for Oracle Files is through the HTTP node. The additional 0.6 sessions are HTTP sessions which will be started whenever a user of the Oracle Files Web UI starts another Oracle Files Web UI, or if the user accesses Web Folders or Oracle File Sync; 2) 0.1 connection pool connections per concurrent connected user, assuming the stated user profile; and 3) 400 objects in the java data cache per concurrent connected user. This assumes 50 documents per folder and 8 folders opened per hour, assuming the stated user profile. 11 This number is high by design. Oracle Files has been optimized to reduce database CPU by using middle-tier memory to cache items. This ensures a more scalable and less expensive system because the database machine is less of a scalability bottleneck, and because memory on one- and two- processor middle-tier machines is typically cheaper than memory or CPU on high-end database machines (machines with large amounts of attached storage or with many processors). 12 Oracle recommends limiting the number of peak concurrent sessions using IFS.SERVICE.MaximumConcurrentSessions in the service configuration. Oracle has only tested with Java heaps up to 2GB. With this constraint, this implies up to about 700 concurrent connected users per node if each user takes 1.6 sessions, and each session is 1MB (700 * 1.6 * 1MB = 1,120MB) and each user needs 400 java data cache objects and each object takes 3KB (700 * 400 * 3KB = 866MB) for a total of 1986MB. For each additional node on the same machine, the node overhead must be added. See Table 6 for more information. The HTTP/WebDAV memory overhead includes memory for 10 simultaneous guest user requests, so guest users should not be counted as connected users for HTTP/WebDAV access.13 For the HTTP node, use 1.6. For SMB, it may be as high as 10, because for each SMB concurrent connected user there may be an additional 9 other non-concurrent, but connected users. 14 Calculate by multiplying the number of folder opens in the peak hour by the number of objects per folder by the number of peak concurrent connected users. This is the value IFS.SERVICE.DATACACHE.Size .15 Number of database connections depends on how many simultaneous read or write operations are being performed. Assume 0.1 database connections per user if using a standard user profile. This is the sum of IFS.SERVICE.CONNECTIONPOOL.WRITEABLE.MaximumSize and IFS.SERVICE.CONNECTIONPOOL.READONLY.MaximumSize for each service.16 A minimum amount of memory to run a very small Oracle Server. 17 The database buffer cache in the default Oracle database configuration is sufficient for about 50,000 documents. Above 50,000 documents, allocate 500 bytes per document for optimal performance, including wildcard filename searches. If no wildcard filename searches are anticipated, this number may be reduced. 18 100KB is calculated by assuming 0.1 database connections are needed per concurrent connected user as in the stated user profile. Each database connection takes approximately 1MB of database memory. |
Approximate minimum memory overhead on the middle-tier machines for each component are detailed in Table 6:
This section details various requirements for disk space, and offers guidance as to how necessary disk space will expand with the addition of documents to the server.
Based on experience running Oracle Files for Oracle Corporation's internal usage, the disk overhead of Oracle Files for a large system (hundreds of gigabytes of file content) is approximately as detailed in Table 7:
Tablespace Overhead Type | Overhead Versus Total Raw File ContentFoot 1 | Primarily Determined By |
---|---|---|
Document Storage |
12% |
Size of documents relative to chunk size (32KB by default) |
Oracle Text |
5% |
Amount of ASCII content in all documents |
Metadata |
2% |
Number of folders, documents, etc. |
General Oracle Storage |
1% |
Fixed, not configurable, database settings for |
Total |
20% |
|
See the Oracle Concepts Guide for explanation of the terms Large Object (LOB), tablespace, chunk size, and extents.
Given that a large percentage of the overhead is in LOB overhead, note that the overhead for your Oracle Files instance may vary depending on the average and median sizes of documents.
Table 8 shows the different types of data stored in Oracle Files and describes the purpose of each tablespace. Each of these tablespaces will be discussed in further detail in subsequent sections of this document.
Typical tablespace storage space and disk I/O are detailed in Table 9:
Note the following issues regarding the information in Table 9:
db_block_cache
. These measurements were taken on the Oracle-internal Oracle Files implementation, with 8GB db_block_cache
, 17 million documents, and 40,000 named users.IFS_MAIN
tablespace is the most important tablespace to spread across disks for maximum I/O capacity.IFS_CTX_I
, IFS_CTX_X
and IFS_CTX_K
tablespaces is largely generated from Oracle Text batch processes (ctx_ddl.sync_index
, and ctx_ddl.optimize_index
), which are not critical to end-user performance. Therefore, these tablespaces can be on disks with lower I/O capacity, if necessary.The largest consumption of disk space will occur on the disks that actually contain the documents that reside within Oracle Files, namely the Indexed Medias tablespaces, Non-Indexed Medias tablespaces, and interMedia tablespaces. This section explains how the documents are stored and how to calculate the amount of space those documents will require.
As previously mentioned, documents stored in Oracle Files are actually stored in database tablespaces. Oracle Files makes use of the Large Object (LOB) facility of the Oracle Database. All documents are stored as Binary Large Objects (BLOBs), which is one type of LOB provided by the database. LOBs provide for transactional semantics much like the normal data stored in a database. In order to accomplish these semantics, LOBs must be broken down into smaller pieces which are individually modifiable and recoverable. These smaller pieces are referred to as chunks. Chunks are a group of one or more sequential database blocks from a tablespace that contains a LOB column.
Both database blocks and chunk information within those blocks (BlockOverhead) impose some amount of overhead for the stored data. BlockOverhead is presently 60 bytes per block, which consists of the block header, the LOB header, and the block checksum. Oracle Files configures its LOBs to have a 32K chunk size.
As an example, assume that the DB_BLOCK_SIZE parameter of the database is set to 8192(8K). A chunk would require four contiguous blocks and impose an overhead of 240 bytes. The usable space within a chunk would be 32768-240=32528 bytes.
Each document stored in Oracle Files will consist of some integral number of chunks. Using the previous example, for instance, a 500K document will actually use 512000/32528=15.74=16 chunks. Sixteen chunks will take up 16*32K = 524288 bytes. The chunking overhead for storing this document would then be 524288-512000=12288 bytes which is 2.4% of the original document's size.
The chunk size used by Oracle Files is set to optimize access times for documents. Note that small documents, documents less than one chunk, will incur a greater disk space percentage overhead since they must use at least a single chunk.
Another structure required for transactional semantics on LOBs is the LOB Index. Each LOB index entry can point to 8 chunks of a specific LOB object (NumLobPerIndexEntry = 8). In our continuing example, where a 500K document takes up 16 chunks, two index entries would be required for that object. Each entry takes 46 bytes (LobIndexEntryOverhead) and is then stored in an Oracle B*tree index, which in turn has its own overhead depending upon how fragmented that index becomes.
The last factor affecting LOB space utilization is the PCTVERSION parameter used when creating the LOB column. For information about how PCTVERSION works, please consult the Oracle9i SQL Reference.
Oracle Files uses the default PCTVERSION of 10% for the LOB columns it creates. This reduces the possibility of "ORA-22924 snapshot too old" errors occurring in read consistent views. So by default, a minimum of a 10 percent increase in chunking space must be added in to the expected disk usage to allow for persistent PCTVERSION chunks.
For large systems where disk space is an issue, Oracle recommends reducing PCTVERSION to 1, in order to reduce disk storage requirements. This may be done at any time in a running system using the following SQL commands:
alter table odmm_contentstore modify lob (globalindexedblob) (pctversion 1); alter table odmm_contentstore modify lob (emailindexedblob) (pctversion 1); alter table odmm_contentstore modify lob (emailindexedblob_t) (pctversion 1); alter table odmm_contentstore modify lob (intermediablob) (pctversion 1); alter table odmm_contentstore modify lob (intermediablob_t) (pctversion 1); alter table odmm_nonindexedstore modify lob (nonindexedblob2) (pctversion 1);
The steps for calculating LOB tablespace usage are as follows:
chunks = roundup(FileSize/(ChunkSize-((ChunkSize/BlockSize) * BlockOverhead)))
For example, if FileSize = 100,000, ChunkSize = 32768, Blocksize = 8192, and BlockOverhead = 60, then:
Chunks = roundup (100000 /(32768 - ((32768 / 8192) * 60)))= 4 Chunks
FileDiskSpaceInBytes = roundup(chunks*ChunkSize*PctversionFactor) + roundup(chunks/NumLobPerIndexEntry*LobIndexEntryOverhead)
Hence, if chunks = 4, ChunkSize = 32768, PctversionFactor = 1.1, NumLobPerIndexEntry = 8, and LobIndexEntryOverhead = 46:
FileDiskSpaceInBytes = roundup (4 * 32768 * 1.1) + (roundup(4/8) * 46) = 144226 FileDiskSpaceInBytes
TableSpaceUsage = sum(FileDiskSpaceInBytes) for all files stored
Oracle Files creates multiple LOB columns. The space calculation must be made for each tablespace based upon the amount of content that will qualify for storage in that tablespace.
The Oracle Files server keeps persistent information about the file system and the contents of that file system in database tables. These tables and their associated structures are stored in the Oracle Files Primary tablespace. This tablespace contains approximately 300 tables and 500 indexes. These structures are required to support both the file system and the various protocols and user interfaces that make use of that file system.
The administration and planning tasks of this space should be very similar to operations on a normal Oracle database installation. The administrator of the system should plan for approximately 6K of overhead per document to be used from this tablespace, or about 2% of the overall content. If there is a significant amount of custom metadata, such as categories, this overhead will be larger.
The initial disk space allocated for this tablespace is approximately 50MB for a default install. Of this 50MB, 16MB is actually used at the completion of installation. This includes instantiations for all required tables and indexes and the metadata required for the approximately 700 files that are loaded into Oracle Files as part of the install. Different tables and indexes within this tablespace will grow at different rates depending on which features of Oracle Files are used in a particular installation.
When Oracle Files works in conjunction with Oracle Text, it allows users some powerful search capabilities on the documents stored within Oracle Files. Disk space for these capabilities is divided among three distinct tablespaces for optimal performance.
The Oracle Text Data tablespace contains tables which hold the text tokens (separate words) that exist within the various indexed documents. The storage for these text tokens is roughly proportional to the ASCII content of the document.
The ASCII content percentage will vary depending on the format of the original document. Text files only have white space as their non-ASCII content and therefore will incur a greater per document percentage overhead. Document types such as Microsoft Word or PowerPoint contain large amounts of data required for formatting that does not qualify as text tokens. The per document percentage on these types of documents will therefore be lower. On a system with diverse content types the expected overhead is approximately 8% of the sum of the original sizes of the indexed documents.
Table 10 offers some general guidelines for the amount of ASCII text in a document for several popular formats:
Format | Plain ASCII Content as % of File Size | Typical Percentage of all Document ContentFoot 1 |
---|---|---|
Microsoft ExcelFoot 2 |
250% |
4% |
ASCII |
100% |
2% |
HTML |
90% |
10% |
Rich Text Format |
80% |
2 |
Microsoft Word |
70% |
13% |
Acrobat PDF |
10% |
18% |
Microsoft PowerPoint |
1% |
3% |
Images (JPEG, BMP), Compressed files (Zip, TAR), Binary files, etc. |
0% |
50% |
Total |
|
100% |
The Oracle Text Keymap tablespace contains the tables and indexes required to translate from the Oracle Files locator of a document (the Oracle Files DocID) to the Oracle Text locator of that same document (the Oracle Text DocID). The expected space utilization for this tablespace is approximately 70 bytes per indexed document.
The Oracle Text Index tablespace contains the B*tree database index that is used against the text token information stored in the Oracle Text Data tablespace. This will grow as a function of the ASCII content just as the Oracle Text Data tablespace does. On a system with diverse content types the expected overhead is approximately 4% of the sum of the ASCII content of the documents, or approximately 1% of the sum of the total sizes of the indexed documents.
For more information about Oracle Files Online hardware configuration, see http://technet.oracle.com/products/ifs/pdf/ofowhitepaper.pdf.
Oracle is a registered trademark of Oracle Corporation. Other names may be trademarks of their respective owners.
Copyright © 2003 Oracle Corporation.
All Rights Reserved.