8 Understanding Data Access

One of the most important functions of a session is to provide access to a data source. This chapter explains session components specific to accessing a data source.

This chapter describes data access concepts unique to EclipseLink, including the following:

About Externally Managed Transactional Data Sources
About Data Source Login Types
About Data Source Platform Types
About Authentication
About Connections
About Connection Pools
About Data Partitioning Policies
About Clustered Databases and Oracle RAC

8.1 About Externally Managed Transactional Data Sources

EclipseLink transactional data sources are externally managed if the connection pool is managed by a transaction service (such as an application server controlled transaction or a JTA transaction). A JTA managed data source or connection pool is commonly used in Java EE applications and normally required in EJB applications. Use an externally-managed connection pool as follows:

Configure the session to use an ExternalTransactionController to integrate EclipseLink's unit of work with the external transaction service.
Use the external-transaction-control option to specify the connection's login and inform EclipseLink that the connection is maintained by the external controller.
You may need to configure the EclipseLink read connection pool or sequence connection pool to use a non-JTA connection pool in order to avoid transactional overhead.

8.2 About Data Source Login Types

The login (if any) associated with a session determines how the EclipseLink runtime connects to the project's data source.

A login includes details of data source access, such as authentication, use of connection pools, and use of external transaction controllers. A Login owns a data source platform.

A data source platform includes options specific to a particular data source including binding, use of native SQL, use of batch writing, and sequencing.

For projects that do not persist to a data source, a login is not required. For projects that do persist to a data source, a login is always required.

You can use a login in a variety of roles. A login's role determines where and how you create it. The login role you choose depends on the type of project you are creating and how you intend to use the login.

There is a session login type for each project type that persists to a data source.

Note that there is no XML login. EclipseLink XML projects are used for nonpersistent, in-memory object to XML data transformation and consequently there is no data source to log in to.

If you are creating a project that accesses a relational database, you must configure the project with a DatabaseLogin. Your choice of DatabasePlatform further customizes your project for a particular type of database.

8.3 About Data Source Platform Types

EclipseLink abstracts the details of your underlying data source using data source platform classes. A data source platform is owned by your project's Login.

To configure most platform options, you must use an amendment method, or a preLogin event listener.

EclipseLink interacts with databases using structured query language (SQL). Because each database platform uses its own variation on the basic SQL language, EclipseLink must adjust the SQL it uses to communicate with the database to ensure that the application runs smoothly.

The type of database platform you choose determines the specific means by which the EclipseLink runtime accesses the database, including the type of Java Database Connectivity (JDBC) driver to use. JDBC is an application programming interface (API) that gives Java applications access to a database. EclipseLink relational projects rely on JDBC connections to read objects from, and write objects to, the database. EclipseLink applications use either individual JDBC connections or a JDBC connection pool, depending on the application architecture.

EclipseLink provides a variety of database-specific platforms that let you customize your project for your target database. For a list of supported database platforms, see Section A.1, "Database Support."

Specify your database platform at the project level for all sessions, or override this project-level configuration at the session level.

8.4 About Authentication

Authentication is the means by which a data source validates a user's identity and determines whether or not the user has sufficient privileges to perform a given action. Authentication plays a central role in data security and user accountability and auditing.

For two-tier applications, simple JDBC authentication is usually sufficient.

The following sections describe the different authentication strategies:

Simple JDBC Authentication
Oracle Database Proxy Authentication
Auditing

8.4.1 Simple JDBC Authentication

When you configure an EclipseLink database login with a user name and password, EclipseLink provides these credentials to the JDBC driver that you configure your application to use.

By default, EclipseLink writes passwords to and reads them from the sessions.xml file in encrypted form using JCE encryption. Optionally, you can configure a different encryption class.

8.4.2 Oracle Database Proxy Authentication

EclipseLink supports proxy authentication with Oracle Database in Java SE applications and Java EE applications with the Oracle JDBC driver and external connection pools only.

Note:

EclipseLink does not support Oracle Database proxy authentication with JTA.

Oracle Database proxy authentication delivers the following security benefits:

A limited trust model, by controlling the users on whose behalf middle tiers can connect, and the roles the middle tiers can assume for the user.
Scalability, by supporting user sessions through Oracle Call Interface (OCI) and thick JDBC, and eliminating the overhead of reauthenticating clients.
Accountability, by preserving the identity of the real user through to the database, and enabling auditing of actions taken on behalf of the real user.
Flexibility, by supporting environments in which users are known to the database, and in which users are merely "application users" of which the database has no awareness.

Note:

Oracle Database supports proxy authentication in three-tiers only; it does not support it across multiple middle tiers.

For more information about authentication in Oracle Database, see "Preserving User Identity in Multitiered Environments" in the Oracle Database Security Guide.

Configure your EclipseLink database login to use proxy authentication to do the following:

address the complexities of authentication in a three-tier architecture (such as client-to-middle-tier and middle-tier-to-database authentication, and client reauthentication through the middle -tier to the database)
enhance database audit information (for even triggers and stored procedures) by using a specific user for database operations, rather than the generic pool user
simplify VPD/OLS configuration by using a proxy user, rather than setting user information directly in the session context with stored procedures

8.4.3 Auditing

Regardless of what type of authentication you choose, EclipseLink logs the name of the user associated with all database operations. Example 8-1 shows the CONFIG level EclipseLink logs when a ServerSession connects through the main connection for the sample user "scott", and a ClientSession uses proxy connection "jeff"

Example 8-1 Logs with Oracle Database Proxy Authentication

[TopLink Config]--ServerSession(13)--Connection(14)--Thread(Thread[main,5,main])--connecting(DatabaseLogin( platform=>Oracle9Platform   user name=> "scott" connector=>OracleJDBC10_1_0_2ProxyConnector datasource name=>DS))
[TopLink Config]--ServerSession(13)--Connection(34)--Thread(Thread[main,5,main])--Connected: jdbc:oracle:thin:@localhost:1521:orcl
User: SCOTT
[TopLink Config]--ClientSession(53)--Connection(54)--Thread(Thread[main,5,main])--connecting(DatabaseLogin(platform=>Oracle9Platform user name=> "scott" connector=>OracleJDBC10_1_0_2ProxyConnector datasource name=>DS))
[TopLink Config]--ClientSession(53)--Connection(56)--Thread(Thread[main,5,main])--Connected: jdbc:oracle:thin:@localhost:1521:orcl
User: jeff

Your database server likely provides additional user auditing options. Consult your database server documentation for details.

Alternatively, you may consider using the EclipseLink unit of work in conjunction with your database schema for auditing purposes.

8.5 About Connections

A connection is an object that provides access to a data source by way of the driver you configure your application to use. Relational projects use JDBC to connect to the data source; EIS projects use JCA. EclipseLink uses the interface org.eclipse.persistence.internal.databaseaccess.Accessor to wrap data source connections. This interface is accessible from certain events.

Typically, when using a server session, EclipseLink uses a different connection for both reading and writing. This lets you use nontransactional connections for reading and avoid maintaining connections when not required.

By default, an EclipseLink server session acquires connections lazily: that is, only during the commit operation of a unit of work. Alternatively, you can configure EclipseLink to acquire a write connections at the time you acquire a client sessions.

Connections can be allocated from internal or external connection pools.

8.6 About Connection Pools

A connection pool is a service that creates and maintains a shared collection (pool) of data source connections on behalf of one or more clients. The connection pool provides a connection to a process on request, and returns the connection to the pool when the process is finished using it. When it is returned to the pool, the connection is available for other processes. Because establishing a connection to a data source can be time-consuming, reusing such connections in a connection pool can improve performance.

EclipseLink uses connection pools to manage and share the connections used by server and client sessions. This feature reduces the number of connections required and allows your application to support many clients.

You can configure your session to use internal connection pools provided by EclipseLink or external connection pools provided by a JDBC driver or Java EE container.

You can use connection pools in your EclipseLink application for a variety of purposes, such as reading, writing, sequencing, and other application-specific functions.

This section describes the following types of connection pools:

Internal Connection Pools
External Connection Pools
Default (Write) and Read Connection Pools
Sequence Connection Pools
Application-Specific Connection Pools

8.6.1 Internal Connection Pools

For non-Java EE applications, you typically use internal connection pools. By default, EclipseLink sessions use internal connection pools.

Using internal connection pools, you can configure the default (write) and read connection pools and you can create additional connection pools for object identity, or any other purpose.

Using internal connection pools, you can optimize the creation of read connections for applications that read data only to display it and only infrequently modify data.

8.6.2 External Connection Pools

For Java EE applications, you typically use external connection pools.

If you are using an external transaction controller (JTA), you must use external connection pools to integrate with the JTA.

Using external connection pools, you can use Java to configure the default (write) and read connection pools and create additional connection pools for object identity, or any other purpose.

8.6.3 Default (Write) and Read Connection Pools

A server session provides a read connection pool and a write connection pool. These could be different pools, or if you use external connection pooling, the same connection pool.

All read queries use connections from the read connection pool and all queries that write changes to the data source use connections from the write connection pool. You can configure attributes of the default (write) and read connection pools.

Whenever a new connection is established, EclipseLink uses the connection configuration you specify in your session's DatasourceLogin. Alternatively, when you use an external transaction controller, you can define a separate connection configuration for a read connection pool to avoid the additional overhead, if appropriate.

8.6.4 Sequence Connection Pools

An essential part of maintaining object identity is sequencing–managing the assignment of unique values to distinguish one instance from another. For more information, see Section 9.2, "About Cache Type and Size".

Sequencing involves reading and writing a special sequence resource maintained by your data source.

By default, EclipseLink includes sequence operations in a separate transaction. This avoids complications during the write transaction, which may lead to deadlocks over the sequence resource. However, when using an external transaction controller (such as a JTA data source or connection pool), EclipseLink cannot use a different transaction for sequencing. Use a sequence connection pool to configure a non-JTA transaction pool for sequencing. This is required only for table sequencing–not native sequencing.

In each server session, you can create one connection pool, called a sequence connection pool, that EclipseLink uses exclusively for sequencing. With a sequence connection pool, EclipseLink satisfies a request for a new object identifier outside of the transaction from which the request originates. This allows EclipseLink to immediately commit an update to the sequence resource, which avoids deadlocks.

Note:

If you use a sequence connection pool and the original transaction fails, the sequence operation does not roll back.

You should use a sequence connection pool, if the following applies:

You use table sequencing (that is, non-native sequencing).
You use external transaction controller (JTA).

You should not use a sequence connection pool, if the following applies:

You do not use sequencing, or use the data source's native sequencing.
You have configured the sequence table to avoid deadlocks.
You use non-JTA data sources.

8.6.5 Application-Specific Connection Pools

When you use internal EclipseLink connection pools in a session, you can create one or more connection pools that you can use for any application purpose. These are called named connection pools, as you can give them any name you want and use them for any purpose.

Typically, use these named connection pools to provide pools of different security levels. For example, the "default" connection pool may only allow access to specific tables but the "admin" connection pool may allow access to all tables.

8.7 About Data Partitioning Policies

Data partitioning allows for an application to scale its data across more than a single database machine. EclipseLink supports data partitioning at the Entity level to allow a different set of entity instances for the same class to be stored in a different physical database or different node within a database cluster. Both regular databases and clustered databases are supported. Data can be partitioned both horizontally and vertically.

Partitioning can be enabled on an entity, a relationship, a query, or a persistence unit.

To configure data partitioning, use the @Partitioned annotation and one or more partitioning policy annotations. The annotations for defining the different kinds of policies are:

@HashPartitioning; Partitions access to a database cluster by the hash of a field value from the object, such as the object's ID, location, or tenant. The hash indexes into the list of connection pools/nodes. All write or read requests for objects with that hash value are sent to the same server. If a query does not include the hash field as a parameter, it can be sent to all servers and unioned, or it can be left to the session's default behavior.
@PinnedPartitioning; Pins requests to a single connection pool/node. This allows for vertical partitioning.
@RangePartitioning; Partitions access to a database cluster by a field value from the object, such as the object's ID, location, or tenant. Each server is assigned a range of values. All write or read requests for objects with that value are sent to the same server. If a query does not include the field as a parameter, then it can either be sent to all servers and unioned, or left to the session's default behavior.
@ReplicationPartitioning; Sends requests to a set of connection pools/nodes. This policy is for replicating data across a cluster of database machines. Only modification queries are replicated.
@RoundRobinPartitioning; Sends requests in a round-robin fashion to the set of connection pools/nodes. It is for load balancing read queries across a cluster of database machines. It requires that the full database be replicated on each machine, so it does not support partitioning. The data should either be read-only, or writes should be replicated.
@UnionPartitioning; Sends queries to all connection pools and unions the results. This is for queries or relationships that span partitions when partitioning is used, such as on a ManyToMany cross partition relationship.
@ValuePartitioning; Partitions access to a database cluster by a field value from the object, such as the object's location or tenant. Each value is assigned a specific server. All write or read requests for objects with that value are sent to the same server. If a query does not include the field as a parameter, then it can be sent to all servers and unioned, or it can be left to the session's default behavior.
@Partitioning; Partitions access to a database cluster by a custom partitioning policy. A PartitioningPolicy class must be provided and implemented.

Partitioning policies are globally-named objects in a persistence unit and are reusable across multiple descriptors or queries. This improves the usability of the configuration, specifically with JPA annotations and metadata.

The persistence unit properties support adding named connection pools in addition to the existing configuration for read/write/sequence. A named connection pool must be defined for each node in the database cluster.

If a transaction modifies data from multiple partitions, JTA should be used to ensure 2-phase commit of the data. An exclusive connection can also be configured in the EntityManager to ensure only a single node is used for a single transaction.

8.8 About Clustered Databases and Oracle RAC

Some databases support clustering the database across multiple machines. Oracle RAC allows for a single database to span multiple different server nodes. Oracle RAC also supports table and node partitioning of data. A database cluster allows for any of the data to be accessed from any node in the cluster. However, it is generally more efficient to partition the data access to specific nodes, to reduce cross node communication.

EclipseLink partitioning can be used in conjunction with a clustered database to reduce cross node communication, and improve scalability.

To use partitioning with a database cluster to following is required:

The partition policy should not enable replication, as the database cluster makes data available to all nodes.
The partition policy should not use unions, as the database cluster returns the complete query result from any node.
A data source and EclipseLink connection pool should be defined for each node in the cluster.
The application's data access and data partitioning should be designed to have each transaction only require access to a single node.
Use of an exclusive connection for an EntityManager is recommended to avoid having multiple nodes in a single transaction and avoid 2-phase commit.