|Oracle® Database Concepts
12c Release 1 (12.1)
|PDF · Mobi · ePub|
This chapter provides an overview of the Structured Query Language (SQL) and how Oracle Database processes SQL statements.
This chapter includes the following topics:
SQL (pronounced sequel) is the set-based, high-level declarative computer language with which all programs and users access data in an Oracle database. Although some Oracle tools and applications mask SQL use, all database tasks are performed using SQL. Any other data access method circumvents the security built into Oracle Database and potentially compromises data security and integrity.
Creating, replacing, altering, and dropping objects
Inserting, updating, and deleting table rows
Controlling access to the database and its objects
There are two broad families of computer languages: declarative languages that are nonprocedural and describe what should be done, and procedural languages such as C++ and Java that describe how things should be done. SQL is declarative in the sense that users specify the result that they want, not how to derive it. The SQL language compiler performs the work of generating a procedure to navigate the database and perform the desired task.
SQL enables you to work with data at the logical level. You need be concerned with implementation details only when you want to manipulate the data. For example, the following statement queries records for employees whose last name begins with
SELECT last_name, first_name FROM hr.employees WHERE last_name LIKE 'K%' ORDER BY last_name, first_name;
The database retrieves all rows satisfying the
WHERE condition, also called the predicate, in a single step. The database can pass these rows as a unit to the user, to another SQL statement, or to an application. The user does not need to process the rows one by one, nor is the user required to know how the rows are physically stored or retrieved.
All SQL statements use the optimizer, a component of the database that determines the most efficient means of accessing the requested data. Oracle Database also supports techniques that you can use to make the optimizer perform its job better.
Oracle strives to follow industry-accepted standards and participates actively in SQL standards committees. Industry-accepted committees are the American National Standards Institute (ANSI) and the International Organization for Standardization (ISO). Both ANSI and the ISO/IEC have accepted SQL as the standard language for relational databases.
The latest SQL standard was adopted in July 2003 and is often called SQL:2003. One part of the SQL standard, Part 14, SQL/XML (ISO/IEC 9075-14) was revised in 2006 and is often referred to as SQL/XML:2006.
Oracle SQL includes many extensions to the ANSI/ISO standard SQL language, and Oracle Database tools and applications provide additional statements. The tools SQL*Plus, SQL Developer, and Oracle Enterprise Manager enable you to run any ANSI/ISO standard SQL statement against an Oracle database and any additional statements or functions available for those tools.
All operations performed on the information in an Oracle database are run using SQL statements. A SQL statement is a computer program or instruction that consists of identifiers, parameters, variables, names, data types, and SQL reserved words.
Note:SQL reserved words have special meaning in SQL and should not be used for any other purpose. For example,
UPDATEare reserved words and should not be used as table names.
A SQL statement must be the equivalent of a complete SQL sentence, such as:
SELECT last_name, department_id FROM employees
Oracle Database only runs complete SQL statements. A fragment such as the following generates an error indicating that more text is required:
Create, alter, and drop schema objects and other database structures, including the database itself and database users. Most DDL statements start with the keywords
Add a comment to the data dictionary (
DDL enables you to alter attributes of an object without altering the applications that access the object. For example, you can add a column to a table accessed by a human resources application without rewriting the application. You can also use DDL to alter the structure of objects while database users are performing work in the database.
Example 7-1 uses DDL statements to create the
plants table and then uses DML to insert two rows in the table. The example then uses DDL to alter the table structure, grant and revoke privileges on this table to a user, and then drop the table.
CREATE TABLE plants ( plant_id NUMBER PRIMARY KEY, common_name VARCHAR2(15) ); INSERT INTO plants VALUES (1, 'African Violet'); # DML statement INSERT INTO plants VALUES (2, 'Amaryllis'); # DML statement ALTER TABLE plants ADD ( latin_name VARCHAR2(40) ); GRANT SELECT ON plants TO scott; REVOKE SELECT ON plants FROM scott; DROP TABLE plants;
COMMIT occurs immediately before the database executes a DDL statement and a
ROLLBACK occurs immediately afterward. In Example 7-1, two
INSERT statements are followed by an
ALTER TABLE statement, so the database commits the two
INSERT statements. If the
ALTER TABLE statement succeeds, then the database commits this statement; otherwise, the database rolls back this statement. In either case the two
INSERT statements have already been committed.
Data manipulation language (DML) statements query or manipulate data in existing schema objects. Whereas DDL statements change the structure of the database, DML statements query or change the contents. For example,
ALTER TABLE changes the structure of a table, whereas
INSERT adds one or more rows to the table.
DML statements are the most frequently used SQL statements and enable you to:
Add new rows of data into a table or view (
INSERT) by specifying a list of column values or using a subquery to select and manipulate existing data.
Update or insert rows conditionally into a table or view (
The following example uses DML to query the
employees table. The example uses DML to insert a row into
employees, update this row, and then delete it:
SELECT * FROM employees; INSERT INTO employees (employee_id, last_name, email, job_id, hire_date, salary) VALUES (1234, 'Mascis', 'JMASCIS', 'IT_PROG', '14-FEB-2008', 9000); UPDATE employees SET salary=9100 WHERE employee_id=1234; DELETE FROM employees WHERE employee_id=1234;
A collection of DML statements that forms a logical unit of work is called a transaction. For example, a transaction to transfer money could involve three discrete operations: decreasing the savings account balance, increasing the checking account balance, and recording the transfer in an account history table. Unlike DDL statements, DML statements do not implicitly commit the current transaction.
A query is an operation that retrieves data from a table or view.
SELECT is the only SQL statement that you can use to query data. The set of data retrieved from execution of a
SELECT statement is known as a result set.
Table 7-1 shows two required keywords and two keywords that are commonly found in a
SELECT statement. The table also associates capabilities of a
SELECT statement with the keywords.
Specifies which columns should be shown in the result. Projection produces a subset of the columns in the table.
An expression is a combination of one or more values, operators, and SQL functions that resolves to a value. The list of expressions that appears after the
Specifies the tables or views from which the data should be retrieved.
Specifies a condition to filter rows, producing a subset of the rows in the table. A condition specifies a combination of one or more expressions and logical (Boolean) operators and returns a value of
Specifies the order in which the rows should be shown.
See Also:Oracle Database SQL Language Reference for
SELECTsyntax and semantics
A join is a query that combines rows from two or more tables, views, or materialized views. Example 7-2 joins the
departments tables (
FROM clause), selects only rows that meet specified criteria (
WHERE clause), and uses projection to retrieve data from two columns (
SELECT). Sample output follows the SQL statement.
SELECT email, department_name FROM employees JOIN departments ON employees.department_id = departments.department_id WHERE employee_id IN (100,103) ORDER BY email; EMAIL DEPARTMENT_NAME ------------------------- ------------------------------ AHUNOLD IT SKING Executive
Most joins have at least one join condition, either in the
FROM clause or in the
WHERE clause, that compares two columns, each from a different table. The database combines pairs of rows, each containing one row from each table, for which the join condition evaluates to
TRUE. The optimizer determines the order in which the database joins tables based on the join conditions, indexes, and any available statistics for the tables.
Join types include the following:
An inner join is a join of two or more tables that returns only rows that satisfy the join condition. For example, if the join condition is
employees.department_id=departments.department_id, then rows that do not satisfy this condition are not returned.
An outer join returns all rows that satisfy the join condition and also returns rows from one table for which no rows from the other table satisfy the condition.
The result of a left outer join for table A and B always contains all records of the left table A, even if the join condition does not match a record in the right table B. If no matching row from B exists, then B columns contain nulls for rows that have no match in B. For example, if not all employees are in departments, then a left outer join of
employees (left table) and
departments (right table) retrieves all rows in
employees even if no rows in
departments satisfy the join condition (
employees.department_id is null).
The result of a right outer join for table A and B contains all records of the right table B, even if the join condition does not match a row in the left table A. If no matching row from A exists, then A columns contain nulls for rows that have no match in A. For example, if not all departments have employees, a right outer join of
employees (left table) and
departments (right table) retrieves all rows in
departments even if no rows in
employees satisfy the join condition.
A full outer join is the combination of a left outer join and a right outer join.
If two tables in a join query have no join condition, then the database performs a Cartesian join. Each row of one table combines with each row of the other. For example, if
employees has 107 rows and
departments has 27, then the Cartesian product contains 107*27 rows. A Cartesian product is rarely useful.
A subquery is a
SELECT statement nested within another SQL statement. Subqueries are useful when you must execute multiple queries to solve a single problem.
Each query portion of a statement is called a query block. In Example 7-3, the subquery in parentheses is the inner query block. The inner
SELECT statement retrieves the IDs of departments with location ID 1800. These department IDs are needed by the outer query block, which retrieves names of employees in the departments whose IDs were supplied by the subquery.
SELECT first_name, last_name FROM employees WHERE department_id IN ( SELECT department_id FROM departments WHERE location_id = 1800 );
The structure of the SQL statement does not force the database to execute the inner query first. For example, the database could rewrite the entire query as a join of
departments, so that the subquery never executes by itself. As another example, the Virtual Private Database (VPD) feature could restrict the query of employees using a
WHERE clause, so that the database decides to query the employees first and then obtain the department IDs. The optimizer determines the best sequence of steps to retrieve the requested rows.
An implicit query is a component of a DML statement that retrieves data without using a subquery. An
MERGE statement that does not explicitly include a
SELECT statement uses an implicit query to retrieve rows to be modified. For example, the following statement includes an implicit query for the
UPDATE employees SET salary = salary*1.1 WHERE last_name = 'Baer';
The only DML statement that does not necessarily include a query component is an
INSERT statement with a
VALUES clause. For example, an
INSERT INTO TABLE mytable VALUES (1) statement does not retrieve rows before inserting a row.
See Also:"Virtual Private Database (VPD)"
Undo the changes in a transaction, since the transaction started (
ROLLBACK) or since a savepoint (
ROLLBACK TO SAVEPOINT). A savepoint is a user-declared intermediate marker within the context of a transaction.
ROLLBACKstatement ends a transaction, but
ROLLBACK TO SAVEPOINTdoes not.
Specify whether a deferrable integrity constraint is checked following each DML statement or when the transaction is committed (
The following example starts a transaction named
Update salaries. The example creates a savepoint, updates an employee salary, and then rolls back the transaction to the savepoint. The example updates the salary to a different value and commits.
SET TRANSACTION NAME 'Update salaries'; SAVEPOINT before_salary_update; UPDATE employees SET salary=9100 WHERE employee_id=1234 # DML ROLLBACK TO SAVEPOINT before_salary_update; UPDATE employees SET salary=9200 WHERE employee_id=1234 # DML COMMIT COMMENT 'Updated salaries';
Session control statements dynamically manage the properties of a user session. As explained in "Connections and Sessions", a session is a logical entity in the database instance memory that represents the state of a current user login to a database. A session lasts from the time the user is authenticated by the database until the user disconnects or exits the database application.
Session control statements enable you to:
Alter the current session by performing a specialized function, such as setting the default date format (
The following statement dynamically changes the default date format for your session to
'YYYY MM DD-HH24:MI:SS':
ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY MM DD HH24:MI:SS';
Session control statements do not implicitly commit the current transaction.
See Also:Oracle Database SQL Language Reference for
ALTER SESSIONsyntax and semantics
System control statements change the properties of the database instance. The only system control statement is
SYSTEM. It enables you to change settings such as the minimum number of shared servers, terminate a session, and perform other system-level tasks.
Following are examples of system control statements:
ALTER SYSTEM SWITCH LOGFILE; ALTER SYSTEM KILL SESSION '39, 23';
ALTER SYSTEM statement does not implicitly commit the current transaction.
See Also:Oracle Database SQL Language Reference for
ALTER SYSTEMsyntax and semantics
Embedded SQL statements incorporate DDL, DML, and transaction control statements within a procedural language program. They are used with the Oracle precompilers. Embedded SQL is one approach to incorporating SQL in your procedural language applications. Another approach is to use a procedural API such as Open Database Connectivity (ODBC) or Java Database Connectivity (JDBC).
Embedded SQL statements enable you to:
Define, allocate, and release cursors (
Initialize descriptors (
Specify how error and warning conditions are handled (
To understand how Oracle Database processes SQL statements, it is necessary to understand the part of the database called the optimizer (also known as the query optimizer or cost-based optimizer). All SQL statements use the optimizer to determine the most efficient means of accessing the specified data.
To execute a DML statement, Oracle Database may have to perform many steps. Each step either retrieves rows of data physically from the database or prepares them for the user issuing the statement.
Many different ways of processing a DML statement are often possible. For example, the order in which tables or indexes are accessed can vary. The steps that the database uses to execute a statement greatly affect how quickly the statement runs. The optimizer generates execution plans describing possible methods of execution.
The optimizer determines which execution plan is most efficient by considering several sources of information, including query conditions, available access paths, statistics gathered for the system, and hints. For any SQL statement processed by Oracle, the optimizer performs the following operations:
Evaluation of expressions and conditions
Inspection of integrity constraints to learn more about the data and optimize based on this metadata
Choice of optimizer goals
Choice of join orders
The optimizer generates most of the possible ways of processing a query and assigns a cost to each step in the generated execution plan. The plan with the lowest cost is chosen as the query plan to be executed.
Note:You can obtain an execution plan for a SQL statement without executing the plan. Only an execution plan that the database actually uses to execute a query is correctly termed a query plan.
You can influence optimizer choices by setting the optimizer goal and by gathering representative statistics for the optimizer. For example, you may set the optimizer goal to either of the following:
Initial response time
FIRST_ROWS hint instructs the optimizer to get the first row to the client as fast as possible.
A typical end-user, interactive application would benefit from initial response time optimization, whereas a batch-mode, non-interactive application would benefit from total throughput optimization.
Oracle Database PL/SQL Packages and Types Reference for information about using
Oracle Database SQL Tuning Guide for more information about the optimizer and using hints
The optimizer contains three main components, which are shown in Figure 7-2.
The input to the optimizer is a parsed query (see "SQL Parsing"). The optimizer performs the following operations:
The optimizer receives the parsed query and generates a set of potential plans for the SQL statement based on available access paths and hints.
The optimizer estimates the cost of each plan based on statistics in the data dictionary. The cost is an estimated value proportional to the expected resource use needed to execute the statement with a particular plan.
The optimizer compares the costs of plans and chooses the lowest-cost plan, known as the query plan, to pass to the row source generator (see "SQL Row Source Generation").
The query transformer determines whether it is helpful to change the form of the query so that the optimizer can generate a better execution plan. The input to the query transformer is a parsed query, which the optimizer represents as a set of query blocks.
See Also:"Query Rewrite"
This measure represents a fraction of rows from a row set. The selectivity is tied to a query predicate, such as
last_name='Smith', or a combination of predicates.
This measure represents units of work or resource used. The query optimizer uses disk I/O, CPU usage, and memory usage as units of work.
If statistics are available, then the estimator uses them to compute the measures. The statistics improve the degree of accuracy of the measures.
The plan generator tries out different plans for a submitted query, and then picks the plan with the lowest cost. The optimizer generates subplans for each of the nested subqueries and unmerged views. The optimizer represents each subplan as a separate query block. The plan generator explores various plans for a query block by trying out different access paths, join methods, and join orders.
The adaptive query optimization capability changes plans based on statistics collected during statement execution. All adaptive mechanisms can execute a final plan for a statement that differs from the default plan. Adaptive optimization uses either dynamic plans, which choose among subplans during statement execution, or reoptimization, which changes a plan on executions after the current execution.
An access path is the technique that a query uses to retrieve rows. For example, a query that uses an index has a different access path from a query that does not. In general, index access paths are best for statements that retrieve a small subset of table rows. Full scans are more efficient for accessing a large portion of a table.
The database can use several different access paths to retrieve data from a table. The following is a representative list:
This type of scan reads all rows from a table and filters out those that do not meet the selection criteria. The database sequentially scans all data blocks in the segment, including those under the high water mark (HWM) that separates used from unused space (see "Segment Space and the High Water Mark").
The rowid of a row specifies the data file and data block containing the row and the location of the row in that block. The database first obtains the rowids of the selected rows, either from the statement
WHERE clause or through an index scan, and then locates each selected row based on its rowid.
This scan searches an index for the indexed column values accessed by the SQL statement (see "Index Scans"). If the statement accesses only columns of the index, then Oracle Database reads the indexed column values directly from the index.
A cluster scan retrieves data from a table stored in an indexed table cluster, where all rows with the same cluster key value are stored in the same data block (see "Overview of Indexed Clusters"). The database first obtains the rowid of a selected row by scanning the cluster index. Oracle Database locates the rows based on this rowid.
A hash scan locates rows in a hash cluster, where all rows with the same hash value are stored in the same data block (see "Overview of Hash Clusters". The database first obtains the hash value by applying a hash function to a cluster key value specified by the statement. Oracle Database then scans the data blocks containing rows with this hash value.
The optimizer chooses an access path based on the available access paths for the statement and the estimated cost of using each access path or combination of paths.
The optimizer statistics are a collection of data that describe details about the database and the objects in the database. The statistics provide a statistically correct picture of data storage and distribution usable by the optimizer when evaluating access paths.
Optimizer statistics include the following:
These include the number of rows, number of blocks, and average row length.
These include the number of distinct values and nulls in a column and the distribution of data.
These include the number of leaf blocks and index levels.
These include CPU and I/O performance and utilization.
Oracle Database gathers optimizer statistics on all database objects automatically and maintains these statistics as an automated maintenance task. You can also gather statistics manually using the
DBMS_STATS package. This PL/SQL package can modify, view, export, import, and delete statistics.
Optimizer statistics are created for the purposes of query optimization and are stored in the data dictionary. These statistics should not be confused with performance statistics visible through dynamic performance views.
Oracle Database PL/SQL Packages and Types Reference to learn about
A hint is a comment in a SQL statement that acts as an instruction to the optimizer. Sometimes the application designer, who has more information about a particular application's data than is available to the optimizer, can choose a more effective way to run a SQL statement. The application designer can use hints in SQL statements to specify how the statement should be run.
For example, suppose that your interactive application runs a query that returns 50 rows. This application initially fetches only the first 25 rows of the query to present to the end user. You want the optimizer to generate a plan that gets the first 25 records as quickly as possible so that the user is not forced to wait. You can use a hint to pass this instruction to the optimizer as shown in the
SELECT statement and
AUTOTRACE output in Example 7-4.
SELECT /*+ FIRST_ROWS(25) */ employee_id, department_id FROM hr.employees WHERE department_id > 50; ------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes ------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 26 | 182 | 1 | TABLE ACCESS BY INDEX ROWID | EMPLOYEES | 26 | 182 |* 2 | INDEX RANGE SCAN | EMP_DEPARTMENT_IX | | ------------------------------------------------------------------------
The execution plan in Example 7-4 shows that the optimizer chooses an index on the
employees.department_id column to find the first 25 rows of
employees whose department ID is over 50. The optimizer uses the rowid retrieved from the index to retrieve the record from the
employees table and return it to the client. Retrieval of the first record is typically almost instantaneous.
Example 7-5 shows the same statement, but without the optimizer hint.
SELECT employee_id, department_id FROM hr.employees WHERE department_id > 50; ------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cos ------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 50 | 350 | |* 1 | VIEW | index$_join$_001 | 50 | 350 | |* 2 | HASH JOIN | | | | |* 3 | INDEX RANGE SCAN | EMP_DEPARTMENT_IX | 50 | 350 | | 4 | INDEX FAST FULL SCAN| EMP_EMP_ID_PK | 50 | 350 |
The execution plan in Example 7-5 joins two indexes to return the requested records as fast as possible. Rather than repeatedly going from index to table as in Example 7-4, the optimizer chooses a range scan of
EMP_DEPARTMENT_IX to find all rows where the department ID is over 50 and place these rows in a hash table. The optimizer then chooses to read the
EMP_EMP_ID_PK index. For each row in this index, it probes the hash table to find the department ID.
In this case, the database cannot return the first row to the client until the index range scan of
EMP_DEPARTMENT_IX completes. Thus, this generated plan would take longer to return the first record. Unlike the plan in Example 7-4, which accesses the table by index rowid, the plan in Example 7-5 uses multiblock I/O, resulting in large reads. The reads enable the last row of the entire result set to be returned more rapidly.
See Also:Oracle Database SQL Tuning Guide to learn how to use optimizer hints
This section explains how Oracle Database processes SQL statements. Specifically, the section explains the way in which the database processes DDL statements to create objects, DML to modify data, and queries to retrieve data.
Figure 7-3 depicts the general stages of SQL processing: parsing, optimization, row source generation, and execution. Depending on the statement, the database may omit some of these steps.
When an application issues a SQL statement, the application makes a parse call to the database to prepare the statement for execution. The parse call opens or creates a cursor, which is a handle for the session-specific private SQL area that holds a parsed SQL statement and other processing information. The cursor and private SQL area are in the PGA.
During the parse call, the database performs the following checks:
Shared pool check
The preceding checks identify the errors that can be found before statement execution. Some errors cannot be caught by parsing. For example, the database can encounter deadlocks or errors in data conversion only during statement execution (see "Locks and Deadlocks").
As explained in "Overview of the Optimizer", query optimization is the process of choosing the most efficient means of executing a SQL statement. The database optimizes queries based on statistics collected about the actual data being accessed. The optimizer uses the number of rows, the size of the data set, and other factors to generate possible execution plans, assigning a numeric cost to each plan. The database uses the plan with the lowest cost.
The database must perform a hard parse at least once for every unique DML statement and performs optimization during this parse. DDL is never optimized unless it includes a DML component such as a subquery that requires optimization.
See Also:Oracle Database SQL Tuning Guide for detailed information about the query optimizer
The query plan takes the form of a combination of steps. Each step returns a row set. The rows in this set are either used by the next step or, in the last step, are returned to the application issuing the SQL statement.
A row source is a row set returned by a step in the execution plan along with a control structure that can iteratively process the rows. The row source can be a table, view, or result of a join or grouping operation.
During execution, if the data is not in memory, then the database reads the data from disk into memory. The database also takes out any locks and latches necessary to ensure data integrity and logs any changes made during the SQL execution. The final stage of processing a SQL statement is closing the cursor.
See Also:Oracle Database SQL Tuning Guide for detailed information about execution plans and the
Most DML statements have a query component. In a query, execution of a cursor places the row generated by the query into the result set.
The database can fetch result set rows either one row at a time or in groups. In the fetch, the database selects rows and, if requested by the query, sorts the rows. Each successive fetch retrieves another row of the result until the last row has been fetched.
Oracle Database processes DDL differently from DML. For example, when you create a table, the database does not optimize the
CREATE TABLE statement. Instead, Oracle Database parses the DDL statement and carries out the command.
See Also:Oracle Database Development Guide to learn about processing DDL, transaction control, and other types of statements