5.2 Important Terms and Concepts

Introduction to edge nodes, edge database, cell nodes, and Hadoop cluster integration.

These terms are key to understanding Query Server.

About Edge Nodes

An edge node in a Hadoop cluster is the interface between the Hadoop cluster and the outside network. Typically, edge nodes are used to run client applications and Hadoop cluster administration tools such as Cloudera Manager and Apache Ambari. Edge nodes can act as a data gateway, by providing HDFS access through NFS or HttpFS, or by running REST servers.

About Cell Nodes

The BDS cells run on the DataNodes, and allows for parts of query processing to be pushed down to the Hadoop cluster DataNodes where the data resides. This ensures both load distribution and reduction in the volume of data that needs to be sent to the database for processing. This can result in significant performance improvements on Big Data workloads.

Hadoop Cluster Integration

Oracle Big Data SQL includes the following three service roles that can you can manage in either Cloudera Manager or Apache Ambari:

  • Big Data SQL Query Server: Enables you to run SQL queries against the Hadoop cluster. Applications connect to this server using JDBC or SQL*Net.
  • Big Data SQL Agent: Manages the Big Data SQL installation and is also used by the Copy to Hadoop feature.
  • Big Data SQL Server: Also known as Big Data SQL Cells, allows for parts of query processing to get pushed-down to the Hadoop cluster DataNodes where the data resides.