Concepts Guide

Designing Data Services

This section contains general guidelines and patterns for creating a data services layer. The following topics are covered:

Using a Layered Data Integration and Transformation Approach

When planning a data service deployment, it is helpful to think of the data service layer in terms of an assembly line. In an assembly line, a product is built incrementally as it passes through a series of machines or assemblers that specialize in one aspect of the fabrication of the product.

Similarly, a well-designed data services layer transforms input (source data) into output (structured information) incrementally, through a series of small transformations. Such a design eases development and maintenance of the data services and increases the opportunity for reuse.

Note: Keep in mind that a multi-level data service implementation model described here are flattened when the data services are compiled for deployment. That is, adding conceptual layers does not add overhead to the data integration work performed by the Liquid Data deployment, and therefore does not affect performance.

By this design, distinct subsets of data services comprise sublayers in the overall transformation layer. As data passes from layer to layer—each of which specializes in one aspect of the transformation—the data is transformed from a more generalized state to a more application-specific state.

To further illustrate this design, consider a deployment with the following sublayers:

The first sublayer (that is, the first one to touch the raw data) is the physical data services layer. This layer exists in any Liquid Data deployment, whether or not data is further transformed. The data services in this layer are generated for you when you import the metadata for a data source and normally should not be modified.
The second sublayer of data services should normalize the data while retaining the data shape as imported. For example, it can change element names (that is, tag names) to make them consistent with other sources and make minor modifications to data values, for example, concatenating names or adjusting time values for a time zone.
Data services in the next sublayer can then use the normalized data to represent integrated business entities in the data domain, such as an a unified view of a customer. The data services can unify data sources, for example, or change the shape of the data in any way desired.

This sublayer does most of what might be called the integration work of the overall data services layer; it is where the integration logic and predicates and primary relationships are specified. (In small projects, this layer may be combined with the second sublayer. That is, it would contain data services that both normalize the data and define data shapes for the integration layer.)

A final sublayer can specialize information specifically for applications. This layer, which can be thought of as the extended services layer, tailors information in a way that makes sense to particular applications or types of applications, such as executive dashboards, sales portals, or HR applications. For example, it might specify nesting in its data shape a way that is useful for particular applications, such as having order items as a child of a customer item or, on the other hand, customers as a child of orders (as shown in Figure 5-1).

For very large database sources, instead of creating a single master data service, it is best to decide what a client application needs and build corresponding, minimal data services. The concept is to build client-specific data services from a manageable number of views that query a reasonable number of data sources, providing an abstraction from the lowest level and most common relationships while keeping the overall view reasonably simple. Liquid Data also provides a metadata API that allows client applications to discover relationships between data services at run-time, allowing applications to navigate the data services without the need for a master data service.

Figure 5-1 Layered Data Services Design Strategy

Layered Data Services Design Strategy

The most significant benefit of this approach is that it increases the opportunity for reuse within the overall data services layer. As shown in Figure 5-1, once you have defined a single form of a business entity (such as a customer) in a data service dedicated to the task, you can have multiple application-specific data services use the information without having to repeat data normalization and integration tasks. An additional benefit is that it aids maintenance because there is a clear separation of concerns between the data service layers.

Taking Advantage of Data Service Reusability

A typical data service design pattern within a data service is to have a single read function that defines the data shape without filtering conditions. The function may be declared private so that it can only be called by other functions within the same data service. Also, it would be the only one with integration logic. Additional functions, either in the same data service or in other data services, can use the private function to specify the filtering criteria users. Figure 5-2 shows the design view of a data service exhibiting this pattern.

Figure 5-2 Customer Data Service functions

Customer Data Service functions

The following XQuery sample demonstrates the mechanics behind data service reuse. This function, getCustomerByName(), filters instances based on the customer name:

    declare function l1:getCustomerByName($c_name as xs:string) 
    as element(t1:CUSTOMER)* 
    {
        for $c in l1:getAllCustomers() 
        where $c/CUSTOMERNAME eq $c_name
        return $c
    };

The getAllCustomers() function, in turn, would assemble the data shape for the returned data, providing join logic and transformation, as shown its return clause:

   ...
   return
        <t1:CUSTOMER>     
          <CUSTOMERID>{fn:data($c/CUSTOMERID)}</CUSTOMERID>
          <CUSTOMERNAME>{fn:data($c/CUSTOMERNAME)}</CUSTOMERNAME>
          {
             for $a in f2:ADDRESS()
             where $c/CUSTOMERID eq $a/CUSTOMERID
             return
                <ADDRESS>
                <STREET>{fn:data($a/RTL_STREET)}</STREET>
                <CITY>{fn:data($a/RTL_CITY)}</CITY>
                <STATE>{fn:data($a/RTL_STATE)}</STATE>
                </ADDRESS>
         }
        </t1:CUSTOMER>

Keep in mind that client application themselves can specify filtering conditions on a data service function call. Therefore, the data service designer can choose whether to have broadly defined data access functions (that is, without filter condition), and let the client to apply filtering as desired, or narrowly by defining the criteria in the API.

Note: All functions whose bodies are FLWOR (For-Let-Where-Order-Return) statement should be be declared to return a plural rather than a singulr result; for example, element(purchase_order)* rather than element(purchase_order).

Modeling Relationships

There are several ways to implement a logical relationship between distinct units of information with data services:

Data shape containment
Navigation functions

When containment is implemented in the data shape, it means that the XML data type of the data service is nested; that is, one element is the parent of another element. For example, in the following sample a customer element contains orders:

<customer>
   <customerId>...</customerId>
   <customerName>...</customerName>
   <orders>
      <order>...</order>
      <orderId>...</orderId>
   </orders>
</customer>

A diagram of this XML structure would be:

Customer Data Service functions

In this type of containment, the parent-child hierarchy between the customer and order is locked into the data shape. This nesting might make sense for most applications, particularly those oriented by customer. However, other applications may benefit from an orders-oriented view of the data. For example, an inventory application may prefer to work with the data in an orders-first fashion, with the customer as a child element of each order.

Customer Data Service functions

Conceptually, in this case it could also be said that an Order is not existence-dependent on a Customer. If a Customer deleted, it may not necessarily follow that the customer's deleted should be deleted as well.

Alternatively, other relationships do not require this type of hierarchical flexibility. In most cases, this also implies that the business entity's existence does depend on the existence of the parent. For example, consider an order that contains items.

Customer Data Service functions

In most logical data models, it would not make sense to have an item outside of the context of the order that contains it. When deleting an order, it is safe to say that composing order items would need to be deleted as well.

The choice when modeling such containment either through a relationship or through data shape nesting is informed by these considerations. When choosing whether to model containment either through data shape nesting or using relationships, it is recommended that:

Existence-dependent entities are modeled as nested elements.
Existence-independent entities are modeled as relationships.

By modeling independent entities with bi-directional relationships, data service users and designers can easily specialize the logical hierarchy between business entities as best suited for their applications.