Implement Custom Node Rules

8 Implement Custom Node Rules

Node rules are standard audit rules that you write in respond to the parsing of application files, including HTML, JSON, CSS, and JavaScript. Oracle JAF handles file parsing by creating data nodes that the Oracle JAF audit engine walks in the form of an Abstract Syntax Tree (AST) and exposes to you through node event listeners that you can register in your custom node rule.

About AST Rule Nodes in CSS Auditing

Rules that audit CSS files or the <style> section of HTML files are implemented as JavaScript/TypeScript files which are loaded at runtime as node.js modules, and are passed a context from Oracle JAF as it analyzes the abstract syntax tree (AST) of the audited content and invokes the node type listeners that you have registered with your audit rules.

Overview of Rule Nodes in CSS

Consider the following CSS rule:

body,html {
  margin:0; padding:0;
}

The CSS rule is represented in the AST as a Rule node. Here is a skeleton view of the Rule node:

{
   "type": "Rule",
    . . .
   "prelude" : {},             // see below
   "block" : {                 // contains the property/value pairs
      "children" : [
              {
                "type" : "Declaration",
                "property" : "margin"
                "value" : {
                     "children" : [
                            {
                              "type" : "number",
                              "value" : "0"
                            }
                      ]
                }
             },
             {
               "type" : "Declaration",
               "property" : "padding"
               "value" : {
                    "children" : [
                            {
                              "type" : "number",
                              "value" : "0"
                            }
                    ]
               }
            }
      ]
   }
}

From this sample it would be a simple task to extract the property/value pairs from this Rule node.

For clarity, some content above has been omitted. For example, throughout the Rule node there are loc sub-properties which contain positional information:

"loc": {
    "source": " ",
       "start": {
           "offset": 18,
           "line": 3,
           "column": 5
       },
       "end": {
           "offset": 26,
           "line": 3,
           "column": 13
       }
}

Note:

The loc position information is relative to the start of the CSS text. Since CSS may also be embedded in an HTML <style>, the rule context provides the offset property which provides the actual origin of the text, and that can be used to adjust the position information when reporting an issue. See the offset property and helper utility method CssUtils.getPosition() description in Context Object Properties Available to CSS Rule Listeners.

In this CSS rule example, the property prelude was shown. This contains a higher view of the structure of the rule, and introduces node types SelectorList and Selector. Here is a skeleton example.

"prelude": {
    "type": "SelectorList",
    "children": [
        {
          "type": "Selector",
          "children": [
               {
                 "type": "TypeSelector",
                 "name": "body"
               }
            ]
         },
         {
            "type": "Selector",
            "children": [
               {
                  "type": "TypeSelector",
                  "name": "html"
               }
            ]
         }
    ]
}

In the sample above, the type property has value TypeSelector since it refers to the elements <body> and <html>. For other selector types, ClassSelector, IdSelector, and PsuedoSelector are used. Note that SelectorList contains two Selector nodes; this is because body and html were grouped in the CSS using a comma. A more detailed discussion of the SelectorList node can be found below.

Overview of the SelectorList Node

In the sample above, the SelectorList property of the prelude node was introduced for a simple case using grouping. In that example, the SelectorList contains two Selector nodes of type TypeSelector. There were two Selector nodes generated because of the use of the grouping comma. This section goes into greater depth when combinators and pseudo selectors are used. When selectors are combined, only one compound Selector is generated and contains multiple child nodes.

Combinator Examples

Consider the following:

.foo.bar  { ... }

This will produce a skeleton SelectorList and Selector node as follows. Note that the Selector node contains two child ClassSelector nodes:

"prelude": {
  "type": "SelectorList",
  "children": [
     {
        "type": "Selector",
        "children": [
           {
              "type": "ClassSelector",
              "name": "foo"
           },
           {
              "type": "ClassSelector",
              "name": "bar"
           }
        ]
     }
  ]
}

Consider the following:

.foo .bar {...}

This will produce a skeleton SelectorList and Selector node as follows:

"prelude" : {
  "type": "SelectorList",
  "children": [
     {
        "type": "Selector",
        "children": [
           {
             "type": "ClassSelector",
             "name": "foo"
           },
           {
             "type": "WhiteSpace",
             "value": " "
           },
           {
             "type": "ClassSelector",
             "name": "h2"
           }
        ]
     }
  ]
}

Consider the following:

div > p  { ... }

This generates the following SelectorList node:

"prelude" : {
  "type": "SelectorList",
  "children": [
     {
        "type": "Selector",
        "children": [
           {
             "type": "TypeSelector",
             "name": "div"
           },
           {
             "type": "Combinator",
             "name": ">"
           },
           {
             "type": "TypeSelector",
             "name": "p"
           }
        ]
     }
  ]
}

Note that a Combinator node appears between the two type selectors, per the CSS.

Consider this slightly more complex example using an attribute selector:

a[href^="https"]  { ... }

This generates the following SelectorList node as follows:

"prelude": {
   "type": "SelectorList",
   "children": [
      {
        "type": "Selector",
        "children": [
           {
             "type": "TypeSelector",
             "name": "a"
           },
           {
             "type": "AttributeSelector",
             "name": {
                "type": "Identifier",
                "name": "href"
             },
             "matcher": "^=",
             "value": {
                "type": "String",
                "value": "\"https\""
             }
           }
        ]
      }
   ]
}

In the sample above, an AttributeSelector node has been generated with a matcher property.

Pseudo Class Selector Examples

Consider the following:

.foo:focus  { . . . }

This will produce a skeleton SelectorList and Selector node as follows:

"prelude": {
  "type": "SelectorList",
  "children": [
     {
        "type": "Selector",
        "children": [
           {
             "type": "ClassSelector",
             "name": "foo"
           },
           {
             "type": "PseudoClassSelector",
             "name": "focus"
           }
        ]
     }
  ]
}

Note that the Selector node reflects the class selector followed by the pseudo class selector.

Consider the following:

p:nth-last-child(2) {}

This generates a more complex Selector node as follows:

"prelude": {
   "type": "SelectorList",
   "children": [
      {
        "type": "Selector",
        "children": [
           {
             "type": "TypeSelector",
             "name": "p"
           },
           {
             "type": "PseudoClassSelector",
             "name": "nth-last-child",
             "children": [
                {
                   "type": "Nth",
                   "nth": {
                      "type": "AnPlusB",
                      "a": null,
                      "b": "2"
                   }
                }
             ]
          }
        ]
      }
   ]
}

Note that the PseudoClassSelector now has an expanded children node.

Walkthrough of Sample HTML and JSON Audit Rules

Rules that audit HTML or JSON files are passed a context from Oracle JAF as it analyzes the abstract syntax tree (AST) of the audited file and invokes the node type listeners that you have registered with your HTML/JSON audit rules.

In this walkthrough, the first audit rule shows how easy it is to get started writing a rule that audits HTML. Subsequent rule samples illustrate greater complexity and the power of Oracle JAF for writing custom rules. Overall, Oracle JAF gives you the ability to look forwards or backwards within a file from the current position, and the various JAF utility functions that are available simplify the task of writing a rule.

Note:

For clarity, the samples in this section omit getName(), getDescription(), and getShortDescription() methods. To understand the basics of node rule implementation, see Understand the Structure of Custom Audit Rules.

Version 1 - Validating id attributes

In this simple introductory rule, the requirement is to inspect all element id attributes to ensure that they begin with a common prefix (acv-) for the project.

... // for clarity, the getName(), getDescription(), and getShortDescription() methods have been omitted
 
function register(regContext)
{
  return { attr : _fnAttrs };
};
 
function _fnAttrs(ruleContext, attrName, attrValue)
{
  let issue;
 
  if ((attrName === "id") && (! attrValue.startsWith("acv-")))
  {
    issue = new ruleContext.Issue(`'id' attribute ('${attrValue}') is not prefixed with project prefix \"acv\"`);
    ruleContext.reporter.addIssue(issue, ruleContext);
  }
};

Version 2 - Validating id attributes

In general, you can look into the context for additional information, so let's assume that for this rule, you only want to look at particular project files in the file set that begin with ACV. The ruleContext object has the member filepath that you can use. Note that filepath always uses forward slashes, regardless of the platform, so the test for /ACV will succeed on all platforms.

function _fnAttrs(ruleContext, attrName, attrValue)
{
   let issue;
 
   if (ruleContext.filepath.includes("/ACV") && (attrName === "id") && (! attrValue.startsWith("acv-")))
   {
     issue = new ruleContext.Issue(`'id' attribute ('${attrValue}') is not prefixed with project prefix \"acv\"`);
     ruleContext.reporter.addIssue(issue, ruleContext);
   }
};

Version 3 - Validating id attributes

How can the rule be improved? Because JAF is very efficient at file processing, you could seek to improve performance if very large numbers of files are involved. To do that, let's use the context object node property, and the attribs property of the node. The node property is the current node in the file, so you can navigate forwards or backwards from it. Secondly, from the performance aspect, you can reduce the number of invocations of the rule by only listening for HTML elements instead of attributes. Let's assume that, on average, the DOM elements have 5 attributes, then you would reduce the number of rule invocations by 80%. In this version of the rule, the attributes of each element are examined directly.

function register(regContext)
{
  // Listen for DOM elements instead of attribute
  return { tag : _fnTags };
};
 
function _fnTags(ruleContext, tagName)
{
  // Look at the element's attributes
  let attribs, attrValue, issue;
 
  // 'attribs' is an object of attribute name/value properties for the tag
  attribs   = ruleContext.node.attribs;
  // Get the 'id' value if it exists
  attrValue = attribs.id;              
 
  if (attrValue && (! attrValue.startsWith("acv-")))
  {
    issue = new ruleContext.Issue(`'id' attribute ('${attrValue}') is not prefixed with project prefix \"acv\"`);
    ruleContext.reporter.addIssue(issue, ruleContext);
  }
};

Version 4 - Validating id attributes

At this point, it is worth noting that the ruleContext object provides access to DomUtils, a collection of useful DOM utility functions. For example, the function _fnTags() in the above example could be rewritten as follows.

function _fnTags(ruleContext, tagName)
{
  let attrValue, issue;
 
  // Returns the 'id' attribute's value if found
  attrValue = ruleContext.DomUtils.getAttribValue(context.node, "id");      
  if (attrValue && (! attrValue.startsWith("acv-")))
  {
    issue = new ruleContext.Issue(`'id' attribute ('${attrValue}') is not prefixed with project prefix \"acv\"`);
    ruleContext.reporter.addIssue(issue, ruleContext);
  }
};

Version 5 - Validating id attributes

While JAF is efficient, audit rules can always be improved upon. To listen for a file invocation, the rule must register a listener for the file type.

Note:

It is necessary to understand that performance and rule complexity/maintainability is a tradeoff. For example, it is possible to reduce this rule's invocation count to once per file by converting the rule to a hook type rule, as described by Implement Custom Hook Rules. Essentially, hook rules make it possible to request that a rule to be invoked (once only) when the file is first read, and prior to any other rules. This means that the rule must then examine the parsed nodes looking for elements and their attributes.

function register(regContext)
{
   // Listen for files instead of elements or attributes
   return { file : _fnFiles };
};
 
function _fnFiles(ruleContext)
{
  let tagNodes, node, attrValue, i;
  const DomUtils = ruleContext.utils.DomUtils;
 
  // Get elem nodes only (ignore text, comments, directives, etc)
  tagNodes = DomUtils.getElems() ;
  for (i = 0; i < tagNodes.length; i++)
  {
    node = tagNodes[i] ;
    // Get the id" attribute value
    attrValue = DomUtils.getAttribValue(node, "id");    
    if (attrValue && (! attrValue.startsWith("acv-")))
    {
       issue = new ruleContext.Issue(`'id' attribute ('${attrValue}') is not prefixed with project prefix \"acv\"`);
       ruleContext.reporter.addIssue(issue, ruleContext);
    }
  }
};

Walkthrough of a Sample CSS Audit Rule

Rules that audit CSS files or the <style> section of HTML files are passed a context from Oracle JAF as it analyzes the abstract syntax tree (AST) of the audited file and invokes the node type listeners that you have registered with your CSS audit rules.

The CSS in this CSS rule walkthrough is as follows.

p ...  {
          color : "#112233",
          ...
        }

Note that p can be decorated with additional CSS syntax, and the audit rule must ignore any such decoration.

The audit rule starts by listening for CSS rules and then looks for a p type selector. For more information about determining a type selector, see About AST Rule Nodes in CSS Auditing.

Here is the basic framework for the audit rule for CSS:

var  CssUtils ;
 
function  register(regCtx)
{
   // See setPosition() below 
   CssUtils.regCtx.utils.CssUtils;
   return { "css-rule" : _onRule }
};
 
 
function  _onRule(ruleCtx, rule)
{ 
   // If the rule has a p type selector
   if (_hasParaTypeSelector(rule))
   {
      // and the rule sets the 'color' property
      let loc = _getColorPropPos(rule);
      if (loc)
      {
         // report the issue.
         _emitIssue(ruleCtx, loc); 
      }
   }
};
 
 
function  _emitIssue(ruleCtx, loc)
{
   var issue = new ruleCtx.Issue("p type selector must not override the 'color' property");
 
   issue.setPosition(CssUtils.getPosition(ruleCtx, loc));
   ruleCtx.reporter.addIssue(issue, ruleCtx);
};

The next step is to analyze the rule node to find the p type selectors:

function  _hasParaTypeSelector(rule)
{
  var sels, sel, a, ch, i, j;
 
  if (rule.prelude.type === "SelectorList")
  {
    a = rule.prelude.children;
 
    for (i = 0; i < a.length; i++)
    {
       sels = a[i];
       if (sels.type === "Selector")
       {
         ch = sels.children;
         for (j = 0; j < ch.length; j++)
         {
           sel = ch[j];
           if (sel.type === "TypeSelector" && sel.name === "p")
           {
             return true;
           }
         }
       }
    }
  }
};

Finally, we need to search the rule to see if it specifies the color property:

function  _getColorPropPos(rule)
{
   var block, decl, i;
 
   // Process the rule's block of property/value pairs
   block = rule.block.children;
   for (i = 0; i < block.length; i++)
   {
      decl = block[i];
      if (decl.type === "Declaration" && decl.property === "color")
      {
         // Return the 'color' property position for the Issue
         return decl.loc;
      }
   }
};

Walkthrough of a Sample Markdown Audit Rule

JAF parses a Markdown file into an abstract syntax tree (AST). This AST is subsequently analyzed, and the summarized data objects are presented to rules via their registered rule listeners.

For Markdown processing, a rule can listen for file events (i.e., when a .md file is first read) or for specific Markdown events (when a particular type of markup is found).

For a list of Markdown rule listeners and a description of their arguments, see Listener Types for Markdown Rules.

In either case, when a rule listener is invoked, it is passed a context object. In addition to the many properties available on the context object for all rule types (see Context Object Properties Available to Registered Listeners), context contains the supplementary data property suppData, which is of particular interest when auditing a Markdown file. The property provides easy access to summarized data (links, images, paragraphs, headings, code blocks, etc.) through methods available on its utils object. For more information, see Context Object Properties Available to Markdown Rule Listeners.

Note:

Use of the file event in conjunction with the utility methods available via suppData.utils will be the most straightforward approach to accessing summarized data, rather than walking the AST, since all the summarized data will be available at that point and the AST will not need to be inspected.

Listen for Markdown Events

The Markdown events are of the form md-xxxxx, where xxxxx represents the type of Markdown data required (e.g., md-link for link events, used in the following rule class).

register(regCtx)
{
   return {
             md-link : this._onLink,      // Want markup URL references
             . . .
          }
}
 
_onLink(ruleCtx, link)
{
   // Process the link object passed as the second argument
   . . .
 
   // Access to all the parsed data is also available via the supplementary data object
   var utils  = ruleCtx.suppData.utils ;
   var images = utils.getImages() ;         // Returns an array of image objects
 
   // Process the links array
   . . .
}

Note how the supplementary data property suppData is used to get the Markdown utils object. This is then used to acquire the image data from the markup text.

A file hook can also be used, and this permits a rule to access all of the Markdown data at the same time.

register(regCtx)
{
   return {
             file : this._onFile,      // Listen for files being read
             . . .
          }
}
 
_onFile(ruleCtx)
{
   var utils, links, images, paras ;
 
   utils  = ruleCtx.suppData.utils ;
   links  = utils.getLinks() ;         // Array of link objects
   images = utils.getImages() ;        // Array of image objects
   paras  = utils.getParas() ;         // Array of paragraph objects 
 
   // Process the links, images, and paragraphs
   . . .
}

Example Rules

The following rule checks that the first paragraph of Markdown contains a copyright.

class  Rule
{
  // For clarity, getRule(), getDescription(), and getShortDescription() have been omitted.
 
  register(regCtx)
  {
    return { file : _onFile }
  }
 
  _onFile(ruleCtx)
  {
    var  utils  = ruleCtx.suppData.utils ;
    var  paras  = utils.getParas() ;
 
    var para = paras[0] ;                   // Get the first paragraph.
 
    if (! /[Oo]racle/.test(para.text))
    {
      let issue = new ruleCtx.Issue("Copyright must be declared in first paragraph") ;
 
      // Supply start and end indices so that the paragraph can be highlighted.
      issue.setPosition(null, null, paras.pos[0].start, paras.pos[1].end);     // JAF will compute the line/col from the start index
      ruleCXtx.reporter.addIssue(issue, ruleCtx) ;
    }
  }
 
}

The following rule finds all references to URLs containing "Oracle".

class Rule
          {
          // For clarity, getRule(),getDescription(), and getShortDescription() have been omitted  
          
          register(regCtx)
           {
            return { file : _onFile}
            }

          _onFile(ruleCtx) 
          {
             var  utils    = ruleCtx.suppData.utils;
             var  links    = utils.getLinks();
             var  refLinks = utils.getRefLinks();
             var  images   = utils.getImages();
          
             // URLs are found in inline-links, inline-images, and reference links
    
             // Inspect inline links
              links.forEach( (link) => {         
                               if (checkUrl(link.inline)       
                                 {   
                                   // found                            
                                  }                           
                                 });
             // Inspect reference links
                for (const ref in refLinks)
                  {
                    link = refLinks[ref] 
                    if (checkUrl(link, refLink)     
                    {
                      // found                              
                     } 
          
              // Inspect inline-image links
                 images.forEach( (link) => {         
                                    if (checkUrl(link.inline)                                    if(link.link.includes("Oracle"))         
                                    {     
                                           // found
                                     }                           
                                });
         }  
 
      _checkUrl(link, refLink)
        {
             if (link.inline || refLink)         // Inline links contain a url (i.e., ignore references to links)
             {     
               return (link.link.includes("Oracle")) ;
              }  
         } 
    };

Walkthrough of a Sample JavaScript/TypeScript Audit Rule

Rules that audit JavaScript/TypeScript files are passed a context from Oracle JAF as it analyzes the abstract syntax tree (AST) of the audited file and invokes the node type listeners that you have registered with your JavaScript/TypeScript audit rules.

A JavaScript/TypeScript file is parsed by JAF into an Abstract Syntax Tree (AST), and as the tree is subsequently walked, any node with a type registered by a rule is passed to the rule in context.node. The node.type property (string) specifies what the node represents. For example, a node.type of AssignmentExpression indicates a typical statement form such as myVariable = 42. As an example, in JavaScript, the portion of the AST representing this statement is:

myVariable = 42;

The above statement parses into the following node, where some additional properties have been removed for clarity:

{
   "type":  "ExpressionStatement",
   "expression":  {
                     "type":  "AssignmentExpression",
                     "operator": "=",
                     "left":   {
                                 "type": "Identifier",
                                 "name":  "myVariable",
                               },
                     "right":  {
                                  "type": "Literal",
                                  "value": 42,
                                  "rawValue": 42,
                              },
                  },
}

Thus a simple rule to flag number assignments to variables that are greater than 42, could be:

function register(regContext)
{
   return {
             AssignmentExpression : _fnAssign
          }; 
};
  
function _fnAssign(ruleCtx, node)
{
  if (node.left && (node.left.type === "Identifier"))
  {
    if (node.right && (node.right.type === "Literal") && (parseInt(node.right.value) > 42))
    {
       let issue = new ruleCtx.Issue(`${node.left.name} assignment is greater than 42`);
       ruleCtx.reporter.addIssue(issue, ruleCtx);
    }
  }
};

Tip:

When writing a JavaScript rule, it is helpful to be able to look at the syntax tree for the particular case being audited. The AST Explorer tool can be very helpful by allowing you to generate syntax trees for arbitrary pieces of JavaScript.