Perform Advanced Analytics with Link

Understand the application of the Link feature in performing advanced analytics with the use-case discussed in this topic.

For the steps to use the Link feature to analyze your log records, see Link Visualization.

Analyze the Access Logs of Oracle WebLogic Server

Consider the example of a data set consisting of Oracle WebLogic Server Access Logs from the log source FMW WLS Server Access Logs. The log records contain data about the access to Oracle WebLogic Server by the users over a specific period of time. These individual log records can be analyzed to get meaningful insight into the usage statistics, the popularity of the URLs, the most active users, and more such data. From the logs, learn to obtain the following results by analyzing the log records with the selection of specific fields for each result:

  1. Display the top URLs by Number of Hits

  2. Display the anomalies by Number of Hits

  3. Display the anomalies by Access Duration

  4. Identify the URLs by Upload Size

  5. Identify the URLs by Download Size

  6. Analyze the correlation between Number of Hits and Download Size

  7. Determine the Most Visited Pages

  8. Identify the Top Users

  9. Identify the Top Users and their Favorite Pages

  10. Identify the entry page that drives maximum visits

  11. Identify the Entry and Exit path for most users

Note:

  • Use the rename command to change the name of the field to one that’s more relevant for the use-case.

  • The classify command lets you analyze the groups, and displays the result in the form of a bubble chart. To simply view the result of the execution of a query in the tabular format, remove the classify command from the query, and re-run it.

  • Click the anomalous bubble in the chart to view the details of the anomalous groups. To return to the original result after investigating the bubble, click the Undo Undo icon.

  • When you run the link command, the group duration is shown in a readable format in the bubble chart, for example, in minutes or seconds. However, if you want to run a where command after the link command to look for transactions that took more than the specified number of seconds (say, 200 seconds), then the unit that you must use is milliseconds.

To retrieve the data set, select a suitable date range, specify the log source, and run the query:

'Log Source' = 'FMW WLS Server Access Logs'

Select Link open link from the Visualize panel. This’ll display the 'FMW WLS Server Access Logs' groups table and the bubble chart.

  1. To display the top URLs by Number of Hits, group the log records by the value of the URL in the log record, obtain the total count for the URL in each group, rename the default fields in the groups table to suitable values, and display the result in the tabular format. With this analysis, you can determine the URLs that’re most used.

    1. Drag and drop the field URI to Link By, remove the field Log Source from Link By, and click the check mark to submit the query.

    2. After the query is executed, in the command-line, change the names of the fields Count to Number of Hits, Start Time to First Access, End Time to Last Access, and Group Duration to Access Duration.

    3. Remove the classify command from the command-line, and submit the query.

      The query will be as follows:

      'Log Source' = 'FMW WLS Server Access Logs' | link URI | rename Count as 'Number of Hits', 'Start Time' as 'First Access', 'End Time' as 'Last Access', 'Group Duration' as 'Access Duration'

    On running the query, you can determine the top URLs by number of hits in the table. The columns are renamed as specified in the rename command.

  2. To display the anomalies by Number of Hits, group the log records by the value of the URL in the log record, rename the default fields in the groups table to suitable values, and analyze the groups for the URL’s number of hits. With this analysis, you can separate the unusual pattern in accessing the URLs.

    Click Analyze, select Number of Hits, and click OK.

    The query must change to the following:

    'Log Source' = 'FMW WLS Server Access Logs' | link URI | rename Count as 'Number of Hits', 'Start Time' as 'First Access', 'End Time' as 'Last Access', 'Group Duration' as 'Access Duration' | classify topcount = 300 'Number of Hits'

    This query triggers analysis of the 'Number of Hits' column and creates bubbles representing the commonly seen ranges. The majority of the values are treated as the baseline. For example, a large bubble can become the baseline, or a large number of smaller bubbles clustered together can form the baseline. Bubbles that are farthest from the baseline are marked as anomalies.

    So, this displays the anomalous URLs grouped into separate bubbles in the bubble chart. To view the percentage of URLs in each range of number of hits, hover the cursor on the bubbles.

  3. To display the anomalies by Access Duration, group the log records by the value of the URL in the log record, rename the default fields in the groups table to suitable values, and analyze the groups for the access duration of the URL. With this analysis, you can separate the unusual pattern in the time spent in accessing the URLs. In continuation to step 2:

    Click Analyze, select Access Duration, and click OK.

    Access Duration is an indication of the duration for which each URL was accessed. This is computed as the difference between the last timestamp and the first timestamp in the log file for each URL.

  4. To identify the URLs by Upload Size, group the log records by the value of the URL in the log record, rename the default fields in the groups table to suitable values, and analyze the groups for the size of the data uploaded. With this analysis, you can identify the URLs that have unusual size of the data uploaded. In continuation to step 3:

    1. Drag and drop the field Content Size In to Display Fields.

    2. Rename the field Content Size In to Bytes Uploaded by altering the query on the command-line, and run the query.

    3. Click Analyze, select Bytes Uploaded, and click OK.

      The query will be as follows:

      'Log Source' = 'FMW WLS Server Access Logs' | link URI | stats avg('Content Size In') as 'Bytes Uploaded' | rename Count as 'Number of Hits', 'Start Time' as 'First Access', 'End Time' as 'Last Access', 'Group Duration' as 'Access Duration' | classify topcount = 300 'Bytes Uploaded'

      The Analyze chart displays the groups of URLs by the bytes uploaded.

    4. To correlate the Bytes Uploaded data across the time range, you can selectively hide or show charts in the Histogram Chart Options. Explore the other visualization options besides the bar chart.

  5. To identify the URLs by Download Size, group the log records by the value of the URL in the log record, rename the default fields in the groups table to suitable values, and analyze the groups for the size of the data downloaded. With this analysis, you can identify the URLs that have unusual size of the data downloaded. In continuation to step 4:

    1. Drag and drop the field Content Size Out to Display Fields and remove Content Size In from Display Fields.

    2. Rename the field Content Size Out to Download Size by altering the query on the command-line, and run the query.

    3. Click Analyze, select Download Size, and click OK.

      The query will be as follows:

      'Log Source' = 'FMW WLS Server Access Logs' | link URI | stats avg('Content Size Out') as 'Download Size' | rename Count as 'Number of Hits', 'Start Time' as 'First Access', 'End Time' as 'Last Access', 'Group Duration' as 'Access Duration' | classify topcount = 300 'Download Size'

      The Analyze chart displays the groups of URLs by the download size.

    4. To correlate the Download Size data across the time range, you can selectively hide or show charts in the Histogram Chart Options. Explore the other visualization options besides the bar chart.

  6. To analyze the correlation between Number of Hits and Download Size, group the log records by the value of the URL in the log record, rename the default fields in the groups table to suitable values, and analyze the groups for the size of the data downloaded and the number of hits. With this analysis, you can identify the URLs that have unusual patterns of size of data downloaded and number of hits. In continuation to step 5:

    1. Click Analyze, select the fields Number of Hits, Download Size, and click OK.

    2. Remove topcount=300 from the query to see all the bubbles, and run the query.

      The query will be as follows:

      'Log Source' = 'FMW WLS Server Access Logs' | link URI | stats avg('Content Size Out') as 'Download Size' | rename Count as 'Number of Hits', 'Start Time' as 'First Access', 'End Time' as 'Last Access', 'Group Duration' as 'Access Duration' | classify 'Download Size', 'Number of Hits'

    In the bubble chart, the field Number of Hits is plotted along the x-axis and Download Size along the y-axis.



    The bubbles can be interpreted as follows:

    • 73.8% of the URLs were accessed one to seven times.

    • Average download size for the 73.8% of URLs is between 32,345 to 34,000. This tight range implies that a large number of URLs have very uniform behavior with reference to the download size.

    • Since 73.8% is the large majority, the rest of the points are marked as anomalies.

    • With real data, it is common for the system to group .css, .js and image files separately from other URLs because they tend to have different download behaviors.

  7. To determine the Most Visited Pages, group the log records by the value of the URL in the log record, rename the default fields in the groups table to suitable values, and analyze the groups for the number of unique visitors. With this analysis, you can identify the URLs that’re most visited by the unique visitors. In continuation to step 6:

    1. Drag and drop the field User Name to Display Fields.

    2. Click the down arrow next to the field name, change the function from Unique to Distinct Count. See the other functions you can select for a numeric field:

    3. Rename the field User Name to Number of Unique Users, remove the classify command by altering the query on the command-line, and run the query. The query will be as follows:

      'Log Source' = 'FMW WLS Server Access Logs' | link URI | stats avg('Content Size In') as 'Bytes Uploaded', avg('Content Size Out') as 'Download Size', distinctcount('User Name') as 'Number of Unique Users' | rename Count as 'Number of Hits', 'Start Time' as 'First Access', 'End Time' as 'Last Access', 'Group Duration' as 'Access Duration' 
    4. Click Analyze, select the field Number of Unique Users, and click OK.

    The table lists the URLs and the corresponding number of unique users, helping us to identify the URLs that were most visited by unique users. From the table, you can also determine the number of hits that each URL has.

    The analysis shows that more than 99% of the URLs have 0 or 1 unique users. This would be the case for URLs that don't need a login, or are seldom accessed. Drilling down to any of the smaller bubbles will point to the specific pages, how many hits they typically have, and how many unique visitors.

  8. To identify the Top Users, group the log records by the value of the user name in the log record, rename the default fields in the groups table to suitable values, and analyze the groups for the number of hits. With this analysis, you can identify the most active users.

    1. Edit the command-line to remove all the filters: 'Log Source' = 'FMW WLS Server Access Logs' | link URI

    2. Drag and drop the field User Name to Link By, remove URI, and run the query.

    3. Remove the classify command, rename the default fields in the command-line, and run the following query:

      'Log Source' = 'FMW WLS Server Access Logs' | link 'User Name' | rename Count as 'Number of Hits', 'Start Time' as 'First Access', 'End Time' as 'Last Access', 'Group Duration' as 'Access Duration'

      The table is sorted by the number of hits by the user.

    4. To view the user behavior by access, click Analyze, select the field Number of Hits, and click OK.

    5. Click the anomalies to identify the users who have recorded higher or lower number of hits compared to the other users.

  9. To identify the Top Users and their Favorite Pages, group the log records by the value of the user name in the log record, rename the default fields in the groups table to suitable values, and analyze the groups for the number of unique pages. With this analysis, you can identify the least and most active users, and their favorite pages. In continuation to step 8:

    1. Drag and drop the field URI to Display Fields. Change the function from Unique to Distinct Count.

    2. Rename the field URI to Number of Unique Pages by altering the query in the command-line, and run the query.

    3. Click Analyze, select the field Number of Unique Pages, and click OK.

  10. To identify the entry page that drives maximum visits, group the log records by the value of the user name in the log record, rename the default fields in the groups table to suitable values, and analyze the groups for the values of the entry URLs and number of hits to the URLs. With this analysis, you can identify the pages that the users hit first. In continuation to step 9:

    1. To get the entry URLs, change the function of the field URI from Distinct Count to Earliest.

    2. Rename the field URI to Entry URL by altering the query in the command-line, and run the query.

    3. Click Analyze, select the fields Number of Hits and Entry URL, select the topcount as 20, and click OK.

      'Log Source' = 'FMW WLS Server Access Logs' | link 'User Name' | stats earliest(URI) as 'Entry URL' | rename Count as 'Number of Hits', 'Start Time' as 'First Access', 'End Time' as 'Last Access', 'Group Duration' as 'Access Duration' | classify topcount = 20 'Number of Hits', 'Entry URL'


    This displays the first URL used by the users in relation to the number of hits. For example, /login is the first URL majority of the users use.

  11. To identify the Entry and Exit path for most users, group the log records by the value of the user name in the log record, rename the default fields in the groups table to suitable values, and analyze the groups for the values of the entry URLs and exit URLs. With this analysis, you can identify
    • The most common paths taken by the users to transit through the website

    • The most popular product pages from where the users are exiting the website

    • The most common exit URLs, like the product checkout pages or the payment gateway

    • The unusual exit URLs, and root cause the unexpected exits

    In continuation to step 10:
    1. Drag and drop the field URI to Display Fields.

    2. To get the exit page, change the function of the field URI from Unique to Latest.

    3. Edit the command-line and rename the field latest(URI) to Exit URL and submit the query.

    4. Click Analyze, select the fields Entry URL and Exit URL, select the topcount as 20, and click OK.

      'Log Source' = 'FMW WLS Server Access Logs' | link 'User Name' | stats earliest(URI) as 'Entry URL', latest(URI) as 'Exit URL' | rename Count as 'Number of Hits', 'Start Time' as 'First Access', 'End Time' as 'Last Access', 'Group Duration' as 'Access Duration' | classify topcount = 20 'Entry URL', 'Exit URL'
    5. Increase the size of the chart by using the Analyze Chart Options.



    This tree map shows the relationship between the entry and exit URLs in a site. This would be very useful for the retail sites where the service providers would want to identify the entry URLs that lead the customers to the checkout pages, and the product URLs that’re causing users to not proceed to checkout.

Generate Charts with Virtual Fields

To create a new virtual field, you can use the eval command in the link feature. The eval query on the command-line will generate a line chart for the virtual field and enable tracking it over time.

To create a new virtual field, you can use the eval command in the link feature. The eval query on the command-line will generate a line chart for the virtual field and enable tracking it over time.

Examples:

  • Consider the scenario where the log records from the log source SAR CPU Logs are grouped by the host name and the CPU. To determine the load experienced by the CPU of the server over time, the eval command creates a virtual field Load % and generates the line chart.

    'Log Source' = 'SAR CPU Logs' | rename Instance as CPU | link 'Host Name (Server)', CPU | stats avg('CPU Idle Time (%)') as 'CPU Idle Time (%)' | eval 'Load %' = 100 - 'CPU Idle Time (%)'

    To view the line chart:

    1. Click the Histogram tab.

    2. Click the down arrow next to the Chart options (chart options) icon. Click Hide / Show Charts. Select Load %.

    3. Click the down arrow next to the Chart options (chart options) icon. Click Chart Options. From the Chart Type list, select Line Without Marker. Click Close.



  • Consider the scenario where the log records from the log source OMC WLS Server Access Logs are grouped by the URI. To determine the size of the data accessed over time in megabytes, the eval command creates a virtual field Content Size (MB), calculates the content size in megabytes based on the value of the field Content Size,and generates the line chart.

    'Log Source' = 'WLS Server Access Logs' | link URI | stats avg('Content Size') as 'Content Size Bytes' | eval 'Content Size (MB)' = 'Content Size Bytes' / 1024

    To view the line chart:

    1. Click the Histogram tab.

    2. Click the down arrow next to the Chart options (chart options) icon. Click Hide / Show Charts. Select Content Size (MB) and Access Log Records.

    3. Click the down arrow next to the Chart options (chart options) icon. Click Chart Options. From the Chart Type list, select Line Without Marker. Click Close.



Link by Using SQL Statement as the Field of Analysis

Link supports SQL Statement as a field for analysis. SQL Statement contains the SQL that’s executed, and is captured by log sources like Database Audit XML Logs and Oracle Unified DB Audit Log Source Stored in Database 12.1.

You can use link 'SQL Statement' to group SQLs and analyze their behavior and identify anomalies.

Example:

Consider the following query that links the log records based on the field SQL Statement:

'Log Source' in ('Database Audit Logs', 'Database Audit XML Logs') 
	| rename 'Host Name (Server)' as 'DB Server', 'User Name (Originating)' as 'OS User', 'User Name' as 'DB User' 
	| link 'SQL Statement' 
	| rename Count as 'Number of Runs', 'Start Time' as 'First Run', 'End Time' as 'Last Run', 'Group Duration' as Age 
	| addfields [ Object = dual | stats count as 'dual Table Access' ], 
		[ Object like 'all_%' | stats count as 'ALL_ Table Access' ], 
		[ Object like 'dba_%' | stats count as 'DBA_ Table Access' ], 
		[ Object like 'user_%' | stats count as 'USER_ Table Access' ], 
		[ Object like 'v$%' | stats count as 'VDollar Table Access' ], 
		[ Object = null | stats count as 'No Table Access' ], 
		[ Action = '2' | stats count as 'Insert Count' ], 
		[ Action = '3' | stats count as 'Select Count' ], 
		[ Action = '6' | stats count as 'Update Count' ], 
		[ Action = '7' | stats count as 'Delete Count' ], 
		[ Type = '8' | stats count as 'Connect Count' ], 
		[ 'Status Code' = 1 | stats count as Failures ] 
	| eval 'Object Type' = if('dual Table Access' > 0, Dual, 
		'ALL_ Table Access' > 0, System, 
		'DBA_ Table Access' > 0, System, 
		'USER_ Table Access' > 0, System, 
		'VDollar Table Access' > 0, System, 
		'No Table Access' > 0, 'No Table', Other) 
	| eval 'SQL Type' = if('Insert Count' > 0, Insert, 
		'Select Count' > 0, Select, 
		'Update Count' > 0, Update, 
		'Delete Count' > 0, Delete, 
		'Connect Count' > 0, Connect, Other) 
	| stats distinctcount(Object) as Objects, distinctcount('Database ID') as 'Number of DBs', 
		distinctcount(Session) as 'Number of Sessions' 
	| fields -'dual Table Access', -'No Table Access', -'ALL_ Table Access', 
		-'USER_ Table Access', -'DBA_ Table Access', -'VDollar Table Access', -'Insert Count', 
		-'Select Count', -'Update Count', -'Delete Count', -'Connect Count', -'SQL Type', -'Object Type' 
	| classify Age 
	| classify 'Number of Sessions' 
	| classify 'Number of DBs' 
	| classify 'Number of Runs', 'Object Type' 
	| classify 'Object Type', 'SQL Type'

Note:

addfields is a function available with link visualization to add virtual fields to the query. It takes a query and pipes the output to a stats command. The resulting virtual field is available in the table as well as in the time series chart.

By executing the above query, the following results can be observed:

  • Based on the classify command, the bubble charts for Age, Number of Sessions, Number of DBs, Number of Runs, Object Type, and Object Type, SQL Type are generated.





    In the bubble charts, the log records are grouped based on the number of SQLs that fall under each set of parameters. The Object Type and SQL Type parameters are determined using the eval command in the query.

  • The Line with Area histogram charts illustrate the occurrence of fields like dual Table Access, No Table Access, ALL_ Table Access, USER_ Table Access, DBA_ Table Access, VDollar Table Access, Insert Count, Select Count, Update Count, Delete Count, Connect Count, and Log Records plotted against time.

    1. In the histogram chart tab, click the down arrow next to the Chart options (chart options) icon.

    2. Select to show the charts of all the fields.

    3. Under Chart Type, select Line With Area.

    4. Adjust the width to display two charts per line.



  • The Groups Table lists the groups identified by link based on the SQL Statement field. You can observe that for each SQL, the table lists the number of time that the SQL was run, the start time, the end time, and the group duration. Click on each group and view the log records for more details. You can also view the groups in the cluster visualization for further analysis.



Analyze the Time Taken Between Steps in a Transaction

The link feature gives you the ability to analyze user sessions, extract the various time parameters by grouping, and deduce data about the transaction time to help you in getting business insights.

Consider this unordered data set taken from an Access Log file. The following fields indicate the information about a user session and the actions performed by the user:

Time | Session ID | Action
 T2  | 1          | Login
 T1  | 5          | Login
 T6  | 1          | addtocart
 T3  | 1          | productlisting
 T4  | 1          | purchase
 T9  | 1          | purchase
 T7  | 5          | addtocart
 T5  | 1          | addtocart
 T8  | 5          | purchase 

The actions like Login, addtocart, productlisting, and purchase are recorded in a random order T1 through T9, and have occurred in two sessions with session ID 1 and 5.

To, perform similar time analysis of your Access Logs, extract the Session ID from the logs into a field. Extract the intermediate steps of the session from the Access Logs by applying a regular expression to obtain the URL from the logs.

In a generic context, the sessions in this example represent any user transactions, and the actions represent the intermediate steps performed by the user to complete a transaction.

To analyze this unordered data and to extract the required information, the following example query can be run:

'Upload Name' = logadmin 
| link 'Session ID'
| rename 'Group Duration' as 'Session Duration' 
| addfields 
  [ Action = addtocart | stats earliest(Time) as 'First Add To Cart Time' ], 
  [ Action = purchase | stats latest(Time) as 'Last Purchase Time' ] 
| eval 'Time Taken for Purchase (Secs)' = ('Last Purchase Time' - 'First Add To Cart Time') / 1000 
| fields -'First Add To Cart Time', 
         -'Last Purchase Time' 
| classify 'Time Taken for Purchase (Secs)'
  • link 'Session ID' groups the Access Logs records by the Session ID, creating two groups:

    Time | Session ID | Action
     T2  | 1          | Login
     T6  | 1          | addtocart
     T3  | 1          | productlisting
     T4  | 1          | purchase
     T5  | 1          | addtocart
     T9  | 1          | purchase
    
     T1  | 5          | Login
     T7  | 5          | addtocart
     T8  | 5          | purchase
  • addfields is run against each of these groups. The first addfields picks up the records where Action = addtocart. The result of this query is as below for both the groups:

    Time | Session ID | Action
     T6  | 1          | addtocart
     T5  | 1          | addtocart
    
     T7  | 5          | addtocart
  • stats earliest(Time) sorts the above result by time, for each group:

    Time | Session ID | Action
     T5  | 1          | addtocart
     T6  | 1          | addtocart
     
     T7  | 5          | addtocart
  • Then the specified field, which is Time, is picked up from the first record:

    'First Add To Cart Time' = T5 for Group = 1
    'First Add To Cart Time' = T7 for Group = 5
  • The second addfields runs on Action = purchase, extracting the following records:

    Time | Session ID | Action
     T4  | 1          | purchase
     T9  | 1          | purchase
    
     T8  | 5          | purchase
  • latest(Time) also sorts the above records by Time:

    Time | Session ID | Action
     T4  | 1          | purchase
     T9  | 1          | purchase
    
     T8  | 5          | purchase
  • latest(Time) picks up the last record and extract the specified field, which is Time:

    'Last Purchase Time' = T9 for Group = 1
    'Last Purchase Time' = T8 for Group = 5
  • At this point, both the groups have the values for First Add to Cart Time and Last Purchase Time set. These are timestamps. eval subtracts one from another to get the elapsed time.

  • In effect, you can get the time taken from Adding to the Cart to the Purchase step for each session. This can now be used in classify to analyze the variance of this Elapsed Time across sessions.