Calculate the Sum of a Column or the Count of Rows While Processing Large Files

You can process large comma-separated value (CSV) files (up to 1 GB in size) using the Read File In Segments operation of a stage file action. You may also have a requirement to calculate the sum of a column or the count of rows.

Consider the following payload. Assume you want to calculate the sum of all the values in the column Amount1. You may typically declare a variable upstream of the stage file action and keep updating this variable with the computation done in each chunk of the stage file action.

The images shows a table with columns for Sno, Amount1, Amount2, Amount3, and Amount4. Each column is of type Integer and is identified as Mandatory. Four rows of each column are displayed. The Amount1 column mentioned above has values of 100, 200, 300, and 400.

However, updating the upstream variables inside the stage file action Read File in Segments operation impacts performance and does not allow stage file action processing in parallel. You observe the following warning message in the integration canvas:
Stage File Read File in Segments includes action that will result in segments being processed sequentially

As a solution for this use case, perform the following steps to use the aggregate functions sum and count while processing larger files:

  1. Add a stage file action in the integration canvas and select the Read File in Segments operation to read the large file.
  2. Inside the stage file action that uses the Read File In Segments operation, create a second stage file action that uses the Write File operation and select the Append to Existing File check box.
  3. Get the count and sum of each segment and write the results into the second stage file action.
    • count ($StageSegments/.../Record)
    • sum ($StageSegments/..Record/ns22:Amount1)
  4. After completion of the stage file action using the Read File in Segments operation, create a stage file action that uses the List File operation to list the file that was created in Step 2. The response of the stage file action List File operation contains the file size. The file size determines the next action to perform.
  5. If the file size is less than 10 MB, read the file with a stage file action that uses a Read Entire File operation and create an assign action with variables to store the grand totals.
  6. Otherwise, repeat Step 1 through Step 4 with the new file.