DataWeave - Partition List

Learn how to partition a list in the spirit of Ruby's and Scala's partition function

DataWeave - Partition List

In this post I'm going to go over a function that I've found in other languages (Ruby, Scala), and thought would be particularly useful in DW: partition. partition is a function that takes in an array and predicate (a lambda that returns true or false). It iterates through the array and passes each value to the predicate. If the predicate returns true the value is added to a list, if the predicate returns false the value is added to a separate list. Both these lists are returned in an array. If you've ever had the thought "I want to filter this, but keep the negative results elsewhere so I can do something with them," then partition should be able to help you out.

Here's a simple example, assuming scripts/utils.wev on the classpath somewhere:

%dw 1.0
%output application/json

%var utils = readUrl("classpath://scripts/utils.wev")

%var input = [1,2,3,4,5,6]

%function odd(n)
  (n mod 2) != 0
---
utils.partition(input, odd)

// Output: [[1,3,5],[2,4,6]]

You could also pass in the predicate as a lambda, instead of defining the function beforehand:

utils.partition(input, ((n) -> (n mod 2) != 0))

So hopefully that gives you a general idea of how it works. Let's check out the code real quick:

%function partition(arr, predicate)
  arr reduce ((v, acc=[[],[]]) ->
    [ acc[0] + v, acc[1] ] 
      when predicate(v) 
      otherwise [ acc[0], acc[1] + v ] 
  )

We already know that reduce iterates over an array, and accumulates some kind of output along the way (check this out for more details). In this case, our accumlator is an array of two arrays. The first array will contain values that pass the predicate, and the second array will contain values that don't pass the predicate. Once you understand that, the rest is relatively simple:

Here, we're pushing to the first array because the predicate test passed:

[ acc[0] + v, acc[1] ]

And here, we're pushing to the second array because the predicate test failed:

[ acc[0], acc[1] + v]

and that's decided by the conditional:

... when predicate(v) otherwise ...

So where might you use something like this? I mentioned earlier that it's great for situations where you might use filter, but also want the negative results for futher processing. I reach for partition whenever I need to do some validations on a list of data, and need to separate the values that pass the validation from the ones that do not (maybe so I can continue processing the successful records, and pass the failed ones to a dead-letter queue). Depending on your validation logic, this may or may not work. However, if your validation logic can be isolated to a single record (i.e., the validation of one record is not dependent on the data or processing of another record in the array), this will work just fine. Here's how I'd approach it:

%dw 1.0
%output application/json

%var utils = readUrl(...)

%function validation1(...) ...
%function validation2(...) ...
%function validation3(...) ...
...
%function validationN(...) ...

%function validator(record)
  // Assuming all validations MUST pass
  validation1(record)
    and validation2(record)
    and validation3(record)
    ...
    and validationN(record)
---
utils.partition(payload, validator)

I hope you'll find some good use for the partition function. Feel free to send me a message on LinkedIn if you have any questions. Stay tuned for a future post where I'll discuss how to partition objects, and refactor partition to work with both objects and arrays.