DataWeave 2.2 - Additions to the Arrays Module, Part 1

Lunch break? Take a few minutes to get familiar with some of the new functions added to the DataWeave Arrays module in Mule Runtime 4.2.0

DataWeave 2.2 - Additions to the Arrays Module, Part 1

Introduction

MuleSoft recently released runtime version 4.2.0, and along with it, DataWeave 2.2! DW 2.2 has a ton of new features that I won't be able to cover in a single blog post, so I'll be covering them over the course of a few blog posts. In this particular post, I'll be going over some of the additions to the Arrays module (dw::core::Arrays). I'll discuss the new functions, how they relate back to concepts that you likely already know, and give a few examples of how to use them and when to use them. Here's a list of functions included in this post:

  • drop,
  • take,
  • slice,
  • dropWhile,
  • takeWhile,
  • indexOf,
  • indexWhere,
  • splitAt,
  • splitWhere,
  • partition

I've organized the presentation of these functions in terms of what they accomplish:

  1. Extracting Array subsets (aka slicing an Array)
  2. Locating specific items in an Array
  3. Splitting an Array into two Arrays

I'm going to leave out the DW header in the code examples for the sake of brevity. So know that if you want to try this yourself, you'll need to import the Arrays module:

import * from dw::core::Arrays

For reference, you can find the DataWeave 2.2 release notes here, and the docs for DataWeave 2.2 (Mule Runtime version 4.2.0) here.

In the past, I've use the term element to describe an ambiguous member of an Array. I've recently noticed that the official documentation prefers the term item so I will use that from now on.

Extracting Array Subsets with drop, take, and slice

The drop, take, and slice functions are useful for extracting an Array subset of an existing Array. We can relate all of these functions back to another strategy for getting subsets of an Array: arr[n to m]. I'll refer to variations of arr[n to m] as slice notation from this point forward (not to be confused with the slice function, which I will always highlight like in this sentence).

The main difference between these functions and using slice notation is how out-of-bounds input is handled. If you try to access indexes that are out-of-bounds using slice notation, it will return null. As long as you provide the correct type of input to drop, take, and slice (i.e., Arrays and Numbers) they will always return an Array.

drop

The drop function can be used to effectively remove n number of items from the beginning of an Array. This is akin to arr[n to -1] in slice notation. Here's an example of drop:

var arr = [0,1,2,3,4,5]
---
drop(arr, 3) // Drop the first 3 items
// Returns: [3,4,5]

It's important to keep in mind that the second parameter is not an index, it's the number of items from the beginning of the Array that you want to drop.

And here's an example of similar functionality using slice notation:

arr[3 to -1]

You may be wondering what happens if you pass in a number that's less than 1 or if you pass in a number that's greater than the number of items you have in the Array. In the event that you pass a number less than 1, DataWeave will return the same Array. In the event that you pass a number greater than the number of items you have in the Array, drop returns an empty Array:

var arr = [0,1,2,3,4,5]
---
drop(arr, -5)
// Returns: [0,1,2,3,4,5]
var arr = [0,1,2,3,4,5]
---
drop(arr, 20)
// Returns: []

While this feature will prevent you from getting an IndexOutOfBoundsException, it also means you will need to handle the scenarios in which you would have anticipated this exception to occur.

drop will be a welcome replacement in my code for the previous method of getting every item in the Array besides the first item (very popular use case for recursive functions). So instead of this:

var arr = [0,1,2,3,4,5]
---
arr[1 to -1]

I'll use this instead:

var arr = [0,1,2,3,4,5]
---
drop(arr, 1)

take

While you use drop to remove items from the beginning of an Array, you use take to get the first n items from the beginning of an Array. This is akin to arr[0 to n] in slice notation. Here's an example:

var arr = [0,1,2,3,4,5]
---
take(arr, 1)
// Returns: [0]

And here's an example of similar functionality using slice notation:

arr[0 to 0]

Like drop, it's important to know that the number passed to take is not an index but instead the number of items you wish to get from the beginning of the Array.

The rules for what happens when you specify an out-of-bounds index for take are slightly different than drop. In the event that you pass a number less than 1, take will return the same Array. In the event that you pass a number greater than the number of items you have in the Array you'll get back an empty Array.

slice

While drop can get you items from some point in the middle of an Array to the end and take can get you items from the beginning of an Array to some point in the middle, slice covers both of those use cases plus the ability to extract items from one point in the middle of an Array to any other point in the middle of an Array. slice is akin to arr[n to m] in slice notation. Example:

var arr = [1,2,3,4,5]
---
slice(arr, 1, 3)
// Returns: [2,3]

An example of the same functionality using slice notation:

arr[1 to 3]

Note that with slice the 1st index gets included in the output Array whereas the 2nd index gets excluded. In other words we get the Array from and including index 1, up to and excluding index 3. If you're familiar with String#substring() in Java, this follows the same rules.

What are the rules are if you specify indexes that are out-of-bounds? What happens if we take the arr defined above, but try to slice up to index 20?

slice(arr, 1, 20)
// Returns: [2,3,4,5]

So slice doesn't choke on that. Instead, it exhibits the same behavior as take.

What happens if we take the arr defined above and try to begin the slice with a negative index?

slice(arr, -5, 3)
// Returns: [1,2,3]

So slice won't choke on that, either. It exhibits the same behavior as drop.

Finally, what happens if we completely mix up the inputs so that the first index we provide is greater than the second?

slice(arr, 10, 1)
// Returns: []

In this case we get an empty Array.

Lastly, a subtle but important observation is we cannot use negative indexes with slice to count from the back of the Array like we can with slice notation. In other words:

slice(arr, 1, -1)

and

arr[1 to -1]

are not equivalent. The slice example will return an empty Array.

Extracting Array Subsets with dropWhile and takeWhile

dropWhile and takeWhile are the higher-order function counterparts of drop and take, respectively. These functions give the client much more flexibility in deciding when to stop dropping and taking items from the input Array. If you find yourself wanting to use drop or take but need a criterion for when to stop "dropping" or "taking" that cannot be defined by an index alone, use dropWhile and takeWhile.

dropWhile

dropWhile is a more general form of drop. These functions are both used to remove items from the beginning of an Array. However, dropWhile asks for a function to define when it should stop dropping items from the Array, whereas drop takes an integer representing the index where it should stop dropping. The function you pass to dropWhile should take in a single parameter and return a Boolean. Example:

fun nameIsNotJosh(name) = name != "Josh"
var arr = ["James", "John", "Josh", "Jerry!!!"]
---
dropWhile(arr, nameIsNotJosh)
// Returns: [“Josh”, "Jerry!!!"]

takeWhile

takeWhile is a more general-purpose version of take. Instead of specifying how many items you want from the beginning of the Array, you use a function to specify when takeWhile should stop taking items from the beginning of the Array. Example:

var arr = [1,2,3,4,5]
---
takeWhile(arr, isOdd)
// Returns: [1]

Locating Array Items with indexOf and indexWhere

You can use the indexOf and indexWhere functions to find the index of where an items occurs in an Array. indexOf will take a hard value that defines what to search for, while indexWhere, a higher-order function, will take a function that defines a match.

indexOf

indexOf should look pretty familiar if you've been programming for a while (it probably does exactly what you think!). You pass indexOf two parameters: an Array, and a value for which you'd like to search, e.g. a String or Number. indexOf returns the index of the first occurence of a match:

var arr = ["Hello", "World"]
---
indexOf(arr, "World")
// Returns: 1

Note indexOf will return the index of the first match and it will not continue after finding a match

In the event that indexOf does not find a match, it will return -1:

indexOf(arr, "SPACE!")
// Returns: -1

indexWhere

indexWhere is a more general form of indexOf. The relationship between indexOf and indexWhere is the same as the relationship between drop and dropWhile. That is to say that indexWhere is a more general from of indexOf that accepts a function instead of a hard value. The function you pass to indexWhere takes in a single parameter and needs to return a Boolean. This is a great function to use if you need to dive into a data structure to determine if that item in the Array is a match. Example:

var people = [{name: "Josh", job: "programmer"}, {name: "Marty", job: "guitarist"}]
---
indexWhere(people, (p) -> p.job == "guitarist")
// Returns: 1

Note indexWhere will return the index of the first match but it will not continue after finding a match.

Like indexOf, indexWhere will return -1 if it does not find a match:

indexWhere(people, (p) -> p.job == "ice cream tester")
// Returns: -1

Splitting Arrays up with splitAt, splitWhere, and partition

You can use the splitAt, splitWhere, and partition functions if you'd like to split an Array into two separate Arrays based on some criteria.

As a brief overview: You'll use splitAt when you want to split up an Array based on an index that you know ahead of time. You'll use splitWhere when you need to define a criterion other than the index for where you should split an Array. Finally, you'll use partition when you want to split up individual items in the Array based on whether or not they pass a test.

splitAt

splitAt splits an Array into two Arrays at a specified index. It returns an Object containing two keys, referred to as a Pair. The two keys in the Pair are "l" and "r", where "l"'s values are the values before the specified index and "r"'s values are the values after. Example:

var arr = [1,2,3,4]
---
splitAt(arr, 2)
// Returns: 
// {
//   "l": [1,2],
//   "r": [3,4]
// }

Notice that the item that occurs at the specified index gets added to the "r" Array in the Pair.

If you want two Arrays contained in an Array (instead of the Pair) you can transform the data after the split:

var split = splitAt(arr, 2)
---
[split.l, split.r]
// Returns [[1,2], [3,4]]

pluck will also work for this while getting us away from specifying the exact keys:

splitAt(arr, 2) pluck $

splitWhere

If you've read this far you probably already know what this function does. It's the same as splitAt but far more general because it takes in a function that defines where to split the input Array. The function passed to splitWhere should take a single parameter and return a Boolean. Again, splitAt is a great function to use if you want to split an Array but you need to define the point of the split based on nested data (or anything besides and index). Here's an example:

var people = [{name: "Josh", job: "programmer"}, {name: "Marty", job: "guitarist"}, {name: "Dave", job: "vocalist"}]
---
splitWhere(people, (p) -> p.name startsWith("M"))
// Returns: 
// {
//   "l": [{name: "Josh", job: "programmer"}],
//   "r": [{name: "Marty", job: "guitarist"}, {name: "Dave", job: "vocalist"}]
// }

Again, you can use pluck to transform the output into an Array of Arrays if you'd prefer that over an Object:

splitWhere(people, (p) -> p.name startsWith("M")) pluck $
// Returns:
// [ 
//   [{name: "Josh", job: "programmer"}],
//   [{name: "Marty", job: "guitarist"}, {name: "Dave", job: "vocalist"}]
// ]

partition

partition is useful for splitting an Array based on whether the items in the Array pass or fail a specified criteria. You pass partition an Array and the criteria defined as a function. The function should take in a single parameter and return a Boolean. partition will return an object with two keys, "success" and "failure". Example:

var arr = [1,2,3,4,5,6]
---
arr partition isOdd($)
// Returns:
// {
//   "success": [1,3,5],
//   "failure": [2,4,6]
// }

Again, if you'd prefer an Array as output instead of an Object you’ll use pluck to transform the output:

arr 
  partition idOdd($)
  pluck $
// Returns: [[1,3,5],[2,4,6]]

If you're already familiar with groupBy, partition can be defined in terms of groupBy:

fun myPartition(arr, fnCriteria) =
  arr groupBy (e) -> if (fnCriteria(e)) "success" else "failure"

If you don't like the "success" and "failure" keys, you can define your own partition using groupBy to essentially "override" that behavior in a performant way (as opposed to using mapObject after partition to change the keys).

Conclusion

This concludes Part 1 of the new functionality added to the DataWeave Array module in version 2.2. We covered how to use drop, take, and slice to extract subsets of an Array, and the subtle differences between using these functions vs. the more familiar slice notation, arr[n to m]. We also covered the more generic forms of Array slicing: dropWhile, and takeWhile, and when you might prefer them over drop and take. We discussed how to find items in an Array using indexOf and indexWhere. Finally, we closed by investigating how we can split Arrays using splitAt, splitWhere, and partition. Stay tuned for Part 2 in which I cover the SQL-esque functions that have joined the team, join, leftJoin, and outerJoin. Get it? Joined the team?

leo-laugh

If you never want to read my blog again, I'll understand! If not, see you next time :)

EDIT: https://www.jerney.io/dw-2-2-arrays-pt-2/ to see Part 2!
EDIT: The slice notation example for take was incorrect. Thanks to Tanner Sherman for pointing this out.