DataWeave 2.2 - Additions to the Arrays Module, Part 1
Lunch break? Take a few minutes to get familiar with some of the new functions added to the DataWeave Arrays module in Mule Runtime 4.2.0
Introduction
MuleSoft recently released runtime version 4.2.0, and along with it, DataWeave 2.2! DW 2.2 has a ton of new features that I won't be able to cover in a single blog post, so I'll be covering them over the course of a few blog posts. In this particular post, I'll be going over some of the additions to the Arrays module (dw::core::Arrays
). I'll discuss the new functions, how they relate back to concepts that you likely already know, and give a few examples of how to use them and when to use them. Here's a list of functions included in this post:
drop
,take
,slice
,dropWhile
,takeWhile
,indexOf
,indexWhere
,splitAt
,splitWhere
,partition
I've organized the presentation of these functions in terms of what they accomplish:
- Extracting Array subsets (aka slicing an Array)
- Locating specific items in an Array
- Splitting an Array into two Arrays
I'm going to leave out the DW header in the code examples for the sake of brevity. So know that if you want to try this yourself, you'll need to import the Arrays module:
import * from dw::core::Arrays
For reference, you can find the DataWeave 2.2 release notes here, and the docs for DataWeave 2.2 (Mule Runtime version 4.2.0) here.
In the past, I've use the term element to describe an ambiguous member of an Array. I've recently noticed that the official documentation prefers the term item so I will use that from now on.
Extracting Array Subsets with drop, take, and slice
The drop
, take
, and slice
functions are useful for extracting an Array subset of an existing Array. We can relate all of these functions back to another strategy for getting subsets of an Array: arr[n to m]
. I'll refer to variations of arr[n to m]
as slice notation from this point forward (not to be confused with the slice
function, which I will always highlight like in this sentence).
The main difference between these functions and using slice notation is how out-of-bounds input is handled. If you try to access indexes that are out-of-bounds using slice notation, it will return null
. As long as you provide the correct type of input to drop
, take
, and slice
(i.e., Arrays and Numbers) they will always return an Array.
drop
The drop
function can be used to effectively remove n
number of items from the beginning of an Array. This is akin to arr[n to -1]
in slice notation. Here's an example of drop
:
var arr = [0,1,2,3,4,5]
---
drop(arr, 3) // Drop the first 3 items
// Returns: [3,4,5]
It's important to keep in mind that the second parameter is not an index, it's the number of items from the beginning of the Array that you want to drop.
And here's an example of similar functionality using slice notation:
arr[3 to -1]
You may be wondering what happens if you pass in a number that's less than 1 or if you pass in a number that's greater than the number of items you have in the Array. In the event that you pass a number less than 1, DataWeave will return the same Array. In the event that you pass a number greater than the number of items you have in the Array, drop
returns an empty Array:
var arr = [0,1,2,3,4,5]
---
drop(arr, -5)
// Returns: [0,1,2,3,4,5]
var arr = [0,1,2,3,4,5]
---
drop(arr, 20)
// Returns: []
While this feature will prevent you from getting an IndexOutOfBoundsException, it also means you will need to handle the scenarios in which you would have anticipated this exception to occur.
drop
will be a welcome replacement in my code for the previous method of getting every item in the Array besides the first item (very popular use case for recursive functions). So instead of this:
var arr = [0,1,2,3,4,5]
---
arr[1 to -1]
I'll use this instead:
var arr = [0,1,2,3,4,5]
---
drop(arr, 1)
take
While you use drop
to remove items from the beginning of an Array, you use take
to get the first n
items from the beginning of an Array. This is akin to arr[0 to n]
in slice notation. Here's an example:
var arr = [0,1,2,3,4,5]
---
take(arr, 1)
// Returns: [0]
And here's an example of similar functionality using slice notation:
arr[0 to 0]
Like drop
, it's important to know that the number passed to take
is not an index but instead the number of items you wish to get from the beginning of the Array.
The rules for what happens when you specify an out-of-bounds index for take
are slightly different than drop
. In the event that you pass a number less than 1, take
will return the same Array. In the event that you pass a number greater than the number of items you have in the Array you'll get back an empty Array.
slice
While drop
can get you items from some point in the middle of an Array to the end and take
can get you items from the beginning of an Array to some point in the middle, slice
covers both of those use cases plus the ability to extract items from one point in the middle of an Array to any other point in the middle of an Array. slice
is akin to arr[n to m]
in slice notation. Example:
var arr = [1,2,3,4,5]
---
slice(arr, 1, 3)
// Returns: [2,3]
An example of the same functionality using slice notation:
arr[1 to 3]
Note that with slice
the 1st index gets included in the output Array whereas the 2nd index gets excluded. In other words we get the Array from and including index 1, up to and excluding index 3. If you're familiar with String#substring()
in Java, this follows the same rules.
What are the rules are if you specify indexes that are out-of-bounds? What happens if we take the arr
defined above, but try to slice up to index 20?
slice(arr, 1, 20)
// Returns: [2,3,4,5]
So slice
doesn't choke on that. Instead, it exhibits the same behavior as take
.
What happens if we take the arr
defined above and try to begin the slice with a negative index?
slice(arr, -5, 3)
// Returns: [1,2,3]
So slice
won't choke on that, either. It exhibits the same behavior as drop
.
Finally, what happens if we completely mix up the inputs so that the first index we provide is greater than the second?
slice(arr, 10, 1)
// Returns: []
In this case we get an empty Array.
Lastly, a subtle but important observation is we cannot use negative indexes with slice
to count from the back of the Array like we can with slice notation. In other words:
slice(arr, 1, -1)
and
arr[1 to -1]
are not equivalent. The slice
example will return an empty Array.
Extracting Array Subsets with dropWhile and takeWhile
dropWhile
and takeWhile
are the higher-order function counterparts of drop
and take
, respectively. These functions give the client much more flexibility in deciding when to stop dropping and taking items from the input Array. If you find yourself wanting to use drop
or take
but need a criterion for when to stop "dropping" or "taking" that cannot be defined by an index alone, use dropWhile
and takeWhile
.
dropWhile
dropWhile
is a more general form of drop
. These functions are both used to remove items from the beginning of an Array. However, dropWhile
asks for a function to define when it should stop dropping items from the Array, whereas drop
takes an integer representing the index where it should stop dropping. The function you pass to dropWhile
should take in a single parameter and return a Boolean. Example:
fun nameIsNotJosh(name) = name != "Josh"
var arr = ["James", "John", "Josh", "Jerry!!!"]
---
dropWhile(arr, nameIsNotJosh)
// Returns: [“Josh”, "Jerry!!!"]
takeWhile
takeWhile
is a more general-purpose version of take
. Instead of specifying how many items you want from the beginning of the Array, you use a function to specify when takeWhile
should stop taking items from the beginning of the Array. Example:
var arr = [1,2,3,4,5]
---
takeWhile(arr, isOdd)
// Returns: [1]
Locating Array Items with indexOf and indexWhere
You can use the indexOf
and indexWhere
functions to find the index of where an items occurs in an Array. indexOf
will take a hard value that defines what to search for, while indexWhere
, a higher-order function, will take a function that defines a match.
indexOf
indexOf
should look pretty familiar if you've been programming for a while (it probably does exactly what you think!). You pass indexOf
two parameters: an Array, and a value for which you'd like to search, e.g. a String or Number. indexOf
returns the index of the first occurence of a match:
var arr = ["Hello", "World"]
---
indexOf(arr, "World")
// Returns: 1
Note
indexOf
will return the index of the first match and it will not continue after finding a match
In the event that indexOf
does not find a match, it will return -1:
indexOf(arr, "SPACE!")
// Returns: -1
indexWhere
indexWhere
is a more general form of indexOf
. The relationship between indexOf
and indexWhere
is the same as the relationship between drop
and dropWhile
. That is to say that indexWhere
is a more general from of indexOf
that accepts a function instead of a hard value. The function you pass to indexWhere
takes in a single parameter and needs to return a Boolean. This is a great function to use if you need to dive into a data structure to determine if that item in the Array is a match. Example:
var people = [{name: "Josh", job: "programmer"}, {name: "Marty", job: "guitarist"}]
---
indexWhere(people, (p) -> p.job == "guitarist")
// Returns: 1
Note
indexWhere
will return the index of the first match but it will not continue after finding a match.
Like indexOf
, indexWhere
will return -1 if it does not find a match:
indexWhere(people, (p) -> p.job == "ice cream tester")
// Returns: -1
Splitting Arrays up with splitAt, splitWhere, and partition
You can use the splitAt
, splitWhere
, and partition
functions if you'd like to split an Array into two separate Arrays based on some criteria.
As a brief overview: You'll use splitAt
when you want to split up an Array based on an index that you know ahead of time. You'll use splitWhere
when you need to define a criterion other than the index for where you should split an Array. Finally, you'll use partition
when you want to split up individual items in the Array based on whether or not they pass a test.
splitAt
splitAt
splits an Array into two Arrays at a specified index. It returns an Object containing two keys, referred to as a Pair. The two keys in the Pair are "l"
and "r"
, where "l"
's values are the values before the specified index and "r"
's values are the values after. Example:
var arr = [1,2,3,4]
---
splitAt(arr, 2)
// Returns:
// {
// "l": [1,2],
// "r": [3,4]
// }
Notice that the item that occurs at the specified index gets added to the "r" Array in the Pair.
If you want two Arrays contained in an Array (instead of the Pair) you can transform the data after the split:
var split = splitAt(arr, 2)
---
[split.l, split.r]
// Returns [[1,2], [3,4]]
pluck
will also work for this while getting us away from specifying the exact keys:
splitAt(arr, 2) pluck $
splitWhere
If you've read this far you probably already know what this function does. It's the same as splitAt
but far more general because it takes in a function that defines where to split the input Array. The function passed to splitWhere
should take a single parameter and return a Boolean. Again, splitAt
is a great function to use if you want to split an Array but you need to define the point of the split based on nested data (or anything besides and index). Here's an example:
var people = [{name: "Josh", job: "programmer"}, {name: "Marty", job: "guitarist"}, {name: "Dave", job: "vocalist"}]
---
splitWhere(people, (p) -> p.name startsWith("M"))
// Returns:
// {
// "l": [{name: "Josh", job: "programmer"}],
// "r": [{name: "Marty", job: "guitarist"}, {name: "Dave", job: "vocalist"}]
// }
Again, you can use pluck
to transform the output into an Array of Arrays if you'd prefer that over an Object:
splitWhere(people, (p) -> p.name startsWith("M")) pluck $
// Returns:
// [
// [{name: "Josh", job: "programmer"}],
// [{name: "Marty", job: "guitarist"}, {name: "Dave", job: "vocalist"}]
// ]
partition
partition
is useful for splitting an Array based on whether the items in the Array pass or fail a specified criteria. You pass partition
an Array and the criteria defined as a function. The function should take in a single parameter and return a Boolean. partition
will return an object with two keys, "success"
and "failure"
. Example:
var arr = [1,2,3,4,5,6]
---
arr partition isOdd($)
// Returns:
// {
// "success": [1,3,5],
// "failure": [2,4,6]
// }
Again, if you'd prefer an Array as output instead of an Object you’ll use pluck
to transform the output:
arr
partition idOdd($)
pluck $
// Returns: [[1,3,5],[2,4,6]]
If you're already familiar with groupBy
, partition
can be defined in terms of groupBy
:
fun myPartition(arr, fnCriteria) =
arr groupBy (e) -> if (fnCriteria(e)) "success" else "failure"
If you don't like the "success"
and "failure"
keys, you can define your own partition
using groupBy
to essentially "override" that behavior in a performant way (as opposed to using mapObject
after partition
to change the keys).
Conclusion
This concludes Part 1 of the new functionality added to the DataWeave Array module in version 2.2. We covered how to use drop
, take
, and slice
to extract subsets of an Array, and the subtle differences between using these functions vs. the more familiar slice notation, arr[n to m]
. We also covered the more generic forms of Array slicing: dropWhile
, and takeWhile
, and when you might prefer them over drop
and take
. We discussed how to find items in an Array using indexOf
and indexWhere
. Finally, we closed by investigating how we can split Arrays using splitAt
, splitWhere
, and partition
. Stay tuned for Part 2 in which I cover the SQL-esque functions that have joined the team, join
, leftJoin
, and outerJoin
. Get it? Joined the team?
If you never want to read my blog again, I'll understand! If not, see you next time :)
EDIT: https://www.jerney.io/dw-2-2-arrays-pt-2/ to see Part 2!
EDIT: The slice notation example for take
was incorrect. Thanks to Tanner Sherman for pointing this out.