DataWeave - The Map Function

Learn the ins and out of DataWeave's map function!

DataWeave - The Map Function

Introduction

This post will examine the map function in the DataWeave (DW) language. First, I'll explain when you'd want to use map, how map works, what map expects as arguments, and how those arguments need to be contructed. I won't be glossing over a lot of details like I do in some of my posts on more advanced concepts, so I'll take the time to explain virtually everything you need to know about this function. By the end of this post, you'll be ready to apply what you've learned to your integrations.

What is map?

A good way to think of map for the first time is that map is a function you will use when you want to modify every element in an array in the same way. In integrations, map is typically used to transform a payload from the source system to a format expected by the target system. It can also be used to transform a payload into a more convenient format for later processing. Let's observe some concrete examples. Let's assume we have an array of objects, and each object represents an employee:

[
  {
    "name": "Joshua Erney",
    "job": "Programmer",
    "age": 27
  },
  {
    "name": "Mary Smith",
    "job": "Data Analyst",
    "age": 32
  }
]

You could use map to implement any of the following transformations to the above array:

  1. Add a field, "employed" and set it to true for each employee
  2. Add 1 to every employee's age
  3. Remove the age field for each employee
  4. Get a list of names of all the employees

A couple important things to note is that we're not modifying the number of elements in the array, and whatever we're trying to implement applies to every element in the array, not just certain elements in the array. If those two criteria hold true, map is a great tool for the job.

Let's take a look at what map needs to accomplish this for us. The map function takes two arguments: on its left side, it takes an array, and on its right side it takes a function referred to as a callback. When receiving these parameters, the map function will iterate over every element in the array and apply the callback function to each element. The callback needs to be constructed in a very specific way, namely it must contain at most two parameters, but it can contain none in certain situations. Typically, you will see it used with only one parameter. The first parameter represents the current value in the array that is being iterated over. For example, if you have an array of objects, the first parameter will always represent an object. The second parameter represents the current index of the iteration, so it will always be an integer. The body of the callback function describes how each element of the array will be modified with respect to the input parameters.

If you're not familiar with what a callback function is, don't worry. You can think of it this way: when map is iterating over the array passed to it, it will "call back" to the function passed as its right-hand parameter for every step of the iteration, passing in the parameters it needs. In other words, the callback function for map is defining the rules of how each element in the input array should be transformed

If you're from a Java-esque background, passing functions as arguments might be a foreign concept. We can do this in DW because DW treats functions differently than Java does. With DW, functions are given the same privledges as classes, objects, and primitive data types in Java. We can create functions on the fly (like anonymous classes), store them in variables, pass them to other functions, and return them from functions. If you've ever used map or filter before, you're likely already familiar with creating functions on the fly and passing them to other functions. More details on this later.

map in Action

Time to check out how map works. Let's implement the 1st transformation that we listed in the previous section, "Add a field 'employed' and set it to true for each employee." We can do this one of two ways. Let's do it the most verbose way first:

%dw 2.0
output application/java
---
payload map (employee) -> {
  name:     employee.name,
  job:      employee.job,
  age:      employee.age,
  employed: true
}

In the above example, DW is calling the map function with our payload, which is an array of objects. The output of the callback function is also an object.

Notice that our object contains all the same keys as our original object as well as all the same values, except that in this case we've added a single field, "employed", and set it to true.

Note that the output of our callback function is an object only because the requirement dictates it. Even if your input array is a bunch of objects, that does not mean your callback needs to return objects. map is very flexible, the callback could return a scalar like a string or a number, or another collection like an array.

The output of this DW script would be the following:

[
  {
    "name": "Joshua Erney",
    "job": "Programmer",
    "age": 27,
    "employed": true"
  },
  {
    "name": "Mary Smith",
    "job": "Data Analyst",
    "age": 20,
    "employed": true
  }
]

You might be wondering if we actually needed to be so explicit about mapping the first three fields, considering that they were left completely unchanged. As it turns out, we don't; we could have implemented the same requirement this way instead:

%dw 2.0
output application/java
---
payload map (employee) ->
  employee ++ {employed: true}

Which you could describe as "map the payload by adding the key:value pair of {employed: true} to each object in the array," as opposed to "map the payload by keeping name, job, and age key:value pairs the same, and adding an additional key:value pair of {employed: true}."

In the code above we're using {} in a completely different way now. I'm going to point this out because it was a point of confusion for me when I started learning the language. It's very important to realize that the expression after -> is the body of the function, and that curly brackets, {} do NOT represent a block of code like they do in Java. {} will always create a new instance of an object (i.e., {} is an object constructor). So in the code above, we're actually adding a new object to employee. You'll also notice that functions in DW don't use a return keyword like you see in other languages. DW always returns the result of the last expression in the body, which in the above case, is a new object.

Now that you've seen map in action, let's return back to what I was saying earlier:

With DW, function are given the same privileges as classes, objects, and scalar data types are in Java. We can create them on the fly (like anonymous classes), store them as variables, and pass them to other functions.

If you're not familiar with DW yet, you might be wondering what (employee) -> ... is. You should already know it's a function, specifically a callback, as I stated previously. However, it isn't named like most functions you'd see in Java or Python. That's because this is an anonymous function, also called a lambda. When you construct a lambda in DW you specify a parameters list, for example (employee, index), or () if there are no parameters. Then use an arrow, ->, and everything after that arrow is the body of the function. If you're uncomfortable with this syntax, it's entirely possible to name your functions instead and use those just the same. Let's check out the following example using a named function instead of a lambda:

%dw 2.0
output application/java

fun addEmployedField(employee) =
  employee ++ {employed: true}
---
payload map addEmployedField($)

Sometimes structuring the solution this way can be easier to understand because it's pretty readable, i.e., "Map the payload by adding the employed field." Most code I've seen, however, does not use this convention and instead opts to use a lambda to accomplish this task. The rest of the post will follow this convention so you can become more familiar with it.

Now would be a good time to try some of those array transformations I was describing earlier. Give 2, 3, and 4 a shot before you move on. Feel free to drop me a question in the comments section below if you get stuck!

The Details

Now that you have an understanding of how to use map, let's dig into how it works. Learning how map works should prove very beneficial, because if you understand its mechanics you'll also understand the mechanics of filter, reduce, groupBy, pluck, and any other function that takes in a collection and a callback used to process that collection. I should tell you now that I've glossed over an important detail (just like I said I wouldn't in the intro). I previously stated that you'll use map when you want to modify every element of an array in the same way, but that's a lie. The truth is map doesn't modify anything. In fact, it can't. DW doesn't allow it, because nothing in DW is mutable! This is such an important concept to internalize that I'm going to repeat it: nothing in DW is mutable! Once an element has been created, it will never change again.

So if map isn't modifying the array, what is it doing? Instead of modifying elements in-place, it's creating an entirely new array. I use the following image to help me visualize what's going on:

Screen-Shot-2018-10-28-at-08.35.39

Let's disect this a bit. First, we have our input array represented by the blue box on the left. Within our input array we have 5 elements called e1, e2, etc. These elements could represent anything, they could be numbers, objects, or even other arrays. When map is called, it iterates through each element of the array, passing it to the callback function. The callback function processes the element, returns an entirely new element, then assigns that new element to a new array. map continues this process until it has iterated through all the elements in the input array.

Conclusion

map is probably the most used function in DataWeave. Its functionality covers a ton of use cases within the integration niche, so it's important to know not only how to use this function, but also how it works. In this article, we went over what use cases for which map is useful, as well as use cases where it isn't. We talked about what map expects as arguments and how those arguments need to be constructed. We checked out how map works in practice, and finally, we took a deep-dive into map to fully understand how it's working. I hope you enjoyed this article! Feel free to leave any questions below and be sure to follow @mulemadeeasy on Twitter for more educational content related to Mulesoft products!