DataWeave - Practice Exercises

A set of exercises for developing your DataWeave skills

DataWeave - Practice Exercises

Introduction

This post contains DataWeave practice exercises. For the most part, these exercises are derived from my experience as a practitioner, and because of this, it focuses around the language features / functions I use the most: map, filter, mapObject, pluck, groupBy, reduce, and recursion. It is primarily organized into sections by language feature / function. Each section of exercises starts with easier exercises, which increase in difficultly througout the section, sometimes building upon skills learned from earlier exercises. Each section starts out with a small description of language feature / function, followed by the exercises themselves.

This is a blog post that I will update indefinitely as I think of more exercises.

If you have any questions, feel free leave a comment below, or shoot me a message on LinkedIn.

If you are new to DataWeave, I'd first recommend you get a better understanding of how the language works before taking on these exercises. A good place to start would be this post, and the MuleSoft documentation. The documentation for 1.0 is here, and the documentation for 2.0 is here.

These questions all deal with JSON, with the occational CSV. You won't find any questions dealing with XML. Frankly, JSON is really easy for me to read and write. XML is very verbose by comparison. Doubly, DW's syntax more closely mimics JSON. If there's large demand for XML, I'll include it, but until now, JSON will do for helping you mastery how these functions work.

If you solve all the exercises, shoot me a message and let's compare code!


Updates:

  • 08/18/18 - Added groupBy section. Added bonus to exercise 8 of the reduce section. Modified the introduction. Added note to exercise 3 of the recursion section. Created intro to "Additional Exercises" section. Other minor edits.

map

map is a function that takes in an array, and returns an array. In addition to the input array, you pass map a lambda that instructs it on how to transform each value in the input array. The lambda has access to both the value of the current iteration, and the index of that value. The exercises below will check that you know how to use both.

Exercises

  1. Add 1 to each value in the array [1,2,3,4,5]
  2. Get a list of ids from:
[
  { "id": 1, "name": "Archer" },
  { "id": 2, "name": "Cyril"  },
  { "id": 3, "name": "Pam"    }
]
  1. Take the following:
[
  { "name": "Archer" },
  { "name": "Cyril"  },
  { "name": "Pam"    }
]

And generate the input for question #2 (i.e. add an incrementing id field).

  1. Given what you've learned from question #3, take the following:
[
  { 
    "name": "Archer",
    "jobs": [
      { "type": "developer" },
      { "type": "investor"  },
      { "type": "educator"  } 
    ] 
  },
  {
    "name": "Cyril",
    "jobs": [
      { "type": "developer"    },
      { "type": "entrepreneur" },
      { "type": "lion tamer"   }
    ]
  } 
]

Create an incrementing num field on the root object, and an incrementing num field for each object in the jobs array. Here's the desired output (order of the fields does not matter):

[
  { 
    "num":  1
    "name": "Archer",
    "jobs": [
      { "num": 1, "type": "developer" },
      { "num": 2, "type": "investor"  },
      { "num": 3, "type": "educator"  } 
    ] 
  },
  {
    "num":  2
    "name": "Cyril",
    "jobs": [
      { "num": 1, "type": "developer"    },
      { "num": 2, "type": "entrepreneur" },
      { "num": 3, "type": "lion tamer"   }
    ]
  } 
]

hint: you will need to use map twice.

filter

filter is a function that takes in an array and returns a subset of that array. In addition to the input array, filter is passed a lambda that is used to determine if a value in the original array should be contained in the output array. The lambda has access to both the value of the current iteration, and the index of that value. The practice problems below will check that you know how to use both.

Exercises

  1. Remove odd values from [1,2,3,4,5]
  2. Remove even indexes from [1,2,3,4,5]
  3. Remove objects in the input array where the status field of the object is "processed":
[
  {
    "id": 1,
    "status": "waiting"
  },
  {
    "id": 2,
    "status": "processed"
  },
  {
    "id": 3,
    "status": "waiting"
  }
]
  1. Remove values in the input array that are contained in this array: ["deleted", "processed"].
[
  "starting", 
  "waiting", 
  "deleted", 
  "processing", 
  "processed"
]

mapObject

mapObject is a function that takes in an object, and returns an object. In addition to the input object, mapObject is passed a lambda that describes how to create the output object from the input object. The lambda has access to both the key and the value of the current iteration. The practice problems below with assure that you know how to use both.

Exercises

  1. Take the following object and transform all the values to uppercase:
{
  "one":   "two",
  "three": "four",
  "five":  "six"
}
  1. Take the object from exercise #1 and transform all the keys to uppercase.
  2. Remove all of the key:value pairs from the following object where the value is null (do not use the skipOnNull directive):
{
  "one":   "two",
  "three": null,
  "five":  null
}

pluck

pluck is a function that takes in an object, and returns an array. In addition to the input object, pluck is passed a lambda that describes how to transform each key:value pair in the object to a value in an array. The lambda has access to both the key and the value of the current iteration. The practice problems below will assure you know how to use both.

Exercises

  1. Get a list of the values from the following:
{
  "one":   "two",
  "three": "four",
  "five":  "six"
}
  1. Get a list of the keys from the object in question #1
  2. Using the object in question #1, create this:
[
  {"one":   "two" },
  {"three": "four"},
  {"five":  "six" }
]
  1. Using the object in question #1, create this:
[
  ["one",   "two" ],
  ["three", "four"],
  ["five",  "six" ]
]

groupBy

groupBy is a function that takes in an array and returns an object, where each value of the object is an array. In addition to the input array, groupBy is passed a lambda that describes the group to which current object of the iteration belongs. The keys of the objects are the outputs of the lambda during the iteration. I think this is easier to see with an example:

[1,2,3,4,5,6,7,8] groupBy ("even" when ($ mod 2 == 0) otherwise "odd")

will output:

{
  "even": [2,4,6,8],
  "odd" : [1,3,5,7]
}

Then, if you need all the even numbers, you can just do payload["even"].

Over time I've found 3 uses for groupBy that have popped up repeatedly: Merging two data structures that share common IDs (exercise 1), creating nested data structures from flat structures that mimic a nested structure (exercise 2), and separating good records from bad records (exercise 3). Let's make sure you understand how to do all three!

Exercises

  1. Take the following two data structrures (let's assume these came from the invoice and allocation tables of a database):
%var invoices = [
  {
    "invoiceId": 1,
    "amount":    100 },
  {
    "invoiceId": 2,
    "amount":    200 },
  {
    "invoiceId": 3,
    "amount":    300 }]

%var allocations = [
  {
    "allocationId":     1,
    "invoiceId":        1,
    "allocationAmount": 50 },
  {
    "allocationId":     2,
    "invoiceId":        1,
    "allocationAmount": 50 },
  {
    "allocationId":     3,
    "invoiceId":        2,
    "allocationAmount": 100 },
  {
    "allocationId":     4,
    "invoiceId":        2,
    "allocationAmount": 100 },
  {
    "allocationId":     5,
    "invoiceId":        3,
    "allocationAmount": 150 },
  {
    "allocationId":     6,
    "invoiceId":        3,
    "allocationAmount": 150 }]

And merge them to create the following:

[
  {
    "invoiceId":  1
    "amount":     100
    "allocations: [
      {
        "allocationId":     1,
        "invoiceId":        1,
        "allocationAmount": 50 },
      {
        "allocationId":     2,
        "invoiceId":        1,
        "allocationAmount": 50 }]},
  {
    "invoiceId":  2,
    "amount":     200,
    "allocations: [
      {
        "allocationId":     3,
        "invoiceId":        2,
        "allocationAmount": 100 },
      {
        "allocationId":     4,
        "invoiceId":        2,
        "allocationAmount": 100 }]},
  {
    "invoiceId":  3,
    "amount":     300,
    "allocations: [
      {
        "allocationId":     5,
        "invoiceId":        3,
        "allocationAmount": 150 },
      {
        "allocationId":     6,
        "invoiceId":        3,
        "allocationAmount": 150 }]}]

The important thing to notice here is that all the allocations contained in the allocations array share the same invoiceId as their parent. You will need to use map for this. Remember, you can use %var to store intermediate calculations. Bonus: Remove the duplicate invoiceId in the allocation objects.

  1. Take the following CSV file:
invoiceId,vendorName,total,lineItem,lineItemAmount
1,Amazon,100,Sneakers,75
1,Amazon,100,Shirt,25
2,Walmart,38,Paper,10
2,Walmart,38,Towel,28

And transform it to the following JSON:

[
  {
    "invoiceId":  1,
    "vendorName": "Amazon",
    "total":      100,
    "lineItems": [
      {
        "item":   "Sneakers",
        "amount": 75
      },
      {
        "item":   "Shirt",
        "amount": 25
      }
    ]
  },
  {
    "invoiceId":  2,
    "vendorName": "Walmart",
    "total":      38,
    "lineItems": [
      {
        "item":   "Paper",
        "amount": 10
      },
      {
        "item":   "Towel",
        "amount": 28
      }
    ]
  }
]
  1. Take the following input and sort it by whether or not "merchantName" is under 10 characters (let's assume your database's "merchantName" field is VARCHAR(10)):
[
  { "merchantName": "HelloFresh"    },
  { "merchantName": "Amazon"        },
  { "merchantName": "Walmart"       },
  { "merchantName": "Guitar Center" }
]

You should get the following output:

{
  "true": [
    { "merchantName": "Amazon"  },
    { "merchantName": "Walmart" },
  ],
  "false": [
    { "merchantName": "HelloFresh"    },
    { "merchantName": "Guitar Center" }
  ]
}

reduce

reduce is a function that takes in an array and returns anything. Anything meaning it could return an array, object, string, number, null, etc. In addition to the input array, reduce is passed a lambda that describes how to build up the element that is ultimately returned. The lambda has access to both the current value of the iteration, and the current value of the element that is being built up to be returned. reduce is the least intuitive out of the big three transformation functions (the others being map and filter), so if you don't have a great understanding of it yet, check out this introduction.

Exercises

  1. Implement a function myMap that is an implementation of map using reduce
  2. Implement a function myFilter that is an implementation of filter using reduce
  3. Implement a function mySizeOf that is an implementation of sizeOf using reduce
  4. Implement a function myJoinBy that is an implementation of joinBy using reduce
  5. Implement a function maxBy. It takes in two parameters: an array, and a lambda that describes how to compare values. Here are some examples:
maxBy([1,2,3,4,5], ((maximum, n) -> n when (n > maximum) otherwise maximum)) => 5
maxBy(["hello", "hi", "hey"], ((largest, str) -> str when ((sizeOf str) > (sizeOf largest)) otherwise largest)) => "hello"
  1. Take this list of objects:
[
  {"dev":  1},
  {"qa":   2},
  {"prod": 3}
]

and transform it into this:

{
  "dev":  1,
  "qa":   2,
  "prod": 3
}
  1. Take the following DW script and optimize it to use reduce instead of map/filter.
%dw 1.0
%output application/java
---
[1,2,3,4,5] filter (($ mod 2) == 0) map ($ * 100)
  1. Write a function partition that takes an array and a predicate function, and returns an array of arrays, arr, where arr[0] is an array containing all values where the predicate function returned true, and arr[1] is an array containing all the values where the predicate function returns false. It should pass these tests:
partition([1,2,3,4], ((n) -> (n mod 2) == 0)) => [[2,4],[1,3]]
partition(["BIG","BIG","little","little"], ((str) -> ((upper str) == str)) => [["BIG","BIG"],["little","little"]]

Bonus: Write the same function using groupBy.

Recursion

Recursion is when a function calls itself. You can use recursion as a general-purpose looping tool, if needed. You shouldn't need to use recursion often in DataWeave, but it's a good tool to know. I typically use the following pattern when creating recursive functions in DataWeave:

%function fName(<param1>, <param2>,... <paramN>, result=[])
  result
    when <conditional to determine if the final result is ready>
    otherwise <call fName with new arguments to build up result>

Below, when I say something like "this function should take in two parameters" I mean two parameters not including the final optional parameter that stores the result.

Exercises

  1. Write a function strToCharArr, that takes in a string and returns an array containing each character in the string. It should pass these tests:
strToCharArr("Hello") => ["H", "e", "l", "l", "o"]
strToCharArr("")      => []
strToCharArr(null)    => []
  1. Write a function chunkStr, that takes in a string and a number, and returns an array containing the result of splitting the string at each increment of the number. It should pass these tests:
chunkStr("yoyoyo", 2) => ["yo", "yo", "yo"]
chunkStr("yoyoyo", 3) => ["yoy", "oyo"]
chunkStr("yoyoyo", 4) => ["yoyo", "yo"]
chunkStr("", 1)       => []
chunkStr(null, 1)     => []
chunkStr("hi", 0)     => []
  1. Write a function range that takes in 2 required parameters, and one optional one. The first parameter, start represents the start of the range (inclusive) and the second parameter, end represents the end of the range (not inclusive). The third, optional parameter, step, represents the step. range should generate an array from the start number, to the end number, counting by the step parameters. It should pass these tests:
range(1, 6)    -> [1,2,3,4,5]
range(1, 6, 2) -> [1,3,5] 

Please note that this function is for practice purposes only (unless you need to include the final step parameter), please use the syntax 1 to 6 if you just need to generate a range of numbers.

Additional Exercises

These are more advanced exercises that require a greater mastery of the language. Proceed at your own risk! :)

  1. Implement strToCharArr using chunkStr (see questions #1 and #2 in the above section).
  2. Implement a function called repeat that takes in an element e, and a number n, and returns an array containing e repeated n times. It should pass these tests:
repeat(1, 5)              -> [1,1,1,1,1]
repeat({"one", "two"}, 2) -> [{"one", "two"}, {"one", "two"}]
repeat(1, 0)              -> []
repeat(1, null)           -> []
repeat(null, null)        -> []
  1. Implement intersection, a function that takes in two arrays and returns a set of values that are common in both arrays. It should pass these tests:
intersection([1,2,3],[2,3,4])    => [2,3]
intersection([1,2,2,3],[2,2,3,4] => [2,3]
insersection([1,2,3],[4,5,6]     => []
  1. Implement group, a function that takes in an array and a number that represents the group size. It should pass these tests:
group([1,2,3,4], 2) => [[1,2],[3,4]]
group([1,2,3,4], 3) => [[1,2,3], [4]]
group([1], 2)       => [[1]]
group([], 2)        => []
group(null, 2)      => []
  1. Implement find which takes in an array, and a lambda that defines what you're looking for. The find function should return the first value that matches. If no matches, it returns null. It should pass these tests:
find([1,2,3,4,5], ((n) -> (n mod 5) == 0))        => 5
find([1,2,3,4,5], ((n) -> n == 6))                => null
find(["hi","hello","hi"], ((str) -> str == "hi")) => "hi"
  1. Modify find to take an additional parameter that dictates which index to start the search on. The parameter should be optional, and default to 0. It should work exactly like above, except this should pass:
find([1,2,3,4,5], ((n) -> n == 1), 1)  => null
find([1,2,3,4,5], ((n) -> n == 1), 10) => null
  1. Use the function repeat from question #2 above to create the functions padLeft and padRight. They should pass these tests:
padLeft("123", "0", 6)  => "000123"
padLeft("123", "0", 2)  => "123"
padRight("123", "0", 6) => "123000"
padRight("123", "0", 2) => "123"