Mule

Mule Programming Style Guide: DataWeave Code

How to structure and format your DataWeave code to optimize readability.

Joshua A Erney

14 Sep 2018 • 7 min read

In my previous post about my Mule programming style, I discussed a couple things: first, why the readability of your code is so important, and second, how having a single flow that describes the overall intent of the code through descriptive doc:name attributes can really improve the readability of your code. In this post, I will discuss how I format my DataWeave code to achieve the same effects.

My Code

Again, I'd like to start with an example of what my DataWeave code typically looks like. Here's a sample. Don't worry about understanding what it does.

%dw 2.0
output application/json

import modules::Utils as utils

var invoices    = payload
var allocations = vars.allocations

fun formatDate(s: String) =
  s as Date   {format: "MM/dd/yyyy"}
    as String {format: "yyyyMMdd"}

fun replaceNullEverywhere(e, replacement="") =
  utils::applyToValues(e, ((v) -> if (v == null) replacement else v))
---
replaceNullEverywhere(
  invoices map ((invoice) -> {
    invoiceId:     invoice.invoice_id,
    invoiceNumber: invoice.invoice_number,
    amount:        invoice.total as Number,
    invoiceDate:   formatDate(invoice.date),
    allocations:   allocations[invoice.invoice_id] map ((allocation) -> {
      percentage: allocation.percentage,
      amount:     allocation.amount,
      datePaid:   formatDate(allocation.paid_date)
    })
  })
)

Now let's talk about what I'm doing and why.

Reassign Payload and FlowVars when appropriate.

When you're looking at a script for the first time that someone else wrote, payload doesn't carry much meaning other than just being the payload of the message. There's nothing there that cues the reader to what the payload represents, unless metadata has been set up. Reassigning payload to be something more meaningful can go a long way towards helping someone understand the intention of your transformation. For example, in the example above, it's obvious we're mapping invoice data. Perhaps that's obvious from the fields as well, but that won't always be the case.

Prefer Naming Your Lambda/Callback Arguments Instead of Using the Default Values

You won't find $, $$, or $$$ too often in my DataWeave code anymore. In my opinion, the saved keystrokes are just not worth the sacrifice in readability. If I do something like:

invoices map {
  id:        $$,
  invoiceId: $.invoice_id
  ...
}

I will surely know at the time of writing the code what $ and $$ represent. But will I remember in 6 months when I come back to this code? Or will I need to look it up in the documentation again? What about the junior dev who's going to maintain this? What if I did this instead:

invoices map ((invoice, index) -> {
  id:        index,
  invoiceId: invoice.invoice_id
  ...
})

Which one is easier to understand?

Separate Functionality from Mapping

I keep all of my custom functionality in the header of the script (everything above ---). Functions can get pretty complex, and having them in the body can obfuscate how the fields map to each other. The only thing I keep in the body area are function calls, or very simple transformations (like casting). This allows me to separate how I want to format the mapping, from how I want to format my functions, and it helps keep both simple.

Your Mapping Code Should Match the Shape of the Output, When Applicable

If you look at my example code, you'll see that I indent once for the first map and do an additional indent for the second map. I do this because it mirrors how the output of the script will look:

[
  {
    "invoiceId": 1,
    "invoiceNumber: "J503KL",
    "amount": 3.5
    "invoiceDate": "20181212",
    "allocations: [
      {
        percentage: 100,
        amount: 100,
        datePaid: "20181222"
      }
    ]
  }
]

Justify Your Code

I'm assuming this will be the most polarizing style preference because:

I've never seen anyone voluntarily do this in ANY code (except my college professor. Shout out to Dr. Morgan Benton), and
They assume it's going to take a long time to format code this way

It's going to take a bigger example to justify why I take the time to do this. Would you rather read this:

invoice map ((invoice, index) -> {
  "insertionCriteria": createInsertionCriteria(invoice),
  "Invoice Line Number": index + 1,
  "Business Unit": buName,
  "Source": upper source,
  "Invoice Number": invoice.INVOICE_NUMBER,
  "Invoice Amount": invoice.INVOICE_AMOUNT,
  "Invoice Date": formatDate(invoice.INVOICE_DATE),
  "Supplier Name": configs.supplierName,
  "Supplier Number": configs.supplierNumber,
  "Supplier Site": flowVars.configs.SUPPLIER_SITE_CODE,
  "Invoice Description": trim invoice.INV_DESCRIPTION,
  "Invoice Type": invoiceType(invoice.INVOICE_AMOUNT),
  "Payment Terms": configs.PAYMENT_TERM,
  "Line Type": configs.LINE_TYPE,
  "Amount": invoice.LINE_AMOUNT,
  "Line Description": invoice.LINE_DESC,
  "Distribution Combination": buildDistributionCombination(invoice),
  "Terms Date": null,
  "Goods Received Date": null,
  "Invoice Received Date": null,
  "Accounting Date": null,
  "Payment Method": null,
  "Pay Group": null,
  "Pay Alone": null,
  "Discountable Amount": null,
  "Prepayment Number": null,
  "Prepayment Line Number": null,
  "Prepayment Application Amount": null,
  "Prepayment Accounting Date": null,
  "Invoice Includes Prepayment": null,
  "Conversion Rate Type": null,
  "Conversion Date": null,
  "Conversion Rate": null,
  "Liability Combination": null,
  "Document Category Code": null
})

Or this:

invoice map ((invoice, index) -> {
  "Invoice Line Number":           index + 1,
  "Business Unit":                 buName,
  "Source":                        upper source,
  "Invoice Number":                invoice.INVOICE_NUMBER,
  "Invoice Amount":                invoice.INVOICE_AMOUNT,
  "Invoice Date":                  formatDate(invoice.INVOICE_DATE),
  "Supplier Name":                 configs.SUPPLIER_NAME,
  "Supplier Number":               configs.SUPPLIER_NUMBER,
  "Supplier Site":                 configs.SUPPLIER_SITE_CODE,
  "Invoice Description":           trim invoice.INV_DESCRIPTION,
  "Invoice Type":                  invoiceType(invoice.INVOICE_AMOUNT),
  "Payment Terms":                 configs.PAYMENT_TERM,
  "Line Type":                     configs.LINE_TYPE,
  "Amount":                        invoice.LINE_AMOUNT,
  "Line Description":              invoice.LINE_DESC,
  "Distribution Combination":      buildDistributionCombination(invoice),
  "Terms Date":                    null,
  "Goods Received Date":           null,
  "Invoice Received Date":         null,
  "Accounting Date":               null,
  "Payment Method":                null,
  "Pay Group":                     null,
  "Pay Alone":                     null,
  "Discountable Amount":           null,
  "Prepayment Number":             null,
  "Prepayment Line Number":        null,
  "Prepayment Application Amount": null,
  "Prepayment Accounting Date":    null,
  "Invoice Includes Prepayment":   null,
  "Conversion Rate Type":          null,
  "Conversion Date":               null,
  "Conversion Rate":               null,
  "Liability Combination":         null,
  "Document Category Code":        null
})

This is a no-brainer for me. One looks like a giant blob of unstructured code, the other looks like a clear set of key-value mappings. You may be thinking "Wow, who has time to structure their code like this?". Frankly, I did this for years before taking an afternoon to write a script that does it for me: that's how important I think it is, and how dumb I was for not writing the script earlier. It's not perfect, but it's good enough that it saves me a ton of time. You can find it on my GitHub here.

Wrap Generic Functions In A Simple Interface, Even If You Don't Have To

This one is a bit more complicated. You may have noticed the following in the header of my first example:

import modules::Utils as utils

...

fun replaceNullEverywhere(e, replacement="") =
  utils::applyToValues(e, ((v) -> if (v == null) replacement else v))

I could've done this instead:

---
utils::applyToValues(
  invoices map ((invoice) -> {
    ...
  },
  ((v) -> if (v == null) "" else v)
)

But I didn't. Why?

applyToValues is a very generic function. As its first parameter, it can take a single value. It could be a string, an object, an array, or any combination of those and more. Its second parameter is a function with a single arument. This allows the user to pass in any kind of functionality they want to apply to the first argument. So we now know two things about applyToValues: its first parameter can be just about any value, and its second parameter can be just about any single-argument function.

Wow. That's a lot of flexibility. The downside to all this flexibility is that the reader needs to understand a lot of context to understand how the function is being used. They need to know what the value passes as the first argument is, and they need to understand how the function passed as the second argument is going to interact with that value to change it. That's asking a lot of someone reading your code for the first time!

Is there an easy way we can alleviate that mental burden? Yes, and it takes just a few seconds: Give it a more context-specific interface (i.e. wrap it with a function that's easier to use). By giving applyToValues a wrapper called replaceNullEverywhere, it's immediately obvious to the reader how applyToValues is being used: It's being used to replace all null values with another value. We also eliminate the need to expose the lambda to the caller, eliminating all the flexiblity. At the expense of flexibility, we gain a ton of readability.

In software development, there's typically a tradeoff between readability and flexibility. The more flexible a tool is, the more time you need to spend learning how to use it properly. This is true in a lot of other areas of life, including photography. Think about how easy it is to use your phone camera versus a DSLR. The interface of a camera phone is simple. A DSLR can take fantastic photos in a much wider variety of conditions, but the interface is much more complicated. But you can't just flip a switch on a DSLR and make it easier to use. However, it is that easy with programming, so take advantage and do a favor for those who will need to read your code in the future!

Above was a short overview of applyToValues. If you're interested in a deep dive of how this code works and other ways it can be used in your transformations, check out my post about it here

Conclusion

That's how I handle formatting and writing my DataWeave code, in a nutshell. Little steps like assigning your payload to a descriptive variable name, naming your callback arguments, separating your functions from your mapping, justifying your code, and giving generic functions a more specific interface can go a long way in helping others understand your code. Don't feel like you need to apply all these rules at once, or apply all of them at all. Obviously these are my opinions, but if you can walk away with something that will help your clients and other developers, that's great! Please let me know what you all think in the comments section below.