wissel.net

Usability - Productivity - Business - The web - Singapore & Twins

Cleaning up an APEX codebase with PMD


You inherit a large code base, it is perfect, a work of beauty. Then you wake up to the ping of your PMD run completion and find multiple thousands of rule violation (and you haven't even started to assert test assertions). Here is how to fix that.

Divide and conquer

Boiling the ocean and fix all issues in one go is hardly an option. So you need to strategize. PMD gives you a hint: every rule has a priority property, where you can specify how important this rule is. The higher the number, the less important the rule. You can run PMD using the min -[somenumber] parameter which will ignore rules with a higher value.

Next step is to decide what rules constitute your priority 1 and 2 buckets. I strongly recommend to make them mandatory fixes before the next deployment, so pick the rules carefully. The candidates I suggest are around performance and security:

Priority 1

  • ApexXSSFromURLParam
  • ApexSOQLInjection
  • ApexOpenRedirect
  • ApexInsecureEndpoint
  • AvoidDmlStatementsInLoops
  • AvoidSoqlInLoops

Priority 2

  • ApexCRUDViolation
  • ApexSharingViolations

Thereafter you decide on priority 3 onwards. Strong candidates for level 3 are all the rules that ensure code is maintainable and avoid errors, like deep nesting or cyclomatic complexity (a fancy word for "messy code")

Putting it into practice

Designing the rules and enforcing them are two distinct steps. Your best course of action is to add PMD to your build pipeline. Of course you don't want your developers to rely on the pipeline to inform them, so you make sure they have the PMD Plugin installed in their VSCode environment.

The build pipeline runs PMD and will fail if violations have been found. So you would run you ruleset with e.g. -min 2 to only fail on high priority rule violations. You still can extract information on the suppressed rules using -showsuppressed. Another option is to run PMD twice. First run uses -min 2 and fails, second run (using HTML or XSLT output) uses all rules and -failOnViolation false to get the full report. The pipeline needs to send the report somewhere thereafter to retain it.

Getting tough on new files

Pick this working strategy to improve your code base: "Whenever you have to alter a piece of code for a business requirement, you fix it to be compliant to all code rules". Keep in mind, a rule is only as good as its enforcement, so your pipeline needs to accommodate that. To implement that a little shell magic is needed. The following snippet presumes your CI has checked out the branch, the ruleset is in the repository (I would pull it from a central location, but we keep it simple here) and starts in your working directory:

#!/bin/bash
# Extract list of changed classes from GIT
git log -1 --name-only | grep classes > changedClasses.txt
# Run PMD on changed class files from the last commit
pmd -R StrictRuleset.xml -filelist changedClasses.txt -failOnViolation true 

This makes sure new and updated code is up to your standards.

As usual: YMMV


Posted by on 23 November 2018 | Comments (1) | categories: Apex PMD Salesforce

Comments

  1. posted by Kim Galant on Tuesday 27 November 2018 AD:

    When working with rules, bear in mind that we for sure are not all Google-level coders, so in many cases the skill level of your developers would dictate how far you go. E.g. on my first ever play with static analysis, I insisted on Cyclomatic Complexity getting fixed (as you say - messy code, and we don't like messy). Except that more or less nobody had any clue as to how to fix it properly. So PMD can't stand alone - you also have to give the devs the training and the clue level to actually understand why stuff is bad and how to turn bad into not-bad. Otherwise you end up with people chopping functions up arbitrarily just to please the scanner and get an even bigger mess than the one you started with. Also, fixing it down to every last rule exception probably wouldn't mean a lot in many cases. I.e. it'd make the code a lot simpler for a Google-level dev to read - but would it make a regular dev's life much easier? And if not, was it really worth the investment?