I started using semgrep on a work project. I initially learned about the project through HN. The project looked great so I filed it under “tools to use when the time comes”. Well the time came. Here are my initial thoughts in preparation for writing a guide.
My use case was detecting jest
expect calls that would be replaced
by a custom matcher. Here is the backstory. The codebase makes calls
to the Slack BlockKit API. This results in large complex objects in
HTTP requests. Different blocks have their own validations too such as
“this field only accepts these values”, or “this field must be less
than X characters” etc. The test suite did not capture 100% of these
requirements. The result was tests would pass but then fail in
production due to some unknown API constraint or other contractor
error. This had happened before, so I updated the relevant test, then
The problem with that approach is it’s a local fix to that specific
block in that specific test. The same issue may be present in other
API calls but there’s no way to know without adding more
calls. I decided to write a custom jest matcher that encapsulated
checks for “should match contract” that supersede the other calls.
semgrep would work well enough for this. I could write
rules to match lines that I should change. Then run
semgrep to find
tests to refactor, then repeat until
semgrep passed. That worked
Here’s an example:
The test have a lot of code like this:
expect(message.blocks).toHaveProperty("type", "section"); // check required text.type expect(message.blocks).toHaveProperty("text.type", "mrkdwn"); // check required text.text present expect(message.blocks).toHaveProperty("text.text"); // continue to make specific expectations on property values
These three lines are coarse contract testing. They are repeated a lot in the codebase. I’d guess it’s in the hundreds. I want all those replaced with something like:
// One custom matcher that checks the block is compliant with the // "section" API spec expect(message.blocks).toBeSection();
Now the test suite can add more checks to
toBeSection() and now more
repeated code. All that’s left is to identify all the call sites.
semgrep is great for this.
I initially wrote a rule to find all
toHaveProperty("type", "section") calls. Then I learned I could generalize it to find common
sequences of code. Check this out.
I’m not going to go deep into the pattern rules right now but I think
it’s clear enough what’s going on.
$EXPECT is a placeholder for
anything with the
.toHaveProperty method called.
"..." means any
string. This allowed me generalize pattern for common object “shape”
This is just the beginning of my
semgrep practice. This small
experiment taught me that it is a useful tool and definitely deserves
a spot in my workflow. I can’t wait to add this to my precommit
The second learning is that Semgrep has an
Consider this feature alpha. In some cases it rewrote code
completely unexpectedly and in the worst case resulted in syntax
errors. This is not reliable enough yet to use on an entire codebase
or along side tools like mature tools such as
eslint. Try it on
single files before trying across the entire code base. All that being
said, I did write
fix-regex rules for all rules just to
test it out. Hopefully the
--autofix option matures over time.
Lastly, it could be faster. It could be too slow to run on entire codebases during precommit. Luckily it can be limited to specific files on the CLI. Even so, it could be faster. Hopefully the team makes performance improvements in the future.
Regardless of all this, I’m stoked to keep practicing with semgrep to see how and where it can improve my workflow. You should try it.