Original Post: Experimental feature: generic pattern matching
The post describes a new experimental feature in Semgrep called "generic pattern matching." This feature allows Semgrep to match code patterns in languages it doesn’t natively support, configuration files, or structured data like HTML or XML. An example rule shows how to detect a certain pattern in Terraform files.
Key properties of generic pattern matching include:
- Documents are interpreted as sequences of ASCII words, punctuation, and bytes.
- The ellipsis (
...
) skips non-matching elements up to 10 lines. - Metavariables (
$X
) match any words. - Indentation and common ASCII braces determine document structure.
- Shorter matches are preferred, and leading dots match block beginnings.
Example rules for nginx configurations illustrate the feature’s application. However, it has limitations like limited metavariable support and inability to detect obfuscated malicious code. Users can try it in Semgrep’s live editor.
Go here to read the Original Post