Original Post: Semgrep: a static analysis journey
Summary:
Introduction:
Semgrep, a fast, lightweight static analysis tool for finding bugs and enforcing code standards, supports over 17 languages and originated as Spatch (Coccinelle) for C. Initially used by academics, it was adapted at Facebook for PHP (Sgrep). Now maintained by r2c, Semgrep is favored by modern security teams.
The academic years: Spatch:
Spatch began in 2006 to address Linux kernel API changes, automating difficult "collateral evolutions" using familiar patch syntax and features like metavariables and ellipses. Published work and open-source release gained positive reception from Linux community, showing practical utility in automating code transformations.
The Facebook years: Pfff and Sgrep:
At Facebook, Semgrep’s precursors evolved to address PHP’s lack of static analysis. Pfff was developed, leading to Sgrep for enforcing new API rules. Sgrep was fast, easy to use, and tightly integrated with Facebook’s CI, leading to widespread internal adoption with over 200 rules and influencing future tools like Infer and Zoncolan.
The r2c years: Semgrep:
In 2019, at r2c, Semgrep was expanded to support multiple languages and complex patterns, evolving through community feedback and internal advancements. Semgrep now offers features like taint analysis and a web playground, aiming to make security effortless and accessible for developers.
Conclusion:
Semgrep’s design has always addressed immediate needs, guided by practical application and user feedback. Its evolution reflects a consistent goal of simplifying security and enhancing developer usability. With ongoing developments, Semgrep continues to adapt to broader audiences while maintaining its core mission of making security easy.
For a detailed exploration, watch a presentation by Jean Yang and Hongyi Hu on Twitch.
Go here to read the Original Post