Skip to content

Latest commit

 

History

History
213 lines (162 loc) · 8.14 KB

File metadata and controls

213 lines (162 loc) · 8.14 KB

jscpd v5 (Rust Engine)

The Rust engine is a ground-up rewrite of jscpd. It is a drop-in replacement for the Node.js CLI — same algorithm, same reporters, same .jscpd.json config — but 24-37x faster.

The Rust engine is distributed as two npm packages:

Package Installs commands Notes
jscpd@5 jscpd and cpd Same command name as v4, plus cpd alias
cpd cpd Lighter package, shorter command only

Both packages install the identical Rust binary and accept the same CLI options.

Performance

Benchmarks on macOS (Apple Silicon), 10 runs per target (3 for CopilotKit). v4 ran with --no-gitignore -i "node_modules" to ensure comparable file scanning. See performance-comparison.md for full methodology.

Codebase Files Size jscpd v4 (Node.js) cpd/jscpd v5 (Rust) Speedup
Multi-format fixtures 548 1.5 MB 1.03s 0.03s 34.3x
Svelte source 9K 38 MB 15.80s 0.43s 36.9x
CopilotKit 17K 159 MB 82.89s 3.44s 24.1x

Installation

# npm — installs both jscpd and cpd commands (same binary as v4 command name)
npm install -g jscpd@5
jscpd /path/to/code
cpd /path/to/code      # cpd alias also available

# npm — installs only the cpd command (lighter)
npm install -g cpd
cpd /path/to/code

# crates.io — Rust-native install (exposes both jscpd and cpd commands)
cargo install jscpd
jscpd /path/to/code
cpd /path/to/code

# Nix — run without installing
nix run github:kucherenko/jscpd -- /path/to/code

# Nix — install permanently
nix profile install github:kucherenko/jscpd

# Homebrew (macOS/Linux)
brew install jscpd

The npm packages ship prebuilt binaries for 6 platforms: macOS arm64/x64, Linux arm64/x64 (glibc/musl), Windows x64. No Node.js runtime is required — the binary is self-contained.

CLI Usage

Both jscpd and cpd commands are available after installing jscpd@5. They accept the same options and are identical:

# Both commands work the same way
jscpd [OPTIONS] [PATH]...
cpd [OPTIONS] [PATH]...

Options

Option Short Description Default
--min-tokens -k Minimum tokens in a clone 50
--min-lines -l Minimum lines in a clone 5
--max-lines -x Maximum source file lines
--max-size -z Skip files larger than SIZE (e.g. 1kb, 1mb, 100kb) no limit
--mode -m Detection mode: mild, weak, strict mild
--workers Number of worker threads for parallel tokenization/detection auto (all CPU cores)
--no-colors Disable ANSI color output off
--absolute -a Use absolute paths in reports off
--ignore-case Ignore case of symbols in code (experimental) off
--formats-exts Custom format-to-extension mapping (e.g. javascript:es,es6;dart:dt)
--formats-names Custom format-to-filename mapping
--list List all supported formats and exit
--skip-local Skip clones where both fragments are in the same directory off
--min-duplicated-lines Minimum percentage of duplication to report (0-100) 0
--silent -s Suppress console output off
--no-tips Suppress tips and promotional messages off
--version -V Print version
--help -h Print help

Reporters

13 built-in reporters:

Reporter Output
console Clone list + statistics table (default)
console-full Clone list with source snippets; with --blame shows side-by-side author comparison
json report/jscpd-report.json
xml report/jscpd-report.xml
csv report/jscpd-report.csv
html report/jscpd-report.html
markdown report/jscpd-report.md
badge report/jscpd-badge.svg + report/jscpd-lines-badge.svg
sarif report/jscpd-report.sarif (GitHub Code Scanning)
ai Token-efficient output for LLM pipelines
xcode Xcode-compatible warnings
threshold Exit 1 if duplication percentage exceeds --threshold
silent No console output

Output file names differ from v4: v5 uses jscpd-report.* prefix (e.g. jscpd-report.json, jscpd-report.sarif) while v4 uses jscpd-report.json, html/ directory, etc.

Blame Output

With --blame --reporters console-full, clones are displayed with a side-by-side author comparison:

176 │ Andrii Kucherenko │ <= │ 196 │ Josh Soref │ ## TODO
177 │ Andrii Kucherenko │ <= │ 197 │ Josh Soref │
180 │ Andrii Kucherenko │ == │ 200 │ Andrii Kucherenko │ ## License

== means both lines were written by the same author; <= means different authors (potential copy).

Examples

# Drop-in replacement for jscpd v4
jscpd /path/to/source
# or
cpd /path/to/source

# Same flags as v4
cpd /path/to/source --min-tokens 30 --min-lines 3 --reporters console,json,html

# Git blame with side-by-side author comparison
cpd /path/to/source --blame --reporters console-full

# List supported formats
cpd --list

# Use multiple reporters with custom output
cpd ./src -r console,json,sarif -o ./reports

# Skip clones within the same directory
cpd --skip-local /path/to/source

Config File

v5 reads the same .jscpd.json config file format as v4:

{
  "path": ["./src"],
  "reporters": ["console", "json"],
  "minLines": 5,
  "minTokens": 50,
  "threshold": 0,
  "format": ["javascript", "typescript"],
  "ignore": ["**/node_modules/**"],
  "gitignore": true,
  "mode": "mild"
}

Format Support

v5 supports 223 formats (verified via --list). Use cpd --list to see the full list.

Cross-Format Detection

Vue SFC (.vue), Svelte (.svelte), Astro (.astro), and Markdown (.md) files are tokenized per-block/per-section, enabling duplicate detection across file types — same as v4.

Differences from jscpd v4 (Node.js)

Feature jscpd v4 (Node.js) cpd v5 (Rust)
--blame Calls git CLI for each file Same output (==/<= markers), calls git blame --porcelain per file
--store (LevelDB/Redis) Persistent store for large repos Not supported. Use jscpd v4.x for external stores.
--formats-exts Custom format-to-extension mapping Same flag name, same behavior
--formats-names Custom format-to-filename mapping Same flag name, same behavior
Programming API jscpd() Promise API, detectClones() Rust API via cpd-finder crate; no Node.js API
Config file .jscpd.json with camelCase keys Same — .jscpd.json with camelCase keys
Cross-format detection Vue SFC, Svelte, Astro, Markdown Same — per-block tokenization
Token counts Varies by tokenizer May differ by 1-2% due to Rust tokenizer; clone detection matches
--reporters All v4 reporters All v4 reporters except full (use console-full)
--no-gitignore Default respects .gitignore Same behavior, same flag name
--workers Not available Available — control parallelism for file tokenization/detection
Output filenames jscpd-report.json, html/ directory jscpd-report.json, jscpd-report.html, jscpd-report.sarif, jscpd-report.csv, jscpd-report.md, jscpd-badge.svg, jscpd-lines-badge.svg

Rust API

For integration in Rust applications:

use cpd_finder::orchestrate::{RunConfig, run};

let config = RunConfig {
    paths: vec!["./src".into()],
    min_tokens: 50,
    ..Default::default()
};

let result = run(&config).unwrap();
println!("Found {} clones", result.clones.len());
println!("Analyzed {} files", result.statistics.total.sources);

Architecture

cpd (CLI binary)
 ├── cpd-core      — Detection algorithm (Rabin-Karp rolling hash)
 ├── cpd-tokenizer — Language tokenization (223 formats)
 ├── cpd-finder    — File walking, orchestration, git blame
 └── cpd-reporter  — Output formatting (13 reporters)