Data tool

Dataset Corruptor

Damage CSV, TSV, JSON, or plain-text datasets on purpose so you can test importers, validators, ETL jobs, and cleanup workflows against messy data.

Runs Locally

This tool runs in your browser. Files never leave your device.

Drop a JSON, CSV, TSV, or TXT file here or paste data below.

Drag and drop a dataset file anywhere inside this input card.

Corrupted output

Detected CSV · 5 source lines · 242 B output

id,na,email,plan,active,score,created_at
1002,Jane Doe,jane@example.com,Free,,74,2026-06-03
1003,Sam Smith,sam@example.com,Team,true,88,2026-06-04
,Avery Stone,avery@example.com,Pro,true,91,
1001�,false,laramie@example.com,,true,98,2026-06-01

Corruption report

Rows dropped
0
Rows duplicated
0
Rows shuffled
4
Values blanked
4
Values mutated
1
Types changed
1
Fields removed
0
Fields renamed
1
Junk inserted
1
Structure damage
0

What this breaks

Imports: missing rows, duplicated rows, changed delimiters, blanks, and broken structure.
Schemas: removed fields, renamed keys, changed value types, null-ish values, and malformed output.
Cleaning: junk characters, weird casing, truncated strings, fake nulls, and inconsistent values.
Testing: seeded output lets you reproduce the same messy dataset during debugging.

What this tool does

Dataset Corruptor introduces controlled errors into structured datasets. Generate missing values, malformed records, noisy entries, and invalid data for testing machine learning pipelines, validation systems, parsers, and import workflows. The tool helps developers evaluate how systems behave when data quality is less than perfect.

How it works

Upload or paste dataset content, choose corruption settings, and generate modified output with controlled levels of noise and errors.

When to use Dataset Corruptor

Machine learning testing
AI training workflows
Data validation testing
Import pipeline evaluation
Synthetic dataset generation
Error-handling development
Quality assurance testing