Handy data cleaning tool – CSV fingerprints

Recently I stumbled upon a handy little tool that may be interesting for everyone working with data in tables. An important but often tedious task is the cleaning of your dataset before you can actually start running statistical analyses. During this cleaning or mastering process you may find artifacts like the following:

  • Entries with unexpected data types: When test takers were expected to describe something in prose but a few entered a number instead.
  • Empty cells where no missing values are allowed: Maybe a mistake when entering paper pencil data manually.
  • A sudden shift of cell values to the right, causing a lot of values to fall into the wrong column: This happens, when data separation characters are used in the data itself.

If you’ve ever worked with larger sets of data, you surely know these or similar problems and have experience how hard it can be to spot them.

CSV Fingerprints gives you a very quick first visual of your data and can therefore save you a lot of time. Victor Powell, the author of this handy tool explains CSV Fingerprints in more details on his blog. There is also a full screen version of the tool available.

Tip: Don’t try to copy&paste data directly from Excel, always copy the CSV from a text editor.


