GuideMay 27, 202610 min read

Common CSV Import Errors and How to Fix Them

The 11 CSV import errors that break pipelines most often, why each one happens, and the exact fix. Encoding, delimiters, quoting, dates, and locale issues explained with practical solutions (2026).

Igor Nikolic
Igor Nikolic

Co-founder, FileFeed

Common CSV Import Errors and How to Fix Them

A CSV looks like the simplest file format in the world. It is just rows of values separated by commas. Open it in a text editor and you can read every byte. And yet CSV imports fail constantly. A file that loaded cleanly last week throws a parsing error this week. A column of numbers imports as text. Names with accents turn into a soup of question marks and boxes. A single stray comma shifts every value one column to the right for the rest of the file.

The reason is that CSV is not actually a standard. There is an RFC (4180) that describes how CSV should work, but almost nobody follows it strictly. Every system that exports a CSV file makes its own choices about delimiters, quoting, encoding, line endings, and how to represent empty values. When you import a file from a client, a partner, or a legacy system, you are importing all of those choices along with the data. The errors below are what happens when their choices do not match your parser's assumptions.

This guide walks through the eleven CSV import errors that break pipelines most often. For each one, you will see why it happens and exactly how to fix it. They fall into two groups: structural and parsing errors that stop the file from being read at all, and data and formatting errors that let the file load but corrupt the values inside it. The second group is more dangerous, because the import looks like it succeeded.

Key Takeaways

  • Most CSV errors are mismatched assumptions, not corrupt files. The exporter and the importer disagree about encoding, delimiter, quoting, or date format.
  • The silent errors are the expensive ones. Encoding, date ambiguity, and number formatting let the file load but quietly corrupt the data.
  • Excel is a frequent culprit. Opening and re-saving a CSV in Excel strips leading zeros, reformats dates, and converts long numbers to scientific notation.
  • Validation before load is the only durable fix. Catching these errors at import time is far cheaper than tracing corrupted records in production later.
88%
of spreadsheets contain at least one error (Panko research)
$3.1T
estimated annual cost of poor data quality in the US (IBM)
60%
of data quality issues originate at the point of ingestion
10x
cost to fix bad data in production vs catching it at import

What causes CSV import errors

Almost every CSV import error comes from one of three sources. The first is a mismatch between how the file was written and how your parser expects to read it: a different delimiter, a different character encoding, a different way of escaping special characters. The second is human editing, usually in Excel or Google Sheets, which silently transforms values when a file is opened and saved. The third is structural inconsistency within the file itself, where some rows have more or fewer columns than the header promises.

None of these are exotic. They show up in ordinary files from ordinary systems. The difference between a pipeline that handles them and one that breaks is whether you anticipate them and validate for them before the data lands in your database. With that framing, here are the specific errors.

Structural and parsing errors

1. Character encoding and garbled text

This is the error behind names like "José" instead of "José" and currency symbols that turn into random characters. It happens when a file is saved in one character encoding (often Windows-1252 or ISO-8859-1) but read as another (usually UTF-8), or the reverse. The bytes are intact, but the parser interprets them with the wrong character map, producing the garbled text known as mojibake.

The fix is to detect or declare the encoding explicitly rather than guessing. Standardize on UTF-8 for everything you accept, and convert files that arrive in other encodings before parsing. Watch for a byte order mark (BOM) at the start of UTF-8 files written by Excel: those three invisible bytes can corrupt the first column header if your parser does not strip them. A robust importer detects the encoding from the file content and normalizes everything to UTF-8 as the first step.

2. Wrong or ambiguous delimiter

CSV stands for comma-separated values, but plenty of CSV files are not separated by commas. European locales frequently use semicolons because the comma is the decimal separator. Other exports use tabs or pipes. If your parser assumes a comma and the file uses semicolons, the entire row is read as a single column and nothing maps correctly.

The fix is delimiter detection. Sniff the first few lines and count candidate delimiters (comma, semicolon, tab, pipe) to determine which one produces a consistent column count. Better still, make the expected delimiter part of your import configuration per source, so a client who always sends semicolon-delimited files is handled correctly every time without re-detection.

3. Unescaped commas and broken quoting

A field that contains the delimiter must be wrapped in quotes, so that "Smith, John" is read as one value rather than two. When an exporter fails to quote such fields, or quotes them inconsistently, the parser splits a single value into multiple columns and every field after it shifts. The same problem happens with quotes inside quoted fields when they are not properly doubled or escaped.

The fix is a parser that follows RFC 4180 quoting rules strictly, and validation that checks every row has the expected number of columns. When you see a row with too many columns, an unescaped delimiter inside a field is almost always the cause. Reject or flag those rows rather than letting the misaligned data load, because a shifted row is worse than a missing one. It looks valid but every value is in the wrong field.

4. Inconsistent column counts (ragged rows)

The header row promises ten columns, but row 4,512 has nine and row 8,003 has eleven. Ragged rows come from manual edits, broken exports, or unescaped delimiters (see the previous error). Some parsers silently pad or truncate, which means data ends up in the wrong fields without any error being raised.

The fix is to validate column count on every row against the header and treat any mismatch as an error to be flagged, not silently absorbed. A good validation pipeline reports the exact row numbers so the file can be corrected at the source rather than guessed at downstream.

5. Header problems: missing, renamed, or extra rows

Many files do not start with clean headers on row one. There may be a title, a few metadata lines, or blank rows above the real header. Or the client renamed a column from "Employee ID" to "Emp_ID" and your fixed mapping no longer matches. Or there is no header at all. Each of these breaks an importer that assumes the first row is the header and that header names are stable.

The fix has two parts. First, detect and skip leading metadata rows so the real header is found. Second, map columns by meaning rather than by exact string match, so that "Employee ID", "Emp_ID", and "employee_id" all resolve to the same target field. Flexible field mapping is what keeps a header rename from breaking the pipeline.

6. Line break and newline issues

Files created on Windows use carriage return plus line feed (CRLF) to end lines, while Unix systems use a single line feed (LF). Mixed line endings, or a parser that only expects one style, can merge rows or split them incorrectly. The harder version of this error is a legitimate line break inside a quoted field, such as a multi-line address, which a naive line-by-line reader will mistake for the end of a record.

The fix is to use a CSV parser that understands quoted fields can contain newlines, rather than splitting the file on raw line breaks. Normalize line endings on ingestion, and never parse CSV by simply splitting on the newline character. For a deeper look at parser behavior, see our roundup of JavaScript CSV parsers.

The problem

The five errors above stop a file from parsing, so you usually find out fast. The errors below are more dangerous: the file loads successfully and the import reports success, but the values inside are quietly wrong. These are the ones that surface weeks later in a report nobody can reconcile.

Data and formatting errors

7. Date format ambiguity

The value 01/02/2026 means January 2nd in the United States and February 1st almost everywhere else. When a file crosses locales, dates are silently misinterpreted: a parser reads US-style dates as European or the reverse, and January 2nd becomes February 1st without any error. The corruption is invisible until someone notices the monthly numbers are off.

The fix is to enforce an explicit, unambiguous date format in your schema and reject values that do not match it. Require ISO 8601 (YYYY-MM-DD) where you can. When you must accept locale-specific formats, configure the expected format per source rather than letting the parser guess. Ambiguous dates are the single most common cause of silent data corruption in file imports, so treat date parsing as a validation rule, not an afterthought.

8. Number and decimal formatting

Numbers carry locale baggage too. In the United States, one thousand and a half is written 1,000.50. In much of Europe it is 1.000,50, with the meanings of the comma and period reversed. Add currency symbols, percent signs, thousands separators, and parentheses for negatives, and a column of perfectly valid numbers becomes unparseable or, worse, parses to the wrong value. The string "1.500" might be one and a half, or fifteen hundred, depending on locale.

The fix is to define the expected numeric format per source and normalize on import: strip currency symbols and thousands separators, standardize the decimal separator, and validate that the result is a number within an expected range. Range checks catch the locale errors that slip through, because a value that is off by three orders of magnitude usually fails a sanity bound.

9. Excel mangling: leading zeros and scientific notation

Excel is the most common tool people use to open and edit CSV files, and it transforms data without asking. Zip codes like 02134 lose their leading zero and become 2134. Long identifiers and phone numbers get converted to scientific notation, so 1234567890123 becomes 1.23457E+12 and the original digits are lost permanently once the file is saved. Product codes that look like dates, such as "3-1", get converted to actual dates.

The fix on the producing side is to avoid round-tripping CSVs through Excel, or to import them into Excel as text rather than opening them directly. On the receiving side, validate that identifier fields keep their expected length and character pattern, and flag values that show signs of Excel conversion (scientific notation in an ID column, a date where a code should be). When you control the upload experience, an embeddable importer can catch these at the moment of upload.

10. Type coercion failures

CSV is a typeless format. Every value is a string until something interprets it. That means a quantity field can contain the word "TBD", a boolean field can contain "yes" or "maybe" instead of true or false, and a price field can contain "pending". The file loads, but the moment a downstream process tries to do arithmetic or a strict insert into a typed database column, it fails, often far from the original import.

The fix is explicit type checking at import time. Define the expected type for every field and validate that each value can be parsed as that type before accepting the row. Catching "TBD" in a numeric column at import is trivial; tracing a crash three systems downstream to that same value is not. This is a core part of cleaning CSV data properly.

11. Inconsistent empty and null values

Different systems represent "no value" differently. One export uses an empty cell, another writes the literal text "NULL", a third uses "N/A", "-", "none", or a single space. If your import treats only empty cells as missing, the rows that say "N/A" sail through as if they contained real data, and a required-field check that should have caught them does not.

The fix is to define which tokens count as null for each source and normalize them to a single empty representation before validation runs. Then apply required-field checks against that normalized value. Trailing whitespace deserves the same treatment: trim values before checking them, so a field containing only spaces is correctly recognized as empty.

How to prevent CSV import errors at scale

Fixing these errors one file at a time is manageable when you have a handful of sources. It stops being manageable when you have dozens of clients, each sending files in their own dialect of CSV, each capable of changing their export format without warning. At that point, ad hoc fixes scattered through your import code become their own source of bugs.

The durable solution is to move from reactive fixes to a structured import pipeline that handles these errors by design. A pipeline like this normalizes encoding, detects the delimiter, parses with strict quoting rules, validates column counts and types on every row, normalizes dates and numbers to a defined format, and produces an error report with exact row numbers before any data is accepted. The same checks run on every file, so a new client or a changed format is caught immediately rather than discovered in production.

  1. Normalize on ingestion. Detect and convert encoding to UTF-8, strip BOMs, normalize line endings, and detect the delimiter before parsing begins.
  2. Parse strictly. Use an RFC 4180 compliant parser that handles quoting and embedded newlines correctly, never a naive split on commas or newlines.
  3. Validate every row. Check column count, types, required fields, date and number formats, and ranges. Reject or flag rows that fail, with row-level error reporting.
  4. Map by meaning. Resolve column headers to target fields flexibly so a rename does not break the pipeline.
  5. Report, then load. Surface all errors in a single structured report before accepting the file, so problems are fixed at the source rather than traced downstream.

Building this pipeline in-house is a real engineering investment, and keeping it correct as edge cases pile up is ongoing work. For teams that would rather not own that maintenance, FileFeed's Automated File Feeds handle encoding normalization, delimiter detection, strict parsing, schema validation, and field mapping out of the box, with errors surfaced before data reaches your system. If you are building or rethinking a file ingestion workflow, designing for these eleven errors from the start is what separates a pipeline that scales from one that breaks every time a client changes their export.

Key insight

Treat CSV import as a contract, not a guess. Define the encoding, delimiter, date format, and schema you expect from each source, validate every file against that contract, and reject what does not match. Most CSV import errors disappear the moment the import stops assuming and starts checking.

Frequently asked questions

What is the most common CSV import error?

The two most common are character encoding errors and delimiter mismatches. Encoding errors produce garbled text (mojibake) when a file is saved in one encoding and read as another, most often a non-UTF-8 file read as UTF-8. Delimiter mismatches happen when a parser assumes commas but the file uses semicolons or tabs, which causes every row to be read as a single column. Both are prevented by detecting encoding and delimiter from the file rather than assuming them.

How do I fix encoding errors in a CSV file?

Detect the file's actual encoding and convert it to UTF-8 before parsing. If you control the export, save the file as UTF-8 explicitly. If you receive files in other encodings such as Windows-1252 or ISO-8859-1, run an encoding conversion step on ingestion. Also strip the byte order mark (BOM) that Excel adds to UTF-8 files, since those invisible bytes can corrupt the first column header. Standardizing every incoming file to UTF-8 as the first pipeline step eliminates most encoding errors.

Why does Excel change my CSV data?

Excel applies automatic formatting when it opens a CSV. It strips leading zeros from numbers (so 02134 becomes 2134), converts long numbers and identifiers to scientific notation, and reinterprets values that look like dates. These changes are saved back into the file, so the original data is lost. To avoid it, do not open CSVs directly in Excel for round-tripping. Instead import them as text, or keep identifier and code fields validated for their expected length and pattern so any Excel conversion is caught.

How do I handle commas inside CSV fields?

Fields that contain the delimiter must be wrapped in double quotes, so "Smith, John" is read as one value. Use a parser that follows RFC 4180 quoting rules, which also handles quotes inside quoted fields by doubling them. Then validate that every row has the expected number of columns. A row with too many columns almost always indicates an unescaped delimiter inside a field, and those rows should be flagged rather than loaded, because the misaligned values look valid but sit in the wrong fields.

Can I prevent CSV import errors automatically?

Yes. A structured import pipeline prevents the majority of CSV errors by normalizing encoding, detecting the delimiter, parsing with strict quoting, and validating types, formats, and column counts on every row before any data is accepted. You can build this in-house or use a platform that provides it. The key is that the same validation runs on every file, so a malformed file or a changed export format is caught at import time rather than discovered after it has corrupted production data. See our guide on automating CSV imports for how to set this up.

Skip the manual work

Let FileFeed handle file processing so your team doesn’t have to

Start free, configure your first pipeline, and see how FileFeed handles the file processing layer so your team doesn't have to.