When processing data through a schema transformer, validation errors are captured in the output’s errors property, providing detailed information about any data quality issues or schema violations.

Error Structure

The validation errors are organized in a tree structure where each path represents a field or nested object in your data. Each leaf node contains an array of ValidationErrorDetail objects:

interface ValidationErrorDetail {
  error_type: string;
  statistics: ErrorStats;
  schema_path: string;
  error_sample: SampleItem[];
  check: any;
}

Error Types

The system recognizes several categories of validation errors:

  • Basic Validation

    • pattern: Value doesn’t match the required pattern
    • required: Required field is missing
    • type: Value doesn’t match the expected type
    • enum: Value isn’t one of the allowed options
    • array: Array validation failures
    • duplicate: Duplicate value where uniqueness is required
  • Custom Validation

    • custom: Custom validation rule failures
    • unknown: Unrecognized validation issues
  • Unsupported Cases

    • Various unsupported_* types for special handling

Error Statistics

Each error includes detailed statistics about its occurrence:

interface ErrorStats {
  error_count: number;   // Number of times this error occurred
  null_count: number;    // Number of null values
  total_count: number;   // Total number of records processed
  missing_count: number; // Number of missing values
}

Error Samples

Errors include sample data to help diagnose issues:

interface SampleItem {
  index: number;  // Index of the record in the source data
  value: any;     // The problematic value
}

Example

Here’s an example of how validation errors might appear in the output:

{
  "errors": {
    "user": {
      "email": [{
        "error_type": "pattern",
        "statistics": {
          "error_count": 3,
          "null_count": 0,
          "total_count": 100,
          "missing_count": 0
        },
        "schema_path": "user.email",
        "error_sample": [
          { "index": 5, "value": "invalid-email" },
          { "index": 12, "value": "also-invalid" }
        ],
        "check": {
          "pattern": "^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}$"
        }
      }],
      "age": [{
        "error_type": "type",
        "statistics": {
          "error_count": 1,
          "null_count": 0,
          "total_count": 100,
          "missing_count": 0
        },
        "schema_path": "user.age",
        "error_sample": [
          { "index": 23, "value": "thirty" }
        ],
        "check": {
          "type": "number"
        }
      }]
    }
  }
}

In this example, we can see validation errors for email format and age type conversion, including statistics about how often these errors occur and sample problematic values.