The quickest way to get started with flat, tabular data is to upload a sample CSV file with your desired output format. Lume will automatically generate a target schema for you! For nested or complex data structures, we recommend building your schema manually using this guide.

If you prefer to build your schema manually or need to customize an existing one, this guide will walk you through the process.

Schema Basics

Target schemas in Lume use JSON Schema format to define your desired output structure. Each field requires:

  1. A field name
  2. One or more data types
  3. A clear description of the field’s business meaning and context

Write clear, specific descriptions that explain your business’s unique requirements and context. For example, specify if “revenue” means monthly recurring revenue, annual revenue, or revenue before returns. Learn more about writing effective descriptions in our Creating Field Descriptions guide.

Field Types

Common JSON Schema types include:

  • string: Text data
  • number: Numeric values
  • integer: Whole numbers
  • boolean: True/false values
  • array: Lists of values
  • object: Nested structures
  • null: Missing or undefined values

Data Classification with Enums

Use enums to classify data into specific categories:

{
  "subscription_tier": {
    "type": ["string"],
    "description": "The customer's subscription level",
    "enum": ["free", "basic", "premium", "enterprise"]
  }
}

Validation Rules

JSON Schema provides several validation options:

Complete Example

Here’s a complete target schema example:

{
  "type": "object",
  "properties": {
    "customer_id": {
      "type": ["string"],
      "description": "Unique identifier for the customer",
      "pattern": "^CUST\\d{6}$"
    },
    "full_name": {
      "type": ["string"],
      "description": "Customer's full legal name"
    },
    "email": {
      "type": ["string", "null"],
      "description": "Primary contact email address",
      "format": "email"
    },
    "account_type": {
      "type": ["string"],
      "description": "Type of account held by the customer",
      "enum": ["personal", "business", "enterprise"]
    },
    "monthly_spend": {
      "type": ["number"],
      "description": "Average monthly spend in USD",
      "minimum": 0
    },
    "is_active": {
      "type": ["boolean"],
      "description": "Whether the customer account is currently active"
    }
  },
  "required": ["customer_id", "full_name", "account_type"]
}

Best Practices

  1. Clear Descriptions: Write clear, specific descriptions that explain the business meaning of each field
  2. Appropriate Types: Use the most specific type(s) possible for each field
  3. Validation Rules: Add validation rules where appropriate to ensure data quality
  4. Required Fields: Mark essential fields as required in the schema
  5. Consistent Naming: Use consistent field naming conventions throughout your schema

Remember: Focus on describing what each field means, not how to transform it. Lume handles the transformation logic automatically!

Advanced Schema Structures

Nested Objects

Your schema can include nested objects to represent complex data structures:

{
  "billing_address": {
    "type": ["object"],
    "description": "Customer's billing address details",
    "properties": {
      "street": {
        "type": ["string"],
        "description": "Street address including unit number"
      },
      "city": {
        "type": ["string"],
        "description": "City name"
      },
      "state": {
        "type": ["string"],
        "description": "State or province code",
        "minLength": 2,
        "maxLength": 2
      },
      "postal_code": {
        "type": ["string"],
        "description": "Postal or ZIP code"
      }
    }
  }
}

Arrays

Use arrays to represent lists of values or objects:

Database-Based Schemas

Your schema can mirror database tables and relationships:

{
  "user": {
    "type": ["object"],
    "description": "User record from the database",
    "properties": {
      "id": {
        "type": ["integer"],
        "description": "Primary key from users table"
      },
      "departments": {
        "type": ["array"],
        "description": "Departments this user belongs to",
        "items": {
          "type": ["object"],
          "properties": {
            "dept_id": {
              "type": ["integer"],
              "description": "Foreign key to departments table"
            },
            "role": {
              "type": ["string"],
              "description": "User's role in this department",
              "enum": ["member", "lead", "manager"]
            }
          }
        }
      }
    }
  }
}

Field names cannot contain periods (.) as this is a protected character in Lume. Use underscores or camelCase instead:

  • user.first.name
  • user_first_name
  • userFirstName