Skip to main content
If you are building your schema or need to customize an existing one, this guide will walk you through the process.

Schema Basics

Target schemas in Lume use YAML format to define your desired output structure. A target schema requires a models section and each entry requires a name field and columns section. Each column entry contains the following:
  1. A field name
  2. A clear description of the field’s business meaning and context
  3. A set of tests to validate your transformed data
models:
  - columns:
      - name: customer_name
        description: "The full legal name of the customer as it appears on official documents"
Write clear, specific descriptions that explain your business’s unique requirements and context. For example, specify if “revenue” means monthly recurring revenue, annual revenue, or revenue before returns. Learn more about writing effective descriptions in our Creating Field Descriptions guide.

Defining Enums for Classifications

Within your Target Schema, you can define an enum set that will trigger Lume’s classification module. The classification module will classify the transformed source data needed to fit your target field to one of your options if it fits. Here is an example of defining an enum set of Apparel, Electronics, and Perishable for the field category:
models:
  - columns:
    - name: category 
      description: Category of the product
      tests:
      - accepted_values:
          values:
          - Apparel
          - Electronics
          - Perishable
Classifications for SQL projects coming soon!

Defining Code Generation Language Preference

A user can define per model what language they would like Lume’s AI engine to generate code. Here is a quick example:
Language Specification
    models:
        - name: orders
          language: python
          columns:
            - name: order_id
Lume currently supports code generation in both SQL and Python.

Types of Default Tests

The YAML schema provides built in test options: unique, not_null, accepted_values, and relationships. Here is an example using those tests for an orders model:
models:
    - name: orders
        columns:
            - name: order_id
                tests:
                    - unique
models:
    - name: orders
        columns:
            - name: order_id
                tests:
                    - not_null
models:
    - name: orders
        columns:
            - name: status
                tests:
                    - accepted_values:
                        values: ['placed', 'shipped', 'completed', 'returned']
models:
    - name: orders
        columns:
            - name: customer_id
                tests:
                    - relationships:
                        to: ref('customers')
                        field: id
Lume also provides built in support for DBT Utils Tests.

Complete Example

Here’s a complete target schema example:
models:
  - name: customers
    description: "Customer records and metadata"
    columns:
      - name: customer_id
        description: "Unique identifier for each customer"
        tests:
          - not_null
          - unique

      - name: customer_name
        description: "Full legal name of the customer"
        tests:
          - not_null

      - name: customer_type
        description: "Type of customer (e.g., individual, business)"
        tests:
          - accepted_values:
              values: ["individual", "business"]

  - name: orders
    description: "All customer orders"
    columns:
      - name: order_id
        description: "Primary key for the order"
        tests:
          - not_null
          - unique

      - name: customer_id
        description: "Foreign key to customers"
        tests:
          - not_null
          - relationships:
              to: ref('customers')
              field: customer_id

      - name: status
        description: "Current status of the order"
        tests:
          - accepted_values:
              values: ["pending", "shipped", "delivered", "cancelled"]

  - name: payments
    description: "Payments made toward orders"
    columns:
      - name: payment_id
        description: "Unique ID for the payment record"
        tests:
          - not_null
          - unique

      - name: order_id
        description: "Associated order ID"
        tests:
          - relationships:
              to: ref('orders')
              field: order_id

      - name: payment_method
        description: "Method of payment (e.g., credit card, PayPal)"
        tests:
          - accepted_values:
              values: ["credit_card", "paypal", "bank_transfer"]
When working with ecommerce data, product catalogs often require specific schema structures to handle product attributes, variants, and categorization. Here’s an example schema that demonstrates common ecommerce patterns:
models:
  - name: products
    description: "Core product information and metadata"
    columns:
      - name: product_id
        description: "Unique identifier for each product (SKU)"
        tests:
          - not_null
          - unique

      - name: product_name
        description: "Display name of the product"
        tests:
          - not_null

      - name: product_type
        description: "Main product category (e.g., physical, digital, subscription)"
        tests:
          - accepted_values:
              values: ["physical", "digital", "subscription", "service"]

      - name: brand
        description: "Manufacturer or brand name"
        tests:
          - not_null

      - name: status
        description: "Current product status in the catalog"
        tests:
          - accepted_values:
              values: ["active", "draft", "archived", "discontinued"]

      - name: category
        description: "Primary product category"
        tests:
          - accepted_values:
              values: ["clothing", "electronics", "home", "beauty", "sports"]

  - name: product_variants
    description: "Product variations (size, color, etc.)"
    columns:
      - name: variant_id
        description: "Unique identifier for the variant"
        tests:
          - not_null
          - unique

      - name: product_id
        description: "Reference to parent product"
        tests:
          - not_null
          - relationships:
              to: ref('products')
              field: product_id

      - name: color
        description: "Product color variant"
        tests:
          - accepted_values:
              values: ["red", "blue", "green", "black", "white", "yellow"]

      - name: size
        description: "Product size variant"
        tests:
          - accepted_values:
              values: ["XS", "S", "M", "L", "XL", "XXL"]

      - name: material
        description: "Product material variant"
        tests:
          - accepted_values:
              values: ["cotton", "polyester", "wool", "silk", "leather"]
Lume currently supports only single-level category hierarchies in the schema definition. If your product catalog requires multiple category levels (e.g., Clothing > Men > Shirts > T-Shirts), please contact Lume support for assistance with implementing a custom solution.
For ecommerce catalogs, pay special attention to:
  • Product variants and their relationships
  • Category enumerations
  • Product status workflows
  • Brand and manufacturer relationships

Best Practices

  1. Clear Descriptions: Write clear, specific descriptions that explain the business meaning of each field
  2. Test Rules: Add test rules where appropriate to ensure data quality
  3. Consistent Naming: Use consistent field naming conventions throughout your schema
Remember: Focus on describing what each field means, not how to transform it. Lume handles the transformation logic automatically!
Property names in Lume’s API cannot contain periods (.).
Lume currently does not support custom test and macros.
  • our_custom_macros_test
  • not_null
  • unique
I