Core Concepts

Welcome to Lume! 🚀 We’ve recently introduced Projects as our new way to organize data transformations. If you’re new to Lume or evaluating our platform, we recommend starting with our Project Guides for the latest and greatest experience. Existing customers can continue using Flows while we help you transition.

Understanding these foundational concepts will help you make the most of Lume’s capabilities. This guide introduces the key components and how they work together.

Data Pipeline Basics

Source Data

The input data you want to transform

Target Schema

The desired structure for your output data

Source Data

Source data is any user-provided data that you want to interpret or transform. Lume supports various structured and semi-structured formats:

JSON
CSV
Excel
And more

Example Source Data

[
    {
        "id": "0018y000008hFqqAAE",
        "name": "Blue Sky Ventures LLC",
        // ...
    },
    {
        "id": "0018y000008nrMrAAI",
        "name": "Green Acres Holdings LLC",
        // ...
    }
]

While Lume only requires a single record to generate mapping logic, providing larger data samples improves mapping accuracy through better pattern recognition.

Need support for additional data formats? Contact the Lume team for assistance!

Target Schema

A target schema defines the desired output format for your transformed data. It uses JSON Schema format to specify:

Expected data types
Field requirements
Data validation rules
Format specifications

Remember: Property names in Lume’s API cannot contain periods (.).

Don’t know JSON Schema? Lume can automatically generate a target schema from a sample CSV file containing your desired output format. This makes it easy for non-technical users to define their data requirements.

Example Target Schema

{
  "type": "object",
  "properties": {
    "full_name": {
      "type": ["string"],
      "description": "The full name of the customer, including first name and last name."
    },
    "email_address": {
      "type": ["string"],
      "description": "The customer's primary email address used for communication.",
      "format": "email"
    }
    // ... additional fields ...
  },
  "required": ["full_name", "email_address"]
}

Required Schema Properties

Type: Defines allowed data types

{
  "type": ["string", "null"]
}

Description: Explains the field’s purpose

{
  "description": "The status of the business's registration"
}

Optional Schema Properties

Enum: Defines allowed values for classification

{
  "enum": ["CA", "NY", "TX"]
}

Format: Specifies data format (email, date, UUID, etc.)

{
  "format": "date-time"
}

Pattern: Enforces format via regex

{
  "pattern": "^\\+?[1-9]\\d{1,14}$"
}

Advanced Properties

cleaning-instructions: Custom data cleaning rules

{
  "lume-settings": {
    "cleaning-instructions": "Remove special characters and convert to lowercase"
  }
}

source-enum: Defines acceptable source values for classification

{
  "lume-settings": {
    "source-enum": ["California", "New York", "Texas"]
  }
}

Pipeline Components

Flows

Orchestrate your data transformation journey

Mapping

AI-powered data transformation

Runs

Execute and monitor your transformations

Flows

A flow is your complete data transformation pipeline. It can:

Accept multiple data inputs
Include multiple transformation steps
Join and combine data
Produce final mapped output

Flows help you organize related transformations into logical sequences. Complex transformations can be broken down into manageable steps, making them easier to maintain and modify.

Mapping

Lume generates Python code to transform your data, but you don’t need to be a programmer to use it effectively. The platform provides: For Excel Users:

Visual data lineage showing how fields map between source and target
Sample data previews at each transformation step
Natural language explanations of the transformation logic
Interactive workshopping interface for adjusting mappings

For Developers:

Python-based transformation code (with SQL, dbt, and JSONata support coming soon)
Version control for all mapping logic
Direct code editing capabilities
Test-driven development support

Whether you’re Excel-proficient or a seasoned developer, Lume provides the tools you need to understand and control your data transformations.

Access the Schema Transform node within your flow to workshop and improve your mapper logic. Test changes with sample data before deployment.

Runs

A run represents the actual execution of your flow. Each run tracks:

Input source data
Output mapped data
Mapper version used
Real-time execution status
Validation results

Quality Control

Validation

Ensure data quality and accuracy

Monitoring

Track performance and catch issues

Iteration

Improve and refine your pipelines

Validation

Lume provides comprehensive validation capabilities:

Schema compliance checking
Data format verification
Required field validation
Custom validation rules

Monitoring

Monitor your data transformations through:

Real-time run status
Field-level validation results
Macro statistics
Performance metrics

Iteration

Improve your pipelines through:

Direct code inspection
Interactive workshopping
Test-driven development
Version control

Welcome

Getting Started

Project Guides

Flow Guides

Security

Questions

Permissions

Data Pipeline Basics

Source Data

Target Schema

Source Data

Target Schema

Pipeline Components

Flows

Mapping

Runs

Flows

Mapping

Runs

Quality Control

Validation

Monitoring

Iteration

Validation

Monitoring

Iteration

Welcome

Getting Started

Project Guides

Flow Guides

Security

Questions

Permissions

​Data Pipeline Basics

Source Data

Target Schema

​Source Data

​Target Schema

​Pipeline Components

Flows

Mapping

Runs

​Flows

​Mapping

​Runs

​Quality Control

Validation

Monitoring

Iteration

​Validation

​Monitoring

​Iteration

Data Pipeline Basics

Source Data

Target Schema

Pipeline Components

Flows

Mapping

Runs

Quality Control

Validation

Monitoring

Iteration