Core Concepts
Understand the key concepts and architecture
⚠️ Private SDK Documentation - This documentation is for customers with private SDK access. Some features and capabilities may vary based on your agreement.
Understanding the fundamental concepts behind the Lume Python SDK will help you build more effective data transformation workflows.
Architecture Overview
The Lume SDK follows a simple but powerful pattern:
Key Objects
Flow
A Flow is a logical mapping template that defines:
- Target Schema: The structure of your output data
- Transformation Rules: How to map input data to a destination data model
- Validation Rules: Quality checks and business logic
- Error Handling: How to handle malformed or missing data
- Seed Data Integration: How to use reference data during transformation
Flows are created and managed in the Lume UI, not via the API.
Version
A Version is an immutable snapshot of a Flow at a specific point in time. Think of it like a Git commit - once created, it never changes.
Why Versions?
- Reproducibility: Same input always produces same output
- Safety: Changes to flows don’t affect running jobs
- Rollback: Easy to revert to previous versions
- Testing: Test new versions before promoting to production
- Compliance: Maintain audit trails
Run
A Run is a single execution of a Flow Version against one or more CSV input files and optional seed files.
Run Lifecycle
Every run goes through these states:
Status Meanings
Status | Description | Action Required |
---|---|---|
CREATED | Run created, data uploaded, waiting to be triggered | Trigger the run |
QUEUED | Waiting for resources | None - will start automatically |
RUNNING | Currently processing | None - monitor progress |
SUCCEEDED | All data processed successfully | Download results |
PARTIAL_FAILED | Some data processed, some failed | Check rejects, download results |
FAILED | All data failed to process | Investigate errors |
CRASHED | System error occurred | Contact support |
Output Structure
Every run produces a consistent output structure:
Metrics Overview
The metrics.json
file contains:
File Formats
Input Formats
- CSV: Comma-separated values (only supported input format)
Output Formats
- CSV: Comma-separated values (default)
- JSON: JSON Lines format (one JSON object per line)
Seed Files
- CSV: Reference data files used during transformation
- Examples: lookup tables, configuration data, master data
Supported Storage
The SDK can read from and write to:
- Amazon S3:
s3://bucket/path/to/file
Seed Files
Seed files provide reference data that can be used during the transformation process. Common use cases include:
Lookup Tables
Configuration Data
Master Data
Output Format Selection
You can choose your output format when downloading results:
Security and Compliance
See our Security page.