Understanding the core concepts behind the Lume Python SDK
Understanding the fundamental concepts behind Lume will help you build more effective and secure data transformation pipelines. Lume’s architecture is designed to provide a secure, reliable, and scalable transformation engine while minimizing direct access to your production systems.
The Lume Python SDK operates on a “Sync-Transform-Sync” model, which is designed for maximum security and operational simplicity. When you trigger a run, you are not executing the transformation logic in your own environment. Instead, you are orchestrating a pipeline on the Lume platform.
This architecture means:
This diagram illustrates the flow of data during a lume.run()
execution.
A Connector is a pre-configured, authenticated link to one of your external data systems, such as an object store or a relational database. Connectors are created and managed securely within the Lume UI.
A Flow Version must be associated with at least one source and one target connector.
Lume provides connectors for a variety of systems.
Object Storage Primary storage solution for handling ad-hoc documents such as CSV and JSON files. Supported object storage includes:
Relational Databases (Recommended) Optimal storage solution for large datasets and structured data. Supported databases include:
A Flow is a logical mapping template that defines the blueprint for a transformation. This includes:
Flows are created and managed in the Lume UI, not via the API.
A Version is an immutable snapshot of a Flow at a specific point in time. Think of it like a Git commit - once created, it never changes.
Why Versions?
A Run is a single execution of a Flow Version against a specific batch of data. You create a run by calling lume.run()
.
Each run is defined by two key parameters:
flow_version
: The immutable logic to execute.source_path
: A string that tells Lume what specific data to process. See Understanding source_path
below for details.This one function call orchestrates the entire Sync-Transform-Sync pipeline.
Pro Tip: Use Webhooks for Production
While run.wait()
is great for simple scripts and getting started, we strongly recommend using Webhooks for production applications. They are more scalable and efficient than continuous polling.
source_path
The source_path
parameter is a string that uniquely identifies the data your pipeline will process. Its meaning depends on the type of Source Connector used by your Flow Version.
When your source is an object store, source_path
is the full URI to a specific file.
s3://my-customer-data/new_records.csv
Lume will fetch this specific file for processing.
When your source is a database, source_path
is not a direct path but rather a logical identifier for a batch of data. It’s a string you provide (e.g., a batch ID, a date range) that your pre-configured query in the Lume UI uses to select the correct rows.
"batch_202407291430"
The Connector configuration in Lume contains the actual SQL query. This query must reference the source_path
to filter the data. For example, your query might look like: SELECT * FROM invoices WHERE batch_id = :source_path;
This design prevents SQL injection and separates orchestration logic (the source_path
your code provides) from data access logic (the query managed in Lume).
For a complete, step-by-step guide to running your first pipeline, see the Quickstart Guide.
Every run goes through an expanded set of states reflecting the sync-transform-sync process:
A run can also terminate in FAILED
, PARTIAL_FAILED
, or CRASHED
.
Status | Description |
---|---|
CREATED | Run has been accepted and is waiting to be scheduled. |
SYNCING_SOURCE | Lume is actively ingesting data from your source system into its secure staging area. |
TRANSFORMING | The data transformation logic is being executed on the staged data. |
SYNCING_TARGET | Lume is writing the transformed data and metadata to your target system. |
SUCCEEDED | The entire pipeline, including both sync steps and the transformation, completed successfully. |
PARTIAL_FAILED | The pipeline completed. Some rows were transformed successfully, while others were rejected due to validation or mapping errors. Both mapped and rejected data are written to the target system. See Handling Partial Failures for details. |
FAILED | A non-recoverable error occurred during one of the steps. Check metadata for details. |
CRASHED | A fatal system error occurred. Contact support. |