Before diving into Lume, it’s important to familiarize yourself with some foundational concepts that will be referenced throughout this documentation.

Source to schema

At its core, Lume leverages AI to generate deterministic logic that maps source data to a target schema. This source-to-schema paradigm is the basis for the inputs that the Lume system requires.

Source Data

Source data is any user-provided data that you want to interpret or transform in one way or another. Lume generally supports any structured or semi-structured data formats, including JSON, CSV, and Excel.

If you need support for a data format not included above, please send a support request to the Lume team!

Lume only requires a single record of source data to generate mapping logic. That said, providing larger batches of data is recommended, because Lume uses data sampling to improve mapping efficacy. A common pattern is to provide a starting batch of data for the initial generation, and then test and validate the generated logic against the larger dataset.

Target Schema

A target schema defines the desired post-mapping format for the source data. Lume’s output will ultimately be a batch of mapped data in this format that corresponds one-to-one with the input source data. Target schemas must follow the JSON Schema format.

Fields that must be populated with data should be included in the required property.

Periods (e.g. .) are a reserved character for property names in Lume’s API. Thus, only property names that do not contain the . character will be accepted.

To learn how to edit target schemas in existing pipelines, see Editing Target Schema.

Generating mapping logic

Mapper

A mapper is a set of deterministic, executable logic that maps incoming source data to a target schema. To create a mapper, a target schema must be provided along with sample source data. With these inputs, Lume uses AI to generate the source-to-schema transformation logic.

Currently, Lume generates mapper logic in Python. Support for additional languages and frameworks is on our roadmap, including:

  • SQL
  • dbt
  • DSL (JSONata)

The sample data will be executed as an initial run with the mapper. Subsequent runs of the same mapper will leverage the existing logic to map new source data.

Run

A run represents data moving through a flow. Each run stores:

  • The source data provided as input
  • The mapped data produced as output
  • The version of the mapper used for transformation
  • The realtime status of the data mapping during execution

A run’s mapped data will always consist of a list of records conforming to the target schema that correspond one-to-one with records in the run’s source data. In other words, they represent the final output of mapper logic execution. Mapped data also come annotated with per-record validation errors for review.

Flow

A flow represents a sequence of data mapping steps that processes source data into final mapped output. Each flow can:

  • Accept one or multiple source data inputs
  • Include intermediate steps for data joining or schema transformation
  • Generate final mapped data output

Running the flow will process incoming source data through all defined steps using the deployed mappers. The mapper logic can be edited at any time to improve or modify the transformation process.

Generally a flow should be used to organize related data transformation steps that work together to achieve a specific mapping outcome. Fundamentally different data transformation needs are best represented as new flows.

Review and editing

Review

The Lume app provides comprehensive review capabilities for your mapping operations:

  • View mapped data results
  • Analyze flagged fields that may need attention
  • Examine macro statistics about the mapping
  • Inspect the actual generated code
  • Make necessary edits directly in the interface

Workshopping

A flow’s mapper can be workshopped in the Lume app by accessing the Schema Transform node within the flow. This allows you to improve or modify the transformation logic. You can edit both the target schema and transformation logic directly through the interface. See Editing Mappers and Building Mapper Logic to learn more.

Mapper edits can (and should!) be tested with source data to validate the changes work as desired. Once satisfied with the test results, you can deploy the updated mapper to the flow. Future flow runs will leverage the new mapper logic.