Core Concepts
Before diving into Lume, it’s important to familiarize yourself with some foundational concepts that will be referenced throughout this documentation.
Source to schema
At its core, Lume leverages AI to generate deterministic logic that maps source data to a target schema. This source-to-schema paradigm is the basis for the inputs that the Lume system requires.
Source Data
Source data is any user-provided data that you want to interpret or transform in one way or another. Lume generally supports any structured or semi-structured data formats, including JSON, CSV, and Excel.
If you need support for a data format not included above, please send a support request to the Lume team!
Lume only requires a single record of source data to generate mapping logic. That said, providing larger batches of data is recommended, because Lume uses data sampling to improve mapping efficacy. A common pattern is to provide a starting batch of data for the initial generation, and then test and validate the generated logic against the larger dataset.
Target Schema
A target schema defines the desired post-mapping format for the source data. Lume’s output will ultimately be a batch of mapped data in this format that corresponds one-to-one with the input source data. Target schemas must follow the JSON Schema format.
Fields that must be populated with data should be included in the required
property.
Periods (e.g. .
) are a reserved character for property names in Lume’s API.
Thus, only property names that do not contain the .
character will be
accepted.
To learn how to edit target schemas in existing pipelines, see Editing Target Schema.
Generating mapping logic
Mapper
A mapper is a set of deterministic, executable logic that maps incoming source data to a target schema. To create a mapper, a target schema must be provided along with sample source data. With these inputs, Lume uses AI to generate the source-to-schema transformation logic.
The sample data will be executed as an initial run with the mapper. Subsequent runs of the same mapper will leverage the existing logic to map new source data.
Run
A run stores information about a particular mapper execution. At first, this will simply include the source data provided for the run. During execution, the run will indicate the realtime status of the data mapping. Once execution is complete, the run will contain the final output mapped data.
A run’s mapped data will always consist of a list of records conforming to the target schema that correspond one-to-one with records in the run’s source data. In other words, they represent the final output of mapper logic execution. Mapped data also come annotated with per-record validation errors for review.
Pipeline
A pipeline represents an active deployment of a mapper and supports versioning via iterative edits on the mapper. Creating a pipeline will also create its initial mapper version, which is active by default. Running the pipeline will map incoming source data with its active mapper. The initial mapper can be edited incrementally to produce new versions which can be deployed to the pipeline (only one can be active at a time).
Generally a pipeline should be used to organize mapper versions that, on a high level, operate on the same source-target schema pair. Fundamentally different incoming data sources or outgoing target schemas are best represented as new pipelines.
Review and editing
Review
Mapping logic and mapped data is viewable for each run in a pipeline, as the executed logic can differ based on the mapper. The Get Mapper endpoint returns high-level information about transformation logic in the manifest field.
Workshopping
A pipeline can be workshopped to create and deploy new mapper versions. Mapper edits are inherently iterative (they are always based on a previously generated mapper version) and can modify the target schema and transformation logic. Transformation logic can be edited directly with manual changes or indirectly with natural language prompting. See Editing Mappers and Building Mapper Logic to learn more.
Mapper edits can (and should!) be tested with source data to validate the edits work as desired. As with the initial mapper, creation of a new mapper version requires sample source data which will be executed as an initial run. Once satisfied with the test results, you can deploy the mapper version to the pipeline to make it active. Future pipeline runs will leverage the new mapper logic by default.