Before diving into Lume, it’s important to familiarize yourself with some foundational concepts that will be referenced throughout this documentation.

Source to schema

At its core, Lume leverages AI to generate deterministic logic that maps source data to a target schema. This source-to-schema paradigm is the basis for the inputs that the Lume system requires.

Source Data

Source data is any user-provided data that you want to interpret or transform in one way or another. Lume generally supports any structured or semi-structured data formats, including JSON, CSV, and Excel.

If you need support for a data format not included above, please send a support request to the Lume team!

Lume only requires a single record of source data to generate mapping logic. That said, providing larger batches of data is recommended, because Lume uses data sampling to improve mapping efficacy. A common pattern is to provide a starting batch of data for the initial generation, and then test and validate the generated logic against the larger dataset.

Target Schema

A target schema defines the desired post-mapping format for the source data. Lume’s output will ultimately be a batch of mapped data in this format that corresponds one-to-one with the input source data. Target schemas must follow the JSON Schema format.

Fields that must be populated with data should be included in the required property.

Periods (e.g. .) are a reserved character for property names in Lume’s API. Thus, only property names that do not contain the . character will be accepted.

To learn how to edit target schemas in existing pipelines, see Editing Target Schema.

Generating mapping logic

Pipeline

A pipeline handles the mapping of incoming source data to a target schema. To create a pipeline, a target schema must be provided. The pipeline can then be run with batches of source data. Each pipeline run is represented as a job (see below).

The first time a pipeline is run, Lume uses AI to generate a mapper. The mapper defines deterministic, executable logic that performs the source-to-schema mapping. Subsequent runs of the same pipeline will leverage the existing mapper to map new source data.

Job

A job stores information about a particular pipeline run. At first, this will simply include the source data provided for the run. During the run, the job will indicate the realtime status of mapper generation and execution. Once the run is complete, the job will contain the final output mapped data.

A job’s mapped data will always consist of a list of records conforming to the target schema that correspond one-to-one with records in the job’s source data. In other words, they represent the final output of mapping source data with a pipeline. Mapped data come annotated with per-record validation errors as well as a manifest, which describes the source-to-schema field relationships at a high level.

Review and editing

Review

Mapping logic and mapped data is viewable for each job in a pipeline, as the executed logic can differ based on the job. The Get Result Manifest endpoint returns high-level information about a job’s mappings.

Workshop

All editing takes place in a session called a workshop. Workshops are specific to a pipeline and allow editing of that pipeline’s target schema and mapper logic. Mapper logic can be edited directly with manual changes or indirectly with natural language prompting. See Editing Mappers and Building Mapper Logic to learn more.

Workshop edits can (and should!) be tested with source data to validate the edits work as desired. Test source data can be provided directly or selected from a previously run job. Once satisfied with the test results, you can deploy a workshop to apply its edits to the pipeline. This will update the pipeline with new logic for all future job runs.

Workshop deployment will not apply new logic to preexisting jobs. Only source data provided after workshop deployment will be mapped with the new logic.