Taxonomy mapping allows you to standardize categorical values into a predefined set of categories, regardless of how they appear in the source data. This is particularly useful when dealing with variations in how categories are named across different data sources or spreadsheets.

For example, imagine you have product data from multiple retailers, each using their own category names. You might want to standardize these categories like this:

Source Data (What You Start With)

Original Category Name
Ladies Blouses
Mens Athletic Wear
Boys Sports Clothing
Infant Apparel
Baby Clothes 0-24mo
Home & Kitchen

Mapped Data (What You Get)

Original Category NameStandardized Category
Ladies BlousesWOMENS_TOPS
Mens Athletic WearMENS_ACTIVEWEAR
Boys Sports ClothingBOYS_ACTIVEWEAR
Infant ApparelBABY_CLOTHING
Baby Clothes 0-24moBABY_CLOTHING
Home & KitchenHOME_AND_KITCHEN

How to Set Up Taxonomy Mapping

To standardize categories in Lume, you’ll need to define your taxonomy - a master list of standard categories that all incoming data will map to. You define this in your schema using an enum field, similar to how you might create a dropdown list in Excel.

Here’s an example schema that defines standard clothing categories:

{
  "category": {
    "type": ["string"],
    "description": "The standardized product category",
    "enum": [
      "WOMENS_TOPS",
      "MENS_TOPS", 
      "WOMENS_BOTTOMS",
      "MENS_BOTTOMS"
    ]
  }
}

Once you define your taxonomy, Lume uses AI to automatically match any incoming category names to the closest standard category in your enum list. The matching is semantic, meaning it understands the meaning of categories beyond just exact text matches.

Processing Limits

Taxonomy mapping requires intensive AI processing and is subject to rate limits. Keep the total number of classifications (number of rows × number of category columns) below 500 per job. For example, mapping 100 product rows with 2 category columns would use 200 classifications.

Training the Model

Just like you might maintain a lookup table in Excel, you can review and correct mappings directly in the Lume app. When you make corrections, Lume learns from these edits and applies them to future classifications, making your taxonomy mapping more accurate over time.

Confidence Scoring

Confidence scoring is in Alpha and available for taxonomies with fewer than 1000 categories. Contact us with any feedback about confidence scoring.

For each classification, Lume provides a confidence score to help you review results:

  • Confident - Highest confidence match
  • Very High - Strong semantic match
  • High - Good semantic match
  • Medium - Moderate semantic match
  • Low - Weak semantic match
  • Very Low - Poor semantic match
  • Incorrect - Known incorrect match

Viewing Confidence Scores

You can view confidence scores in the Editor tab of a schema transform node, similar to how you might highlight cells in Excel based on certain conditions.

For larger taxonomies (up to 15,000 categories), consider using Filtering mode. This uses text similarity matching instead of semantic classification, similar to fuzzy matching in Excel but more powerful.