Taxonomy Mapping
How to standardize categories using Lume
Taxonomy mapping allows you to standardize categorical values into a predefined set of categories, regardless of how they appear in the source data. This is particularly useful when dealing with variations in how categories are named across different data sources or spreadsheets.
For example, imagine you have product data from multiple retailers, each using their own category names. You might want to standardize these categories like this:
Source Data (What You Start With)
Original Category Name |
---|
Ladies Blouses |
Mens Athletic Wear |
Boys Sports Clothing |
Infant Apparel |
Baby Clothes 0-24mo |
Home & Kitchen |
Mapped Data (What You Get)
Original Category Name | Standardized Category |
---|---|
Ladies Blouses | WOMENS_TOPS |
Mens Athletic Wear | MENS_ACTIVEWEAR |
Boys Sports Clothing | BOYS_ACTIVEWEAR |
Infant Apparel | BABY_CLOTHING |
Baby Clothes 0-24mo | BABY_CLOTHING |
Home & Kitchen | HOME_AND_KITCHEN |
How to Set Up Taxonomy Mapping
To standardize categories in Lume, you’ll need to define your taxonomy - a master list of standard categories that all incoming data will map to. You define this in your schema using an enum
field, similar to how you might create a dropdown list in Excel.
Here’s an example schema that defines standard clothing categories:
Once you define your taxonomy, Lume uses AI to automatically match any incoming category names to the closest standard category in your enum list. The matching is semantic, meaning it understands the meaning of categories beyond just exact text matches.
Processing Limits
Taxonomy mapping requires intensive AI processing and is subject to rate limits. Keep the total number of classifications (number of rows × number of category columns) below 500 per job. For example, mapping 100 product rows with 2 category columns would use 200 classifications.
Training the Model
Just like you might maintain a lookup table in Excel, you can review and correct mappings directly in the Lume app. When you make corrections, Lume learns from these edits and applies them to future classifications, making your taxonomy mapping more accurate over time.
Confidence Scoring
Confidence scoring is in Alpha and available for taxonomies with fewer than 1000 categories. Contact us with any feedback about confidence scoring.
For each classification, Lume provides a confidence score to help you review results:
Confident
- Highest confidence matchVery High
- Strong semantic matchHigh
- Good semantic matchMedium
- Moderate semantic matchLow
- Weak semantic matchVery Low
- Poor semantic matchIncorrect
- Known incorrect match
Viewing Confidence Scores
You can view confidence scores in the Editor tab of a schema transform node, similar to how you might highlight cells in Excel based on certain conditions.
For larger taxonomies (up to 15,000 categories), consider using Filtering mode. This uses text similarity matching instead of semantic classification, similar to fuzzy matching in Excel but more powerful.