Metadata tagging and user bucketing
Intelligent Recommendations can be used to improve relevant personalization for end users, even when they're anonymous. Customers can integrate a personalized metadata tagging experience for their storefront. This experience is achieved by using the ability to identify metadata tags for content (like written articles, podcasts, videos, retail products, etc.) and recommend similar tags or content based on the taste/preference of that user. User metadata can be powerful for recommending relevant content to all users, including:
- New or infrequent customers (also known as “cold users”).
- Connecting users to other users based on unique metadata tagging.
- Connecting users to both relevant and short-lead-time content.
When Metadata tagging is enabled, users can create new recommendations scenarios such as:
- Metadata Categories we picked for you
- Other People also look at these categories
- Recent Events based on your recent activity
- Similar products/content based on their attributed metadata tags
- Picks for you based on user behavior buckets
What is a tag?
Tags are a descriptor for something of interest within the items/content, which users gravitate to, and must be specific to the activity of the end user. E.g., , In the world of movies, genre, cast members, mood, etc. may all be considered tags for a movie, as well something that end-Users have a special fondness/dislike for. Tags can even include celebrity players/users, Article titles, genre, Product Categories, events, other content terminology. The goal is to ensure that end users are recommended relevant content that fits their interests/taste/preference based on available metadata.
Architecture overview
To configure metadata tagging as shown in the architecture diagram, the prerequisites are as follows:
- Authoritative storage for content with rich metadata tags – Catalog.
- User interaction behavior (clicks on content/Usage). End-user profile information may also be available to be used.
- A separate Intelligent Recommendations Account and modeling instance for understanding user interests presented as tags.
- A component to rank content based on personalized tags with a real-time API query.
When enabled, the service produces a model of personalized "tags” for users, based on:
- Historical User interactions
- Metadata-rich content with “tags”
- The assumption here's that the tags are cleaned (no spelling errors and the tags are a predetermined, rationalized set from experts and not randomly created or attached).
Data Contract Configuration
To configure a Data contract to support metadata tagging, do as follows:
Take note of the changes between the ItemId
, TagId
, and InteractionGroupingId
.
In the Applications section, you see examples of how the introduction of a TagID
or BucketId
changes the configuration of the Data Contract. We suggest having a separate Intelligent Recommendations account and modeling instance when testing metadata tagging.
IR Capability Name | CATALOG Data Entity |
CATALOG Data Entity Fields |
INTERACTIONS Data Entity |
INTERACTIONS Data Entity Fields |
---|---|---|---|---|
(Required for all responses) | Reco_ItemsAndVariants | ItemId as the TagId Title |
||
Filtering ability (Applies to all lists) |
Reco_ItemCategories | ItemCategories: ItemId (or TagId ), Category |
||
People also view | Reco_Interactions | InteractionGroupingId as the UserId ItemId as the TagId UserId InteractionType: Purchase == viewed Timestamp |
||
Picks for you | Reco_Interactions | (Same as previous) |
Applications and examples
The following sections walk through two common use cases that benefit from metadata tagging and provide some examples with demo data for each.
- To get "most popular items for you" for cold users. To see an example, see the section titled "Get Most Popular Items for you for Cold Users".
- To create a machine learned map of Users' metadata-values. To see an example, see the section titled "Create an ML map of users' metadata values".
Application 1: Get “most popular items for you” for cold users
A common problem in the world of AI-ML is how to provide relevant recommendations to users who are new or infrequent customers (also known as “Cold users”). As mentioned previously, the aim here's to create some distinct buckets based on meaningful categories and available demographic information (that is, Age and Gender). Then use all Interactions to connect all users to their corresponding demographic buckets, which in turn enables the buckets to be connected to items during the model training phase. During the serving phase, a Cold User’s demographic bucket can be assigned and then used to recommend items, for example “most popular items by user bucket”.
The steps are as follows:
- Prepare a bucketing of Users with their metadata information.
- Create the connections for the model in the “Reco_Interactions.csv” data storage file.
- Query the model to get “most popular items by user bucket” API.
Step 1: Prepare a bucketing of the Users with their metadata information
Few best practices when creating your buckets are as follows:
- User Metadata can be represented as ranged buckets. Consider using the metadata that makes sense for your business domain and use case. For example, if you wanted to create a bucket for age data, then you could use these values: Age5To11, Age30To40, etc.
- Some User metadata can even be combined in buckets together. Consider using the metadata and combinations that make sense for your business domain and use case. For example, you could combine both Age and Gender data to create buckets like this: Age20To30Male, Age20To30Female, Age30To40Male, Age30To40Female, etc.
- Once buckets are created, you need to assign each bucket a unique
BucketId
.
Step 2: Create the connections for the model in the “Reco_Interactions.csv” data storage file
Depending on the number of buckets being over or under 1000, the way data is configured in the Data Contract may change.
If there's LESS than 1000 buckets
For each Interaction Row, you set the ChannelId
to the BucketId
, which corresponds to (or best fits) the user. The Interaction CSV row is changed to: InteractionGroupingID
, ItemId
, UserId
, and BucketId
as the ChannelId
. An example of the Interactions CSV is shown as follows:
Sample CSV for LESS than 1000 buckets
Interactions CSV Headers appear for convenience only and shouldn't be part of the actual data.
InteractionGroupingId | ItemId | ItemVariantId | UserId | InteractionType | Timestamp | Future Attribute | Future Attribute | Channel | Catalog | Strength | IsPositive |
---|---|---|---|---|---|---|---|---|---|---|---|
InteractionGroupingID |
ItemId |
UserId |
BucketId |
If there are MORE than 1000 buckets
If there are more than 1000 buckets of data, then you create more interaction rows using the BucketId
.
Turn each original interaction row between a User and an Item into two new distinct rows with a unique InteractionGroupingId
that is only unique to these two rows. The example shows:
- The original interaction row using
UserId
,ItemId
, and theInteractionGroupingId
asUNIQUE_ID
. - The additional interaction row with the
BucketId
as theItemId
.
Sample CSV for MORE than 1000 buckets:
Interactions CSV Headers appear for convenience only and shouldn't be part of the actual data.
InteractionGroupingId | ItemId | ItemVariantId | UserId | InteractionType | Timestamp | Future Attribute | Future Attribute | Channel | Catalog | Strength | IsPositive |
---|---|---|---|---|---|---|---|---|---|---|---|
UNIQUE_ID |
ItemId |
UserId |
|||||||||
UNIQUE_ID |
BucketId |
UserId |
Step 3: Query the model to get “most popular items by user bucket” API
Take into consideration the model construction outline discussed previously. After a Cold User and their demographic bucket have been determined, query the Serving Endpoint using the “Next Best Action” (formerly CART) list type with the demographic-focused BucketId
to recommend the most popular Items for that bucket.
When there's LESS than 1000 buckets
A sample API Query link where the parameter for ChannelId
is replaced with the BucketId
value, which would look like this:
<serving-endpoint>/Reco/V1.0/Popular?channelID=<BucketId>
Example 1: Less than 1000 buckets
Assume a User with UserId
=100, with a custom assigned BucketId
=Age30To40, who recently purchased an item with ItemId
=98005.
This example creates a row in the “Reco_Interactions.csv” file, which uses a BucketId
(in the ChannelId
field of the IR schema) which best matches the User (represented by UserId
in the IR schema):
- Original Interaction info is:
InteractionGroupingId
=1,UserId
=100,ItemId
=98005 - Notice in the CSV example, that the relevant
ChannelId
, which best matches theUserId
is appended. In the example, theUserId
was matched to theBucketId
= Age30To40, so the modified Interaction row is:
InteractionGroupingId | ItemId | ItemVariantId | UserId | InteractionType | Timestamp | Future Attribute | Future Attribute | Channel | Catalog | Strength | IsPositive |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 98005 | 100 | Age30To40 |
- The API Query and Response return a list of ItemIds, including
ItemId
=43218 in the third position, which is a popular item for users of this category.
API Query
GET <serving-endpoint>/reco/v1.0/Popular?ChannelId=Age30To40
Response
{
"id": "Lists",
"name": "Lists",
"version": "v1.0",
"interactionsVersion": "20220104115104",
"items": [
{
"id": "65106",
"trackingId": "00000000-0000-0000-0000-000000000003"
},
{
"id": "62604",
"trackingId": "00000000-0000-0000-0000-000000000003"
},
{
"id": "43218",
"trackingId": "00000000-0000-0000-0000-000000000003"
},
{
"id": "63503",
"trackingId": "00000000-0000-0000-0000-000000000003"
},
{
"id": "62452",
"trackingId": "00000000-0000-0000-0000-000000000003"
}
],
"title": "Popular",
"longTitle": "Popular",
"titleId": 5,
"pagingInfo": {
"totalItems": 200
},
"status": "Success"
}
When there are MORE than 1000 buckets
A sample API Query link where the ItemId
is replaced with the BucketId
for a cold user would look like this:
<serving-endpoint>/Reco/V1.0/Cart/<BucketId>?
Example 2: More than 1000 buckets
Assume a User with UserId
=100, with a custom assigned BucketId
=Age30To40Female, who recently purchased an item with ItemId
=98005.
Now you can use the original interaction data and construct rows in the “Reco_Interactions.csv” file:
- Original Interaction info is:
InteractionGroupingId
= NEW_UNIQUE_ID,UserId
=100,ItemId
=98005 - The two rows of constructed Interaction info that should be in the “Reco_Interactions.csv” file that is read by Intelligent Recommendations service:
InteractionGroupingId | ItemId | ItemVariantId | UserId | InteractionType | Timestamp | Future Attribute | Future Attribute | Channel | Catalog | Strength | IsPositive |
---|---|---|---|---|---|---|---|---|---|---|---|
UNIQUE_ID |
98005 |
100 |
|||||||||
UNIQUE_ID |
Age30To40Female |
100 |
- The API Query and Response return a list of ItemIds, including
ItemId
=43218 in the third position, which is a popular product for users in this category.
API Query
GET <serving-endpoint>/reco/v1.0/Cart/Age30To40Female?
Response
{
"id": "Lists",
"name": "Lists",
"version": "v1.0",
"interactionsVersion": "20220104115104",
"items": [
{
"id": "65106",
"trackingId": "00000000-0000-0000-0000-000000000003"
},
{
"id": "62604",
"trackingId": "00000000-0000-0000-0000-000000000003"
},
{
"id": "43218",
"trackingId": "00000000-0000-0000-0000-000000000003"
},
{
"id": "63503",
"trackingId": "00000000-0000-0000-0000-000000000003"
},
{
"id": "62452",
"trackingId": "00000000-0000-0000-0000-000000000003"
}
],
"title": "Cart",
"longTitle": "FrequentlyBoughtTogether",
"titleId": 5,
"pagingInfo": {
"totalItems": 200
},
"status": "Success"
}
Application 2: Create an ML Map of users' metadata-values
Modeling user metadata “Tags” in place of direct user interactions can be a powerful modification when the goal is to produce an outcome, which shows how connected users are with those tags, and which tags are truly similar by behavior. Assign each meaningful and available tag (e.g. demographics like Age and Gender, or other metadata) a unique identifier, which the service refers to as the TagId
. During the Model training phase, all interactions data is used to build a connection between UserIds and TagIds.
During the serving phase, the system can provide a personalized list of Tags by calling “Picks for you” with UserId
, and similar tags by calling “People also like” with TagId
.
How to use TagIds for recommendations:
- Prepare a list of user metadata values (tags) and assign each of them a unique
TagId
. - Create the connections for the model in the Reco_Interactions.CSV data storage file.
- Query the model to get “personalized tags by user” or “similar tags” API.
Step 1: Prepare a list of user metadata values (tags) and assign each of them a unique TagId
When constructing values for Age data, bucketing is still a good approach: Age5To11, Age12To18, etc.
For other metadata values, create a separate TagId for each. For example, if we wanted a category for Family Status: Single, Couple, CoupleWithKids, etc.
Step 2: Create the connections for the model in the Reco_Interactions.CSV data storage file
Use each original Interaction between a User and Item, to construct a row of Interaction data with the TagId
.
[!Note:]
Some important reminders with this approach:
- Only the newly constructed data will be used in the Interactions data entity for the model.
- The creation of an Interaction row that connects Users to TagIds does not necessarily need to be based on an interaction. This is an example to illustrate how one can create an Interaction to connect Users to Tags in the model.
- For the
InteractionGroupingId
it might make sense to reuse the original Interaction, if available. Otherwise, try either grouping byUserId
. During the Model training phase, all interactions data is used to build a connection between the different TagIds, and between UserIDs and TagIds. Trying with different ways to group and then seeing which yields the better relevant results is our suggestion as different scenarios and usage patterns can differ.
- Original Interaction row: with
UserId
,ItemId
,InteractionGroupingId
. Unlike the example above withBucketId
, DO NOT INCLUDE this row in the input dataset.- NEW Interaction row: with
UserId
,TagId
as theItemId
,UserId
as theInteractionGroupingId
.
An example Data Contract would look like this:
InteractionGroupingId | ItemId | ItemVariantId | UserId | InteractionType | Timestamp | Future Attribute | Future Attribute | Channel | Catalog | Strength | IsPositive |
---|---|---|---|---|---|---|---|---|---|---|---|
UserId |
TagId |
UserId |
Step 3: Query the model to get personalized tags by user or similar tags
With careful model construction, querying the Serving Endpoint using the “Picks for you” and “People also like” list types yield the desired outcomes.
A "Picks for you" API Query, which returns the recommended TagIds for a given UserId
would look like this:
<serving-endpoint>Reco/v1.0/picks?userId=<UserId>
A "People also like" API Query where the seed-item parameter is replaced by the corresponding TagId
:
<serving-endpoint>/Reco/V1.0/Similar/<TagID-value>?
Sample response output
{
"id": "Picks",
"name": "Picks",
"version": "v1.0",
"items": [
{
"id": "68100",
"trackingId": "00000000-0000-0000-0000-000000000003"
},
{
"id": "62500",
"trackingId": "00000000-0000-0000-0000-000000000003"
},
{
"id": "61504",
"trackingId": "00000000-0000-0000-0000-000000000003"
},
{
"id": "65103",
"trackingId": "00000000-0000-0000-0000-000000000003"
},
{
"id": "61401",
"trackingId": "00000000-0000-0000-0000-000000000003"
}
],
"title": "Picks for you",
"longTitle": "Picks for you",
"titleId": 6,
"personalizationConfidence": 1.0,
"pagingInfo": {
"totalItems": 139
},
"status": "Success"
}
Example 3: Query for tagIds with demo data
Assume a User with UserId
=100, has indicated that they're aligned with the following tags: 123 (which represents “Soccer fan”), Age30To40Female, and FamilyWithKids.
You can use the original interaction row to construct the following rows in the “Reco_Interactions.csv” file: o New 3 rows of Interaction info, one for each Tag for that User, that should be in the “Reco_Interactions.csv” file that is read by Intelligent Recommendations service:
![Note]
In this example, we’ve chosen to group by
UserId
, and have set theInteractionGroupingId
equal to theUserId
. Also note that theItemId
is representing theTagId
.
InteractionGroupingId | ItemId | ItemVariantId | UserId | InteractionType | Timestamp | Future Attribute | Future Attribute | Channel | Catalog | Strength | IsPositive |
---|---|---|---|---|---|---|---|---|---|---|---|
100 | 123 | 100 | |||||||||
100 | Age30To40Female | 100 | |||||||||
100 | FamilyWithKids | 100 |
Query and responses for picks
Here's what the constructed "Picks for you" request looks like:
GET <serving-endpoint>/reco/v1.0/picks?UserId=100
The Picks Response returns a List of 200 ItemIds (for tags) including, TagID
=FamilyWithKids in first position.
{
"id": "Picks",
"name": "Picks",
"version": "v1.0",
"items": [
{
"id": "FamilyWithKids",
"trackingId": "00000000-0000-0000-0000-000000000003"
},
{
"id": "625",
"trackingId": "00000000-0000-0000-0000-000000000003"
},
{
"id": "Sports",
"trackingId": "00000000-0000-0000-0000-000000000003"
},
{
"id": "651",
"trackingId": "00000000-0000-0000-0000-000000000003"
},
{
"id": "611",
"trackingId": "00000000-0000-0000-0000-000000000003"
}
],
"title": "Picks for you",
"longTitle": "Picks for you",
"titleId": 6,
"personalizationConfidence": 1.0,
"pagingInfo": {
"totalItems": 139
},
"status": "Success"
}
Query and response for similar
Here's what the constructed "People also" request using the Similar API looks like:
GET <serving-endpoint>/Reco/V1.0/Similar/FamilyWithKids?
The "People also like" Response returns a List of 200 ItemIds (for tags) including, Age30To40Female in first position and FamilyWithKids in the second position.
{
"id": "Similar",
"name": "Similar",
"version": "v1.0",
"items": [
{
"id": "Age30To40Female",
"trackingId": "00000000-0000-0000-0000-000000000003"
},
{
"id": "FamilyWithKids",
"trackingId": "00000000-0000-0000-0000-000000000003"
},
{
"id": "SportsParent",
"trackingId": "00000000-0000-0000-0000-000000000003"
},
{
"id": "651",
"trackingId": "00000000-0000-0000-0000-000000000003"
},
{
"id": "123",
"trackingId": "00000000-0000-0000-0000-000000000003"
}
],
"title": "People also like",
"longTitle": "People also like",
"titleId": 6,
"pagingInfo": {
"totalItems": 200
},
"status": "Success"
}
To learn more about our service and the models we support, check out our Modeling Guide.
See Also
Quickstart Guide: Create an IR Account
Modeling Q&A
Data Contract Guide
Sample API Requests