Model Performance Signal (Preview) | Model Monitor - Data Joiner | error:" join column 'correlationid is not present in left_input data.

BOON HAWAREE 40 Reputation points
2025-01-11T22:51:20.1466667+00:00

Greetings,

I am working on my MLOps project and have been trying to use the Azure ML "model performance" monitoring function to detect the model's degradation.
failed job at data joiner

Left_input_data is the deployed model's inference output data, which were collected as data assets in the AML blob by enabling the data collector, as shown below.
left_input_data

The right_input_data is the ground truth data uploaded from my local. The format can be seen below.
right_input_data

The image below is the configuration of the monitoring signal. signal configuration

After the error ("join column 'correlationid is not present in left_input data.") occurred, I tried to troubleshoot it and found out that the output of the preprocessing job between the left and right side was different. The correlationid column was somehow missing. As shown in the image below. (column name 0)
missing correlation id on the left side

How could I solve this issue, because the left_input_data that are currently having the problem was collected and recorded automatically by the data collector. I got it from testing the endpoint of the model.

And if the error was caused by the wrong format or wrong signal config, would the error not occurred since the preprocessing job?

PS. The correlationid between the left and right input are matched as the right input were made up by me.

Thank you for your time and every effort in helping me,
Boon Hawaree

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,088 questions
0 comments No comments
{count} votes

Accepted answer
  1. Pinaki Ghatak 5,575 Reputation points Microsoft Employee
    2025-01-13T09:57:26.65+00:00

    Hello @BOON HAWAREE Based on the error message you received; it seems that the join column correlationid is not present in the left_input_data. This could be due to the missing column in the deployed model's inference output data. Since the left_input_data was collected and recorded automatically by the data collector, it might be difficult to modify the data.

    However, you can try to add the missing column to the left_input_data by modifying the preprocessing job.

    Regarding your question about the error being caused by the wrong format or wrong signal config, it is possible that the error could have occurred earlier in the process, but it might not have been detected until the data joiner step.

    It is important to ensure that the data format and signal configuration are correct to avoid errors in the monitoring process.

    I hope this helps you to solve the issue.

    1 person found this answer helpful.

2 additional answers

Sort by: Most helpful
  1. santoshkc 11,790 Reputation points Microsoft Vendor
    2025-01-13T11:59:44.0233333+00:00

    Hi @BOON HAWAREE,

    Thank you for reaching out to Microsoft Q&A forum!

    The issue arises because the correlationid column is missing in the preprocessed left input data, while it is present in the right input data. Since the Data Joiner component relies on this column to align and merge the two datasets, its absence causes the failure.

    To resolve this, first, verify the configuration of the data collector used to capture the inference data and ensure it includes the correlationid column. Next, inspect the preprocessing logic applied to the left input data, as there might be a step that inadvertently removes or renames this column. Additionally, review the signal configuration to confirm that it correctly specifies correlationid as the join key. If the preprocessing pipeline or signal configuration is dropping the column, updating these settings should resolve the issue.

    I hope this help! Thank you.

    1 person found this answer helpful.
    0 comments No comments

  2. BOON HAWAREE 40 Reputation points
    2025-01-14T10:10:42.68+00:00

    Hi @Pinaki Ghatak , @santoshkc ,

    Thank you for your time and effort in helping me,
    I checked the inference output again and found that every predicted output already has its correlationid included. Thus, I tried to create a new deployment and monitor, then modify the format of the Ground truth data again by setting the correlationid to be in the first column, followed by the ground truth data.
    And this time it somehow worked, even though, in the second attempt, I unintentionally used the old format as in the question.
    Therefore I still could not conclude what the main cause exactly was, but I believe that it could be a typo of the format since I created the ground truth data by my hand.
    I truly appreciate your help.

    Best Regards,
    Boon Hawaree

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.