@장경원
Thank you for the question and using Microsoft Q&A platform.
According to the Microsoft documentation, the three email properties used to determine duplicate documents are:
- ConversationTopic
- BodyTagInfo
- InternetMessageId
ConversationTopic is a property that represents the subject of the email conversation. It's calculated based on the email's subject line and the conversation thread. Specifically, it's a combination of the email's subject, the sender's email address, and the conversation thread ID.
BodyTagInfo is a property that represents the email body content. It's calculated based on the email body text, including any formatting and structure. However, the exact parameters used to calculate BodyTagInfo are not publicly documented by Microsoft.
As for your second question, the deduplication process for non-email content like Office documents created/uploaded in the cloud is based on the content of the documents. The system compares the content of each document to identify duplicates. This process is based on the content of the documents, not on any specific metadata.
Finally, if you have multiple custodians and locations in the search condition, the system will deduplicate the results based on the combination of the three email properties mentioned earlier. This means that if two emails have the same ConversationTopic, BodyTagInfo, and InternetMessageId, they will be considered duplicates regardless of the custodian or location they came from.
I hope this helps! Let us know if you have any further questions.