form recognizer mistranslation when processing multiple documents

Srikant Naidu 0 Reputation points
2025-01-30T04:00:32.3966667+00:00

i am testing a simple logic app that translates multiple files into english. when I run one file through it, it is very good quality but when i run multiple files the translation is absolutely trash. The test file I am using is a UN paper so I can compare the quality of the translations; original, good, trash. This is my logic app code

{
    "definition": {
        "$schema": "https://schema.management.azure.com/providers/Microsoft.Logic/schemas/2016-06-01/workflowdefinition.json#",
        "actions": {
            "For_each": {
                "type": "Foreach",
                "foreach": "@body('Lists_blobs_(V2)')?['value']",
                "actions": {
                    "Check_if_PDF": {
                        "type": "If",
                        "expression": {
                            "and": [
                                {
                                    "equals": [
                                        "@items('For_each')?['MediaType']",
                                        "application/pdf"
                                    ]
                                }
                            ]
                        },
                        "actions": {
                            "Start_document_translation": {
                                "type": "ApiConnection",
                                "inputs": {
                                    "host": {
                                        "connection": {
                                            "referenceName": "microsofttranslatorv"
                                        }
                                    },
                                    "method": "post",
                                    "body": {
                                        "storageType": "Folder",
                                        "sourceURL": "https://equablobmain.blob.core.windows.net/test-translate-arabic?si=test-translate&spr=https&sv=2022-11-02&sr=c&sig=%2BrZRDITWB%2BxjCbIp2XezHIR7xWpYajxg7IEWbpjDECU%3D",
                                        "targetContainerURL": "https://equablobmain.blob.core.windows.net/test-translate-en?si=test-translate&spr=https&sv=2022-11-02&sr=c&sig=L9IZV7HhJ4CB27bZWbff0Vk1%2FUc5ZAqVgR38OYoIs2A%3D",
                                        "targetLanguage": "en",
                                        "sourceLanguage": "ar"
                                    },
                                    "headers": {
                                        "Content-Type": "application/json"
                                    },
                                    "path": "/translator/text/batch/v1.1/batches/"
                                },
                                "runAfter": {
                                    "Analyze_Layout": [
                                        "SUCCEEDED"
                                    ]
                                }
                            },
                            "Analyze_Layout": {
                                "type": "ApiConnection",
                                "inputs": {
                                    "host": {
                                        "connection": {
                                            "referenceName": "formrecognizer"
                                        }
                                    },
                                    "method": "post",
                                    "headers": {
                                        "inputFileUrl": "@concat('https://equablobmain.blob.core.windows.net/test-translate-arabic/', items('For_each')?['Name'], '?sp=rl&st=2025-01-25T18:53:32Z&se=2025-01-30T02:53:32Z&spr=https&sv=2022-11-02&sr=c&sig=Xnh5XK1WWmvahq%2BsyrLKFl8BBLLTjbAIhP1v6QIg%2FWE%3D')",
                                        "Content-Type": "application/json"
                                    },
                                    "path": "/v2.1/layout/analyze"
                                }
                            }
                        },
                        "else": {
                            "actions": {
                                "Start_document_translation_1": {
                                    "type": "ApiConnection",
                                    "inputs": {
                                        "host": {
                                            "connection": {
                                                "referenceName": "microsofttranslatorv"
                                            }
                                        },
                                        "method": "post",
                                        "body": {
                                            "storageType": "Folder",
                                            "sourceURL": "https://equablobmain.blob.core.windows.net/test-translate-arabic?si=test-translate&spr=https&sv=2022-11-02&sr=c&sig=%2BrZRDITWB%2BxjCbIp2XezHIR7xWpYajxg7IEWbpjDECU%3D",
                                            "targetContainerURL": "https://equablobmain.blob.core.windows.net/test-translate-en?si=test-translate&spr=https&sv=2022-11-02&sr=c&sig=L9IZV7HhJ4CB27bZWbff0Vk1%2FUc5ZAqVgR38OYoIs2A%3D",
                                            "targetLanguage": "en",
                                            "sourceLanguage": "ar"
                                        },
                                        "headers": {
                                            "Content-Type": "application/json"
                                        },
                                        "path": "/translator/text/batch/v1.1/batches/"
                                    }
                                }
                            }
                        }
                    }
                },
                "runAfter": {
                    "Lists_blobs_(V2)": [
                        "Succeeded"
                    ]
                }
            },
            "Lists_blobs_(V2)": {
                "type": "ApiConnection",
                "inputs": {
                    "host": {
                        "connection": {
                            "referenceName": "azureblob"
                        }
                    },
                    "method": "get",
                    "path": "/v2/datasets/@{encodeURIComponent(encodeURIComponent('AccountNameFromSettings'))}/foldersV2/@{encodeURIComponent(encodeURIComponent('test-translate-arabic'))}"
                },
                "runAfter": {},
                "metadata": {
                    "JTJmdGVzdC10cmFuc2xhdGUtYXJhYmlj": "/test-translate-arabic"
                }
            }
        },
        "contentVersion": "1.0.0.0",
        "outputs": {},
        "triggers": {
            "When_a_blob_is_added_or_updated": {
                "type": "ServiceProvider",
                "inputs": {
                    "parameters": {
                        "path": "https://equablobmain.blob.core.windows.net/test-translate-arabic"
                    },
                    "serviceProviderConfiguration": {
                        "connectionName": "AzureBlob-1",
                        "operationId": "whenABlobIsAddedOrModified",
                        "serviceProviderId": "/serviceProviders/AzureBlob"
                    }
                }
            }
        }
    },
    "kind": "Stateful"
}

Azure Logic Apps
Azure Logic Apps
An Azure service that automates the access and use of data across clouds without writing code.
3,336 questions
Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,908 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Shireesha Eeraboina 1,555 Reputation points Microsoft Vendor
    2025-02-03T02:39:37.2366667+00:00

    Hi @Srikant Naidu ,

    I apologize for any inconvenience caused.

    Based on the information you provided, it seems that the quality of translations is significantly worse when processing multiple files compared to when processing a single file. You have also mentioned that you have tried implementing a time delay in the logic app workflow to address this issue, but it did not resolve the problem.

    Regarding your question, when processing multiple documents, the Document Translator API is attempting to translate the original PDF files, not the output of the Form Recognizer (Analyze Layout) node.

    • It is possible that when multiple files are processed together, the Form Recognizer may not fully analyze each document before passing it to the Translator API, causing the quality degradation. Despite adding an Until node to ensure the Form Recognizer step completes successfully (status = 'Succeeded'), the issue persists.
    • To improve translation consistency when processing multiple files, you may want to consider breaking down the process into smaller batches and processing them separately. This will allow the Form Recognizer to fully analyze each document before passing it to the Translator API, potentially improving the quality of translations.
    • Additionally, you may want to review the settings and configurations of your Azure AI Document Intelligence, Azure AI Translator, and Logic Apps plans to ensure that they are optimized for your use case.

    Also, for your better understanding, please refer to the following documentations:

    I hope this information helps! Let me know if you have any further questions.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.