How to use the latest model version for Vision OCR on Python?

Danny Zhang 20

When I am using the ImageAnalysisClient in my code to analyze an image and return text, the API only seem to use the 2023-10-01 model version. How can I use 2024 or later version of the Vision OCR?

Here is my current code:

 client = ImageAnalysisClient(
        endpoint=endpoint,
        credential=AzureKeyCredential(key)
    )
    try:
        with open(img_path, "rb") as f:
            image_data = f.read()
    except:
        print(f"The file '{img_path}' does not exist.")
        return None
    
    print(f"Analyzing '{img_path}'")
    result = client.analyze(
        image_data=image_data,
        visual_features=[VisualFeatures.READ],
        model_version='latest'
    )

navba-MSFT 24,910 Reputation points Microsoft Employee

2024-09-02T03:47:37.0166667+00:00

@Danny Zhang Welcome to Microsoft Q&A Forum, Thank you for posting your query here!

.

By default, the service uses the latest generally available (GA) model to extract text. To explicitly specify the latest GA model, edit the read statement as shown. Skipping the parameter or using "latest" automatically uses the most recent GA model.

.

.

More info here.

Python Sample code for OCR is available here:

https://learn.microsoft.com/en-us/azure/ai-services/computer-vision/quickstarts-sdk/client-library?tabs=windows%2Cvisual-studio&pivots=programming-language-python

.

Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.

**

Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.
navba-MSFT 24,910 Reputation points Microsoft Employee

2024-09-04T03:49:33.5366667+00:00

@Danny Zhang Just following up to check if my suggestion helped. Please let me know if you have any further queries. I would be happy to help.

navba-MSFT 24,910 Microsoft Employee

@Danny Zhang Thanks for getting back and clarifying your ask. This api-version is supported 2024-02-01. However the SDK is not yet updated to use this new version and it is still using older version 2023-10-01.

So, if you want to use this 2024-02-01 api-version, try directly invoking it from the Python code via REST API as shown below:

import requests

# Define the endpoint and parameters

url = "https://XXXX.cognitiveservices.azure.com/computervision/imageanalysis:analyze"

params = {

    'features': 'caption,read',

    'api-version': '2024-02-01',

    'gender-neutral-caption': 'true'

}

# Define the headers

headers = {

    'Content-Type': 'application/json',

    'Ocp-Apim-Subscription-Key': '337e4XXXXXXX4633cd7f'

}

# Define the request body

data = {

    'url': 'https://learn.microsoft.com/azure/ai-services/computer-vision/media/quickstarts/presentation.png'

}

# Make the POST request

response = requests.post(url, headers=headers, params=params, json=data)

# Print the response

print(response.status_code)

print(response.json())

. Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.

** Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

navba-MSFT 24,910 Reputation points Microsoft Employee

2024-09-06T05:20:58.03+00:00

@Danny Zhang Just following up to check if the below answer helped. If that answers your query, do click "Accept the answer” for the same, which might be beneficial to other community members reading this thread. And, if you have any further query do let me know. I would be happy to help.
Danny Zhang 20 Reputation points

2024-09-06T12:17:57.6966667+00:00

Apologies for the late follow-up. I tested the code you created and while it is using the 2024-02-01 API version. The model version from the response is still from 2023-10-01.

On this page: https://learn.microsoft.com/en-us/azure/ai-services/computer-vision/quickstarts-sdk/image-analysis-client-library-40?tabs=visual-studio%2Clinux&pivots=programming-language-rest-api

The return from the JSON response shows the model version: 2024-02-01
navba-MSFT 24,910 Reputation points Microsoft Employee

2024-09-09T04:12:02.7666667+00:00

@Danny Zhang I am checking this further internally. I will get back once I have an update on this.
navba-MSFT 24,910 Reputation points Microsoft Employee

2024-09-10T04:07:25.7766667+00:00

@Danny Zhang I had reached out to the Product owners regarding this and got an update from them.

.

The API version and model version are different concepts. A new API version indicates changes to the API since the previous version. This could involve the addition of new APIs or updates to specific features, such as model improvements (model version changes). However, it doesn't imply that all models have been updated. In this instance, we introduced new vectorization APIs with the latest API version.

.

Conclusion: This is expected behavior. So, you can go ahead with using the api-version: 2024-02-01 and you will get all the new APIs and updates to this feature.

Hope this answers.
Danny Zhang 20 Reputation points

2024-09-10T11:07:46.75+00:00

Thank you for the update; that makes sense. If I am already using the latest api-version and the latest model-version, why do I get different results from using the API and the Vision Studio OCR:
https://portal.vision.cognitive.azure.com/demo/extract-text-from-images

It seems as if the Vision Studio OCR has better performance, and I assumed it was on a newer version.

Accepted answer

navba-MSFT 24,910 Reputation points Microsoft Employee

2024-09-11T04:50:59.64+00:00

@Danny Zhang I had a discussion with the Product Owners internally. The below is a documentation bug and they will fix it.

The latest model version is 2023-10-01.

Also note that the api-version and model-version used by Vision Studio is 2023-10-01.

If you want to use api-version: 2024-02-01 you can go ahead with using this. But be aware that the model-version will be shows as 2023-10-01. This is by design and expected.

Hope this answers.
Please sign in to rate this answer.

1 person found this answer helpful.
navba-MSFT 24,910 Reputation points Microsoft Employee

2024-09-12T04:12:32.9533333+00:00

@Danny Zhang A quick follow-up to check if you had a chance to look at my above comment.

Danny Zhang 20 Reputation points

2024-09-12T16:58:08.58+00:00

Thank you for all the help!
Sign in to comment

Use comments to ask for clarification, additional information, or improvements to the question.

Share via

How to use the latest model version for Vision OCR on Python?

0 additional answers

Your answer