Azure OpenAI Assistent Rate Limit

Question

Hallo,

ich nutze den Azure OpenAI Assistenten in einer Python Flask App. Grundlegend habe ich alles implementiert und es läuft auch. Ich habe aber das Problem das ich andauernd in das Ratelimit gerate und keinen richtigen Chat führen kann. Meistens passiert das ganze, wenn function calling aufgerufen wird und eine oder zwei Funktionen ausgeführt werden.

Ich habe mein gpt-4o Modell mit Globaler Standard und 30K TPM erstellt weswegen ich nicht nachvollziehen kann wieso ich ins Rate Limit komme.
Ich verwende die API Version 2024-05-01-preview.
Meine Ressource ist in Sweden Central.

Eigentlich wundert es mic hweil die Token meines erachtens auich nicht wirklich hoch sind die verwendet werden:

usage=Usage(completion_tokens=40, prompt_tokens=1267, total_tokens=1307)

Hat jemand eine Ahnung woran das ganze liegen kann?

Vielen Dank für die Unterstützung!

            # Set polling parameters
            wait_time = 1
            max_retries = 20
            retries = 0

            while retries < max_retries:
                # Wait for the run to complete
                run = self.openai_client.beta.threads.runs.retrieve(
                    thread_id=session.thread_id, run_id=run.id
                )
                retries += 1

                if run.status == "requires_action":
                    tool_outputs = []
                    if (
                        run.required_action.type == "submit_tool_outputs"
                        and run.required_action.submit_tool_outputs.tool_calls
                        is not None
                    ):
                        tool_calls = run.required_action.submit_tool_outputs.tool_calls
                        for tool_call in tool_calls:
                            function_output = self.execute_function(
                                function_name=tool_call.function.name,
                                arguments=json.loads(tool_call.function.arguments),
                                session=session,
                            )
                            tool_outputs.append(
                                {
                                    "tool_call_id": tool_call.id,
                                    "output": function_output,
                                }
                            )

                    run = self.openai_client.beta.threads.runs.submit_tool_outputs(
                        thread_id=session.thread_id,
                        run_id=run.id,
                        tool_outputs=tool_outputs,
                    )

                if run.status == "completed":

                    messages = self.openai_client.beta.threads.messages.list(
                        thread_id=session.thread_id
                    )
                    for msg in list(messages):
                        if msg.role == "assistant":
                            for content_block in msg.content:
                                if content_block.type == "text":
                                    return content_block.text.value

                if run.status == "failed":
                    logger.error(f"Run failed: {run.id}")

                time.sleep(wait_time)

Freigeben über

Azure OpenAI Assistent Rate Limit

Ihre Antwort