Deployed gpt-5.2 to foundry running into rate limit issues or very slow responses

Siry, Gaetan 280 Reputation points
2025-12-12T14:54:06.61+00:00

Hello,

I deployed gpt-5.2 to azure -

My rate limit is set pretty high - I am the only user testing this right now and am getting

{"type":"response.failed","sequence_number":3,"response":{"id":"resp_077206c4b441379001693c2b62d0308196afcbb0353a0c03ce","object":"response","created_at":1765550946,"status":"failed","background":false,"content_filters":null,"error":{"code":"rate_limit_exceeded","message":" | ==================== d001-20251211012732-api-default-78bd44c5dc-9w645 ====================\n | Traceback (most recent call last):\n | \n | File "/usr/local/lib/python3.12/site-packages/inference_server/routes.py", line 726, in streaming_completion\n | await response.write_to(reactor)\n | \n | oai_grpc.errors.ServerError: | no_kv_space\n | "},

Last night the same deployment was working. Today the responses are either slow or I get a rate limit exceed on a simple prompt like Hello

Is it too early to use gpt-5.2 in Microsoft Foundry?

Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
{count} votes

1 answer

Sort by: Most helpful
  1. harmeet singh 40 Reputation points
    2025-12-12T19:21:25.08+00:00

    same issue here

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.