Semantic kernel with Plugin that includes 1000 entities results in very high token usage
Hello,
I am playing with Semantic Kernel to link my API data source to the gpt-4o
model.
The service I inject has the following code:
public class MyPlugin
{
private ICollection<MyApiRecord>? records;
private WebApiClientFactory apiClientFactory => new WebApiClientFactory();
[KernelFunction("get_records")]
[Description("Lists all records and their metadata")]
[return: Description("An array of the Api Records")]
public async Task<IEnumerable<MyApiRecord>> GetRecordsAsync()
{
try
{
if (records == null)
{
var apiClient = new ApiClient(apiClientFactory.GetHttpClient());
records = await apiClient.GetRecordsAsync();
}
return records;
}
catch (Exception e)
{
Console.WriteLine(e);
return records;
}
}
[KernelFunction("get_record")]
[Description("Get record details")]
[return: Description("The details of the record")]
public async Task<MyApiRecord?> GetRecordAsync(int id)
{
var records = await GetRecordsAsync();
return records.FirstOrDefault(c => c.Id == id.ToString());
}
}
But when I ask something like Please give me a description on record And justice for all
, I get the following 429
rate limit response:
`Error: HTTP 429 (429)
Requests to the ChatCompletions_Create Operation under Azure OpenAI API version 2024-10-01-preview have exceeded token rate limit of your current OpenAI S0 pricing tier. Please retry after 58 seconds. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit.`
So, I get the impression that there are way too much tokens used, with every request, as the full list of records (1000 records with 10 fields) may be sent with every request.
What would be a better way to work around this, also assuming the data doesn't change too often.