EF Core - High Memory Usage For Query that Runs for 20 seconds

muttBunch 100 Reputation points
2024-06-29T05:22:38.7933333+00:00

I'm wondering if there is a better way to optimize the below query. Granted, it's pulling in 80,000 - 160,000 records, but the RAM usage on my web server, shoots up to ~3gig while it's running for a total of 20 seconds, when testing it in my API, but, this will eventually be running as a service and the 3gig has me puzzled.

The LogStore table, has 1.7 million records in it and the LogStoreLogons table is where it will store the last 24 hours worth of logs from LogStore.

private async Task PullEventsLogonsTest(CancellationToken token)
{
   
    var fromLogStore = _applicationDbContext.LogStore.Where(x => x.TimeStamp < DateTime.Now && x.TimeStamp >
        DateTime.Now.AddDays(-1) && x.EventId == 4624).AsEnumerable();

    foreach (var logs in fromLogStore)
    {
        var logStore = new LogStoreLogons
        {
            DomainId = logs.DomainId,
            Domain = logs.Domain,
            Keywords = logs.Keywords,
            EventId = logs.EventId,
            TimeStamp = logs.TimeStamp,
            Description = logs.Description,
            TaskCategory = logs.TaskCategory,
            DomainControllerHostname = logs.DomainControllerHostname
        };
        await _applicationDbContext.AddAsync(logStore, token);
    }
   
    await _applicationDbContext.SaveChangesAsync(token);
    GC.Collect();
}

Thanks

Entity Framework Core
Entity Framework Core
A lightweight, extensible, open-source, and cross-platform version of the Entity Framework data access technology.
751 questions
C#
C#
An object-oriented and type-safe programming language that has its roots in the C family of languages and includes support for component-oriented programming.
11,012 questions
0 comments No comments
{count} votes

3 answers

Sort by: Most helpful
  1. AgaveJoe 28,536 Reputation points
    2024-06-29T11:32:30.5433333+00:00

    You are using the wrong tooling for the job. The design is copying a lot of data from the database to wherever the C# code is running then copying the same data back to the same database.

    A better approach is to write a SELECT INTO T-SQL script and submit the script to SQL Server. That way the data stays in the database and does not travel over the wire twice. I would wrap the SELECT INTO inside a parameter stored procedure and call the stored procedure from C#.

    If this is a schedule task then don't use C# at all. Schedule a job in SQL Server's agent to call the stored procedure from the job.

    Lastly, if the end result is two copies of the same data in the same database then consider changing the design to simply reading data from the LogStore when the data is needed.

    0 comments No comments

  2. Hongrui Yu-MSFT 2,465 Reputation points Microsoft Vendor
    2024-07-01T06:51:08.61+00:00

    Hi,@muttBunch. Welcome to Microsoft Q&A. 

    Reason:

    The execution process of .AsEnumerable() is: all the data in the database is transferred to your RAM through network communication(Long transmission time), and then filtered in your RAM(Large memory usage), and the results are returned to fromLogStore after the filtering is completed.

     

    Temporary solution: Use .AsQueryable() instead of .AsEnumerable()

    .AsQueryable() first filters in the database, then passes it to RAM through network communication, and finally returns it to fromLogStore

     

    Optimal solution: As described by AgaveJoe, create a stored procedure in the database and then call it through C# or the database agent.

    Benefits: Greatly reduces network overhead and server memory usage

     

    Attach: TimeStamp is often used, so you can consider adding an index to improve query efficiency.


    If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".

    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.


  3. Bruce (SqlWork.com) 66,706 Reputation points
    2024-07-01T16:52:22.3433333+00:00

    you want to use stream input, and stream output. currently you are buffering the input, and all the output. try:

    private async Task PullEventsLogonsTest(CancellationToken token)
    {
        const int batchsize = 10000;
        var count = 0;
        var fromLogStore = _applicationDbContext.LogStore
            .Where(x => x.TimeStamp < DateTime.Now 
                && x.TimeStamp > DateTime.Now.AddDays(-1) && x.EventId == 4624)
            .AsNoTracking();
    
        foreach (var logs in fromLogStore)
        {
            var logStore = new LogStoreLogons
            {
                DomainId = logs.DomainId,
                Domain = logs.Domain,
                Keywords = logs.Keywords,
                EventId = logs.EventId,
                TimeStamp = logs.TimeStamp,
                Description = logs.Description,
                TaskCategory = logs.TaskCategory,
                DomainControllerHostname = logs.DomainControllerHostname
            };
            await _applicationDbContext.AddAsync(logStore, token);
            if (++count % batchsize == 0) 
            {
                await _applicationDbContext.SaveChangesAsync(token);
                _applicationDbContext.ChangeTracker.Clear();
            }
        }
       
        await _applicationDbContext.SaveChangesAsync(token);
    }
    
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.