Debugging a performance issue in production

One of the projects I’m working on has a component that has a very simple task: reads a record from a database table and based on it, send a message to Microsoft Windows Service Bus. Then the next record is read, and so on, until no more rows are found in the table.

Somebody noticed in the log files, in production, that the application runs very slowly – one message is being send in 5 or 6 seconds. Even if Service Bus is not the brightest piece of software, it has no reason to be so slow.
Time to check what’s going on.

Since the the app is in production, we are not allowed to just attach a debugger to it. Also when running it locally for just a few minutes, the issue does not reproduce. What can be done?

[ to protect the innocent, all examples below are ‘anonymized’ ]

I asked the support team to take a memory dump from the running application, using the Task Manager:
Task Manager Memory Dump
– obviously, my process was not Chrome 🙂

I tried to analyze the .DMP file with Visual Studio 2013 Ultimate – no luck – I got a ‘memory analysis could not be completed due to insufficient memory‘ because the DMP file has over 780 MB, which seem too much for my 11 GB of free memory. 🙂

Let’s try the big guns – WinDbg (https://msdn.microsoft.com/library/windows/hardware/ff551063(v=vs.85).aspx – now can be downloaded as part of Windows SDK ).

After opening the 32-bit version of WinDbg (the application was compiled with AnyCPU and ‘Prefer 32 bit’) we have to set-it-up for .NET (CLR) applications:
– File / Symbol File Path: SRV*;c:\symbols*;http://msdl.microsoft.com/download/symbols
– File / Open Crash Dump
.loadby sos clr in the WinDbg command prompt in order to load the SOS debugging extension (http://blogs.msdn.com/b/jasonz/archive/2003/10/21/53581.aspx) ; the ‘clr’ parameter is there because we are on .NET >= 4.0

Next, let’s analyze a bit the heap:


0:000:x86> !dumpheap -stat
Statistics:
MT Count TotalSize Class Name
73309888 1 12 System.ServiceModel.Diagnostics.Utility
69725164 1 12 System.Collections.Generic.ObjectEqualityComparer`1[[System.Linq.Expressions.LabelTarget, System.Core]]
. . .
. . .
02cd0690   421577     16863080 System.Data.Entity.Core.EntityKey
05373330   421575     18549300 System.Data.Entity.Core.Objects.Internal.EntityWrapperWithoutRelationships`1[[SampleDomain.NotificationChange, Sample.Domain]]
0537e2ac   421575     23608200 System.Data.Entity.Core.Objects.EntityEntry
667ac484  2105915     25270980 System.Int32
667a8e58  1257983     30191592 System.Guid
667a6f1c  2544457     30533484 System.Boolean
667aacc4   440021     32087956 System.String
009fe608   421575     67452000 Sample.Domain.NotificationChange
6675ab9c    26544     90426208 System.Object[]
05553478   421581    166945692 System.Data.Entity.Core.Objects.StateManagerValue[]
Total 12892240 objects

after a huge list of entries, we finally see something interesting – over 400 thousand instances of our class, NotificationChange.
This roughly matches the number of rows from the table processed so far.

This might hint where is the problem, but to be sure, we have to dig deeper:


!dumpheap -mt 009fe608
. . .
. . .
24254ef0 009fe608      160     
24257608 009fe608      160     
2425a95c 009fe608      160     
2425e64c 009fe608      160     
2426187c 009fe608      160     
242643ec 009fe608      160     
24268054 009fe608      160     

Statistics:
      MT    Count    TotalSize Class Name
009fe608   421575     67452000 Sample.Domain.NotificationChange
Total 421575 objects

yes, it will list the addresses of all 421.575 objects, but I found no way to get the address for just one of them. 🙂

Now we have to find out why the GC has not released yet these objects:


!gcroot 242643ec
HandleTable:
    00000000001611b4 (strong handle)
    -> 0000000000d0378c System.Object[]
    -> 0000000000ca3524 Sample.EF.MyStorage
    -> 0000000000cde438 System.Data.Entity.Internal.LazyInternalContext
    -> 0000000000ce3c08 System.Collections.Generic.Dictionary`2[[System.Type, mscorlib],[System.Data.Entity.Internal.Linq.IInternalSetAdapter, EntityFramework]]
    -> 0000000000d01314 System.Collections.Generic.Dictionary`2+Entry[[System.Type, mscorlib],[System.Data.Entity.Internal.Linq.IInternalSetAdapter, EntityFramework]][]
    -> 0000000000d00bc4 System.Data.Entity.DbSet`1[[Sample.Domain.NotificationChange, Sample.Domain]]
    -> 0000000000d00ba0 System.Data.Entity.Internal.Linq.InternalSet`1[[Sample.Domain.NotificationChange, Sample.Domain]]
    -> 000000000123497c System.Data.Entity.Core.Objects.ObjectQuery`1[[Sample.Domain.NotificationChange, Sample.Domain]]
    -> 00000000012349b8 System.Data.Entity.Core.Objects.EntitySqlQueryState
    -> 000000000121613c System.Data.Entity.Core.Objects.ObjectContext
    -> 0000000001328b08 System.Data.Entity.Core.Objects.ObjectStateManager
    -> 0000000001530a78 System.Collections.Generic.Dictionary`2[[System.Data.Entity.Core.EntityKey, EntityFramework],[System.Data.Entity.Core.Objects.EntityEntry, EntityFramework]]
    -> 0000000025221000 System.Collections.Generic.Dictionary`2+Entry[[System.Data.Entity.Core.EntityKey, EntityFramework],[System.Data.Entity.Core.Objects.EntityEntry, EntityFramework]][]
    -> 0000000024264578 System.Data.Entity.Core.Objects.EntityEntry
    -> 00000000242644fc System.Data.Entity.Core.Objects.Internal.EntityWrapperWithoutRelationships`1[[Sample.Domain.NotificationChange, Sample.Domain]]
    -> 00000000242643ec Sample.Domain.NotificationChange

Found 1 unique roots (run '!GCRoot -all' to see all roots).

This points to the culprit – the Entity Framework DbContext which indirectly holds a reference to each object loaded so far from the database. 🙂

Looking closer at the source code, it’s doing something like this (in pseudo code):

  1. start application
  2. create DbContext
  3. while (there are rows in the table)
    1. Using the above DbContext: Read next row from database and load it in a NotificationChange object
    2. Send a message to Service Bus
    3. Mark the row in the database as processed
    4. Repeat..

What could go wrong?

Well, nothing for a few rows, except that Entity Framework can’t read my thoughts and won’t guess that, after loading and updating one row, I won’t need it anymore.
It will keep it in the first-level cache (identity map) and will loop through all 400.000 objects each time a new row is loaded from database (maybe it was already loaded 🙂 ).
More on this: https://weblog.west-wind.com/posts/2014/Dec/21/Gotcha-Entity-Framework-gets-slow-in-long-Iteration-Loops

The fix was simple – re-create the DbContext inside the loop – in our case there is no reason for the unit-of-work (DbContext) to span more than one row (https://lostechies.com/jimmybogard/2013/12/20/proper-sessiondbcontext-lifecycle-management/ ).

After doing that – miracle – the average processing time per messages decreased from 5-6 seconds to 0,02 sec.

Advertisements
This entry was posted in .NET, Entity Framework and tagged , , , , . Bookmark the permalink.

4 Responses to Debugging a performance issue in production

  1. One more reason to avoid ORMs… Some of my acquaintances and friends know how much I love ORMs regardless of this.

    • Tudor says:

      This is not a drawback of ORMs, but of the developer not taking time to learn the basics on how to use an ORM properly.

      • I’m fully aware of this, I’ve worked with several ORMs for many years. The problem is that they can easily lead you into bad practices, such as the one in the described case.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s