When the length of a database field can be misleading

The length of a database field, at least on MS SQL Server, is not what many people think it is.

Let’s say that I have a 10 character field: nvarchar(10):
table structure

Surely we can insert a row with 5 chars in this field:
Insert 5 chars
and the result is the expected one:
Successful run

Let’s try to insert a 6’Th character, it should work fine, right?
Adding 6 chars

Well, not really :)

Run failure

What happened?

By default, SQL Server is using UCS-2 encoding for nvarchar columns.
UCS-2 represent each character on 16 bits (2 bytes) – 65,536 chars should be enough for everybody, right? :)
Well, not exactly :) Since 2001, many more characters were added to the Unicode standard, reaching a total of 120,737 chars today (2015, Unicode 8.0). These clearly can’t be represented on only 2 bytes, so 3 or 4 are needed.

In our case, A, B, C, D… are not the letters from the latin alphabet, but… ‘MATHEMATICAL BOLD CAPITAL A, B, C…’: http://unicode.org/cldr/utility/character.jsp?a=1D400

In UTF-16, this is represented on 4 bytes as: 0xD835 0xDC00 (hexa).
MS SQL Server will happily accept it, but by default will consider it as 2 chars. The same happens in .NET Framework, that will return the length 12 for the above string:
Get the string length

Length in .NET

Posted in .NET, SQL Server | Tagged , , , , | Leave a comment

Debugging a performance issue in production

One of the projects I’m working on has a component that has a very simple task: reads a record from a database table and based on it, send a message to Microsoft Windows Service Bus. Then the next record is read, and so on, until no more rows are found in the table.

Somebody noticed in the log files, in production, that the application runs very slowly – one message is being send in 5 or 6 seconds. Even if Service Bus is not the brightest piece of software, it has no reason to be so slow.
Time to check what’s going on.

Since the the app is in production, we are not allowed to just attach a debugger to it. Also when running it locally for just a few minutes, the issue does not reproduce. What can be done?

[ to protect the innocent, all examples below are ‘anonymized’ ]

I asked the support team to take a memory dump from the running application, using the Task Manager:
Task Manager Memory Dump
– obviously, my process was not Chrome :)

I tried to analyze the .DMP file with Visual Studio 2013 Ultimate – no luck – I got a ‘memory analysis could not be completed due to insufficient memory‘ because the DMP file has over 780 MB, which seem too much for my 11 GB of free memory. :)

Let’s try the big guns – WinDbg (https://msdn.microsoft.com/library/windows/hardware/ff551063(v=vs.85).aspx – now can be downloaded as part of Windows SDK ).

After opening the 32-bit version of WinDbg (the application was compiled with AnyCPU and ‘Prefer 32 bit’) we have to set-it-up for .NET (CLR) applications:
– File / Symbol File Path: SRV*;c:\symbols*;http://msdl.microsoft.com/download/symbols
– File / Open Crash Dump
.loadby sos clr in the WinDbg command prompt in order to load the SOS debugging extension (http://blogs.msdn.com/b/jasonz/archive/2003/10/21/53581.aspx) ; the ‘clr’ parameter is there because we are on .NET >= 4.0

Next, let’s analyze a bit the heap:


0:000:x86> !dumpheap -stat
Statistics:
MT Count TotalSize Class Name
73309888 1 12 System.ServiceModel.Diagnostics.Utility
69725164 1 12 System.Collections.Generic.ObjectEqualityComparer`1[[System.Linq.Expressions.LabelTarget, System.Core]]
. . .
. . .
02cd0690   421577     16863080 System.Data.Entity.Core.EntityKey
05373330   421575     18549300 System.Data.Entity.Core.Objects.Internal.EntityWrapperWithoutRelationships`1[[SampleDomain.NotificationChange, Sample.Domain]]
0537e2ac   421575     23608200 System.Data.Entity.Core.Objects.EntityEntry
667ac484  2105915     25270980 System.Int32
667a8e58  1257983     30191592 System.Guid
667a6f1c  2544457     30533484 System.Boolean
667aacc4   440021     32087956 System.String
009fe608   421575     67452000 Sample.Domain.NotificationChange
6675ab9c    26544     90426208 System.Object[]
05553478   421581    166945692 System.Data.Entity.Core.Objects.StateManagerValue[]
Total 12892240 objects

after a huge list of entries, we finally see something interesting – over 400 thousand instances of our class, NotificationChange.
This roughly matches the number of rows from the table processed so far.

This might hint where is the problem, but to be sure, we have to dig deeper:


!dumpheap -mt 009fe608
. . .
. . .
24254ef0 009fe608      160     
24257608 009fe608      160     
2425a95c 009fe608      160     
2425e64c 009fe608      160     
2426187c 009fe608      160     
242643ec 009fe608      160     
24268054 009fe608      160     

Statistics:
      MT    Count    TotalSize Class Name
009fe608   421575     67452000 Sample.Domain.NotificationChange
Total 421575 objects

yes, it will list the addresses of all 421.575 objects, but I found no way to get the address for just one of them. :)

Now we have to find out why the GC has not released yet these objects:


!gcroot 242643ec
HandleTable:
    00000000001611b4 (strong handle)
    -> 0000000000d0378c System.Object[]
    -> 0000000000ca3524 Sample.EF.MyStorage
    -> 0000000000cde438 System.Data.Entity.Internal.LazyInternalContext
    -> 0000000000ce3c08 System.Collections.Generic.Dictionary`2[[System.Type, mscorlib],[System.Data.Entity.Internal.Linq.IInternalSetAdapter, EntityFramework]]
    -> 0000000000d01314 System.Collections.Generic.Dictionary`2+Entry[[System.Type, mscorlib],[System.Data.Entity.Internal.Linq.IInternalSetAdapter, EntityFramework]][]
    -> 0000000000d00bc4 System.Data.Entity.DbSet`1[[Sample.Domain.NotificationChange, Sample.Domain]]
    -> 0000000000d00ba0 System.Data.Entity.Internal.Linq.InternalSet`1[[Sample.Domain.NotificationChange, Sample.Domain]]
    -> 000000000123497c System.Data.Entity.Core.Objects.ObjectQuery`1[[Sample.Domain.NotificationChange, Sample.Domain]]
    -> 00000000012349b8 System.Data.Entity.Core.Objects.EntitySqlQueryState
    -> 000000000121613c System.Data.Entity.Core.Objects.ObjectContext
    -> 0000000001328b08 System.Data.Entity.Core.Objects.ObjectStateManager
    -> 0000000001530a78 System.Collections.Generic.Dictionary`2[[System.Data.Entity.Core.EntityKey, EntityFramework],[System.Data.Entity.Core.Objects.EntityEntry, EntityFramework]]
    -> 0000000025221000 System.Collections.Generic.Dictionary`2+Entry[[System.Data.Entity.Core.EntityKey, EntityFramework],[System.Data.Entity.Core.Objects.EntityEntry, EntityFramework]][]
    -> 0000000024264578 System.Data.Entity.Core.Objects.EntityEntry
    -> 00000000242644fc System.Data.Entity.Core.Objects.Internal.EntityWrapperWithoutRelationships`1[[Sample.Domain.NotificationChange, Sample.Domain]]
    -> 00000000242643ec Sample.Domain.NotificationChange

Found 1 unique roots (run '!GCRoot -all' to see all roots).

This points to the culprit – the Entity Framework DbContext which indirectly holds a reference to each object loaded so far from the database. :)

Looking closer at the source code, it’s doing something like this (in pseudo code):

  1. start application
  2. create DbContext
  3. while (there are rows in the table)
    1. Using the above DbContext: Read next row from database and load it in a NotificationChange object
    2. Send a message to Service Bus
    3. Mark the row in the database as processed
    4. Repeat..

What could go wrong?

Well, nothing for a few rows, except that Entity Framework can’t read my thoughts and won’t guess that, after loading and updating one row, I won’t need it anymore.
It will keep it in the first-level cache (identity map) and will loop through all 400.000 objects each time a new row is loaded from database (maybe it was already loaded :) ).
More on this: https://weblog.west-wind.com/posts/2014/Dec/21/Gotcha-Entity-Framework-gets-slow-in-long-Iteration-Loops

The fix was simple – re-create the DbContext inside the loop – in our case there is no reason for the unit-of-work (DbContext) to span more than one row (https://lostechies.com/jimmybogard/2013/12/20/proper-sessiondbcontext-lifecycle-management/ ).

After doing that – miracle – the average processing time per messages decreased from 5-6 seconds to 0,02 sec.

Posted in .NET, Entity Framework | Tagged , , , , | 4 Comments

NameOf and Obfuscators

I was wondering some time ago how the new ‘nameof‘ operator from C# 6.0 works when using .. obfuscators.

Let’s write some code to verify this. I included a few other methods to get the member name (VS2015 RC was used):

using System;
using System.Runtime.CompilerServices;

namespace TestNameOf
{
    class Program
    {
        static void Main(string[] args)
        {
            var o = new Foo();
            o.Bar();

            Console.ReadKey();
        }
    }

    internal class Foo
    {
        public void Bar()
        {
            Console.WriteLine("nameof(Bar): " + nameof(Bar));
            ShowCallerName();
            Console.WriteLine("Action name: " + GetName(this.Bar));
        }

        private void ShowCallerName(
            [CallerMemberName] string callerName = null)
        {
            Console.WriteLine("CallerMemberName atribute: " + callerName);
        }

        public static string GetName(Action action)
        {
            return action.Method.Name;            
        }
    }
}

The result when the code is not obfuscated is the expected one:

nameof - not obfuscated

nameof – not obfuscated

When the code is obfuscated (using Eazfuscator.Net) the result is:

nameof - obfuscated

nameof – obfuscated

Unsurprisingly, it works as expected: the name from the original source code is preserved, even if the code is obfuscated. That’s because nameof is applied at compile time, and most (maybe all) obfuscators are applied immediately after the compile step.

Are there cases when this might not be the desired behavior? Maybe, but only if we try really hard, like when we combine nameof with reflection:

var m = typeof(Foo).GetMember(nameof(Bar))[0];

we will get an Exception:

nameof and reflection

nameof and reflection

The decision to return the source code information instead of metadata info was taken only in the late phases of C# designn: https://roslyn.codeplex.com/discussions/570551

And, let’s not forget that in general, typeof(Class).Name != nameof(Class):

typeof vs nameof

typeof vs nameof

Posted in .NET, C# | Tagged , | Leave a comment

Patterns and frameworks

Many people, when they first start to study design patterns (usually in university), dive into the ‘Gang-of-four’ reference book and if they have the energy to read it all, in the end they think something like: ‘well, very cool and interesting, I understood some of them, and maybe if I am lucky I will encounter projects interesting enough to actually use some of them’.. :)
It’s a normal reaction: unless you have a lot of experience in many real-world projects, you might never deliberately used, or realized that you used many of those patterns.

And here is a point that many people miss: design pattern, when they are really understood, might help somebody not only to improve it’s own code, but also to understand how and why the code in many frameworks and libraries is designed the way it is.
We don’t have to look any further than what we use every day – the .NET Framework. Here are some examples, in no particular order; I won’t explain each pattern, nor how it’s used in each case:

Decorator: I/O streams: Stream (the common ‘interface’), FileStream (concrete/component class), StreamReader, BufferedStream, CryptoStream (decorators)
and of course the Decorator class from WPF
Iterator: IEnumerator (generic iterator interface), IEnumerable (aggregator in GoF book), List (or any other collection), yield keyword
Observer: EventHandler delegate (abstract observer), any class exposing an event handler, like Button (concrete subject)
or IObserver/IObservable use in Reactive Extensions.
Abstract factory and bridge patterns: ASP.NET WebForms or ADO.NET providers (DbProviderFactory) – introduced in .NET 2.0
Factory: WebRequest.Create() method
Template method: many places, like ASP.NET WebForms Control class protected methods: OnLoad, OnInit, OnDataBinding etc..
Command: in WPF: ICommand, ICommandSource, RoutedCommand, or Action class in Java Swing or Delphi VCL
Facade: ApplicationUserManager from ASP.NET Identity framework
Flyweight: string interning, WPF dependency properties
Adapter: each time we use COM components from .NET or DataAdapter used in ADO.NET/DataSet world
Strategy: IComparer interface used in many sorting and searching methods in the framework
Composite: CompositeControl or Component base class and all it’s derived classes used in WinForms, ADO.NET etc..
Proxy: obviously, the proxy classes used in WCF or .NET Remoting clients
Interpreter: System.Linq.Expressions.Expression and it’s derived classes (also an example of composite pattern)
Memento: .NET serializable clases
Visitor : System.Linq.Expressions.ExpressionVisitor

These are just some random examples and maybe there are many more.
What’s the point in knowing this: when learning a new framework, if you identify a pattern, it’s easy to answer the question: ‘why the heck did they do it like this?’ :)

Many more patterns can be found in Fowler book (‘Patterns of Enterprise Application Architecture’), but maybe I’ll talk about those in a next episode..

Posted in .NET | Tagged , , , , | 1 Comment

On closures and captured variables

A few days ago, on the project I’m working on, I’ve stumbled on an interesting bug – an example of why it pays off to learn the ‘deeper’ areas of C# language (or any other language).
Image copyright: Pavel Shlykov (Shutterstock)

Image copyright: Pavel Shlykov (Shutterstock)


Greatly simplified (and with the class names changed to protect the innocent :) ), we had:
– a structure of orders and order lines/items, something pretty straightforward:

// ...
    public class Order
    {
        public Order()
        {
            Items = new List<OrderItem>();
        }

        public string Number { get; set; }
        // ... other fields

        public IList<OrderItem> Items { get; set; }
    }
// ...
    public class OrderItem
    {
        public int ItemId { get; set; }
        public string ProductName { get; set; }
        public decimal Price { get; set; }
        // ... other fields
    }

An order contains several lines.

For one reason or another, let’s say that we want to deep copy this structure to another class, OrderLineDto, that flattens the structure:

    public class OrderLineDto
    {
        public string OrderNumber { get; set; } // the parent order number
        // OrderItem attributes:
        public int ItemId { get; set; }
        public string ProductName { get; set; }
        public decimal Price { get; set; }
        // ... other fields
    }

Because OrderItem has several hundred properties (don’t ask me why :) ), I’m using AutoMapper to simplify the mapping job.
We added a helper class that it’s supposed to keep the code nice and tidy:

public class OrderMapper
{
    private readonly Order _order;

    public OrderMapper(Order order)
    {
        _order = order;

        AutoMapper.Mapper.CreateMap<OrderItem, OrderLineDto>()
          .ForMember(orderLineDto => orderLineDto.OrderNumber, 
             config => config.MapFrom(sourceOrderItem => _order.Number));
     }

     public OrderLineDto GetLineDto(OrderItem orderItem)
     {
         var dtoLine 
           = AutoMapper.Mapper.Map<OrderItem, OrderLineDto>(orderItem);
         return dtoLine;
     }
}

The constructor gets the current Order instance, defined the mapping, and the GetLineDto method is doing the actual mapping from a OrderItem to a new OrderLineDto. Pretty simple..
Only for OrderLineDto.OrderNumber, we have to tell AutoMapper to take the value from the ‘parent’ _order.Number.

Let’s test it in a console application:

        static void Main()
        {
            var order1 = new Order {Number = "O1"};
            var orderItem1 = new OrderItem
            {
                ItemId = 1,
                ProductName = "Book 1",
                Price = 100
            };
            order1.Items.Add(orderItem1);

            var order2 = new Order {Number = "O2"};
            var orderItem2 = new OrderItem
            {                
                ItemId = 2,
                ProductName = "Book 2",
                Price = 200
            };
            order2.Items.Add(orderItem2);

            /////////////
            var orderMapper1 = new OrderMapper(order1);
            OrderLineDto dto1 = orderMapper1.GetLineDto(orderItem1);

            Console.WriteLine("\n\rItem 1 order number: {0}  == DTO 1 order number: {1}", 
                                order1.Number, dto1.OrderNumber);

            Console.WriteLine("Item 1 prod. name: {0}  == DTO 1 prod name: {1}", 
                                orderItem1.ProductName, dto1.ProductName);

            //////////////
            var orderMapper2 = new OrderMapper(order2);
            OrderLineDto dto2 = orderMapper2.GetLineDto(orderItem2);

            Console.WriteLine("\n\rItem 2 order number: {0}  == DTO 2 order number: {1}", 
                                order2.Number, dto2.OrderNumber);

            Console.WriteLine("Item 2 prod. name: {0}  == DTO 2 prod name: {1}", 
                                orderItem2.ProductName, dto2.ProductName);

            Console.ReadLine();
        }

I create 2 Order objects, each with one OrderItem, and for each OrderItem, I map it to a OrderLineDto object.
Finally, I compare the original and DTO properties to make sure they were copied properly.

However, the result is not the expected one:

Item 1 order number: O1  == DTO 1 order number: O1
Item 1 prod. name: Book 1  == DTO 1 prod name: Book 1

Item 2 order number: O2  == DTO 2 order number: O1
Item 2 prod. name: Book 2  == DTO 2 prod name: Book 2

Obviously, the 2’nd DTO object does not have the right order number (‘O2’), but the first one, ‘O1’.
Is AutoMapper broken? :)

No – the culprit is the was I’m defining the custom mapping for OrderNumber:

public class OrderMapper
{
    private readonly Order _order;

    public OrderMapper(Order order)
    {
        _order = order;

        AutoMapper.Mapper.CreateMap<OrderItem, OrderLineDto>()
          .ForMember(orderLineDto => orderLineDto.OrderNumber, 
             config => config.MapFrom(sourceOrderItem 
                                   => _order.Number));
    }
...
}

sourceOrderItem => _order.Number
is a lambda expression, but because _order field is referenced, a closure is created.
As for any closure, the _order variable instance is captured, and will be available each time the lambda expression is evaluated.

Nothing unexpected so far – the question is: which instance of _order?
The one from the moment OrderMapper constructor is called and the lambda expression is instantiated, right?
:)
That was the intention, at least.
However, AutoMapper has the good habit of caching the mappings in a static field, for good performance reasons.
So even if we try to redefine the mapping for a certain type, the first mapping is used.
In our case, the mapping will allways use the lambda expression created during the first call to Mapper.CreateMap, when the first OrderMapper is instantiated, so the first _order instance is captured by the closure and always used when OrderNumber is mapped.

How to fix this? Quite easy: copy the OrderNumber directly in code and don’t use AutoMapper for such a simple task:

    public class OrderMapper
    {
        private readonly Order _order;

        public OrderMapper(Order order)
        {
            _order = order;

            AutoMapper.Mapper.CreateMap<OrderItem, OrderLineDto>();
            
            //.ForMember(orderLineDto => orderLineDto.OrderNumber, 
            //        config => config.MapFrom(sourceOrderItem => _order.Number));
        }

        public OrderLineDto GetLineDto(OrderItem orderItem)
        {
            var dtoLine = AutoMapper.Mapper.Map<OrderItem, OrderLineDto>(orderItem);
            dtoLine.OrderNumber = _order.Number;
            return dtoLine;
        }
    }

To make sure that such a ‘bug’ is not introduced again by mistake, we can move the call to AutoMapper.Mapper.CreateMap in a static constructor that won’t be able to access instance fields.

More on closures and a comparation with Java: http://csharpindepth.com/articles/chapter5/closures.aspx
or http://martinfowler.com/bliki/Lambda.html
or in JavaScript: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Closures

Posted in .NET, C# | Tagged , , , | Leave a comment

On assumptions and formats

In .NET (and any other framework for that matter), it’s better to never assume anything, but to check twice.
Let’s take an example – what do you think, will the following unit test always pass?

[TestMethod]
public void ShortDateLength()
{
    DateTime d = new DateTime(2015, 01, 18);
    string dateString = d.ToString("yyyyMMdd");

    Assert.AreEqual(8, dateString.Length);
}

.
.
.
.
.
.
.
.
.
.
.
.
.
.
… well, usually, it will, but once every blue moon, it will fail :) .
All it takes it’s an user changing the regional settings of the computer, or adding the following 3 lines of code:

[TestMethod]
public void ShortDateLength()
{
    CultureInfo c = new CultureInfo("he-IL", false);
    c.DateTimeFormat.Calendar = new HebrewCalendar();
    Thread.CurrentThread.CurrentCulture = c;

    DateTime d = new DateTime(2015, 01, 18);
    string dateString = d.ToString("yyyyMMdd");

    Assert.AreEqual(8, dateString.Length);
}

Yes, on some cultures and calendars, the dates are not represented in arab numerals, and years might not fit in 4 chars.
In the above case, dateString will have the following ‘unexpected’ value:
תשע”הד’כ”ז

– a solid 10 chars in length.
When does this matter? When dateString is going to be displayed on the UI, probably not – in such cases I want to be formatted in the format chosen by the end-user.
However, if the DateTime value is going to be serialized to a text file or send to a web service, I want to make sure that I will be able to decode it later.
In such cases it’s better to replace the line 9 above with:
string dateString = d.ToString(“yyyyMMdd”, CultureInfo.InvariantCulture);

Somehow related – what do you think, which of the following tests will pass?

const string digits1 = "5678";
Assert.IsTrue(Regex.IsMatch(digits1, @"^\d+$"));

const string digits2 = "୮౪୩";
Assert.IsTrue(Regex.IsMatch(digits2, @"^\d+$"));

Unexpectedly for some, both will pass :)
\d in .NET matches any digit, and is Unicode-aware (https://msdn.microsoft.com/en-us/library/20bw873z(v=vs.110).aspx#DigitCharacter).
୮౪୩ are.. digits in some cultures (http://www.fileformat.info/info/unicode/category/Nd/list.htm).

Again, why is this relevant? Because many developers use \d as a quick way to validate input that it’s supposed to be only one of 0,1,2,…,9 – well, digit might mean more than that.
Some will say that in their application there is no risk that an end-user might enter something like ୮౪୩ – true, unless the input comes from a mis-behaving client application that calls a web service, and it just happen to send by accident the following sequence of bytes (hex values):
EF BB BF E0 AD AE
or
E0 B1 AA
or
E0 AD A9
– in UTF-8 these are.. digits.

Posted in .NET, C# | Tagged , , , | Leave a comment

How developers start to ignore code smells

Many people wonder how some developers blissfully ignore some best practices when writing code, or aren’t too bothered when they see a code smell in their project.
There are many explanations, but an old one is the code they see when working with Microsoft framework and samples (and not only Microsoft).

Even if Microsoft did great improvements in this direction in the recent years (clean code, best practices etc.), when some developers see code like the one below, in one of the most recent Microsoft frameworks, what conclusion will they draw? It’s from MS, so it must be right, no? :)

IdentityConfig.cs – part of the latest ASP.NET Identity 2.0 project template – 6 classes in one file:
IdentityConfig.cs

UserManager class – part of ASP.NET itself, new class added by Microsoft last year – the screenshot is truncated, it could be twice this size – I’m too tired to count how many public members are in there:
UserManager class

Posted in .NET, IT | Tagged , , , , | Leave a comment