Posts

  • Concurrent/Async programming on .NET

    May 12, 2013

    By popular request, here is the course material for a course I held at various companies, introducing developers to the world of concurrent and asynchronous programming on the .NET platform. This covers topics like basic CPU operation principles, OS threading basics, optimization topics for parallel processing and of course asynchronous patterns in order to improve scalability or responsiveness of applications. We start how these topics are handled in NET 1.0 and later refined in subsequent versions of .NET up to async/await keyword usage in NET 4.5. Some optional libraries ( Reactive Extensions, TPL Dataflow ) are also discussed.

    In addition to the presentation itself, about 80 code samples are included to illustrate various usage patterns.

    Download the presentation material here: Presentation & Code Samples

    Note: While immutable collections are discussed, the recent BCL Team authored immutable collection library is not mentioned since the presentation is over a year old and interested parties should look at the bcl team blog for information about this: BCL Team Immutable collections library

  • Async LINQ part 2

    Aug 23, 2012

    As i noted in my last post’s update, i have decided to implement ASYNC Linq as an open source project, and today i reached the “feature complete” milestone.

    All LINQ operators are supported, except “ToLookup” which I never used myself in traditional LINQ and consider a low priority. All operators are implemented asynchronously, and as with traditional LINQ evaluated lazily whenever possible.

    The next steps for my library are to increase test coverage and implement a slew of performance optimizations, because currently performance is quite bad, as I did not want to do a lot of optimizations when I was still working on the implementation.

    So let’s get back to my first post and continue on how to create async enumerables. I have implemented three ways to go about this actually, all suited to particular use cases.

    If you think about it a classic IEnumerable where its element is a Task actually represents an async sequence, except that the wait has to be performed on the retrieved current value instead of the actual movement operation.

    So the first way to create an async enumerable is to implement/return such an IEnumerable> which is then converted by the library infrastructure into an IAsyncEnumerable. This makes your code quite easy to write as the example below shows:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    public static async Task<int> GetDataItem(int index)
    {
        .... returns an int from somwhere asynchronously
    }
    
    public static IEnumerable<Task<int>> EnumerateData()
    {
         yield return GetDataItem(1);
         yield return GetDataItem(2);
    }
    
    public static void Usage()
    {
         IAsyncEnumerable<T> enumerable = EnumerateData().ToAsync();
    }
    

    A few points to note are that in the enumeration method you don’t await an async method instead you yield the task returned by it, ( in fact the compiler disallows awaits in yield based enumerator methods ) But the behaviour stays the same as one would expect, the first yield returns a task, which is awaited outside and only after that the method is re-entered and continues before the next yield statement ( in case there is a bit more code there ), so sequencing stays the same as with a regular async/await based method implementation.

    However there are some limitations to this approach, namely it’s quite tricky to have async calls in such an enumerator implementation that you just want to await but not put their return value into the sequence you’re defining.

    I will address these limitations with a different concept of defining an Async Sequence in my next post.

  • Async LINQ part 1

    Aug 18, 2012

    Update 20th August: I decided due to interest to actually implement the functionality described in this post series as an open source library. The repository is at: GitHub repository and the corresponding nuget package is at: Nuget package

    Now that NET 4.5 has hit RTM and is available to MSDN Subscribers, I want to take a look at one of the functionality gaps that are present in this release. The major new feature in .NET 4.5 was the addition of the async/await syntax for developing asynchronous code in a much more concise way than was possible before.

    However as maybe some of you have already noticed while there is PLINQ for concurrent LINQ to Objects query processing, “ALINQ” as I would call it for asynchronous queries is sadly absent in this release.

    So I decided to implement this missing functionality myself, and I want to walk through it’s design in this series of blog posts.

    First let’s define what ALINQ should provide for us:

    • The ability to define asynchronously enumerable sequences
    • Async iteration over those sequences
    • Common query operators well know from LINQ that also allow us to specify asynchronous selector/predicate functions

    In the end we want to be able to execute something like this:

    1
    2
    3
    AsyncEnumerable.Range(0,100)
        .Select( async i => await DownloadSomething(i) )
        .ForEach( async r => await Output.WriteAsync(r) );
    

    Let’s start with the first building block to achieve something like this. Sequences in .NET are modeled with the IEnumerable interface, but behave always strictly synchronously. We need an equivalent that works asynchronously. ( One could argue that IObservable models this concept, but that is actually not the case since IObservable follows push semantics while we still want to have pull semantics, just working asynchronously )

    IEnumerable is defined as follows:

    1
    2
    3
    4
    public interface IEnumerable<out T> : IEnumerable
    {
         IEnumerator<T> GetEnumerator();
    }
    

    Basically .NET defines a stateful cursor concept for sequences encapsulated in the IEnumerator interface that GetEnumerator returns. This abstraction allows multiple enumerations of a sequence to be processed in parallel since the cursor is seperate from the actual underlying sequence.

    IEnumerator is defined as:

    1
    2
    3
    4
    5
    6
    public interface IEnumerator<T> : IEnumerator
    {
       T Current { get; }
       bool MoveNext();
       void Reset();
    }
    

    Quite simple actually, MoveNext moves to the next position in the sequence, returning if the move was successful and data is available at that position, Current returns the current item in the sequence and Reset should reposition the enumerator to the beginning of the sequence again.

    There is one caveat to note: Reset is basically deprecated since nobody made use of it, and in fact yield based Enumerators throw a NotImplementedException when it is called.

    How would we now model this to support async enumerations ?

    1
    2
    3
    4
    public interface IAsyncEnumerable<out T> : IAsyncEnumerable
    {
        IAsyncEnumerator<T> GetEnumerator();
    }
    

    This is basically the same as the standard IEnumerable, we just return a different kind of enumerator. It get’s more interesting with the definition of the actual IAsyncEnumerator definition

    1
    2
    3
    4
    5
    public interface IAsyncEnumerator<T> : IAsyncEnumerator
    {
         T Current { get; }
         Task<bool> MoveNext();
    }
    

    As we can see we made the MoveNext method asynchronous and removed the Reset method simply because of the reasons stated before. Current works the same as in the synchronous IEnumerator interface. We now have a sequence that we can get a cursor where the actual move to the next position can be performed asynchronously.

    I left out the explanation of the non generic interfaces on purpose, since they don’t help to explain the concept and just follow the same patterns anyway. Here is the code so far in full:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    public interface IAsyncEnumerator
    {
        object      Current { get; }
        Task<bool>  MoveNext();
    }
    
    public interface IAsyncEnumerator<out T> : IAsyncEnumerator
    {
        new T       Current { get; }
    }
    
    public interface IAsyncEnumerable
    {
        IAsyncEnumerator GetAsyncEnumerator();
    }
    
    public interface IAsyncEnumerable<out T> : IAsyncEnumerable
    {
        new IAsyncEnumerator<T> GetAsyncEnumerator();
    }
    

    In the next post we will look at how to produce such sequences/cursors in a convenient way.

  • Why a 'foreach' extension method for LINQ is a bad idea

    Jun 2, 2011

    I often wondered why LINQ to Objects does not contain a “ForEach” extension method like the generic List class does. After being hit by stupid bugs of my own twice in the last week, I think I have the answer why L2O was designed in that way and why List contains one. So consider this: What is the fundamental difference between all other L2O operators and ForEach ? All normal LINQ operators model a computation of a sequence, either standalone or as a combination of multiple computations. This computation is stateless,the sequence is only materialized when we actually evaluate the tree of computations we have combined so far. This means if the LINQ query creates objects along the way, we will get different ones for each evaluation of the query. So what is the difference with ForEach then ? Simply ForEach implies evaluation of the computation! This falls outside of the normal LINQ model, because now we are not dealing with computations anymore but with the results of such a computation. We effectively now have a mixture of both stateless and stateful ( by allowing us to change the invidual data contained in sequence we’re computing ) operations, and it is much to easy to forget that in our ForEach operator we have to deal with the implications of that, e.g that two different ForEach calls can receive different data, and we must not impose side effects on that data, because in the next evaluation those side effects will be gone again. That’s why it’s not ok to have a ForEach Linq operator, but it’s perfectly alright for List because List is not a stateless sequence in the LINQ sense, but an already evaluated one, that will not return different data between iterations ! So delete those nasty ForEach extensions methods again, because believe me tracing this kind of bug over a few hundred methods is not a fun thing to do…

  • Code snippet of the day

    Jan 18, 2011

    As I wrote some code for a declarative mapping system of members to certain data sources today, and I wanted to keep things DRY, I needed to have both read and write access to a property which was only given to me by a simple selector style C# Lambda Expression. Of course one could generate the required contrary method based on Reflection.Emit ( IL generation ) or some other way ( CSharpCodeDom etc ) but .NET Expression Trees to the rescue, my job was much easier than I thought. Here is my implementation ( which could use a little better error checking I know ).

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    public static Action<TObject, TProperty> GetterToSetter<TObject,TProperty>(Expression<Func<TObject, TProperty>> getter)
    {
        Contract.Requires<ArgumentNullException>(getter != null, "getter");
    
        var memberAccess = getter.Body as MemberExpression;
    
        if (memberAccess != null)
        {
         var memberName = memberAccess.Member.Name;
             var property = typeof (TObject).GetProperty(memberName);
    
             if (property != null && !property.CanWrite)
             {
                 throw new ArgumentException("Property it not writeable");
             }
    
             var member = property ?? (MemberInfo) typeof (TObject).GetField(memberName);
    
             if (member != null)
             {
                 var parameter1        = Expression.Parameter(typeof (TObject), "obj");
                 var parameter2        = Expression.Parameter(typeof (TProperty), "value");
                 var memberAccessClone = Expression.MakeMemberAccess(parameter1, member);
    
                 var body    = Expression.Assign(memberAccessClone,parameter2);
                 var setter  = Expression.Lambda(typeof (Action<TObject, TProperty>), body,new[] {parameter1, parameter2});
    
                 return (Action<TObject, TProperty>) setter.Compile();
             }
        }
    
        throw new ArgumentException("Invalid getter expression, only simple property/field accesses are supported");
    }
    

    Given an expression that selects a property like “s => s.Id” this will give you an action that accepts both the owning object as well as a new value for the property and assigns it. Also since Expression trees use DynamicMethods for code generation the new function will be defined in the owning objects type scope, so even if the setter of the property is not reachable publicly the returned function will circumvent that members visibility and still work. The end result is a pair of functions that allow read/write access to a property or field of an object instance. Here is a usage sample:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    public class Simple
    {
        public int Id { get; set; }
    }
    
    public static void Sample()
    {
        var setter = GetterToSetter<Simple, int>( s => s.Id );
        setter( new Simple(), 10 );
    }
    

    My given implementation is not very useful on it’s own, but can be helpful in code that already has the owning objects type as a generic parameter.

subscribe via RSS