C# 7.2 safe & efficient enhancement of value types

Value type enhancements in C# 7.2

This is mostly a bunch of code demos, not so much into the theories.

This is part of series where we discuss C# language enhancements in version C# 6 and above. In this article we’ll discuss a group of features introduced in C# 7.2, for safe and efficient coding with value types. If you are not familiar with the previous versions yet, go and read the C# 6, C# 7 and C# 7.1 - 7.3 articles first.

C# 7.2 brings in a set of new features that makes working with value types (struct in C#) more efficient and semantically better. Most of them make efficient use of memory for better performance, also provides immutability to avoid unintentional changes in data wherever possible. This features make writing high performance code easier & cleaner and safer to write low level code with better control on memory allocation.

C# 7.2 is another point release that adds a number of useful features. One theme for this release is working more efficiently with value types by avoiding unnecessary copies or allocations. - docs

Struct in .NET

What is a struct in .NET?

Similar to class, a struct is a compound data type that can pack together multiple data pieces (properties, fields) and behaviours (methods, events, indexers). A struct is a value type, means it is passed around as a complete value and assigning it to a new variable makes a complete new copy of the data. On the other hand, classes are reference type, and are passed around as references. Copying a class instance just creates a new reference to the same memory address.

Generally struct objects are stored as a single chunk of memory in the stack (not always though, e.g. when they are used as property of a class object).

Class & struct can both have methods, properties, indexers & events. They both can implement interface (though casting a struct to interface type boxes it to a reference). But, there are some differences

  1. Classes can inherit from another class, struct cannot inherit from anything
  2. Struct cannot have explicit (defined in code) parameter-less constructor
  3. Struct cannot have a destructor
  4. Struct has default implementation of Equals() that matches by value equality of data/properties
public interface IAnimal
{
    string MakeNoise();
}
public class DogFamily
{
    public virtual string MakeNoise() => "Aaooo";
}

//class can implement interface & inherit class but NOT struct
public class Husky : DogFamily, IAnimal //Polygon
{
    public Husky() { } //constructor
    ~Husky() { } //destructor   
}

public struct Polygon
{
    //struct CANNOT have explicit parameterless constructor
    public Polygon(int sides) //a simple constructor
    {
        Sides = sides;
    }
    //can have properties, methods, events & indexers
    public int Sides { get; set; }
    //ONLY classes can have destructors, NOT structs
    //~Polygon() { }
}

//struct can only implement interface, CANNOT inherit class or struct
public struct Rectangle : IAnimal //Polygon
{
    public int Height { get; set; }
    public int Width { get; set; }
    //this is bad example, do not do in real code (impractical & breaks LSP)
    public string MakeNoise() => throw new System.NotImplementedException();
    //Struct DOES supports Deconstructor like class (C# 7.0)
    //needs System.ValueTuple NuGet for < .NET 4.6.2, .NET Core 1.x, .NET Standard 1.x
    public void Deconstruct(out int height, out int width) 
        => (height, width) = (Height, Width);
}

So, when should you use struct?

In .NET, generally classes are preferred. If you are not sure, go with a class.

But struct can have some benefits (e.g. performance & memory overhead) in some scenarios. Allocating and deallocating small piece of memory (allocating memory on heap, generating reference on stack etc.) and the process of garbage collection may be heavier for classes than using structs, which just keeps the whole thing on stack. Another scenario, e.g. creating a 10 item array with reference type creates 11 references, for the 10 items and one for the array. When the same is created for struct, only a single reference is created for the array which holds the whole chunk of data with 10 structs. But having heavy structs can choke the stack memory, and passing them around as parameters can be expensive as that copies the whole data every time.

Structs are specifically useful when there are large number of short lived, small piece of data and data that follows value type semantics, i.e. data that semantically represents a single value (e.g. a decimal number or a coordinate point), rather than a complex object (e.g. a person with height, weight, age, name, gender, job, address etc.). Another important point is, value type should equate when their values are same (default behaviour of Equals() in structs) and need not have same reference. (The string is an exception where it is a reference type but equality matches by value!)

An array of struct takes up one contiguous memory block with a single reference, making it more performant (not while passing around though). Also, a large size struct array will generally grow on LOH memory and bypass the GC compaction, thus further improving the performance. So, this also makes a good case for structs. There are some standard rules for when to consider struct over class:

  1. It logically represents a single value, similar to primitive types (e.g. a decimal number or a coordinate point) - i.e. it follows value semantics. This is probably the single most important factor deciding when to use struct
  2. It has an instance size smaller than 16 bytes (small, packing a few value type data)
  3. It is immutable (well, always depends on actual implementation)
  4. It will not have to be boxed frequently (not casted frequently, e.g. to an interface)

Note: Value types can be passed by-value or by-reference to a method. [1] When passing by-value, a copy of data is made and passed to the parameter. [2] When passed by reference, no copy is made and a reference to the original data is passed, so any change inside the called method reflects in the original data.

So we have 2 options to pass a value type, let’s look at the drawbacks of both

  1. By-value: The actual data is copied, so it can be expensive for large data
  2. By-reference: Not copied, but now the data is open for modification even if unintentional (remember, struct should be immutable)

The following bunch of new features released with C# 7.2 (and a bit with C# 7.3), has the same common theme - increasing efficiency of value types, by enabling them to be passed like reference, while keeping them immutable whenever possible. With these new capabilities of C#, we can have significant performance improvement in some scenarios specially with large value type data. Also they improve the readability of the language by making programming intent clearer. But, please note that all these should be used only in high performance scenarios. Using them for a standard applications that already uses lot of memory and resources, might be just an unnecessary micro-optimization.

In modifier on parameter

We already have ref and out modifiers on parameters, C# 7.2 introduces in.

It means that a parameter is passed by reference but cannot be modified by the calling method. Internally it works similar to ref parameters, but compiler does additional checks to see it was not modified in the method (beyond compiler, out and in are both ref with some additional restrictions). This is more useful for readability of code specifying the intent is to use as input parameter only, not to be modified. It also improves performance by the compiler not having to copy the value. Note that the variable must be initialized before calling the method, with that variable as parameter. Semantically, it, kind of, works as opposite of out.

//Standard use of ref and out parameters
//out parameter need not be initialized
public void Half(ref double value, out double result)
{
    var input = value; //read from ref parameter
    value = input / 2; //setting value to ref, not super intuitive but fine
    result = value; //setting value of out parameter, pretty clean
}

//Using in modifier to better represent intent
public double Square(in double value)
{
    //following line does not compile, as it tries to set value of in-parameter
    //value = value * value; //>>>> error: in double variable is readonly <<<<
    return value * value; //reading an in parameter is fine
}

Notes: [1]An argument that is passed to a ref or in parameter must be initialized before it is passed. This differs from out parameters, whose arguments do not have to be explicitly initialized before they are passed. They can also be defined inline. [2] Members of a class can’t have signatures that differ only by ref, in, or out. BUT, methods can be overloaded by using ref, in, or out when the other one has a simple value type parameter. (Remember: to the compiler, they all are ref) [3] async methods can NOT have ref, in, or out parameters. Same with iterator methods with yield. [4] When reference types are passed by-reference as parameter, the called method can completely replace the object by pointing the reference to a new location (assigning a new object). So now, the original variable in calling method will have a new object. This is not possible when reference type is passed by-value, only individual items can be updated in-place. If a new object is assigned to pass-by-value reference parameter inside called method, further changes to that object does not reflect in original object passed!

class ExampleClass
{
    public void M(int x) { }
    //This works (any ONE of following three is allowed)
    public void M(ref int x) { }
    //But the following 2 does NOT compile, if above is present
    //public void M(in int x) { }
    //public void M(out int x) => x = 10;

    //following methods do not work if parameter has ref or other modifiers
    public IEnumerable<int> GetNumbers(int x)
    {
        for (int i = 1; i <= x; i++)
            yield return i;
    }
    public async Task<int> GetDoubleAsync(int x)
    {
        await Task.Delay(100);
        return x * 2;
    }
}

Note: When calling a method with in parameter, the in keyword is optional. But, if 2 methods differ by only a in modifier in a parameter, then the in keyword must be used to call the method that has in parameter. (This was fixed in C# 7.3).

public void M1(in int x) { }
public void M2(int x) { }
public void M2(in int x) { } //in overload

public void TestIn()
{
    M1(10); //works fine
    //following is NOT allowed - only pass variable with `in`
    //M1(in 30);
    var n = 30;
    M1(in n); //this works

    M2(n); //calls first M2
    M2(in n); //calls second M2
}

//in parameters CAN have default values
public void M3(in int x = 100) { } //valid

For value types, readonly struct (see below) and in parameters play very well together. In fact, passing a non-readonly struct as in parameter almost always degrades the performance.

readonly struct

When a struct is defined with readonly modifier, the structure becomes immutable. Simply put, all it’s member fields & properties must be readonly. The this fields can be set only within a constructor. So, once initialized, the values cannot be updated (well, not without tricks like reflection). This was kind of a long due feature, developers wanted a clear way to create immutable value types, which was not there. This is useful for it’s clear readability of intent, but also very useful from the performance stand point as it saves the compiler from making a defensive copy of the data when passing around. Read this for better understanding.

Readonly structs should always be passed as in parameters to have better performance (by avoiding a defensive copy by the compiler).

public readonly struct Velocity
{
    //All fields & properties MUST BE READONLY
    private readonly string _direction;
    public double Speed { get; }
    public string Unit => "m/s";
    //readonly fileds/properties can only be set in constructor 
    public Velocity(double speed, string direction)
    {
        Speed = speed;
        _direction = direction;
    }
    //this is allowed too, only inside constructor 
    public Velocity(Velocity v)
    {
        this = v;
    }
}

//ideal usage with in parameters
public void TestReadonlyStruct(in Velocity velocity)
{
    var speed = velocity.Speed;
}

Note that there is generally no point creating an instance of readonly struct by calling the default constructor, as that will give you an instance with all properties/fields set to default values and you cannot change it!

ref struct

Adding the ref modifier to a struct declaration defines that instances of that type must be stack allocated. In other words, instances of these types can never be created on the heap as a member of another class, or they cannot be boxed, they cannot be assigned to variables of type object, dynamic or to any interface type. Obviously, they cannot implement an interface. There are bunch of rules that the compiler enforces on a ref struct.

//a ref struct looks like just a struct
public ref struct Temperature
{
    public int Value { get; set; }
    public int Unit { get; set; }
}

//usage
public void TestRefStruct()
{
    //velocity is a normal (readonly) struct defined above
    var readonlyStruct = new Velocity(10, "NW");
    object boxed = readonlyStruct; //struct can be boxed
    var refStruct = new Temperature(); //a ref struct <<<<
    //ref struct CANNOT be boxed
    //boxed = refStruct; //NOT ALLOWED
    //this also NOT ALLOWED, see below
    //var refStructString = refStruct.ToString();
}

class Class1 //a normal class
{
    public Velocity Velocity { get; set; } //struct property
    //NOT ALLOWED - ref struct CANNOT be property of a class
    //public Temperature Temperature { get; set; }
}

ref struct Weather //another ref struct
{
    //works: ref struct CAN be used as property
    //of another ref struct ONLY
    public Temperature Temperature { get; set; }
}

Note: A ref struct is also called an embedded reference. Because of the object casting not being allowed, you cannot call methods like ToSTring(), Equals(), GetHashCode() on an instance of a ref struct. The only use of ref struct variables are in method parameters or local variables. You can also have readonly ref struct which follows the rules from both the types.

ref readonly modifier on parameter

Can be beneficial when returning large structures, as it saves copying the value while maintaining immutability of returned data. So, when you do a ref readonly on a method, it returns a reference to the returned (ref return) item, but the calling method is not allowed to make any changes to it. It can only read the data from the returned reference.

The ref readonly kind of puts the same restrictions as in but on the returning side.

public struct Coordinate
{
    public Coordinate(int x, int y) //simple constructor
    {
        X = x;
        Y = y;
        _array = null;
    }
    private Coordinate[] _array; //fileds & properties
    public int X { get; set; }
    public int Y { get; set; }
    //just a simple method to set internal array
    public void SetCoordinates(Coordinate[] coordinates) => _array = coordinates;

    // >>>> REF READONLY <<<<
    public ref readonly Coordinate GetCoordinate(int x)
    {
        if (_array == null || x < 0 || x >= _array.Length)
            throw new ArgumentException();
        return ref _array[x]; //notice signature & return statement
    }

    public string GetItemsAsString() //just a utility to print the array
    {
        return _array == null ? string.Empty
            : string.Join(" - ", _array.Select(a => $"{a.X},{a.Y}"));
    }
}

public void TestRefReadonly()
{
    var structArr = new Coordinate[] { new Coordinate(1, 2), new Coordinate(2, 4), new Coordinate(3, 6) };
    var coordinate = new Coordinate();
    coordinate.SetCoordinates(structArr);

    //without `ref` it just passes by value (a copy)
    var coByVal = coordinate.GetCoordinate(1);
    coByVal.Y = 100; //coByVal = 2,100
    var originalData = coordinate.GetItemsAsString(); //1,2 - 2,4 - 3,6

    // >>>> REF READONLY actual usage SYNTAX <<<<
    //Actual use of ref readonly for readonly (immutable) reference
    ref readonly var coByRef = ref coordinate.GetCoordinate(1);
    //coByRef.Y = 100; >>>> this is NOT allowed <<<<
    var y = coByRef.Y; //coByRef = 2,4 - gets the value fine, as ref
}

Span<T> and Memory<T>

Though not part of C# 7.2 or any other language specification, these 2 types were added as core framework enhancement with new constructs like ref struct being available. The Span<T> is a ref struct that given a high performance, universal access to contiguous memory space across stack, managed heap or unmanaged heap. The Memory<T> is a wrapper over Span<T> that enables this to be used in iterator and async methods.

If you want immutable versions of these, there are System.ReadOnlySpan<T> and System.ReadOnlyMemory<T>. See Span<T> Struct and Memory<T> Struct documentation for details.

References

comments powered by Disqus