Saturday, January 7, 2012

- Garbage Collection

CLR manages allocated objects via garbage collection. There is no delete keyword in C# and programmer never deallocate a managed object from memory directly. .NET allocate all objects to a region of memory called managed heap, where they will automatically destroyed by the garbage collector. C# new keyword returns a reference to the object on the heap. If you declare the reference variable as a local variable in a method scope, it is stored on the stack for further use in your application. Consider the below example:

Shape referenceToShape = new Shape();
Reference to object on the managed heap
Keep in mind Heap allocation only occurs when you are creating instance of classes.The garbage collector remove the object from the heap when it is unreachable by any part of your code.
When the C# compiler encounter the new keyword, it emits a CIL newobj instruction into the method implementation. Consider below screenshot.
CIL of main method
newobj instruction cause following operations done by CLR:
  • Calculate the total amount of memory required for the object to be allocated.
  • Examine the managed heap to make sure there is enough room to host the object to be allocated.
  • Advance the next object pointer to point to the next available slot on the managed heap.
.NET garbage collector will compact empty blocks of memory for optimization purposes. Managed Heap maintains a pointer (next object pointer or new object pointer) that identifies exactly where the next object will be located. A garbage collection will occur, if the managed heap does not have sufficient memory to allocate a requested object.
Keep in mind assigning a reference to null does not force the garbage collector to fire up at that exact moment and remove the object form the heap.

Application roots:
A root is a storage location containing a reference to an object on the managed heap, which can fall in to any of the following categories:
  • References to global objects (not allowed in C#)
  • References to any static objects/ static fieldcs
  • References to local object within an application's code base
  • References to object parameters passed into a method
  • References to objects waiting to be finalized
  • Any CPU register that references an object
In order to check whether object is still reachable by the application CLR will build a object graph, which represents each reachable object on the heap. Object graph are used to document all reachable objects. garbage collector will never graph the same object twice.
Good to know garbage collector makes use of two distinct heaps, one of which is specifically used to store very large object. This heap is less frequently consulted during the collection cycle, given possible performance penalties involved with relocation large object.

Object Generations:
Each object on the heap assigned to specific generation. The idea behind the generation is the longer an object has existed on the heap, the more likely  it is to stay there. Object on the heap can be belong to one the following generations:
  • Generation 0: newly allocated object that has never been marked for collection
  • Generation 1: Survived a garbage collection. (was marked for collection but didn't remove because of sufficient heap space was acquired)
  • Generation 2: Survived more than one garbage collection.
Generation 0 and 1 are termed ephemeral generations. Garbage collector will investigate all generation 0 objects first. If marking and getting rid of these objects results in the required amount of free memory, any surviving object are promoted to generation 1. However if additional memory is still required, generation 1 objects are then investigated for reachability and collected accordingly. Surviving generation 1 objects are then promoted as generation 2. If still addition memory is required generation 2 objects are evaluated.

Garbage collection prior to .NET 4.0 using a technique termed concurrent garbage collection. Under this model, when a collection takes place for any ephemeral generation (0 or 1), the garbage collector temporary suspends all active threads within the current process to ensure that the application does not access the managed heap during the collection process. On the other hand this model allowed objects belong to generation 2 to be cleaned up on a dedicated thread. Which decreased suspending active thread and program allowed to continue allocating objects on the heap during the collection of non ephemeral generations.

Garbage collection in.NET 4.0 using a new technique termed background garbage collection. In this model if a background garbage collection is taking place for objects living in generation 2, the .NET runtime is now able to collect objects on the ephemeral generations (0 or 1) using a dedicated background thread. Which reduce the amount of time a given thread involved with garbage collection details must be suspended.

System.GS:
Allows you to programmatically interact with garbage collector using a set of static members. Specially when dealing with internal use of unmanaged resources. Some of the members specified at below:
  • AddMemoryPressure() and RemoveMemoryPressure(): Specify a numerical value that represent the calling object's "urgency level" regarding the garbage collection process. These methods should alter pressure in tandem and thus never remove more pressure than the total amount you have added.
  • Collect(): Forces the GC to perform a garbage collection. Overload of this method is available to specify a generation to collect, as well as the mode of collection (via GCCollectionMode enum).
  • CollectionCount(): Number of times a given generation has been swept.
  • GetGeneration(): Generation which an object is currently belong.
  • GetTotalMemory(): Estimated amount of memory (in bytes) currently allocated on the managed heap. A Boolean parameter specifies whether the call should wait  for garbage collection to occur before returning.
  • MaxGeneration: Maximum number of generation supported on the target system.
  • SuppressFinalize(): Set a flag indicating that the specified object should not have its Finalize() method called.
  • WaitForPendingFinalizers(): Suspends the current thread until all finalizable objects have been finalized. This method is typically called directly after invoking GC.Collect(). 
Let's go through an example:

Shape referenceToShape = new Shape();
Console.WriteLine("Number of byes on heap: {0}", GC.GetTotalMemory(false));
Console.WriteLine("Max Number of generations : {0}", GC.MaxGeneration + 1);
Console.WriteLine("referenceToShape is in {0} generation",  GC.GetGeneration(referenceToShape));
Garbage Collection example output
    Garbage Collection Notifications:
    • RegisterForFullGCNotification(): Rregisters for a notification to be raise when the runtime senses that a full collection is approaching
    • WaitForFullGCApproach(): To determine a when a notification has been rised.
    • WaitForFullGCComplete():
    For more information and further example about "Garbage Collection Notification" you may refer to following links : Garbage Collection Notification, Garbage Collection and Performance, Garbage Collection Notification in .NET 4.0.
    Garbage collection is usually forced using GC.Collect() in below cases:
    • You are about to enter to block of code which don't wish GC interupt
    • After allocating large number of objects and you wish to free some memory as soon as possible.
      //force GC
      GC.Collect();
      //wait for each object to be finalized
      GC.WaitForPendingFinalizers();

      Collect() can receive 2 parameter:
      •  generation (int) : to specify the generation number for collect
      • mode (GCCollectionMode): specify the mode for garbage collection from GCCollectionMode enumeration which contains following values:
        • Default: Forced is the current default
        • Forced: collect immediately
        • Optimized: Allows the runtime to determine whether the current time is optimal to reclaim object.
      Finalize Method:
      As you may already know Finalize() is a virtual method of base System.Object class. In your custom class you can override this method in order to specify a location to perform any necessary cleanup logic for your type. Finalize() method is protected and GC will call this method before removing the object from memory. In addition when an AppDomain is unloaded from memory, the CLR automatically invokes finalizer for every finalizable object created during its lifetime.
      Since structures are value type and never allocated on the heap, it is illegal to override Finalize() on structure types.
      Keep in mind you need to override the Finalize() method when you are using unmanaged resources (e.g. raw OS file handles, raw unmanaged database connections, chuck of unmanaged memory,...).
      Unmanaged resources can be obtained directly through calling to PInvoke (Platform Invocation Services)  API or complex COM  interoperability tasks.
      Overriding the Finalize() method is not like overriding other virtual methods (using override keyword), and it is just like a constructor with '~' prefix which  is look like C++ destructor. Finalizer don't get any access modifier( implicitly protected) and never take parameter or return value. Example:

      class Shape
      {
          ~Shape()
          {
              //free unmanaged resources
          }
      .
      .
      .
      }

      Finalization takes time because of the following process:
      If object got Finalize() method it will be marked as finalizable and store in to the finalizable queue which is a table maintained by GC. When GC going to free an object from memory, it examines each entry on the finalization queue and copies the object off the heap to yet another managed structure termed the finalization reachable table (aka F-Reachable). A separate thread is spawned to invoke the finalize() method for each object on the F-Reachable table at the next GC. (at least takes two GC to finalize an object)

      IDisposable
      As an alternative to Finalize() method, you may implement the IDisposable interface in order to release unmanaged objects as soon as possible instead of relying on a garbage collection to occur. However the object user needs to call the Dispose() method manually. (Both structures and class types can implement IDisposable interface.) Example:

      //Implementig IDisposable.
      class Shape: IDisposable
      {
         
         //this method needs to be called after object user done with the object
          public void Dispose()
          {
              //clean up unmanaged resources...
              //Also dispose other contained disposable objects...
          }
          .
          .
          .
      }

      To make sure Dispose() method will call in any all cases it is better to place the Dispose invoke in finally block or alternatively use the 'using' C# keyword which cause the Dispose method call automatically after going out of using scope (both are same 'using' is a short way of try/ finally). If the caller forgets to call Dispose(), method, the unmanaged resources may be held in memory indefinitely. Example:

      Shape my_sahpe = new Shape();            //Dispose method will be called automatically
      try                                      //after finishing using scope
      {                                        using (Shape my_shape = new Shape())
          //...                         OR     {
      }                                            //... 
      finally                                  }
      {
          my_sahpe.Dispose();
      }

      In case your class don't support IDisposable you will encounter the below compile error:
      ...type used in a using statement must be implicitly convertible to 'System.IDisposable'
       'using' keyword also used for importing namespaces as well.
      In below you can see the syntax of declaring multiple objects of the same type within a using scope:

      //syntax to  declare multip objects of the same type
      using (Shape my_shape = new Shape(),
                   my_shape2 = new Shape())
      {
          //...   
      }

      To make sure unmanaged resources will be cleaned up, It is good practice to override the Finalize() and implement the Dispose() method for a single class. Cause if object user forget to call the Dispose() method the GC will take care of unmanaged resources. In addition you need to call the GC.SupressFinalize() in order to GC bypass the finalization process at the end of Dispose() method to avoid redundancy.

      GC.SuppressFinalize(this);

      Formalized Disposal Pattern:
      To avoid duplicate code, it is better to define a helper function  which is called by Finalize() and Dispose() method. In addition managed object should dispose by Dispose() method not Finalize() method. Above all we need to make sure user can safety call Dispose method multiple times without error. Here is an example of formalized disposal pattern: 

      class Shape: IDisposable
      {
          //to check whether Dispose has already been called.
          private bool disposed = false;

          public void Dispose()
          {
              //true shows object user triggered the cleanup.
              CleanUp(true);
              GC.SuppressFinalize(this);
          }

          private void CleanUp(bool disposing)
          {
       if (!this.disposed)
              {
                  if(disposing)
                  {
                      //Dispose managed resources.
                  }
                  //clean up unmanaged resources.
              }
              disposed = true;
          }

          ~Shape()
          {
              //false shows GC triggered the cleanup
              CleanUp(false);
          }
      }

      Lazy Object Instantiation:
      Sometimes you have a member variable which never needed by object user (object user never call a method or a property that make use of the variable). This is troublesome if the unused object requires a large amount of memory (e.g. array with 90000 elements) which may cause a good deal of stress to the GC. One way to deal with the problem is factory design pattern method to create the variable if used. Alternatively .NET 4.0 provides Lazy<> generic class which define data that will not be created unless your code base actually make use of it. Consider following example:

      class Color
      {
          public int R{ set; get;}
          public int g { set; get; }
          public int B { set; get; }
      }

      class CustomColors
      {

          public CustomColors()
          {
              //fill up the allColors array
              Console.WriteLine("Fill up allCollors array of CustomColors");    
          }
          private Color[] allColors = new Color[90000];


      class Shape
      {
          private Lazy<CustomColors> custom_color = new Lazy<CustomColors>();
          //to check whether Dispose has already been called.

          public CustomColors GetCustomColors()
          {
              return custom_color.Value;
          }
      }

      static void Main(string[] args)
      {
          Shape shae_1 = new Shape();
          //CustomColors will be created after calling the GetCustomColors()
          //if object user that use shape instance does not call the GetCustomeColors()
          //CustomColors Object would not be created
          shae_1.GetCustomColors();
      }

      When you declare a Lazy<> variable, the actual internal data type is created using the default constructor, in another word default constructor will be called when the Lazy<> variable is used.
      Lazy<>  class get an optional parameter which allows you to specify generic delegate to specify method to call during the creation of the wrapped type. The generic delegate type is System.Func<>, which can point to a method that returns the same data type being created  by the related Lazy,. variable and can take up to 16 arguments. Example:

      private Lazy<CustomColors> custom_color = new Lazy<CustomColors>(() =>
          {
              Console.WriteLine("In lamda expression section");
              //you may call the custom constructor here as well
              return new CustomColors();
          });
      Lazy<> example output

      Reference: Pro C# 2010 and the .NET 4 Platform by Andrew Troelsen.

      No comments:

      Post a Comment