Navigation

Tuesday, September 28, 2010

LINQ – Language Integrated Query 101

This is a collection of information related to LINQ.  The primary purpose is to provide a reference for getting started with LINQ.

LINQ is a set of extensions to the .NET Framework that encompass language-integrated query, set, and transform operations. It extends C# and Visual Basic with native language syntax for queries and provides class libraries to take advantage of these capabilities.

Language-Integrated Query (LINQ) is an innovation introduced in Visual Studio 2008 and .NET Framework version 3.5 that bridges the gap between the world of objects and the world of data.

Traditionally, queries against data are expressed as simple strings without type checking at compile time or IntelliSense support. Furthermore, you have to learn a different query language for each type of data source: SQL databases, XML documents, various Web services, and so on. LINQ makes a query a first-class language construct in C# and Visual Basic. You write queries against strongly typed collections of objects by using language keywords and familiar operators. The following illustration shows a partially-completed LINQ query against a SQL Server database in C# with full type checking and IntelliSense support.

LINQ query with Intellisense

In Visual Studio you can write LINQ queries in Visual Basic or C# with SQL Server databases, XML documents, ADO.NET Datasets, and any collection of objects that supports IEnumerable or the generic IEnumerable(Of T) interface. LINQ support for the ADO.NET Entity Framework is also planned, and LINQ providers are being written by third parties for many Web services and other database implementations.

You can use LINQ queries in new projects, or alongside non-LINQ queries in existing projects. The only requirement is that the project target .NET Framework 3.5 or later.

A query is an expression that retrieves data from a data source. Queries are expressed in a dedicated query language. Over time, different languages have been developed for different types of data sources, for example, SQL for relational databases and XQuery for XML. This makes it necessary for the application developer to learn a new query language for each type of data source or data format that is supported.

Language-Integrated Query (LINQ) simplifies the situation by offering a consistent model for working with data across various kinds of data sources and formats. In a LINQ query, you are always working with objects. You use the same basic coding patterns to query and transform data in XML documents, SQL databases, ADO.NET datasets and entities, .NET Framework collections, and any other source or format for which a LINQ provider is available. This document describes the three phases of the creation and use of basic LINQ queries.

Three Stages of a Query Operation


LINQ query operations consist of three actions:

  1. Obtain the data source or sources.

  2. Create the query.

  3. Execute the query.

In LINQ, the execution of a query is distinct from the creation of the query. You do not retrieve any data just by creating a query. This point is discussed in more detail later in this topic.

The following example illustrates the three parts of a query operation. The example uses an array of integers as a convenient data source for demonstration purposes. However, the same concepts also apply to other data sources.

' Data source.
Dim numbers() As Integer = {0, 1, 2, 3, 4, 5, 6}

' Query creation.
Dim evensQuery = From num In numbers
                 Where num Mod 2 = 0
                 Select num

' Query execution.
For Each number In evensQuery
    Console.Write(number & " ")
Next

Output:

0 2 4 6

The Data Source


Because the data source in the previous example is an array, it implicitly supports the generic IEnumerable(Of T) interface. It is this fact that enables you to use an array as a data source for a LINQ query. Types that support IEnumerable(Of T), or a derived interface such as the generic IQueryable(Of T), are called queryable types.

As an implicitly queryable type, the array requires no modification or special treatment to serve as a LINQ data source. The same is true for any collection type that supports IEnumerable(Of T), including the generic List(Of T), Dictionary(Of TKey, TValue), and other classes in the .NET Framework class library.

If the source data does not already implement IEnumerable(Of T), a LINQ provider is needed to implement the functionality of the standard query operators for that data source. For example, LINQ to XML handles the work of loading an XML document into a queryable XElement type, as shown in the following example. For more information about standard query operators, see Standard Query Operators Overview.

' Create a data source from an XML document.
Dim contacts = XElement.Load("c:\myContactList.xml")

With LINQ to SQL, you first create an object-relational mapping at design time, either manually or by using the Object Relational Designer (O/R Designer). You write your queries against the objects, and at run-time LINQ to SQL handles the communication with the database. In the following example, customers represents a specific table in the database, and Table(Of TEntity) supports generic IQueryable(Of T).

' Create a data source from a SQL table.
Dim db As New DataContext("C:\Northwind\Northwnd.mdf")
Dim customers As Table(Of Customer) = db.GetTable(Of Customer)

For more information about how to create specific types of data sources, see the documentation for the various LINQ providers. (For a list of these providers, see Language-Integrated Query (LINQ).) The basic rule is simple: a LINQ data source is any object that supports the generic IEnumerable(Of T) interface, or an interface that inherits from it.

NoteNote

Types such as ArrayList that support the non-generic IEnumerable interface can also be used as LINQ data sources. For an example that uses an ArrayList, see How to: Query an ArrayList with LINQ.

The Query


In the query, you specify what information you want to retrieve from the data source or sources. You also have the option of specifying how that information should be sorted, grouped, or structured before it is returned. To enable query creation, Visual Basic has incorporated new query syntax into the language.

When it is executed, the query in the following example returns all the even numbers from an integer array, numbers.

' Data source.
Dim numbers() As Integer = {0, 1, 2, 3, 4, 5, 6}

' Query creation.
Dim evensQuery = From num In numbers
                 Where num Mod 2 = 0
                 Select num

' Query execution.
For Each number In evensQuery
    Console.Write(number & " ")
Next

The query expression contains three clauses: From, Where, and Select. The specific function and purpose of each query expression clause is discussed in Basic Query Operations (Visual Basic). For more information, see Queries (Visual Basic). Note that in LINQ, a query definition often is stored in a variable and executed later. The query variable, such as evensQuery in the previous example, must be a queryable type. The type of evensQuery is IEnumerable(Of Integer), assigned by the compiler using local type inference.

It is important to remember that the query variable itself takes no action and returns no data. It only stores the query definition. In the previous example, it is the For Each loop that executes the query.

Query Execution


Query execution is separate from query creation. Query creation defines the query, but execution is triggered by a different mechanism. A query can be executed as soon as it is defined (immediate execution), or the definition can be stored and the query can be executed later (deferred execution).

Deferred Execution

A typical LINQ query resembles the one in the previous example, in which evensQuery is defined. It creates the query but does not execute it immediately. Instead, the query definition is stored in the query variable evensQuery. You execute the query later, typically by using a For Each loop, which returns a sequence of values, or by applying a standard query operator, such as Count or Max. This process is referred to as deferred execution.

' Query execution that results in a sequence of values.
For Each number In evensQuery
    Console.Write(number & " ")
Next

' Query execution that results in a single value.
Dim evens = evensQuery.Count()

For a sequence of values, you access the retrieved data by using the iteration variable in the For Each loop (number in the previous example). Because the query variable, evensQuery, holds the query definition rather than the query results, you can execute a query as often as you want by using the query variable more than one time. For example, you might have a database in your application that is being updated continually by a separate application. After you have created a query that retrieves data from that database, you can use a For Each loop to execute the query repeatedly, retrieving the most recent data every time.

The following example demonstrates how deferred execution works. After evensQuery2 is defined and executed with a For Each loop, as in the previous examples, some elements in the data source numbers are changed. Then a second For Each loop runs evensQuery2 again. The results are different the second time, because the For Each loop executes the query again, using the new values in numbers.

Dim numberArray() = {0, 1, 2, 3, 4, 5, 6}

Dim evensQuery2 = From num In numberArray
                  Where num Mod 2 = 0
                  Select num

Console.WriteLine("Evens in original array:")
For Each number In evensQuery2
    Console.Write("  " & number)
Next
Console.WriteLine()

' Change a few array elements.
numberArray(1) = 10
numberArray(4) = 22
numberArray(6) = 8

' Run the same query again.
Console.WriteLine(vbCrLf & "Evens in changed array:")
For Each number In evensQuery2
    Console.Write("  " & number)
Next
Console.WriteLine()

Output:

Evens in original array:

0 2 4 6

Evens in changed array:

0 10 2 22 8

Immediate Execution

In deferred execution of queries, the query definition is stored in a query variable for later execution. In immediate execution, the query is executed at the time of its definition. Execution is triggered when you apply a method that requires access to individual elements of the query result. Immediate execution often is forced by using one of the standard query operators that return single values. Examples are Count, Max, Average, and First. These standard query operators execute the query as soon as they are applied in order to calculate and return a singleton result. For more information about standard query operators that return single values, see Aggregation Operations, Element Operations, and Quantifier Operations.

The following query returns a count of the even numbers in an array of integers. The query definition is not saved, and numEvens is a simple Integer.

Dim numEvens = (From num In numbers
                Where num Mod 2 = 0
                Select num).Count()

You can achieve the same result by using the Aggregate method.

Dim numEvensAgg = Aggregate num In numbers
                  Where num Mod 2 = 0
                  Select num
                  Into Count()

You can also force execution of a query by calling the ToList or ToArray method on a query (immediate) or query variable (deferred), as shown in the following code.

' Immediate execution.
Dim evensList = (From num In numbers
                 Where num Mod 2 = 0
                 Select num).ToList()

' Deferred execution.
Dim evensQuery3 = From num In numbers
                  Where num Mod 2 = 0
                  Select num
' . . .
Dim evensArray = evensQuery.ToArray()

In the previous examples, evensQuery3 is a query variable, but evensList is a list and evensArray is an array.

Using ToList or ToArray to force immediate execution is especially useful in scenarios in which you want to execute the query immediately and cache the results in a single collection object. For more information about these methods, see Converting Data Types.

You can also cause a query to be executed by using an IEnumerable method such as the IEnumerable.GetEnumerator method.

The standard query operators are the methods that form the Language-Integrated Query (LINQ) pattern. Most of these methods operate on sequences, where a sequence is an object whose type implements the IEnumerable(Of T) interface or the IQueryable(Of T) interface. The standard query operators provide query capabilities including filtering, projection, aggregation, sorting and more.

There are two sets of LINQ standard query operators, one that operates on objects of type IEnumerable(Of T) and the other that operates on objects of type IQueryable(Of T). The methods that make up each set are static members of the Enumerable and Queryable classes, respectively. They are defined as extension methods of the type that they operate on. This means that they can be called by using either static method syntax or instance method syntax.

In addition, several standard query operator methods operate on types other than those based on IEnumerable(Of T) or IQueryable(Of T). The Enumerable type defines two such methods that both operate on objects of type IEnumerable. These methods, Cast(Of TResult)(IEnumerable) and OfType(Of TResult)(IEnumerable), let you enable a non-parameterized, or non-generic, collection to be queried in the LINQ pattern. They do this by creating a strongly-typed collection of objects. The Queryable class defines two similar methods, Cast(Of TResult)(IQueryable) and OfType(Of TResult)(IQueryable), that operate on objects of type Queryable.

The standard query operators differ in the timing of their execution, depending on whether they return a singleton value or a sequence of values. Those methods that return a singleton value (for example, Average and Sum) execute immediately. Methods that return a sequence defer the query execution and return an enumerable object.

In the case of the methods that operate on in-memory collections, that is, those methods that extend IEnumerable(Of T), the returned enumerable object captures the arguments that were passed to the method. When that object is enumerated, the logic of the query operator is employed and the query results are returned.

In contrast, methods that extend IQueryable(Of T) do not implement any querying behavior, but build an expression tree that represents the query to be performed. The query processing is handled by the source IQueryable(Of T) object.

Calls to query methods can be chained together in one query, which enables queries to become arbitrarily complex.

The following code example demonstrates how the standard query operators can be used to obtain information about a sequence.

Dim sentence = "the quick brown fox jumps over the lazy dog"
' Split the string into individual words to create a collection.
Dim words = sentence.Split(" "c)

Dim query = From word In words
            Group word.ToUpper() By word.Length Into gr = Group
            Order By Length _
            Select Length, GroupedWords = gr

Dim output As New System.Text.StringBuilder
For Each obj In query
    output.AppendLine(String.Format("Words of length {0}:", obj.Length))
    For Each word As String In obj.GroupedWords
        output.AppendLine(word)
    Next
Next

'Display the output
MsgBox(output.ToString())

' This code example produces the following output:
'
' Words of length 3:
' THE
' FOX
' THE
' DOG
' Words of length 4:
' OVER
' LAZY
' Words of length 5:
' QUICK
' BROWN
' JUMPS

Query Expression Syntax


Some of the more frequently used standard query operators have dedicated C# and Visual Basic language keyword syntax that enables them to be called as part of a query expression. For more information about standard query operators that have dedicated keywords and their corresponding syntaxes, see Query Expression Syntax for Standard Query Operators.

Extending the Standard Query Operators


You can augment the set of standard query operators by creating domain-specific methods that are appropriate for your target domain or technology. You can also replace the standard query operators with your own implementations that provide additional services such as remote evaluation, query translation, and optimization. See AsEnumerable(Of TSource) for an example.

References;

Video on getting started with LINQ, ttp://msdn.microsoft.com/en-us/vbasic/bb735955.aspx

Video on writing LINQ, http://msdn.microsoft.com/en-us/library/bb820884(VS.90).aspx

Guy Burstein’s Blog on Like Operators, http://blogs.microsoft.co.il/blogs/bursteg/archive/2007/10/16/linq-to-sql-like-operator.aspx

Guy Burstein’s LINQ blog posts, http://blogs.microsoft.co.il/blogs/bursteg/archive/tags/LINQ/default.aspx

Microsoft LINQ Developer Center, http://msdn.microsoft.com/en-us/netframework/aa904594.aspx

Microsoft LINQ sorting, http://msdn.microsoft.com/en-us/library/bb546145.aspx

Microsoft LINQ filtering, http://msdn.microsoft.com/en-us/library/bb546161.aspx

Microsoft LINQ set operations, http://msdn.microsoft.com/en-us/library/bb546153.aspx

Microsoft LINQ quantifier operations, http://msdn.microsoft.com/en-us/library/bb546128.aspx

No comments: