Compartir vía


Using LINQ To Hive in a .NET client

patterns & practices Developer Center

From: Developing big data solutions on Microsoft Azure HDInsight

Language-Integrated Query (LINQ) provides a consistent syntax for querying data sources in a .NET application. Many .NET developers use LINQ to write object-oriented code that retrieves and manipulates data from a variety of sources, taking advantage of type checking and IntelliSense as they do so.

LINQ to Hive is a component of the .NET SDK for HDInsight that enables developers to write LINQ queries that retrieve data from Hive tables, enabling them to use the same consistent approach to consuming data from Hive as they do for other data sources.

The following code example shows how you can use LINQ to Hive to retrieve data from a Hive table in a C# application. The example is deliberately kept simple by including the credentials in the code so that you can copy and paste it while you are experimenting with HDInsight. In a production system you must protect credentials, as described in “Securing credentials in scripts and applications” in the Security section of this guide.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

using Microsoft.Hadoop.Hive;

namespace LinqToHiveClient
{
  class Program
  {
    static void Main(string[] args)
    {
      var db = new HiveDatabase(
          webHCatUri: new Uri("https://mycluster.azurehdinsight.net"),
          username: "user-name", password: "password",
          azureStorageAccount: "storage-account-name.blob.core.windows.net",
          azureStorageKey: "storage-account-key");

      var q = from x in
                (from o in db.Weather
                 select new { o.obs_date, temp = o.temperature })
              group x by x.obs_date into g
              select new { obs_date = g.Key, temp = g.Average(t => t.temp)};

      q.ExecuteQuery().Wait();

      var results = q.ToList();
      foreach (var r in results)
      {
        Console.WriteLine(r.obs_date.ToShortDateString() + ": "
                          + r.temp.ToString("#00.00"));
      }
      Console.WriteLine("---------------------------------");
      Console.WriteLine("Press a key to end");
      Console.Read();                       
    }
  }

  public class HiveDatabase : HiveConnection
  {
    public HiveDatabase(Uri webHCatUri, string username, string password,  
                        string azureStorageAccount, string azureStorageKey)
        : base(webHCatUri, username, password,
               azureStorageAccount, azureStorageKey) { }

    public HiveTable<WeatherRow> Weather
    {
      get
      {
        return this.GetTable<WeatherRow>("Weather");
      }
    }
  }

  public class WeatherRow : HiveRow
  {
    public DateTime obs_date { get; set; }
    public string obs_time { get; set; }
    public string day { get; set; }
    public float wind_speed { get; set; }
    public float temperature { get; set; }
  }
}

Notice that the code includes a class that inherits from HiveConnection, which provides an abstraction for the Hive data source. This class contains a collection of tables that can be queried (in this case, a single table named Weather). The table contains a collection of objects that represent the rows of data in the table, each of which is implemented as a class that inherits from HiveRow. In this case, each row from the Weather table contains the following fields:

  • obs_date
  • obs_time
  • day
  • wind_speed
  • temperature

The query in this example groups the data by obs_date and returns the average temperature value for each date. The output from this example code is shown in Figure 1.

Figure 1 - Output retrieved using LINQ to Hive

Figure 1 - Output retrieved using LINQ to Hive

Next Topic | Previous Topic | Home | Community