Compartir vía


Using the Microsoft Hive ODBC Driver in a .NET client

patterns & practices Developer Center

From: Developing big data solutions on Microsoft Azure HDInsight

One of the easiest ways to consume data from HDInsight in a custom .NET application is to use the System.Data.Odbc.OdbcConnection class with the Hive ODBC driver to query Hive tables in the HDInsight cluster. This approach enables programmers to use the same data access classes and techniques that are commonly used to retrieve data from relational database sources such as SQL Server.

Due to the typical latency when using ODBC to connect to HDInsight, and the time taken to execute the Hive query, you should use asynchronous programming techniques when opening connections and executing commands. To make this easier, the classes in the System.Data and System.Data.Odbc namespaces provide asynchronous versions of the most common methods.

The following code example shows a simple Microsoft C# console application that uses an ODBC data source name (DSN) to connect to HDInsight, execute a Hive query, and display the results. The example is deliberately kept simple by including the connection string in the code so that you can copy and paste it while you are experimenting with HDInsight. In a production system you must protect connection strings, as described in “Securing credentials in scripts and applications” in the Security section of this guide.

using System;
using System.Threading.Tasks;
using System.Data;
using System.Data.Odbc;
using System.Data.Common;

namespace HiveClient
{
  class Program
  {
    static void Main(string[] args)
    {
      GetData();
      Console.WriteLine("----------------------------------------------");
      Console.WriteLine("Press a key to end");
      Console.Read();
    }

    static async void GetData()
    {
      using (OdbcConnection conn = 
             new OdbcConnection("DSN=Hive;UID=user-name;PWD=password"))
      {
        conn.OpenAsync().Wait();
        OdbcCommand cmd = conn.CreateCommand();
        cmd.CommandText =
            "SELECT obs_date, avg(temp) FROM weather GROUP BY obs_date;";
        DbDataReader dr = await cmd.ExecuteReaderAsync();
        while (dr.Read())
        {
          Console.WriteLine(dr.GetDateTime(0).ToShortDateString()
                  + ": " + dr.GetDecimal(1).ToString("#00.00"));
        }
      }
    }
  }
}

The output from this example code is shown in Figure 1.

Figure 1 - Output retrieved from Hive using the OdbcConnection class

Figure 1 - Output retrieved from Hive using the OdbcConnection class

Next Topic | Previous Topic | Home | Community