Use Apache Sqoop to import and export data between Apache Hadoop on HDInsight and Azure SQL Database

Learn how to use Apache Sqoop to import and export between an Apache Hadoop cluster in Azure HDInsight and Azure SQL Database or Microsoft SQL Server. The steps in this document use the sqoop command directly from the headnode of the Hadoop cluster. You use SSH to connect to the head node and run the commands in this document. This article is a continuation of Use Apache Sqoop with Hadoop in HDInsight.

Prerequisites

Set up

  1. Use ssh command to connect to your cluster. Edit the command below by replacing CLUSTERNAME with the name of your cluster, and then enter the command:

    ssh sshuser@CLUSTERNAME-ssh.azurehdinsight.net
    
  2. For ease of use, set variables. Replace PASSWORD, MYSQLSERVER, and MYDATABASE with the relevant values, and then enter the commands below:

    export PASSWORD='PASSWORD'
    export SQL_SERVER="MYSQLSERVER"
    export DATABASE="MYDATABASE"
    
    
    export SERVER_CONNECT="jdbc:sqlserver://$SQL_SERVER.database.windows.net:1433;user=sqluser;password=$PASSWORD"
    export SERVER_DB_CONNECT="jdbc:sqlserver://$SQL_SERVER.database.windows.net:1433;user=sqluser;password=$PASSWORD;database=$DATABASE"
    

Sqoop export

From Hive to SQL.

  1. To verify that Sqoop can see your database, enter the command below in your open SSH connection. This command returns a list of databases.

    sqoop list-databases --connect $SERVER_CONNECT
    
  2. Enter the following command to see a list of tables for the specified database:

    sqoop list-tables --connect $SERVER_DB_CONNECT
    
  3. To export data from the Hive hivesampletable table to the mobiledata table in your database, enter the command below in your open SSH connection:

    sqoop export --connect $SERVER_DB_CONNECT \
    -table mobiledata \
    --hcatalog-table hivesampletable
    
  4. To verify that data was exported, use the following queries from your SSH connection to view the exported data:

    sqoop eval --connect $SERVER_DB_CONNECT \
    --query "SELECT COUNT(*) from dbo.mobiledata WITH (NOLOCK)"
    
    
    sqoop eval --connect $SERVER_DB_CONNECT \
    --query "SELECT TOP(10) * from dbo.mobiledata WITH (NOLOCK)"
    

Sqoop import

From SQL to Azure storage.

  1. Enter the command below in your open SSH connection to import data from the mobiledata table in SQL, to the wasbs:///tutorials/usesqoop/importeddata directory on HDInsight. The fields in the data are separated by a tab character, and the lines are terminated by a new-line character.

    sqoop import --connect $SERVER_DB_CONNECT \
    --table mobiledata \
    --target-dir 'wasb:///tutorials/usesqoop/importeddata' \
    --fields-terminated-by '\t' \
    --lines-terminated-by '\n' -m 1
    
  2. Alternatively, you can also specify a Hive table:

    sqoop import --connect $SERVER_DB_CONNECT \
    --table mobiledata \
    --target-dir 'wasb:///tutorials/usesqoop/importeddata2' \
    --fields-terminated-by '\t' \
    --lines-terminated-by '\n' \
    --create-hive-table \
    --hive-table mobiledata_imported2 \
    --hive-import -m 1
    
  3. Once the import has completed, enter the following command in your open SSH connection to list out the data in the new directory:

    hadoop fs -tail /tutorials/usesqoop/importeddata/part-m-00000
    
  4. Use beeline to verify that the table has been created in Hive.

    1. Connect

      beeline -u 'jdbc:hive2://headnodehost:10001/;transportMode=http'
      
    2. Execute each query below one at a time and review the output:

      show tables;
      describe mobiledata_imported2;
      SELECT COUNT(*) FROM mobiledata_imported2;
      SELECT * FROM mobiledata_imported2 LIMIT 10;
      
    3. Exit beeline with !exit.

Limitations

  • Bulk export - With Linux-based HDInsight, the Sqoop connector used to export data to SQL doesn't support bulk inserts.

  • Batching - With Linux-based HDInsight, When using the -batch switch when performing inserts, Sqoop makes multiple inserts instead of batching the insert operations.

Important considerations

Next steps

Now you've learned how to use Sqoop. To learn more, see: