What is Azure HDInsight?
Fully managed Big Data Open Source Analytics Service with popular open source frameworks such as Kafka, Storm, R, Spark, Hive, HBase, Phoenix, LLAP, Sqoop, Oozie & Hadoop.
100% Apache Open Source with No lock in. Customers can freely move between on premise, Azure and other clouds as Microsoft does not use any proprietary code with HDInsight.
Manageability & Operations:
• Fully managed service with 99.9% availability SLAs [Industry’s best SLAs]
• Highly optimized, best performance with the default configuration [No fine tuning required for the great performance]
• Customers have full control on cluster. We support wide range of customizations
• Cluster scaling via Scale API
• Microsoft monitors underlying cluster infrastructure as well as various open source services running on cluster for issues and automatically fixes issues with its advanced healing infrastructure
• Integrated with Azure Log Analytics for Log Management & Integrated dashboards
• High availability configurations such as multiple head nodes [if master goes down, your jobs keep running].
• Proven performance at large scale https://azure.microsoft.com/en-us/blog/hdinsight-interactive-query-performance-benchmarks-and-integration-with-power-bi-direct-query/
Enterprise Security:
• VNET and Network Access Control support for perimeter security
• Option for customers to bring their own firewalls
• Active Directory support for multi user configuration
• Role based isolation, access control for Table, Column & row level data via Apache Ranger
• Auditing of all access attempts
• Support for multiple Open Source Frameworks with Ranger such as Hive, Spark, LLAP
• Encryption at REST and in transit
• Most comprehensive compliance
• Available is Government Cloud
Data Storage:
• Support Azure Data Lake Store, Azure Hot Storage, Azure Cool Storage for data storage
• Shared managed services such as Hive Metastore, Ranger Database, Oozie Database & Sqoop Database
Tooling
• 100 % Open Source data science tools s Zeppelin, Jupyter and R Studio
• Best in class Spark debugging support in IntelliJ
• 1st class support for Eclipse, Visual Studio and Visual Studio Code, Power BI & Apache DBeaver
• Native HBase and Phoenix REST SDKs
• Tez View, Grafana and Hive View for monitoring and debugging hive queries
• Cluster orchestration with PowerShell, Azure SDK, ARM templates or Azure Data Factory
• Rich curated marketplace with one-click deploy experience of most popular big data applications
Cost
• HDInsight is most cost-effective solution in its category
• Per Minute billing & No additional support services are required for Open Source components
Use cases
• ETL/Batch [MR, Pig, Hive, Spark]
• Interactive Exploration [Hive, LLAP, Spark SQL]
• Data Science & Machine Learning [R, Spark ML]
• Streaming [Kafka-->Storm/Spark Streaming -->HBase]
• Lift & Shift [HDP & Cloudera Migrations to Azure]