Hadoop on Windows
Welcome to today's Article Spotlight!
Check out the full version of the article here:
Hadoop-based Services For Windows
This blog post is a preview of the content in that article (you'll find 3-5 times more information in the TNWiki article). The article (and many others about Hadoop) is written by Wesley McSwain, SQL Server technical writer.
Hadoop Overview
Apache Hadoop is an open source software framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It consists of two primary components: Hadoop Distributed File System (HDFS) , a reliable and distributed data storage, and MapReduce , a parallel and distributed processing system. A Hadoop cluster can be made up of a single node or thousands.
HDFS is the primary distributed storage used by Hadoop applications. As you load data into a Hadoop cluster, HDFS splits up the data into blocks/chunks and creates multiple replicas of blocks and distributes them across the nodes of the cluster to enable reliable and extremely rapid computations.
Getting Started with Hadoop-based Services for Windows
The links in this section provide information on deploying Apache Hadoop to Microsoft Windows Platforms. All these articles are on TechNet Wiki:
Link | Description |
---|---|
Getting Started with Hadoop-based Services for Windows | An overview of the Getting Started guides currently available. |
Getting Started a Hadoop cluster on the Elastic Map Reduce Portal. | A walkthrough for provisioning and using a temporary Hadoop cluster on the Elastic Map Reduce Portal (EMR) Portal. |
Using Hadoop with other BI Technologies
This section contains information on using Hadoop with other BI technologies. All these articles are on TechNet Wiki:
Link | Description |
How to Connect Excel to Hadoop on Azure via HiveODBC | Explains how to use Excel 2010 to access data in the Hive data warehouse running on Windows Azure by using the Hive ODBC Driver. |
How to Connect Excel PowerPivot to Hive on Azure via HiveODBC | Explains how to use PowerPivot to access data in the Hive data warehouse running on Windows Azure by using the Hive ODBC Driver. |
How To
This section contains a list of Hadoop-related how-to articles. All these articles are on TechNet Wiki:
Link | Description |
---|---|
Hadoop-based Services on Windows Azure How-Tos and FAQs | A collection of common How To topics along with FAQs. |
How to Run a Job on a Provisioned Hadoop on Windows Azure Cluster | Information about creating Map Reduce jobs on a cluster that has been provisioned on the Elastic Map Reduce (EMR) portal |
How To FTP Data to Hadoop on Windows Azure | A walkthrough for using FTPS to send file data to the cluster |
How to create a mapper and reducer in C# (Hadoop Streaming) | A walkthough for creating a mapper and reducer in C# using Hadoop Streaming |
Use SQL Azure database as a Hive metastore | Information about using SQL Azure database as a Hive metastore |
Check out the article and add to it here (it's a lot bigger than the sections I featured in this blog post):
Hadoop-based Services For Windows
Jump on in. The Wiki is warm!
- User Ed
Comments
Anonymous
January 15, 2013
Hi Ed, I'm getting lots of "Resource Not Found" errors on the How To pages. Sandrino http://fabriccontroller.netAnonymous
January 16, 2013
Sandrino, it was the Apache links that weren't working. I updated those. Thanks!Anonymous
January 31, 2016
Computers Today (part 1 of 6) blogs.msdn.com/.../computers-today.aspx ..... CS SPOTLIGHT: Girls in computer programming... why it matters!!! blogs.msdn.com/.../cs-spotlight-girls-in-computer-programming-why-it-matters.aspx ... Computational Thinking - Videos & Papers by Jeannette Wing blogs.msdn.com/.../computational-thinking-videos-amp-papers-by-jeannette-wing.aspx