Hadoop for .NET Developers
Well, it’s Summer again and time for some new blog entries. This Summer, I’ve had some time to dig into Hadoop and want to share some of the basics of storage and job processing from a .NET developer’s perspective.
Hadoop is an open-source platform written in Java. However, thanks to work by Hortonworks and Hortonworks, those familiar with C#, VB.NET, or any other .NET language can now leverage the platform.
With this series of posts, it’s my objective to help you get started with .NET development on Hadoop. The explanations of Hadoop as well as the samples and demonstrations are purposefully simple but will provide you an accessible starting point for your own exploration of the platform.
Please note, I will focus on storage and MapReduce, the core components of the Hadoop platform. In a later series, I will explore Hive and Pig for higher-level interaction with the data sets employed here.
NOTE Anoop Madhusudanan has written quite a bit about the use of .NET with Hadoop. Check out his blog at https://www.amazedsaint.com/search/label/Hadoop for some very informative content.
- Understanding Hadoop
- Basic Architecture
- Setting Up a Desktop Development Environment
- Setting Up an Azure Cluster
- Obtaining the Sample Data Sets
- Understanding HDFS
- Manually Loading Data to Hadoop
- Programmatically Loading Data to HDFS
- Understanding Azure Vault Storage
- Programmatically Loading Data to AVS
- Understanding MapReduce
- Implementing a Simple MapReduce Job
- Understanding Hadoop Streaming
- Implementing a (Slightly) More Complex MapReduce Job
- Unit-Testing with the .NET SDK
- Troubleshooting with the MapReduce Job Logs
Comments
Anonymous
August 29, 2013
Really good piece of knowledge, I had come back to understand regarding your website from my friend Sumit, Hyderabad And it is very useful for who is looking for <a href="biginfosys.com/.../a>.Anonymous
November 17, 2013
Thanks for the tutorial. These blogposts should be included in the Microsoft Jump Start program.