是否有一个.NET 等同于 Apache Hadoop?

因此,我一直以极大的兴趣关注着 Hadoop,说实话,我很着迷,事情并没有变得更酷。

我唯一的小问题是我是一个 C # 开发人员,而且它是用 Java 编写的。

并不是我不了解 Java 就像我在寻找 Hadoop.net 或者 NHadoop 或者。NET 项目,包含 谷歌 MapReduce方法。有人知道吗?

53048 次浏览

Have you looked at using Hadoop's streaming?

I use it in python all the time :-).

I'm starting to see that the heterogeneous approach is often the best and it looks like other folks are doing the same.

If you look at projects like protocol-buffers or facebook's thrift you see that sometimes it's just best to use an app written in another language and build the glue in the language of your preference.

There's a pretty cute MapReduce implementation for .NET at: http://mapsharp.codeplex.com/

Recently, MySpace released their .NET MapReduce framework, Qizmt, as Open Source, so this is also a potential contender in this space.

I would say that DryadLinq is the closest thing that us .NET folk have to Hadoop. But it depends what you want to use hadoop for. If you are looking for the optimized self maintaining distributed file (DFS) system then DryadLINQ isn't what you are looking for. It has an analog to the DFS but you have to manually build the partitions and distribute each partition.

That being said, if its the distributed execution aspect of Hadoop that you are looking for than DryadLINQ is truly wonderful (and no, i'm not affiliated with MS). As long as you have a Microsoft HPC cluster setup than getting going with DryadLINQ is really easy.

The code you write is really just straight LINQ code, except instead of executing the LINQ on IEnumerable<T> you have to execute it on PartitionedTable<T> (the self build distributed data structure).

What has really been cool about DryadLINQ is the fast turn around time (try, test, adjust, repeat) when developing algorithms. You just write LINQ code to do your calculations and DryadLINQ will take care of the whole distributed execution part. It's the most natural analog I've come across that makes writing code for distributed processing just like writing code for single process processing.

It may be better to use Apache Hadoop and streaming because Apache Hadoop is actively being developed and maintained by big giants in the Industry like Yahoo and Facebook. So it can do what you expect it to do.

If you need a solution in .NET please check Myspace implementation @ MySpace Qizmt - MySpace’s Open Source Mapreduce Framework

dryad/linq is being productized and will be released soon: http://blogs.technet.com/b/windowshpc/archive/2011/07/07/announcing-linq-to-hpc-beta-2.aspx use in conjunction with Microsoft HPC for a powerful, cluster based solution for quering unstructured data

I answered your question in my question here

To say it here in the source:

Microsoft dropped its alternative (Dryad) in favor of Hadoop. Next year they will release MS SQL Server 2012 with Hadoop integration. Azure and Windows Sever support is being developed even as we speak.

It will be available in the first half in 2012.

Hadoop is the #1 BigData platform and is going to be supported by opensource and proprietary source (Java, .Net, Python, ...) even Oracle is adopting it.

If you were developing something, you should wait if you're on the .Net platform.

More information about what is possible will be available here

Microsoft Research has project Daytona http://research.microsoft.com/en-us/projects/daytona/

You can download it. There's a WordCount sample in C#.

You can look into something like RavenDb it provides very decent support for MapReduce for a fairly large size of data. as it is built in .Net so a proper LINQ client API is available.

http://ravendb.net/

To get you started you can read my blog entery.

You can now use Hadoop directly from .NET Microsoft has release a SDK to do so.

https://hadoopsdk.codeplex.com/

Of course this means using the java based Hadoop network. But does it matter if the server is running in java? I am sure someone may attempt to port it but I don't think it would be a good idea as corporations are already backing the java version and I don't think the .NET port will get the same attention.

Microsoft is in the process of rolling out HDInsight, which is billed as their "100% Apache compatible Hadoop distribution."

It is available both on Windows Server and as a Windows Azure service.

Have a look on:

http://www.windowsazure.com/en-us/services/hdinsight/

It is an implementation of Hadoop for Azure and you can use .NET for accessing it.

As others have mentioned, DryadLINQ is a programming framework that allows developers to write LINQ queries and execute them on a cluster, in a similar manner to MapReduce. The DryadLINQ project has recently been released under the Apache license on GitHub, and the release includes support for running on YARN clusters (including Azure HDInsight clusters).

Internally, Microsoft have been using Cosmos. This has been made available outside Microsoft thru Azure. It's named Azure Data Lake Analytics and Azure Data Lake Store. Azure Data Lake analytics is kind of Yarn as a service and Azure Data Lake Store WebHDFS as a service. The first version of Azure Data Lake Analytics only hosts U-SQL a language based on Transact-SQL + C#.