| By Tony Baer | Article Rating: |
|
| February 27, 2012 02:35 AM EST | Reads: |
1,961 |
To date, Big Storage has been locked out of Big Data. It’s been all about direct attached storage for several reasons. First, Advanced SQL players have typically optimized architectures from data structure (using columnar), unique compression algorithms, and liberal usage of caching to juice response over hundreds of terabytes. For the NoSQL side, it’s been about cheap, cheap, cheap along the Internet data center model: have lots of commodity stuff and scale it out. Hadoop was engineered exactly for such an architecture; rather than speed, it was optimized for sheer linear scale.
Over the past year, most of the major platform players have planted their table stakes with Hadoop. Not surprisingly, IT household names are seeking to somehow tame Hadoop and make it safe for the enterprise.
Up ' til now, anybody with armies of the best software engineers that Internet firms could buy could brute force their way to scale out humungous clusters and if necessary, invent their own technology, then share and harvest from the open source community at will. Hardly a suitable scenario for the enterprise mainstream, the common thread behind the diverse strategies of IBM, EMC, Microsoft, and Oracle toward Hadoop has been to not surprisingly make Hadoop more approachable.
Up ' til now, anybody with armies of the best software engineers that Internet firms could buy could brute force their way to scale out humungous clusters and if necessary.
What’s been conspicuously absent so far was a play from Big Optimized Storage. The conventional wisdom is that SAN or NAS are premium, architected systems whose costs might be prohibitive when you talk petabytes of data.
Similarly, so far there has been a different operating philosophy behind the first generation implementations from the NoSQL world that assumed that parts would fail, and that five nines service levels were overkill. And anyway, the design of Hadoop brute forced the solution: replicate to have three unique copies of the data distributed around the cluster, as hardware is cheap.
As Big Data gains traction in the enterprise, some of it will certainly fit this pattern of something being better than nothing, as the result is unique insights that would not otherwise be possible. For instance, if your running analysis of Facebook or Twitter goes down, it probably won’t take the business with it. But as enterprises adopt Hadoop – and as pioneers stretch Hadoop to new operational use cases such as what Facebook is doing with its messaging system – those concepts of mission-criticality are being revisited.
And so, ever since EMC announced last spring that its Greenplum unit would start supporting and bundling different versions of Hadoop, we’ve been waiting for the other shoe to drop: When would EMC infuse its Big Data play with its core DNA, storage?
Today, EMC announced that its Isilon networked storage system was adding native support for Apache Hadoop’s HDFS file system. There were some interesting nuances to the rollout.
Big vendors feeling their way
It’s interesting to see how IT household names are cautiously navigating their way into unfamiliar territory. EMC becomes the latest, after Oracle and Microsoft, to calibrate their Hadoop strategy in public.
Oracle announced its Big Data appliance last fall before it lined up its Hadoop distribution. Microsoft ditched its Dryad project built around its HPC Server. Now EMC has recalibrated its Hadoop strategy; when it first unveiled its Hadoop strategy last spring, the spotlight was on the MapR proprietary alternatives to the HDFS file system of Apache Hadoop. It’s interesting that vendor initial announcements have either been vague, or have been tweaked as they’ve waded into the market. For EMC’s shift, more about that below.
For EMC, HDFS is the mainstream
MapR’s strategy (and IBM’s along with it, regarding GPFS) has prompted debate and concern in the Hadoop community about commercial vendors forking the technology. As we’ve ranted previously, Hadoop’s growth will be tied, not only to megaplatform vendors that support it, but the third party tools and solutions ecosystem that grows around it.
For such a thing to happen, ISVs and consulting firms need to have a common target to write against, and having forked versions of Hadoop won’t exactly grow large partner communities.
Regarding EMC, the original strategy was two Greenplum Hadoop editions: a Community Edition with a free Apache distro and an Enterprise Edition that bundled MapR, both under the Greenplum HD branding umbrella. At first blush, it looked like EMC was going to earn the bulk of its money from the proprietary side of the Hadoop business.
This reflects emerging conventional wisdom that the enterprise mainstream is leery about lock-in to anything that smells proprietary for technology where they still are in the learning curve.
What’s significant is that the new announcement of Isilon support pertains on to the HDFS open source side. More to the point, EMC is rebranding and subtly repositioning its Greenplum Hadoop offerings: Greenplum HD is the Apache HDFS edition with the optional Isilon support, and Greenplum MR is the MapR version, which is niche targeted towards advanced Hadoop use cases that demand higher performance.
Coming atop recent announcements from Oracle and Microsoft that have come clearly out on the side of OEM’ing Apache rather than anything limited or proprietary, and this amounts to an unqualified endorsement of Apache Hadoop/HDFS as not only the formal, but also the de facto standard.
This reflects emerging conventional wisdom that the enterprise mainstream is leery about lock-in to anything that smells proprietary for technology where they still are in the learning curve. Other forks may emerge, but they will not be at the base file system layer. This leaves IBM and MapR pigeonholed – admittedly, there will be API compatibility, but clearly both are swimming upstream.
Central Storage is newest battleground
As noted earlier, Hadoop’s heritage has been the classic Internet data center scale-out model. The advantage is that, leveraging Hadoop’s highly linear scalability, organizations could easily expand their clusters quite easily by plucking more commodity server and disk. Pioneers or purists would scoff at the notion of an appliance approach because it was always simply scaling out inexpensive, commodity hardware, rather than paying premiums for big vendor boxes.
In blunt terms, the choice is whether you pay now or pay later. As mentioned before, do-it-yourself compute clusters require sweat equity – you need engineers who know how to design, deploy, and operate them. The flipside is that many, arguably most corporate IT organizations either lack the skills or the capital. There are various solutions to what might otherwise appear a Hobson’s Choice:
- Go to a cloud service provider that has already created the infrastructure, such as what Microsoft is offering with its Hadoop-on-Azure services;
- Look for a happy, simpler medium such as Amazon’s Elastic MapReduce on its DynamoDB service;
- Subscribe to SaaS providers that offer Hadoop applications (e.g., social network analysis, smart grid as a service) as a service;
Pioneers or purists would scoff at the notion of an appliance approach because it was always simply scaling out inexpensive, commodity hardware, rather than paying premiums for big vendor boxes.
- Get a platform and have a systems integrator put it together for you (key to IBM’s BigInsights offering, and applicable to any SI that has a Hadoop practice)
- Go to an appliance or engineered systems approach that puts Hadoop and/or its subsystems in a box, such as with Oracle Big Data Appliance or EMC’s Greenplum DCA. The systems engineering is mostly done for you, but the increments for growing the system can be much larger than simply adding a few x86 servers here or there (Greenplum HD DCA can scale in groups of 4 server modules). Entry or expansion costs are not necessarily cheap, but then again, you have to balance capital cost against labor.
- Surrounding Hadoop infrastructure with solutions. This is not a mutually exclusive strategy; unless you’re Cloudera or Hortonworks, which make their business bundling and supporting the core Apache Hadoop platform, most of the household names will bundle frameworks, algorithms, and eventually solutions that in effect place Hadoop under the hood. For EMC, the strategy is their recent announcement of a Unified Analytics Platform (UAP) that provides collaborative development capabilities for Big Data applications. EMC is (or will be) hardly alone here.
With EMC’s new offering, the scale-up option tackles the next variable: storage. This is the natural progression of a market that will address many constituencies, and where there will be no single silver bullet that applies to all.
This guest post comes courtesy of Tony Baer’s OnStrategies blog. Tony is a senior analyst at Ovum.
Published February 27, 2012 Reads 1,961
Copyright © 2012 Ulitzer, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
Related Stories
- IP Strategy Comes of Age: Best Strategy Book of 2009 Is About Intellectual Property
- Effective Drug Development Metrics Link Operations With Strategy, According to Tufts Center for the Study of Drug Development
- Wal-Mart Announces New Head of U.S. Business Strategy and Names New President and CEO of Walmart.com
More Stories By Tony Baer
Tony Baer, principal of OnStrategies, is a well-published IT analyst with over 15 years background in enterprise systems and manufacturing. A frequent speaker at IT conferences, Baer focuses on strategic technology utilization for the enterprise. He studies implementation issues in distributed data management, application development, data warehousing, and leading enterprise application areas including ERP, supply chain planning, and customer relationship management. As co-author of several books covering J2EE and .NET technologies, Baer is an authority on emerging platforms. Previously chief analyst for Computerwire’s Computer Finance, he is a leading authority on IT economics and cost-of-ownership issues.
- Study: Cloud Computing Becoming Pervasive
- Another Vote for the Apache Hadoop Stack
- Microsoft Teams up with Ariba on B2B eCommerce Front
- HP Announced Major Components & Details for Its Converged Cloud Strategy
- Fast Data Hits the Big Data Fast Lane
- How Can Companies Exploit Cloud's Promise But Also Retain Rigor & Control?
- How Networked Economy Benefits Spring from Improved Cloud Processes
- Strategic Approach to Disaster Recovery and Data Lifecycle Management
- Virtualization Simplifies Disaster Recovery
- HP Offers Products & Services to Deal with Mobile Computing & Social Media
- Top 10 Ways HP Is Different and Better When It Comes to Cloud Computing
- Proper Security and Protection Measures Enable Rapid Cloud Adoption
- Cloud Computing Exposes the Duality Between IT and Business Transformation
- Big Moves in Big Data: EMC's Hadoop Strategy
- Study: Cloud Computing Becoming Pervasive
- HP Offers First Batch of Servers in Proliant Gen8 Series
- Informatica's Stretch Goal
- Another Vote for the Apache Hadoop Stack
- Microsoft Teams up with Ariba on B2B eCommerce Front
- Enterprise Architecture and Enterprise Transformation
- HP Announced Major Components & Details for Its Converged Cloud Strategy
- The Open Trusted Technology Provider Standard (O-TTPS) Snapshot
- Fast Data Hits the Big Data Fast Lane
- App Store Technology Helping Businesses Manage Software
- Hurdles To Cloud Adoption Swirl Around Governance
- TIBCO Takes Social Software to Work
- EDS's David Gee on Cloud Computing
- A Peaceful Leap to Cloud Computing
- IT Looks to Open Trusted Technology Forum to Help Secure Supply Chains
- Oracle Faces Growing Price for MySQL
- Business Commerce Clouds
- Cloud Security Depends on the Human Element
- HP Enters Public Cloud Market
- Governance Makes Cloud Payoffs Possible
- SOA and Cloud Computing Builds Productivity
- Red Hat Introduces JBoss Enterprise SOA Platform 5.1















Ulitzer content is offered under Creative Commons "Attribution Non-Commercial No Derivatives" License.
For any reuse or distribution, you must make clear to others the license terms of this work.
The best way to do this is with a link to this web page.
Any of the above conditions can be waived if you get written permission from Ulitzer, Inc., the copyright holder.
Nothing in this license impairs or restricts the author's moral rights.