Search This Blog

Friday, July 23, 2010

big data innovators – betting against the open source community

As you will realize (preconditioned on becoming regular followers of this post) I like BOLD…to be bold automatically sets expectations for fiery exchanges, passionate interactions and cocksure personalities…perfect recipe for a fantastic afternoon

I met one such BOLD company yesterday who proclaimed that the open source community will not invest in improving Hadoop fundamentals with added features critical for large scale enterprise deployment…as it is not of interest to the community

I have to give them credit that they made this BOLD statement based on some assumptions that seem to be ‘well researched’:

1. Hadoop MR/HDFS today has some gaps (around fault tolerance, resiliency, HW optimization, other…for details please contact
MapR)

2. Based on what was said at the yahoo
hadoop summit (very smart guys at yahoo doing this, but they do not care less if Hadoop did not go much beyond their offices…can’t accuse them of not being focused) said at the Hadoop Summit last month, yahoo (still by far the biggest committer to the open source Hadoop project), is not interested in plugging those gaps

3. The Community will only work on things that excite them, not problems that need fixing for Hadoop to build a Linux like cult with enterprises

4. Large companies will experience challenges in implementing Hadoop unless features that are ‘must haves’ for any enterprise deployment are built

and as you may have guessed by now, MapR is building those very features.

having tried to build from scratch one of the largest Hadoop clusters in the world, this pitch surely caught my attention. not to mention the company’s CEO and CTO have a history of building successful startups..and have hired some smart guys (tomar is a star)…

the company is still very new (1 year anniv this month) and plans to have a beta out end of the year..with a GA in Q1…and claim to have a bevy of ‘customers’ (i don’t get it…in my book a customer is one who pays and uses your product…everyone else is a prospect)

so while I am not quite ready to drink their koolaid…I am not willing to bet against them either…

as a wise man once told me…the line between BOLD and stupid is a thin one!

ps – shout-out to my friend
arif j at lightspeed for introducing me to these guys

just what SFO needed…a hip indian restaurant

forced to profile a company not remotely linked to big data… c’mon, I have other interests too!
it is refreshing to see a startup (a consumer market startup) being interested in both – (a) having paying customers as well as (b) catering to the booming market of adventure gamers in Scandinavia and Germany (BIG markets that I thought had death rates higher than birth rates ...but, I was told, love adventure gaming…and yes have credit cards…and long winters…I guess it does make sense)

the fact that one of the founders is an ex-consultant who understands ‘game changing’, ‘paradigm shifting’ and other similar sounding BS is a big reason for that focus.
i like people who tell it the way it is as they are rare…and so I wish him and
tell tale games the best…

given the sheer entertainment value he moves around with, he should be on every SFO must do list to invite for any potentially boring dinner conversation (consultants take note)… couched in that humor is the truth about consultants every consultant (current or ex) would like to say… but is forced not to as it would be heresy in their hickey freeman suits and hermes ties…

thank you for that refreshing dose of reality andre…now go fucking put some games on the iPhone!

as for the title of the post…you see this unfolded over a 3 hour dinner at
DOSA…a restaurant that is the way restaurants serving Indian food should be

Shout-outs:

praveen, rohit, andre – envy your zip codes…back in 4 weeks…let’s do it all over again

raj, vas – charlotte will forever be short an effing crazy French guy

Wednesday, July 21, 2010

big data evangelists - jeff hammerbacher

jeff hammerbacher...or hammer as he is called was one of the leaders that not only drove the design, build and implementation of facebooks data management platform off hadoop, but has since become one of the co-founders of the hottest hadoop startup on the planet - Cloudera.

jeff is not just a smart thinker on all things BIG DATA, but a visionary who wants to continue to push the platforms forward...and a key player in building up the Hadoop community.

not to mention as a speaker one of my favorites...given his tongue in cheek humor and casual elegance that belies the focus and effort he has put into this space.

big data innovators - tell me a story

Any company that can claim to have been started by an Oscar winner (correction…two time Oscar winner) is bound to have a good ‘story’.

Pat Hanrahan and Tableau qualify for that elite list.

Let me start by saying this, THERE IS NO OTHER BIG DATA VISUALIZATION TOOL IN THE MARKET THAT CAN DO WHAT TABLEAU DOES. NONE.

PS – if you know of one, let me know, I offer to test it for you.

The conditions –
A. you will have 5 minutes to install,
B. 15 minutes to recognize the data,
C. 2 hours to play with it with my team and
D. Then have to come back and let my team present live the results…on your tool.

Yes Tableau did that!

If you do not believe me, try it for yourself, Nothing else comes close.

And if you are on anything else, my best wishes!

big data innovators - flower power

Mayank Bawa is one of the smartest people in the valley. And THE smartest Big Data DB guy period. What he is building at Aster Data is not only the most ambitious blueprint I have seen presented but also game changing for the DB industry in its approach.

Mayank himself is an unassuming intellectual, who hides his keen business acumen behind a sharp technology instinct. And it is this combination - the ability to understand business problems and solve them with his proprietary technology - that is a USP for Aster.

Not to mention he has assembled around him a talented group of individuals and a professional sales team that is driven hard by his investors.

To be honest, as I started looking at this space, Aster was not an obvious pick as their go-to-market messaging was not very clear to me. Having spent time with Mayank, understood his vision, learnt from the problems they have solved and got the best answer to date to a 'big data' question I ask everybody in this space, they very quickly rose to the top of the list.

Their combination of at scale In-DB analytics with a unique SQL-MR query language is very compelling. What I would like to see more of is a tighter story on how it works with Hadoop as well as compatibility with BIG DATA visualization tools like Tableau.

Look beyond the marketing, Aster needs to be on the wish-list of every company that is looking at building out their Big Data capabilities.

big data innovators - new series

Starting a series to share my thoughts on companies I have spent time researching, working with or meeting in and around the space I call BIG DATA.

I want to HIGHLGIHT that what I will not do in this space is disclose any CONFIDENTIAL or PROPRIETARY information given most of my discussions are with private companies and startups, where that is all that they have to go on.

big data evangelists - roger magoulas

O’Reilly has a unique place in the valley …and Roger occupies that same spot for Big Data…’nuff said
Not to mention (as already stated in one of my posts)…the ONE who coined the term BIG DATA…that I so liberally license as my own!


big data evangelists - new series

thought leaders and vagabonds blazing a trail...we need to make sure we follow the crumbs!!

armageddon and cherries

WHY THE WORLD MELTED DOWN IN 2008?

here's my view....

I don’t trust people who claim to have all the answers. It is a lesson I have learnt the hard way. And realized there is actually no other way to learn it!

For anyone who can claim they can predict the future, we all know is a liar. Yet we built a monumental economic pile of crap based on those very ‘predictions’. Quants who claimed they could first package consumer debt obligations and then value them based on predicted cash flows, economists who predicted long-term growth based on myopic short term stimuli (alan the magician anyone), doctors who claim to have found miracle cures to diseases only to find out years later the true consequences of medicating it.

And let me be clear, I am neither a pessimist by nature nor defeatist in attitude. I do believe we can solve problems, real problems, and if and when we are able to do it, it will come with massive positive fore-bearings. I just think the interests of society at large have just not rewarded individuals more keen to solve problems rather than profit from it.

Banking in general and stock & debt markets in particular did start with a higher social obligation at their core –facilitate the exchange of goods & services and fuel growth with limited resources at a risk adjusted cost. Albeit simplified, but true (remember what I said about myself).

Yet somewhere down the line, that social purpose was efficiently hidden by the best and brightest minds (the brightness can now be questioned) that manufactured profit at the cost of social functions they were really meant to play. Not to say it was necessarily evil by itself. Michael Lewis does an excellent job unraveling it all in 2 tomes which if you have not read you HAVE to read…or you will miss the gravity of what has been built on wall street (LIARS POKER & THE BIG SHORT).

At the heart of it, I argue, was a gaping hole. It was the lack of interest in data that financial institutions have forever generated but never really valued (and should all be asked to compulsorily spend time understanding how GOOGLE singlehandedly developed into a core and unmatched asset). Or better said, valued to the extent it suited their purpose. This was shocking given the fact that the data did exist. No one in the industry, then or now, had the ability to collect, store, match and produce it all quickly.

This drove the creation of an industry built on the premise that complicated models can and do mask the relative sparseness of data accessible to them (the limits a result of lack of interest in managing BIG data or investing in it). And in a 15 year boom cycle tracing its roots to the early 90’s (with relatively minor corrections along the way), such assumptions never got tested. Were infact rewarded on the street, in a mad rush to create the most complex models that could be built to create synthetic products that neither fulfilled any social needs nor passed common sense filters.

It can be argued that financial innovation outpaced technological developments, but in my mind that raises more questions than provide answers. For known as they are to be the pioneers of new technology, why did Financial Institutions not drive the development of data management platforms akin to what Google built (and changed the world forever in its wake) given that the efficient management and distribution of data (information) is its very foundation?

It was not to be. What happened is now the subject of many books, documentaries, white papers and popular literature. And none of it needs to be retold.

What I want to share is that this realization was a career-changing one for me. One that I am committed to trying to solve. .

boys and their elephants

That a 3 year old will christen the biggest thing to have come out of the valley in the last 5 years is a fantastic story (that I will not do justice to…and hence will not repeat…but if you get to meet Doug Cutting…ask him…he is proud of it and you can see why).

The fact that it will be the biggest thing in the next 50, makes the genesis of it even more remarkable.

Hadoop has FOREVER, wrested the award for the most creative name for any Apache project (or non-apache project). The fact that this data crusher is open source, used by the biggest web properties not called Google, likes running on cheap commodity hardware and actually delivers the promise of being able to crank a $100TB for commercial use makes it a force of nature to be dismissed at your own peril.

There has been so much written about all things HADOOP, all I will do here is point you to the right resources to go figure it out yourself -
click here. I will say that having deployed a life-sized Hadoop cluster, everything you have heard about it doing, it can do.
Boys will always be boys, and for now the special bond they had with the cuddly elephants (as a kid…hopefully) has been immortalized …and if you missed having that plush toy…never to late to
get one….they are cute!!

how google changed the world in the winters of 2003/04

There’s much written about the most storied company to have been created in the last decade. The only thing I will add here are these links – OPEN SOURCE GFS PAPER and MR paper…that changed the world of BIG DATA forever…

PS – for everyone who decided to spend time to read my posts, you will know exactly why….for those who accidentally came across this…if you really buy into my stuff…you will find out

RIP

too much data


why BIG DATA is BIG?

The last few months have been an interesting expedition for me. Having been in the midst of a lot of intelligence is humbling…at the same time very enlightening. The fact that the company has been more about building ideas (west coast) than profiting from them (east coast) has been a calming balm for jaded nerves. The fact that I am part of that very ‘profiting’ juggernaut is a perspective that has proven to be invaluable.

There is a palpable excitement in the valley that is hard not to spot every-time each of my 5 trips has taken me there this year. And it all centers around this monster theme of ‘data’…or as it is unpretentiously referred to – BIG DATA (a term I think I can give credit to my friend
Roger Magoulas at O’Reilly…and Roger should claim it as well).

Every BIG idea, BIG business model, BIG pitch, BIG money, BIG solution is driven by BIG DATA.

The fact that I believe that the DATA REVOLUTION is THE biggest economic engine since the INDUSTRIAL REVOLUTION might explain my not-so-visible enthusiasm for it.

And what exactly do I see in it:

1. Data in the world is exploding…has been exponentially increasing the last few years and is now mutating at an uncontrollable pace (see graphic - too much data)

2. It will continue to gather pace. As sensors that feel, monitor, track and record data continue to explode (GPS enabled cellphone, computer, email device, health monitors, energy meters, credit/debit cards, browser, social networks, security cameras, satellites, tracking chips in dogs…)…so will the data they generate.

3. Almost EVERY single socio-economic entity (companies, governments, individuals) has the same exploding phenomenon and doesn’t really know (yet) what to do with it all.

4. Technology is finally democratizing the access to and processing of this massive amount of data. Or as I like to call it, it is finally time to put the Information back in Information Technology. Simple, scalable and cheap hardware software platforms are being rapidly created, deployed and improved upon.

5. And is debunking the myth that not all data is useful (data trash is history…finally). Never again should an IT manager be allowed to ask the question “can I delete historical data’ much less “archive this data to tape”.

6. A new breed of talent is being bred into the workforce with an understanding of what can be done with this explosive combo of data and data technology, creating the sexiest new jobs on the planet. I refer to them as data quants (others call them data scientists) and they are destined to become the vikings of the data kingdom.

7. And finally all this is bringing together the potent mix of problems needing answers (and we have no dearth of problems), data that can lock onto the solutions, technologies that hold the keys to that data and talent that can bring it all to life. QED

And (I hope) you can see why I am jumping in my plane seat as I write this…(thank you gogo wireless…although you could have picked better than a pornstar alias as the brand-name for the in-air wireless service)