Cloud Computing: What should I do?

single white cloud on blue sky

“I know Cloud Computing is the ‘in-thing’, so what should I be doing in Cloud Computing?” This is a question that people often ask me when I tell them that I started a company aimed at Cloud education. The answer to this question is again a question, “Who are you?”  Lest you think I am being rude there, let me say that what I want to know from the person is the role that he / she is currently playing. The answer I give will depend on it.

Let me try and answer this for various roles in an Enterprise starting from the very top one. This is ofcourse a 10,000ft view and there will be lot of finer detail that I may not have covered. Yet I think this will give you a good idea of the direction you must take in case you want to work in the Cloud.

1. CIO:  It should be quite obvious that lot of CIOs would be under pressure to either move to the Cloud or atleast explore the possibility of moving to the Cloud. The CIO ofcourse has to look at the long term pan of the company and work out the economics. So if you are CIO you should be know about the various deployment models of the Cloud (Public, Private, Hybrid) and the various service models (IaaS, PaaS and SaaS). You should be able to understand the economics thoroughly (calculating the spend on Cloud is not as easy as you think.) Also the SLAs offered by each provider, their reputation for high uptime and security of your data need to taken into account. The call which the CIOs will generally need to make is whether they want a Cloud model or if they don’t want to manage infrastructure, should they go for Managed Services. To make this decision, they need to understand the difference between a Private Cloud and Managed Data Center. From what I hear and read both have their own advantages and disadvantages. A lot will depend on the applications that are being used by the Enterprise. Needless to say a long term vision about the Infrastructure and getting the best value for money would be the CIOs aim.

Architect : ‘Cloud Architect’ has different connotations depending on where you are working or would like to work. If you want to be a ‘Cloud Architect’ with a Cloud Provider you need to understand the nuts and bolts of how the Cloud is formed. A Cloud Architect in a Cloud Provider space sees less of the Cloud and more of the infrastructure. So while you need to have a clear idea of what Cloud means to the consumer, you must be well versed with Data Center technologies. You should be an expert in either Server Virtualization, Storage or Networking and should be able to understand how Software-Defined-Anything (Storage, Server, Networking) works. Additionally understand the newer technologies like Containers (Docker) which is being used in the Cloud context. Your job will be to Architect the infrastructure to ensure optimal and efficient use. So try and be a domain expert in one of the areas I listed above : Server, Storage or Networking. Depending on the Cloud Provider you may want to become an expert in OpenStack in case of IaaS providers and Cloud Foundry in case of PaaS providers.

If you are a ‘Cloud Architect’ in a company which wants to consume the Cloud, the expectations from you are different. You need to understand the infrastructure services provided by the Cloud Providers and plan for migrating your applications to the Cloud. If will need a good understanding of the services provided to perform efficient migration. You can also plan on developing some of your applications on the Cloud itself and for this too you need to have a clear idea of the services offered by the provider. For example, if you want to migrate your applications to Amazon AWS you should probably think of getting yourself Amazon certified, which will force you to read and understand all the offerings of Amazon.

Similarly if you are more interested in PaaS for your development team, you must understand the offerings from various PaaS vendors whether it Google App Eng, Microsoft Azure, Amazon Elastic Beanstalk, IBM BlueMix or anyone else. You need to understand the IDEs they provide, the language support they provide and how easy it is to deploy your application.

Project Managers / Technical Leads:  Understanding the deployment scenarios and service offerings of the Cloud Vendor is key. Understanding the economics and keeping a good grip on the money spent will be a major task. For like virtualization, here too we can have a sprawl due to the easy nature of provisioning a VM and the ease at which you can consume any service. So understanding how each service of the provide will put a drain on your exchequer is important. For the final control on the developers lies with the Project Managers and Technical Leads. Understanding the infrastructure and your own application well will let you realistically estimate how easy or difficult it is to migrate to the Cloud

Developers: Even before you understand the Cloud, ensure you can code well in one of the languages used for web services: Ruby, PHP, Python or Java. (You have Node.Js and GO as well but one of the four should suffice for now). Once you have mastered the language it will become easy for you to use any API and interact with the Cloud. You will be seeing all the services offered by the Cloud Provider from a programming perspective and once you understand the APIs you can develop lot of programs based on the Cloud. Overall understanding of Cloud Computing is required also with an inquisitive mind and a good grip on a programming language. Given that providers like Amazon AWS give you free tier for a year and the APIs are readily available, an enterprising programmer can develop her own application on the web in a very short time. If you are going to develop Enterprise class applications you need to understand the three-tier application model and some of the frameworks depending on the language you chose.

Administrators: Amongst different categories the Administrators have the most to learn. Again we have two types of Administrators here: One at the Cloud Providers premises and one at the Consumer premises. If you are the Consumer, then you will need to understand how to use the Management Console or CLI of the Cloud Provider. For example, if you apps are hosted on Amazon you should know how to use the Amazon Console and Amazon CLI to manage your apps in AWS. If it is a different provider then you must understand their UI and CLI. System Admin certification of Amazon AWS and CloudStack certification may be helpful

If you are an Administrator at a Cloud Provider, then as with the Architect, you need to be a specialist in Storage Administration, Network Administration or Server Administration. You must thoroughly understand the concept of Virtualization and how it is applied to Storage, Server and Networking.

Read about various server virtualization technologies like VMware, Xen, KVM, OVM, Virtual Box etc. if you are a Server Admin. Also understand the provisioning tools like Chef. Puppet, Ansible. Also understand container technology like Dockers and tools like Vagrant which allow you to launch VMs.

If you are a storage administrator understand how storage virtualization helps, understand replication and backup technologies. Storage is a very very important part of the Cloud and maintaining the SLAs wrt Storage is a major challenge. Understand what Scale Out Filesystem is and why they are useful in the Cloud Data Center.

My knowledge of networking is not great so I will refrain from giving lot of advice but surely do try and understand the concept of Software Defined Networking.

Hope this gives you an idea of what you must be concentrating on. Hopefully your journey to the Cloud will be smooth.



Amazon AWS: Seductiveness of ease of use

One of the important factors which affect people’s use of new technology is ease of use. Think iPhone, think Google. Think Amazon AWS.

I started using it Amazon AWS again recently and I am amazed at how easy it is to use. It is almost as if I have never stopped using it. The way you start an EC2 instance, the way you store objects in S3, the way you can host your static website on S3, everything is fairly easy to use. If there are any issues, the documentation ensures you get your doubts cleared soon. Ofcourse, you can understand all these easily if you have an idea of Amazon’s Infrastructure and you are conversant with the difference between Block Storage and Object Storage.

I had used EC2 and S3 earlier but this was the first time I was trying Elastic Bean Stalk and this too was easy to use. In Elastic Beanstalk Amazon deploys a large infrastructure for your application. Your application can run in a load balanced way with Amazon taking care of the load balancing part. It is supposed to scale the infrastructure whenever your application needs scaling. This is done automatically. Additionally your application’s health is monitored constantly. It supports Node.js, PHP, Python, Ruby, Java and .Net applications.

I chose PHP for my application and started Elastic Beanstalk. The setting up of infrastructure takes some time, a few minutes. Initially I let the Elastic Beanstalk deploy a sample PHP application. The application was started in the high available infrastructure and I could see the application run using a browser and pasting the link provided by Amazon. Once I checked this out, I wrote my own simple PHP application and asked Elastic Beanstalk to now deploy this application in place of the sample application. It took a few minutes and now the new application was deployed and I could see this application now running in my browser. The whole experience was very smooth.

Ease of use leads to more usage which in turn leads to familiarity which in turn leads us to explore more features of a system which in turn makes us be at ease with the product. Which means we are locked. Consider this: when I started CloudSiksha, I wanted to check if I can use OpenSource Office products. I did it give it a try for a month or more but the feature and the familiarity with MS Office was such that I had finally no choice but to buy a one year license of Office 365. I am not regretting it. I understand and appreciate that not every product can be easy to use but having that as a design criteria would definitely help in the long run. It may sound that I am probably stating a self evident truth but when you use some of the software, (which I shall not name), you wonder how the designers missed this simple self evident truth.

Other than the low cost, this ease of use is probably what makes people go to Amazon I guess. In the coming weeks I will doing more with Amazon and I will let you know how things go.



Ready to take off

Last week has been a busy one. First we got the website up. My friend Kavirajan designed the website and last Friday we went live. Do checkout the CloudSiksha website. (You can also ‘Like’ us on Facebook and LinkedIn. The social media links are given in the website.)

We also announced our first course, ‘Storage for Cloud’. This will be held in Bangalore on 20th and 21st of Dec. If you are interested in attending the course, do drop in a mail to

What I have observed is that there are few programs dedicated for senior engineers who want to grow into Architects. In many companies engineers learn by trial and error. There is no structured teaching available which enables the engineers to think about the big picture. You can become a good Storage Architect only if you understand what problems Storage needs to solve in the Enterprise. Storage faces stiffer challenges in the Cloud. I hope to address both the Enterprise Data Center Challenge and the Cloud Challenge in this program.

We sent out this flyer yesterday

Storage for Cloud Flyer

The count down has begun. Wish me luck as I embark on this journey.

In case you are looking out for a web designer, you can always contact my friend Kavirjan. His mail id is

Starting from square one again

After a brief stint with Oracle as a Cloud Architect, I hve decided to start on my own again. This will be the third start for me. I initially started Yagnavalky Center of Competency, which catered to the corporate competency development requirement in the areas of Storage and Linux. Later with my friend and colleague, Sarath Kodali, I founded Avanysis Data Storage Solutions. Our aim was to develop a Data Storage product, which would be cost effective for the SMB market but would have the features of an Enterprise storage product. We were able to develop a prototype but had to give up since we were not able to obtain the funding that we needed for such a product.

I am now starting a new company called CloudSiksha. The company will provide competency development services in the areas of Cloud Computing, Big Data and Data Storage. Programming language like Phython and Core Java, which are used extensively in Cloud and Big Data areas will also be taught. We will be staffed with industry veterans, who have extensive hands-on experience on these technologies.

Work on the website is in progress. You can visit regularly to check for any updates. Hoping to work hard to ensure CloudSiksha succeeds. Will need all your support and best wishes for that to happen.

You can reach me at :


Flash Point

All Flash Arrays (AFA) have been the flavor of the month for some time now with the Storage bloggers, especially after EMC announced the GA of its XtremIO based All Flash Array.

The blogging activity on this started even before the announcement with Richard Harris blogging about it. Before EMC made its announcement, Harris had this interestingly titled blog post : “XtremeLY late XtremIO launch next week” It is an interesting post with Harris discussing in detail about the challenges EMC faces in this area and also about the delay in EMC getting the product to the market.

EMC’s response came in the form of a long and informative post by Chad Sakac, ‘Virtual Geek’. In this detailed post “XtremeIO; Taking the time to do it right”, Chad explains some of the details of the XtremIO and why it took time for EMC to release the product.

From the end user side, the well respected Martin Glassborow, ‘Storagebod’ seemed underwhelmed and said that he ‘Xpect More..’ The post asks some very pertinent questions. Given that it comes from an end user, I am sure all the vendors are keenly listening.

With the All Flash Arrays coming in, the question that gets asked by everyone now is “What type of workloads require such performance?”. The FUD against AFA but those who don’t have one is based on this question.  The question is a genuine and a pertinent one but can always be twisted around to say that AFA is not needed in any case. Robin Harris takes on this question in his, “Ideal workload for enterprise arrays?” post. It had a good discussion in the comments section with Chad Sakac of EMC and NetApp employees weighing in. This lead Robin to do a followup “Best workload for enterprise arrays” post wherein he gave his response to the comments received in the earlier post.

Is AFA only about performance or should we also see the storage efficiency side of things. Vaughn Stewart, who had moved from NetApp to Pure Storage earlier, had a chart which spoke about both performance and storage efficiency of AFAs. He compared products from Pure Storage, Violin, EMC and IBM. Here is the chart.

Chris Evans felt that while Vaughn’s sheet was a good starting point, it did not compare all the vendors of Flash Arrays. So he set out to expand the list of vendors as well as the metrics being used for comparison. Here is the Expanded Comparison Chart.

Now that EMC has come out with its XtremIO array is that logical choice for the customer to buy given EMC’s background and size? No says Robin Harris and gives his take on what he calls the “Top 5 alternatives to XtremIO”

Vaughn Stewart feels that the adoption of Flash has been exceeding everyone’s expectations and that EMC’s entry would accelerate the adoption further. Here is his take on “All Flash Array: Market Clarity”

It must be said that whenever EMC enters the market with a new product there is no dearth of debate. It is the same this time around. Will this be the flash point which will accelerate market adoption of flash or whether this is a temporary flare up with the market slowly settling down between flash and spinning rust, only time will tell. I will probably bet on the latter.

Been a while

Yes, I have been off the blog scene for quite some time now.

In the meanwhile, along with my friend and former colleague Sarath started Avanisys, aimed at developing a storage product. We got it into a decent prototype stage but we were unable to proceed further, mainly due to financial considerations.

In the meanwhile I have also been part of a video transcoder company, where I was responsible for designing multiple things including the background daemon, monitoring daemon, a restart daemon and also wrote the SNMP Agent for the appliance. I also designed the Management GUI for the appliance and wrote the CLI part of the management application.

Lot of work accomplished. Time to move forward. Will provide you with more updates soon.

Storage Array Vendors: Consolidation Time?

I could have titled the article as “All Quiet on the Storage Vendor Front.” It has indeed been very quiet the past few months. The main reason according to me is that lot of consolidation happening on the products front. The battle lines have been clearly drawn. Each of the major vendors is preparing for the battle ahead, sharpening their weapons and adding more potent weapons to their armory.  This metaphor doesn’t hold in the strict sense of the word because in the market the battle never stops. So I should actually be saying that the foot soldiers are fighting it out in the field, the headquarters back home is developing those bazookas which will blow out their opposition and break down customer resistance.

What are the companies working on? The trends of a year or two back are now necessities of life. Snapshots, Thin Provisioning, Deduplication are taken for granted . I don’t think there is any secondary storage device which does not offer compression. And no major array vendoris without Thin Provisioning in his array.  Storage efficiency in form of Dedupe / Compression appears in primary storage as well. Usage of SSDs has percolated and all arrays have started providing SSD option either as top tier storage or as a high performing cache.

The preparation for future according to me is in sectors like Scale Out NAS, Integration with VMware and Cloud play. This is what most companies are doing. Given that cloud will need large amount of storage and virtualization, it is  easy to see why better storage performance with respect to VMware is needed. As the cloud grows, the storage has to scale. Scaling  horizontally through scale out solutions is preferred to vertical scaling. All major storage vendors have a scale out solution in place. The recent news was Hitachi acquiring BlueArc, a company specializing in Scale Out NAS. Hitachi and BlueArc used to work together earlier. EMC has Islion, NetApp has its own scale out solution, HP has IBRIX, IBM has SONAS and now Hitachi has BlueArc. (The news today was that Red Hat has bought Gluster for $136 million. As more news seeps in, we will know what Red Hat is planning to do with Gluster. )

Trying to join this group of senior storage vendors is Dell. The acquisition of EqualLogic has given them leadership in the iSCSI space. They have Exanet, which is Scalable NAS. They also bought Compellent (storage array) and Ocarina for Datadeduplication. Everyone is watching with interest the Dell strategy as they try making inroads into the Enterprise. In short now, the big players have their NAS, SAN, Unified storage and Scale Out solutions in place.

Integration with VMware is another  area where every vendor is concentrating on.  Performance of storage is a major issue of server virtualization. The CPUs do a good job in running VMs but when all these VMs are accessing the same array, performance gets impacted. This is because the hypervisor does a lot of activities related to storage. Hypervisor doing storage work is not an optimal solution since many of the arrays have the intelligence to perform these activities, like say, zeroing out free blocks.  VMware came up with a set of APIs (vAAI: VMware APIs for Array Integration) which will allow to offload some of the storage activities on to the array.  From what I understand, this will be achieved by the arrays supporting a set of SCSI-3 commands like block copy etc. While many arrays claim integration with VMware, you need to check if they are supporting these APIs. This is because VMware integration is claimed even if the array just supports only vMotion. Here is an article which tries to cut through the FUD with respect to VMWare integration. Read the Dot Hill article.

As Server virtualization makes inroads into the Enterprise, the performance of the storage array vis-a-vis VMware will become very important. (I keep mentioning VMware here because they are the dominant vendor in this space. This will apply to other hypervisors like Hyper-V, Xen etc, as well.) Similarly, performance of the array in a virtualized server environment and the ability of the arrays to scale out will be important considerations for the cloud. That’s why you see lot of effort going on from  array vendors and server virtualization vendors in ensuring that storage arrays  are closely integrated with server virtualization.

As they enter into a era of Server virtualization and Cloud, all the major players have the products they need to build good solutions for the Enterprises. One thing I notice is that almost all vendors have lot of different products in their portfolio. There is an ongoing effort going to consolidate the portfolio. It will interesting to observe how the vendors will use their products to build the best solution for the customer.

On a different note: If you Bangalore based, Storage and/or Linux kernel expert/ developer, I have some exciting startup opportunities for you. If interested, contact me at yagnavalky at gmail dot com.

Dealing with enormous data

I wasn’t aware of the company called ‘Greenplum’ until EMC bought it!! I became interested in it when analysts were mentioning that ‘Netezza’ would be bought by IBM to counter this movie. I was interested because I had a friend who worked in ‘Netezza’. So I wanted to find out what this whole thing was about. I checked with a friend, who knows stuff in this area. And this is what he replied. ” The key thing is Netezza, Teradata, Greenplum, Vertica are all designed from the ground up for data warehousing kind of workloads. Oracle and DB2 started as OLTP (Online Transaction Processing) systems and then they tried to do Datawarehousing also using the same server code. That does not work. Datawarehousing has a very different kind of characteristic. Loads are bulk loads. Insert / Update / Deletes are few and it is very Select heavy. All you do is analytics. The selects usually involve very complex queries often running into GBs in size, generated automatically by front end analytics tools. It touches massive amounts of data in the range of terabytes to petabytes. OLTP on the other hand has all of Select / Insert / Update / Delete. Typical example is air line reservation. The volume of data is not that big at all. ” That made sense. Later IBM bought Netezza and HP bought Vertica, another similar company.

So the whole thing was about how you searched for patterns and such in massive amounts of data. Unlike the OLTP data, where there is some data which is current and important, in the analytics scenario, all data is important. There is no irrelevant data as Jim McDonald says in his very nice blog post at XIOTech. This is a very nice post giving a good perspective the challenges faced when you have to access huge amount of data.  He talks about Big Data. I am not sure if there is a common agreement on what ‘Big Data’ means but this Wikibon article can be your starting point in understanding what Enterprise Big Data is all about.

As data grows at amazing speed, neither the processor nor the disk technology can keep up to that pace. So scaling up a product to meet the needs to data growth can only go so far. It is inevitable that data access happen in parallel if you want to deal with larger and larger data sets. The current product trends as well as acquisition trends show that all companies understand this problem and are responding to it. NetApp have come up with their clustered NAS in Data Ontap 8.0  This allows for aggregation of multiple nodes and uses a global namespace. (Looks like there is some confusion regarding the term global namespace since Isilon and SONAS have interpretations that are different from NetApp. You may want to read Martin Glasborrow’s (Storagebod) post which talks about this.)  The data sheet for Clustered Mode Data Ontap is available here. (pdf file)

While NetApp must have developed their our clustered mode Scale Out NAS based on their Bycast buy last year Spinnaker acquisition earlier (thanks to Dustin for pointing my error) , EMC went and bought Isilon, which again was a company dealing with ScaleOut NAS. Infact EMC paid $2.25b to get this company. So you can understand what EMC feels about the potential of Scale Out NAS.  HP in 2009 had acquired IBRIX, another company dealing with Scale Out NAS. IBM has its own Scale Out NAS, which is appropriately labeled, SONAS!!

All of these use a global namespace. What exactly is a global namespace and more importantly, what exactly is Scale Out NAS and how does it work? According to the SONAS datasheet:

-Access your data in a single global namespace allowing all users a single, logical view of files through a single drive letter such as a Z drive.
– Offers internal (SAS, Nearline SAS) and external (Tape) storage pools. Automated file placement and file migration based on policies. It can store and retrieve any file data in/out of any pool transparently and quickly without any administrator involvement.

Scale Out NAS technical details require an extensive writeup, which I will do in a future post. What is important is that all the main storage vendors have a Scale Out NAS solution in their portfolio.

An unexpected, for many, acquisition was that of LSI’s Engenio by NetApp. The reason for being surprised was that NetApp’s message all along has been that of Unified Storage and everyone thought that NetApp would only go with the Unified Storage way always. (Infact there have been blogs critical of NetApp, calling it an one product company. Now everyone was surprised and started asking, “Why are you getting more products. Your messaging will be lost).  LSI’s Engenio is a pure block play and people were interested in knowing why NetApp acquired Engenio and how it would affect their message. Dave Hitz, in his characteristic clear style replied to these concerns / accusations in his blog. In his blog post he says, “The observation is that, while many customers and workloads do require advanced data management, some need “big bandwidth” without the fancy features. For them, the best solution is a very fast RAID array with great price/performance. Perfect for Engenio! Two immediate opportunities are Full Motion Video (FMV) and Digital Video Surveillance (DVS), and over time we believe there will be more.” Here we see NetApp targeting a different type of workload and understanding that no fancy features like Snapshot etc are required here. All that is required here is bandwidth. In other words, all companies are now trying to get solutions which deal with different types of workloads. Hence you see pure block play, Datawarehousing solutions and Scale Out NAS.

So what is the moral all this rambling? Well, the moral is clear. You better start understanding how big data is being dealt with. That is the future if you are into Storage Infrastructure. Your concepts of RAID will not suffice as data will not be distributed across disks in one single array but may be striped across multiple arrays. The clustered storage solutions may become the de facto way of installing storage. And it may happen faster than you think. So go read up more about these technologies. It will help you in the long run.

Talk to me intelligently

When you teach, you clarify doubts of the participants. While you are clarifying the doubts, you start having your own doubts, which make you go deeper into the subject. And the best way to clear deep doubts is to read a good book on the related subject. Internet is good for some quick and dirty research but if you want to do some serious reading you better get hold of a good book.

My aim was to know more about the interaction between the code you write and the processor. Essentially I needed a book which explained to a software engineering student some of the electronics and computer organization stuff. With this in mind I was browsing the bookshelves in Landmark, Chennai when I chanced about a book titled, “Write Great Code: Vol.1 – Understanding the machine.” Written by Randall Hyde. I quickly flipped through and since it had most of what I was looking for, I bought it.

Generally you can bifurcate the technical books in two categories. One is what is written for the student of that subject. This could be a text book or something close to a text. Here it is taken for granted that the person reading has an idea of what he / she is getting into. The other category is technical books written for people who are not students of that subject, but would like to know more on that subject. It is in the second category that I have problems. Basically when a technical subject is being explained to a person who is not involved in that subject, the author assumes that the person reading is absolutely dumb!! It is almost as if saying that if you are not a student of this subject, you ought to be dumb!! I am OK with book like “.. for Dummies” series. Atleast they categorically state who their audience are.  Whereas many of the other books don’t state this assumption and can get on your nerves when you read them, for they start explaining to you at 2+2 is equal to 4. Well, not exactly that, but you get the drift, right?

So it was a pleasure to discover this book by Randall Hyde. As I said, this book is focused on a software engineer who wants to write high performance code. According to my friends, this is a breed which is slowly dying. One, because of project pressures people end up coding the fastest possible way and not the most efficient way. Two, with more and more languages giving you objects and high abstracted entities, your efficiency lies only in selecting the right templates / objects / whatever.  The book is focused on explaining to the reader about the underlying architecture of the machine and how your code can take advantage and become highly performant.

Starting from Binary Numbers, through Bit Operations, Character Representation, How Memory is Organized, the CPU Architecture, Instruction Set Architecture and Input / Output, Randall gives you a very nice view of the internals of a computer. A few things about this book impressed me. First, it talks to you intelligently. It assumes that you are a fairly intelligent person and not someone having an IQ of a caterpillar. Second, the writing style is very fluid. Third is the economy of words. It is possible to pack so much into a 400 page book because Randall doesn’t waste words. Reminds me of an Inorganic Chemistry text we had, authored by J.D.Lee, which had similar economy of words.

If you read the full book, you will end up understanding quite a bit of jargon which you have heard and probably used as well. Stuff like say Pipelining. You probably have a vague idea of what it means but this book makes it very clear. Similarly you would get a good idea about how memory is accessed, what are the instruction sets, what registers do what etc. It also has detailed chapters onI/O, Filesystems and Device Drivers.  It also tells you how compilers work. You can always say that these details are available in various text books and you would be right. But you will need to read a lot of textbooks to get all this knowledge. This is not a book which replaces the text book but rather a high level electronics view keeping the software programmer in mind. At end of each chapter, references to the relevant standard texts are given.

If you are someone interested in knowing the internals of a computer system to the extant of using that knowledge to you advantage while coding, this is the book for you. I would definitely recommend it to all computer science students. This is a very comprehensive book which talks to you intelligently and you will definitely benefit from it.

Recent interesting acquisitions in Storage Space

When there is great growth in an industry, you would expect the demand would to spur competition and we would expect the customer to have more choices and more vendors to procure from. I guess this works only upto a certain scale. Beyond which the opposite, consolidation of vendors,  happens. That is what I see happening in the Storage Industry  now. The demand for Storage is on the rise. Every company is showing wonderful results. Demands for newer technologies is also on the rise. In such a scenario, we are seeing lot of consolidation happening. So market growth leads to shrinking vendor base? I am sure there is some management theory explaining this phenomena. As to when consolidation happens in an industry etc.

These thoughts came to me when I look at the recent happenings in the Storage space. We saw Data Domain being bought by EMC last year. This year there two very major acquisitions. One was HP fighting off Dell in order to acquire 3Par Technologies. HP wanted an array like that of 3Par in their portfolio and went for it aggressively against Dell. It was a $2b + acquisition. 3Par has some nice technology and were quite well known for techniques like Thin Provisioning, Micro RAID, Wide Striping etc. There were getting noticed in the market and had a decent customer base. Everyone feels that this acquisition will help HP immensely in the Storage market.

The second acquisition which has a lot of people talking is that of EMC planning to acquire the Scale Out NAS vendor, Islion. This will also be a $2b + deal. From the comments I see, like HP with 3Par, this is also a buy to fill in a gap in EMC’s portfolio. The general opinion is that the current NAS product of EMC, Celerra, doesn’t scale up well and hence the need to buy a scale out NAS product. EMC was lacking a scale out NAS while the competition had their products. HP has both PolyServe and IBRIX. (Polyserve, btw, had lot of people from the erstwhile Sequent Computers and is based at Beaverton, Portland, Oregon. Some of whom I know), IBM has its Scale Out NAS (SONAS), NetApp has its own scale out product. So this product ensures EMC is also playing in this space.

The other interesting acquisition was IBM acquiring Storwize, a company involved in Primary Data Compression. Storwize had a compression appliance for NAS. This appliance would compress data before it was stored on disk. IBM after acquiring Storwize released a product called IBM Storwize v7000 Storage Array. The funny part was that this array had no Storwize technology in it!! It seems that IBM wants to brand its arrays as Storwize arrays and so only the name was used.

Other interesting acquisitions happened in the Database area. EMC acquired the company Greenplum, which is “massively parallel processing database platform” and IBM acquired the database company Netezza. Both these database companies were involved in building databases for high performance business analytics.

Most of these acquisitions happened keeping cloud in mind. Also on the back the mind of all traditional Storage companies is Oracle. Oracle now has Sun, Sun StorageTek, Virtual Iron and Exadata. And of course, they have their database. They do pose a serious threat in the Storage space. There was. for a brief, while a talk on whether they would acquire someone like NetApp to grow in the Storage space. You never know what will happen!!!

As I said in the beginning, while the Storage market is expanding, the vendor base is getting consolidated. Innovative startups and small companies with good track record are being gobbled up by the big players. So you eventually will end up with only the big guys in the fray.