Amazon AWS: Seductiveness of ease of use

One of the important factors which affect people’s use of new technology is ease of use. Think iPhone, think Google. Think Amazon AWS.

I started using it Amazon AWS again recently and I am amazed at how easy it is to use. It is almost as if I have never stopped using it. The way you start an EC2 instance, the way you store objects in S3, the way you can host your static website on S3, everything is fairly easy to use. If there are any issues, the documentation ensures you get your doubts cleared soon. Ofcourse, you can understand all these easily if you have an idea of Amazon’s Infrastructure and you are conversant with the difference between Block Storage and Object Storage.

I had used EC2 and S3 earlier but this was the first time I was trying Elastic Bean Stalk and this too was easy to use. In Elastic Beanstalk Amazon deploys a large infrastructure for your application. Your application can run in a load balanced way with Amazon taking care of the load balancing part. It is supposed to scale the infrastructure whenever your application needs scaling. This is done automatically. Additionally your application’s health is monitored constantly. It supports Node.js, PHP, Python, Ruby, Java and .Net applications.

I chose PHP for my application and started Elastic Beanstalk. The setting up of infrastructure takes some time, a few minutes. Initially I let the Elastic Beanstalk deploy a sample PHP application. The application was started in the high available infrastructure and I could see the application run using a browser and pasting the link provided by Amazon. Once I checked this out, I wrote my own simple PHP application and asked Elastic Beanstalk to now deploy this application in place of the sample application. It took a few minutes and now the new application was deployed and I could see this application now running in my browser. The whole experience was very smooth.

Ease of use leads to more usage which in turn leads to familiarity which in turn leads us to explore more features of a system which in turn makes us be at ease with the product. Which means we are locked. Consider this: when I started CloudSiksha, I wanted to check if I can use OpenSource Office products. I did it give it a try for a month or more but the feature and the familiarity with MS Office was such that I had finally no choice but to buy a one year license of Office 365. I am not regretting it. I understand and appreciate that not every product can be easy to use but having that as a design criteria would definitely help in the long run. It may sound that I am probably stating a self evident truth but when you use some of the software, (which I shall not name), you wonder how the designers missed this simple self evident truth.

Other than the low cost, this ease of use is probably what makes people go to Amazon I guess. In the coming weeks I will doing more with Amazon and I will let you know how things go.

 

 

Ready to take off

Last week has been a busy one. First we got the website up. My friend Kavirajan designed the website and last Friday we went live. Do checkout the CloudSiksha website. (You can also ‘Like’ us on Facebook and LinkedIn. The social media links are given in the website.)

We also announced our first course, ‘Storage for Cloud’. This will be held in Bangalore on 20th and 21st of Dec. If you are interested in attending the course, do drop in a mail to enquiry@cloudsiksha.com

What I have observed is that there are few programs dedicated for senior engineers who want to grow into Architects. In many companies engineers learn by trial and error. There is no structured teaching available which enables the engineers to think about the big picture. You can become a good Storage Architect only if you understand what problems Storage needs to solve in the Enterprise. Storage faces stiffer challenges in the Cloud. I hope to address both the Enterprise Data Center Challenge and the Cloud Challenge in this program.

We sent out this flyer yesterday

Storage for Cloud Flyer

The count down has begun. Wish me luck as I embark on this journey.

In case you are looking out for a web designer, you can always contact my friend Kavirjan. His mail id is kavirajanr@gmail.com

Starting from square one again

After a brief stint with Oracle as a Cloud Architect, I hve decided to start on my own again. This will be the third start for me. I initially started Yagnavalky Center of Competency, which catered to the corporate competency development requirement in the areas of Storage and Linux. Later with my friend and colleague, Sarath Kodali, I founded Avanysis Data Storage Solutions. Our aim was to develop a Data Storage product, which would be cost effective for the SMB market but would have the features of an Enterprise storage product. We were able to develop a prototype but had to give up since we were not able to obtain the funding that we needed for such a product.

I am now starting a new company called CloudSiksha. The company will provide competency development services in the areas of Cloud Computing, Big Data and Data Storage. Programming language like Phython and Core Java, which are used extensively in Cloud and Big Data areas will also be taught. We will be staffed with industry veterans, who have extensive hands-on experience on these technologies.

Work on the website is in progress. You can visit http://www.cloudsiksha.com regularly to check for any updates. Hoping to work hard to ensure CloudSiksha succeeds. Will need all your support and best wishes for that to happen.

You can reach me at : suresh@cloudsiksha.com

 

Flash Point

All Flash Arrays (AFA) have been the flavor of the month for some time now with the Storage bloggers, especially after EMC announced the GA of its XtremIO based All Flash Array.

The blogging activity on this started even before the announcement with Richard Harris blogging about it. Before EMC made its announcement, Harris had this interestingly titled blog post : “XtremeLY late XtremIO launch next week” It is an interesting post with Harris discussing in detail about the challenges EMC faces in this area and also about the delay in EMC getting the product to the market.

EMC’s response came in the form of a long and informative post by Chad Sakac, ‘Virtual Geek’. In this detailed post “XtremeIO; Taking the time to do it right”, Chad explains some of the details of the XtremIO and why it took time for EMC to release the product.

From the end user side, the well respected Martin Glassborow, ‘Storagebod’ seemed underwhelmed and said that he ‘Xpect More..’ The post asks some very pertinent questions. Given that it comes from an end user, I am sure all the vendors are keenly listening.

With the All Flash Arrays coming in, the question that gets asked by everyone now is “What type of workloads require such performance?”. The FUD against AFA but those who don’t have one is based on this question.  The question is a genuine and a pertinent one but can always be twisted around to say that AFA is not needed in any case. Robin Harris takes on this question in his, “Ideal workload for enterprise arrays?” post. It had a good discussion in the comments section with Chad Sakac of EMC and NetApp employees weighing in. This lead Robin to do a followup “Best workload for enterprise arrays” post wherein he gave his response to the comments received in the earlier post.

Is AFA only about performance or should we also see the storage efficiency side of things. Vaughn Stewart, who had moved from NetApp to Pure Storage earlier, had a chart which spoke about both performance and storage efficiency of AFAs. He compared products from Pure Storage, Violin, EMC and IBM. Here is the chart.

Chris Evans felt that while Vaughn’s sheet was a good starting point, it did not compare all the vendors of Flash Arrays. So he set out to expand the list of vendors as well as the metrics being used for comparison. Here is the Expanded Comparison Chart.

Now that EMC has come out with its XtremIO array is that logical choice for the customer to buy given EMC’s background and size? No says Robin Harris and gives his take on what he calls the “Top 5 alternatives to XtremIO”

Vaughn Stewart feels that the adoption of Flash has been exceeding everyone’s expectations and that EMC’s entry would accelerate the adoption further. Here is his take on “All Flash Array: Market Clarity”

It must be said that whenever EMC enters the market with a new product there is no dearth of debate. It is the same this time around. Will this be the flash point which will accelerate market adoption of flash or whether this is a temporary flare up with the market slowly settling down between flash and spinning rust, only time will tell. I will probably bet on the latter.

Been a while

Yes, I have been off the blog scene for quite some time now.

In the meanwhile, along with my friend and former colleague Sarath started Avanisys, aimed at developing a storage product. We got it into a decent prototype stage but we were unable to proceed further, mainly due to financial considerations.

In the meanwhile I have also been part of a video transcoder company, where I was responsible for designing multiple things including the background daemon, monitoring daemon, a restart daemon and also wrote the SNMP Agent for the appliance. I also designed the Management GUI for the appliance and wrote the CLI part of the management application.

Lot of work accomplished. Time to move forward. Will provide you with more updates soon.

Storage Array Vendors: Consolidation Time?

I could have titled the article as “All Quiet on the Storage Vendor Front.” It has indeed been very quiet the past few months. The main reason according to me is that lot of consolidation happening on the products front. The battle lines have been clearly drawn. Each of the major vendors is preparing for the battle ahead, sharpening their weapons and adding more potent weapons to their armory.  This metaphor doesn’t hold in the strict sense of the word because in the market the battle never stops. So I should actually be saying that the foot soldiers are fighting it out in the field, the headquarters back home is developing those bazookas which will blow out their opposition and break down customer resistance.

What are the companies working on? The trends of a year or two back are now necessities of life. Snapshots, Thin Provisioning, Deduplication are taken for granted . I don’t think there is any secondary storage device which does not offer compression. And no major array vendoris without Thin Provisioning in his array.  Storage efficiency in form of Dedupe / Compression appears in primary storage as well. Usage of SSDs has percolated and all arrays have started providing SSD option either as top tier storage or as a high performing cache.

The preparation for future according to me is in sectors like Scale Out NAS, Integration with VMware and Cloud play. This is what most companies are doing. Given that cloud will need large amount of storage and virtualization, it is  easy to see why better storage performance with respect to VMware is needed. As the cloud grows, the storage has to scale. Scaling  horizontally through scale out solutions is preferred to vertical scaling. All major storage vendors have a scale out solution in place. The recent news was Hitachi acquiring BlueArc, a company specializing in Scale Out NAS. Hitachi and BlueArc used to work together earlier. EMC has Islion, NetApp has its own scale out solution, HP has IBRIX, IBM has SONAS and now Hitachi has BlueArc. (The news today was that Red Hat has bought Gluster for $136 million. As more news seeps in, we will know what Red Hat is planning to do with Gluster. )

Trying to join this group of senior storage vendors is Dell. The acquisition of EqualLogic has given them leadership in the iSCSI space. They have Exanet, which is Scalable NAS. They also bought Compellent (storage array) and Ocarina for Datadeduplication. Everyone is watching with interest the Dell strategy as they try making inroads into the Enterprise. In short now, the big players have their NAS, SAN, Unified storage and Scale Out solutions in place.

Integration with VMware is another  area where every vendor is concentrating on.  Performance of storage is a major issue of server virtualization. The CPUs do a good job in running VMs but when all these VMs are accessing the same array, performance gets impacted. This is because the hypervisor does a lot of activities related to storage. Hypervisor doing storage work is not an optimal solution since many of the arrays have the intelligence to perform these activities, like say, zeroing out free blocks.  VMware came up with a set of APIs (vAAI: VMware APIs for Array Integration) which will allow to offload some of the storage activities on to the array.  From what I understand, this will be achieved by the arrays supporting a set of SCSI-3 commands like block copy etc. While many arrays claim integration with VMware, you need to check if they are supporting these APIs. This is because VMware integration is claimed even if the array just supports only vMotion. Here is an article which tries to cut through the FUD with respect to VMWare integration. Read the Dot Hill article.

As Server virtualization makes inroads into the Enterprise, the performance of the storage array vis-a-vis VMware will become very important. (I keep mentioning VMware here because they are the dominant vendor in this space. This will apply to other hypervisors like Hyper-V, Xen etc, as well.) Similarly, performance of the array in a virtualized server environment and the ability of the arrays to scale out will be important considerations for the cloud. That’s why you see lot of effort going on from  array vendors and server virtualization vendors in ensuring that storage arrays  are closely integrated with server virtualization.

As they enter into a era of Server virtualization and Cloud, all the major players have the products they need to build good solutions for the Enterprises. One thing I notice is that almost all vendors have lot of different products in their portfolio. There is an ongoing effort going to consolidate the portfolio. It will interesting to observe how the vendors will use their products to build the best solution for the customer.

On a different note: If you Bangalore based, Storage and/or Linux kernel expert/ developer, I have some exciting startup opportunities for you. If interested, contact me at yagnavalky at gmail dot com.

Dealing with enormous data

I wasn’t aware of the company called ‘Greenplum’ until EMC bought it!! I became interested in it when analysts were mentioning that ‘Netezza’ would be bought by IBM to counter this movie. I was interested because I had a friend who worked in ‘Netezza’. So I wanted to find out what this whole thing was about. I checked with a friend, who knows stuff in this area. And this is what he replied. ” The key thing is Netezza, Teradata, Greenplum, Vertica are all designed from the ground up for data warehousing kind of workloads. Oracle and DB2 started as OLTP (Online Transaction Processing) systems and then they tried to do Datawarehousing also using the same server code. That does not work. Datawarehousing has a very different kind of characteristic. Loads are bulk loads. Insert / Update / Deletes are few and it is very Select heavy. All you do is analytics. The selects usually involve very complex queries often running into GBs in size, generated automatically by front end analytics tools. It touches massive amounts of data in the range of terabytes to petabytes. OLTP on the other hand has all of Select / Insert / Update / Delete. Typical example is air line reservation. The volume of data is not that big at all. ” That made sense. Later IBM bought Netezza and HP bought Vertica, another similar company.

So the whole thing was about how you searched for patterns and such in massive amounts of data. Unlike the OLTP data, where there is some data which is current and important, in the analytics scenario, all data is important. There is no irrelevant data as Jim McDonald says in his very nice blog post at XIOTech. This is a very nice post giving a good perspective the challenges faced when you have to access huge amount of data.  He talks about Big Data. I am not sure if there is a common agreement on what ‘Big Data’ means but this Wikibon article can be your starting point in understanding what Enterprise Big Data is all about.

As data grows at amazing speed, neither the processor nor the disk technology can keep up to that pace. So scaling up a product to meet the needs to data growth can only go so far. It is inevitable that data access happen in parallel if you want to deal with larger and larger data sets. The current product trends as well as acquisition trends show that all companies understand this problem and are responding to it. NetApp have come up with their clustered NAS in Data Ontap 8.0  This allows for aggregation of multiple nodes and uses a global namespace. (Looks like there is some confusion regarding the term global namespace since Isilon and SONAS have interpretations that are different from NetApp. You may want to read Martin Glasborrow’s (Storagebod) post which talks about this.)  The data sheet for Clustered Mode Data Ontap is available here. (pdf file)

While NetApp must have developed their our clustered mode Scale Out NAS based on their Bycast buy last year Spinnaker acquisition earlier (thanks to Dustin for pointing my error) , EMC went and bought Isilon, which again was a company dealing with ScaleOut NAS. Infact EMC paid $2.25b to get this company. So you can understand what EMC feels about the potential of Scale Out NAS.  HP in 2009 had acquired IBRIX, another company dealing with Scale Out NAS. IBM has its own Scale Out NAS, which is appropriately labeled, SONAS!!

All of these use a global namespace. What exactly is a global namespace and more importantly, what exactly is Scale Out NAS and how does it work? According to the SONAS datasheet:

-Access your data in a single global namespace allowing all users a single, logical view of files through a single drive letter such as a Z drive.
– Offers internal (SAS, Nearline SAS) and external (Tape) storage pools. Automated file placement and file migration based on policies. It can store and retrieve any file data in/out of any pool transparently and quickly without any administrator involvement.

Scale Out NAS technical details require an extensive writeup, which I will do in a future post. What is important is that all the main storage vendors have a Scale Out NAS solution in their portfolio.

An unexpected, for many, acquisition was that of LSI’s Engenio by NetApp. The reason for being surprised was that NetApp’s message all along has been that of Unified Storage and everyone thought that NetApp would only go with the Unified Storage way always. (Infact there have been blogs critical of NetApp, calling it an one product company. Now everyone was surprised and started asking, “Why are you getting more products. Your messaging will be lost).  LSI’s Engenio is a pure block play and people were interested in knowing why NetApp acquired Engenio and how it would affect their message. Dave Hitz, in his characteristic clear style replied to these concerns / accusations in his blog. In his blog post he says, “The observation is that, while many customers and workloads do require advanced data management, some need “big bandwidth” without the fancy features. For them, the best solution is a very fast RAID array with great price/performance. Perfect for Engenio! Two immediate opportunities are Full Motion Video (FMV) and Digital Video Surveillance (DVS), and over time we believe there will be more.” Here we see NetApp targeting a different type of workload and understanding that no fancy features like Snapshot etc are required here. All that is required here is bandwidth. In other words, all companies are now trying to get solutions which deal with different types of workloads. Hence you see pure block play, Datawarehousing solutions and Scale Out NAS.

So what is the moral all this rambling? Well, the moral is clear. You better start understanding how big data is being dealt with. That is the future if you are into Storage Infrastructure. Your concepts of RAID will not suffice as data will not be distributed across disks in one single array but may be striped across multiple arrays. The clustered storage solutions may become the de facto way of installing storage. And it may happen faster than you think. So go read up more about these technologies. It will help you in the long run.

Talk to me intelligently

When you teach, you clarify doubts of the participants. While you are clarifying the doubts, you start having your own doubts, which make you go deeper into the subject. And the best way to clear deep doubts is to read a good book on the related subject. Internet is good for some quick and dirty research but if you want to do some serious reading you better get hold of a good book.

My aim was to know more about the interaction between the code you write and the processor. Essentially I needed a book which explained to a software engineering student some of the electronics and computer organization stuff. With this in mind I was browsing the bookshelves in Landmark, Chennai when I chanced about a book titled, “Write Great Code: Vol.1 – Understanding the machine.” Written by Randall Hyde. I quickly flipped through and since it had most of what I was looking for, I bought it.

Generally you can bifurcate the technical books in two categories. One is what is written for the student of that subject. This could be a text book or something close to a text. Here it is taken for granted that the person reading has an idea of what he / she is getting into. The other category is technical books written for people who are not students of that subject, but would like to know more on that subject. It is in the second category that I have problems. Basically when a technical subject is being explained to a person who is not involved in that subject, the author assumes that the person reading is absolutely dumb!! It is almost as if saying that if you are not a student of this subject, you ought to be dumb!! I am OK with book like “.. for Dummies” series. Atleast they categorically state who their audience are.  Whereas many of the other books don’t state this assumption and can get on your nerves when you read them, for they start explaining to you at 2+2 is equal to 4. Well, not exactly that, but you get the drift, right?

So it was a pleasure to discover this book by Randall Hyde. As I said, this book is focused on a software engineer who wants to write high performance code. According to my friends, this is a breed which is slowly dying. One, because of project pressures people end up coding the fastest possible way and not the most efficient way. Two, with more and more languages giving you objects and high abstracted entities, your efficiency lies only in selecting the right templates / objects / whatever.  The book is focused on explaining to the reader about the underlying architecture of the machine and how your code can take advantage and become highly performant.

Starting from Binary Numbers, through Bit Operations, Character Representation, How Memory is Organized, the CPU Architecture, Instruction Set Architecture and Input / Output, Randall gives you a very nice view of the internals of a computer. A few things about this book impressed me. First, it talks to you intelligently. It assumes that you are a fairly intelligent person and not someone having an IQ of a caterpillar. Second, the writing style is very fluid. Third is the economy of words. It is possible to pack so much into a 400 page book because Randall doesn’t waste words. Reminds me of an Inorganic Chemistry text we had, authored by J.D.Lee, which had similar economy of words.

If you read the full book, you will end up understanding quite a bit of jargon which you have heard and probably used as well. Stuff like say Pipelining. You probably have a vague idea of what it means but this book makes it very clear. Similarly you would get a good idea about how memory is accessed, what are the instruction sets, what registers do what etc. It also has detailed chapters onI/O, Filesystems and Device Drivers.  It also tells you how compilers work. You can always say that these details are available in various text books and you would be right. But you will need to read a lot of textbooks to get all this knowledge. This is not a book which replaces the text book but rather a high level electronics view keeping the software programmer in mind. At end of each chapter, references to the relevant standard texts are given.

If you are someone interested in knowing the internals of a computer system to the extant of using that knowledge to you advantage while coding, this is the book for you. I would definitely recommend it to all computer science students. This is a very comprehensive book which talks to you intelligently and you will definitely benefit from it.

Recent interesting acquisitions in Storage Space

When there is great growth in an industry, you would expect the demand would to spur competition and we would expect the customer to have more choices and more vendors to procure from. I guess this works only upto a certain scale. Beyond which the opposite, consolidation of vendors,  happens. That is what I see happening in the Storage Industry  now. The demand for Storage is on the rise. Every company is showing wonderful results. Demands for newer technologies is also on the rise. In such a scenario, we are seeing lot of consolidation happening. So market growth leads to shrinking vendor base? I am sure there is some management theory explaining this phenomena. As to when consolidation happens in an industry etc.

These thoughts came to me when I look at the recent happenings in the Storage space. We saw Data Domain being bought by EMC last year. This year there two very major acquisitions. One was HP fighting off Dell in order to acquire 3Par Technologies. HP wanted an array like that of 3Par in their portfolio and went for it aggressively against Dell. It was a $2b + acquisition. 3Par has some nice technology and were quite well known for techniques like Thin Provisioning, Micro RAID, Wide Striping etc. There were getting noticed in the market and had a decent customer base. Everyone feels that this acquisition will help HP immensely in the Storage market.

The second acquisition which has a lot of people talking is that of EMC planning to acquire the Scale Out NAS vendor, Islion. This will also be a $2b + deal. From the comments I see, like HP with 3Par, this is also a buy to fill in a gap in EMC’s portfolio. The general opinion is that the current NAS product of EMC, Celerra, doesn’t scale up well and hence the need to buy a scale out NAS product. EMC was lacking a scale out NAS while the competition had their products. HP has both PolyServe and IBRIX. (Polyserve, btw, had lot of people from the erstwhile Sequent Computers and is based at Beaverton, Portland, Oregon. Some of whom I know), IBM has its Scale Out NAS (SONAS), NetApp has its own scale out product. So this product ensures EMC is also playing in this space.

The other interesting acquisition was IBM acquiring Storwize, a company involved in Primary Data Compression. Storwize had a compression appliance for NAS. This appliance would compress data before it was stored on disk. IBM after acquiring Storwize released a product called IBM Storwize v7000 Storage Array. The funny part was that this array had no Storwize technology in it!! It seems that IBM wants to brand its arrays as Storwize arrays and so only the name was used.

Other interesting acquisitions happened in the Database area. EMC acquired the company Greenplum, which is “massively parallel processing database platform” and IBM acquired the database company Netezza. Both these database companies were involved in building databases for high performance business analytics.

Most of these acquisitions happened keeping cloud in mind. Also on the back the mind of all traditional Storage companies is Oracle. Oracle now has Sun, Sun StorageTek, Virtual Iron and Exadata. And of course, they have their database. They do pose a serious threat in the Storage space. There was. for a brief, while a talk on whether they would acquire someone like NetApp to grow in the Storage space. You never know what will happen!!!

As I said in the beginning, while the Storage market is expanding, the vendor base is getting consolidated. Innovative startups and small companies with good track record are being gobbled up by the big players. So you eventually will end up with only the big guys in the fray.

Storage Books: An Indian Perspective

One question I regularly get asked by my students after I do my Storage 101 sessions is regarding the books they should be reading up to get more details about Storage Technologies. I thought it will be a good idea to write about the Storage books that are currently available in India and my impressions on them. This will help in two ways. People can get a detailed list of books available and they also get to know what each of these books could be helpful for them.

I would divide the people who attend my sessions in these categories: a) Engineers who will be involved in developmental or maintenance activities b) Engineers who will be involved in testing activities c) Storage / Systems / Network Administrators d) Those involved in system integration. Since each book caters to all these categories in some way or the other, I will try and make clear, which book is better suited for whom. The listing of the books is in no particular order.

Let’s start with the central premise on which learning is based. You can learn only if you know that you do not know!!! Hence the first book we will take up is:

1. “Storage Area Network for Dummies” by Chirstopher Poelker and Alex Nikitin. Publisher: Wiley India.

As the name indicates this book is about Storage Area Networks (SAN). I would recommend this book to all those who are just out of college and want to know what SAN is all about. This book has lot of implementation details, which is very useful for the Storage Administrators and the System Integrators. Lot of hardware stuff including types of Fibre cables, FC Switches, Arrays, HBAs etc are covered. There are nice chapters on how to setup a SAN including concepts like arbitrated loop, zoning, LUN masking etc. You can clearly see that the authors are people who have actually implemented SANs and they give tips about trouble shooting SAN and how to manage a SAN. Concepts like Dedupe, Replication are also explained. The language is simple and there are lot of diagrams to explain the concepts. In short they know you are a dummy and model the teaching accordingly!!
Availability: You should be able to get this book in almost any technical book shop
Cost: Rs. 399/-

2. Information Storage and Management. Edited by: G.Somasundaram and Alok Srivastava, EMC Education Services. Published by Wiley India

The scope of the book is vast. It covers various aspects of Storage Technologies. As you can expect, this being a book by EMC Education Services, the examples given to highlight any technology are based on EMC products. This is a good thing since it gives people a glimpse of how a particular technology has been implemented and has been productized. The book starts with the very basic unit of disk drives, proceeds to RAID, then to Arrays and on to DAS, NAS, SAN and CAS. It then introduces the concepts of Storage Virtualization, Data Protection, Disaster Recovery, Security etc. Most of the chapters are divided into two parts. In the first part, the technology is introduced and various components of the technology are discussed. The second part gives an idea of EMC product(s) that use the technology being discussed. Given the scope, I think the concepts are covered to a decent depth. This is a book for everyone in terms of understanding the complete Storage landscape. I would probably say this book is slightly tilted towards the development and testing engineers.

Availability: Available in most of the technical book stores

Cost: Rs. 599/-

(I heard that this book is used as the text book in colleges which have a tie-up with EMC. This book also helps students pass one of the basic EMC certifications it seems.)

3. Storage Area Network Essentials by Richard Barker and Paul Massiglia . Publisher: Originally Veritas, Wiley India in India

If the first book I mentioned was written with a ‘Nuts and Bolts’ approach and showing how to set a SAN, this book is aimed at Development, Maintenance and Testing engineers. Both the authors are from Veritas (now Symantec) but there is no Veritas specific stuff in the book. Lot of stuff is discussed which is needed for any development engineer / designer and architects. Things like Lock Managers, Fault tolerance, IO Balancing, Performance are dealt in detail. The book divides itself into three parts: Understanding Storage Networking, What’s in a SAN and SAN Implementations Strategies. If you are an engineer getting into Storage development or testing, grab this book. It will also help those who are already working on storage since the authors bring lot of experience to the table.

Availability: Available in all technical book shops

Cost: Rs. 449/-

4. Storage Networks Explained by Ulf Troppens, Rainer Erkens and Wolfang Muller (All from IBM Germany), Published by Wiley India

This is a translation from German. This is definitely written keeping the engineers in mind. Everything is explained at the conceptual level. If you are looking for ‘Nuts and Bolts’ descriptions, you will not find it here. The good thing about this book is that it explains the various protocols involved in Storage networks. The one limitation is that this book was published in 2004. Given that 5 years is like a lifetime in Storage world, some of the concepts have not caught up in the Storage world as expected. (Example: Infiniband and VIA. ) Having said that, if you are an engineers who wants to get a good grip on the various protocols and also understand some internal details, this book will be very helpful. There are also chapters dealing with SNIA Shared Model and SMI-S. It will be great if an update version of this book is published soon.

Availability: Available in most technical book shops

Cost: Rs.299/-

5. Backup & Recovery by W. Curtis Preston Publisher: O’Reilly

You can’t get a better person writing about Backup & Recovery than Curtis. Known in the industry as ‘Mr. Backup’, Curtis brings his considerable experience to the book. As the title indicates this book is about Backup and Recovery and if you are involved in this area, buy this book. In the book, Curtis first talks about the Open Source Backup Utilities that are available and explains how backup and restores are done using these utilities.  Infact that seems to be the main aim of the book as the sub title of the book is “Inexpensive Backup Solutions for Open Systems”. In the next segment he talks about the features expected / our requirements vis-a-vis the commercial backup utilities that are available. He discussed features like Snapshots, Dedupe, CDP etc in this section. Backup hardware is discussed in another chapter. The next section of the book is devoted to Bare Metal Recovery which covers Solaris, Linux, Windows, AIX and MacOS.  Backing up Databases form the next section and the final section is called Potpourri, which as the name suggests, discusses various miscellaneous stuff. Curtis Preston has a web site: www.backupcentral.com You can go to his blog from this site. Curtis is someone who doesn’t mince words and speaks out his mind clearly. You will find his blogs interesting, even if you are not into backup. Given the challenges and new techniques that are cropping up for VMware backup, I am hoping that a revised edition of this book will appear in the future covering this topic in detail as well.

Availability: You need to look out for this in the technical book stores. Sometimes they push this book into the Database shelf.

Cost: Rs.600/-

6. Storage Networks by  Robert Spalding Publisher: Tata McGraw Hill

Earlier this used to be the only book on Storage Networks that was available. Nowadays I don’t see as much in the book shelves as I see the ‘Dummies’ and Paul Massiglia’s book. This is a 2003 edition and hence lot of newer developments are not present in the book (like Deduplication for example.) This does give a good internal view of many components (like HBA etc) which will be useful for engineers to understand the basic building blocks involved in creating the SAN. It has nice diagrams and the explanations are good. So if you are an engineer, you can definitely check out this book. It will be quite useful for you.

Availability: Used to be widely available. Still available in many technical book shops

Cost: Rs. 500/-

7. Storage Networking Protocol Fundamentals by James Long Publisher: Cisco Press (Pearson Education in India)

This book has details about various protocols including FC, iSCSI, Parallel SCSI. The approach taken is a layered one, in the sense every networking layer as per OSI model is taken up and the Storage protocols applicable to that layer are discussed. For example, SCSI Parallel Interface, Ethernet and FC are the protocols discussed at the physical layer. Similarly Network, Transport, Session, Presentation and Application layers are dealt with. Appropriate mapping of the respective protocols to these layers is done and some details of the protocol are given. If you are working on protocols this will definitely be a good first book to read before you actually go and read up the standard. (A prospect not many relish, I would say!!!). Though published by Cisco Press, this is not a Cisco specific book.

Availability: Seeing lesser copies of this now. Check in the Cisco section of any technical book shop

Cost: Rs.435/-

There was a book titled “Building SANs with Brocade”, which was a Brocade Switch Specific book. I don’t think this is available nowadays. I haven’t seen it any bookshops. Also given that products keep evolving fast, how applicable this book would be for the latest Brocade product needs to be verified.  Some IBM Redbooks may be available and these are useful if you are working on IBM products. I have seen books on IBM SVC and IBM Data Protection Strategies. Ofcourse you can always check out IBM Redbooks on the IBM Redbooks Site.

The books I am going to list are not available in India but may be worth procuring for your company library if you are working in that particular area.

1. Fibre Channel Switched Fabric by Robert Kembel

2. Shared Data Clusters by Dilip M Ranade

3. Highly Available Storage for Windows Servers by Paul Massiglia

4. Storage Security by John Chirrilo

If you have read any other book which is a good reference for any Storage technology, please do leave a comment with the book name. That will benefit everyone.

One final tip before I sign off. While I have given the cost of the book, you should be able to get atleast 10% discount on the book cost in most of the stores. So don’t go buy in some big name book shop which does not give a discount. I buy mostly from Book Paradise or Sapna Book house in Jayanagar, Bangalore and I get discounts ranging from 10% all the way to 22%. So if you save some money based on this tip and want to share a part of that savings,  let me know and I will mail you my address 🙂