Storage Gateways and FUSE

Image source : aws.amazon.com

Cloud brought with it cheap storage and it also brought with it durability This meant that storing your data on the cloud is cost effective and you don’t need to worry about losing data. (S3 of Amazon for example, gives us 11 9s durability. This means that for all practical purposes, you will never ever lose your data). It is but natural that people start using the Cloud storage service and these Cloud Storage service of different vendors are storing trillions and trillions of objects.

Cloud Storage is based on Object Storage. This is different from the standard Block and File storage we are used to. In case of Object store, we need to fetch the object from the Cloud using REST APIs. This is quite different from reading and writing a file on to your disk. There are other differences as well between Object store and Block/File based storage devices.

While Cloud Storage is cheap, users are more comfortable with a filesystem interface. Is there a way in which we can deal with the Cloud Storage as if it is a filesystem? This means that the user just reads and writes a file and does not need to use REST API to fetch or store a object. If this can be done, users will find it easier to use the Cloud storage, thus reducing storage cost. This is possible by using Storage Gateways.

Storage Gateways have the Cloud Storage as their backend but expose a filesystem to the users. The users deal with files whereas the Storage Gateway will store these files are objects in Cloud Storage. This background processing is done transparent to the user. The Storage Gateways could expose either a block device or a filesystem (or a virtual tape library) to the user. AWS, for example, has Storage Gateways which expose a filesystem, Storage Gateways which expose a block device (iSCSI device) and a Storage Gateway which exposes a Virtual Tape Library (VTL). All of these use S3 as their backend to store the data.

The question that will be uppermost in your mind is that if the storage is in the Cloud and if you are using this storage as the primary storage, will there be no impact on the performance? It is a very pertinent question. Accessing the Cloud is definitely not as fast as accessing your disk drive in the data center. In order to address this, Storage Gateways have disks in them wherein the cache the recently accessed files. This helps in bolstering the performance of the gateways.  Other than AWS, Avere System is another company which does Cloud based NAS filers. ( http://www.averesystems.com/ )

Azure has now come with a FUSE adapter for BLOB storage (BLOB storage is the object store of Azure). Once you install this FUSE adapter on a Linux system, you can mount a BLOB container onto your Linux system. Once that is done, you can access the files as if they are part of your filesystem. You don’t need to use the REST APIs. The advantage of this wrt the Storage Gateways is that Storage Gateways are generally virtual appliance. For example, in case of AWS Storage Gateway, you need VMWare on prem because AWS Storage Gateway is a virtual appliance which runs on VMWare ESXi. In case of FUSE, you don’t need any additional device. Once you have the driver installed, you can start accessing the object storage as normal files.

Ofcourse, FUSE adapter of Azure is in the initial stage and hence has limitations. Not all filesystem calls have been implemented. So you need to be careful when you are using it.

You can check up this Azure article for more details on the FUSE adapter:  https://azure.microsoft.com/en-us/blog/linux-fuse-adapter-for-blob-storage/