23 August 2017

Clouds have buckets but also need baskets

Sometimes supposedly simple things can be slightly tricky. This story relates to a data sharing exercise for the cloud workshop last year.

Alice wants to share 150 GB file with Bob, too large to send by email, too large even to share through "free" cloud storage or to conveniently put on a memory stick and send by mail..
File:Baskets four styles.jpg
(Baskets, image credit Jeremy Kemp)

Ordinarily she'd get a piece of  online  storage somewhere, upload the file to it, and send Bob a link. But let's suppose we do it the other way: Bob gets some temporary storage and gives Alice access rights to it, Alice then uploads her file to that space and Bob retrieves it and gets rid of the space. Easy, yes?

If they were both scientists with grid access, it'd be fairly easy to find a piece of space and share files, but let's suppose Alice can't use the grid, or they are not in the same VO, or they want to share privately.

This sounds like a use case for the cloud, then? so Bob gets a piece of cloud storage, a bucket. There's a blob store with keys that can be shared, but it requires a client to talk to it, or there might be a browser plugin, but Bob doesn't want to ask Alice to install anything, she should just be able to use her browser. OK, but how about a publicly writable space (with a non-guessable name) - nope, there's nothing that can be made world writeable - probably a Good Thing™ in the Grand Scheme of Things but it makes Bob's life a bit more difficult. DropBox can create a shared space but Bob would have to "unlock" more space and he doesn't want to pay for a whole month, and he doesn't want a subscription that he'd have to remember to cancel..

OK, so Bob next looks at cloud compute, and gets a virtual machine in the cloud with enough disk space. He sets up Apache on it. While he fiddles with this, he reminds himself how iptables work, so he can protect the machine and test in a protected environment (and obviously he's careful not to block off his ssh connection!) He looks at DAV but it's too complicated. How about a normal upload? Bob reminds himself how the HTML works. Yes, that might work; except it doesn't - a quick search reveals it needs CGI, so Bob needs to find a CGI script - there are modules for Perl, PHP and Python but example scripts need a bit of tweaking to make them look right. Obviously he also needs to build and install the module dependencies that are not part of the standard operating system packages.  Meanwhile, because it is a compute resource, Bob is also paying for its idle time, and it idles a bit because this exercise is not Bob's highest priority.

There is no certificate on the cloud host - so Bob has to use HTTP rather than HTTPS, but this is supposed to be a short exercise, probably OK, it's not sensitive data. However, it does need a bit of protection because it is a writeable resource, Bob thinks, so he reminds himself how BasicAuth works and sets up a temporary password that his friend will remember, and one for himself, to test, which he can remove once his testing is done (so he doesn't expose Alice's password during testing.).He gradually opens the virtual machine's firewall and is annoyed it doesn't work, until he remembers the cloud platform has its own firewall, too; it takes him half a minute to locate the network security rules and add port 80. He then tests from home, too, and finally mails the hostname, username, and password to Alice.

The point of this exercise was to see how Alice and Bob would share data when they don't have a shared data platform, and the recipient wishes to pay the (temporary) storage costs. It is related to our exercise in data sharing through the use of cloud resources. Shouldn't it have been easier than writing to a portable hard drive?