Problem solve Get help with specific problems with your technologies, process and projects.

Backup to the compute cloud

William Brogden discusses the essential trends involved in the wide range of distributed computing configurations that cloud computing can accomplish.

Many computer generations ago, Sun Microsystems adopted the slogan, "the network is the computer." There have been numerous cycles of computer buzzword since then but they are mostly just elaborations on that idea. "Grid computing" has been a popular buzzword in recent years, applied to a wide range of distributed computing configurations as I discussed in this article. Of the three categories of grid computing discussed in that article, cloud computing is most like a generalization of the "Space" concept as implemented in JavaSpaces.

The basic idea is that a user can get practically any computing task done with minimum investment in hardware or software by contracting with a service offered in the "cloud." The term is intentionally nebulous to convey the fact that the user does not care where or how the work is done, only that the results are correct. Services are used and paid for only as needed, a feature especially appreciated in businesses with occasional large peak loads.

Many trends have come together to create the possiblilty of cloud computing. I think the following play a major role:

  • Virtualization of applications and operating systems tends to remove user concern about the underlying hardware and improves scalability.
  • High speed networks remove concern about communication costs and delays, and permit location of hardware where electricity and cooling costs are minimal.
  • Huge capacity/commodity storage devices, cheap generic computers and open source operating systems allow companies offering services to expand rapidly.
  • RESTful web service design concepts simplify communication and client architecture.
  • Big web enterprises like Google, Yahoo and Amazon have discovered that capabilities originally developed for internal support can be sold in the cloud.

Amazon in the Cloud
In order to experiment with the cloud I chose Amazon's Web Service for the simple reasons that I already have an Amazon account and I found ample support for Java developers. In addition to Java, Amazon has developer support for a wide variety of languages including Ruby, Silverlight, Python, PHP, Perl, ColdFusion, Visual Basic, and Erlang.

Setting up an Amazon Web Services account is as simple as establishing a user name, password, and credit card for billing. With the basic account established you can then apply for a unique access key id and secret key. The access key is used to locate resources you own and the secret key is used for authentication.

Amazon is rolling out many fascinating services such as the "Elastic Compute Cloud" which are way beyond the scope of this article. I chose to signup just for the Simple Storage Service or S3 for short to get started.

Amazon simple storage service
As tornado season rolls around in Texas every year I tend to think about off-site backup. Getting backup CDs to the safe deposit vault at the bank is a bit of a hassle so I don't do it often enough. More frequent backing up to Amazon's Simple Storage Service (S3) is an obvious choice for my first cloud project.

Amazon S3 system characteristics important for backup include:

  • Buckets. Stored objects reside in locations called buckets which are named following a URL style naming convention.
  • Addressing. Objects can be addressed by combining the S3 service URL with the bucket and object name resulting in a REST style HTTP compatible URL.
  • Access Control. Access (public, private, read-only) to a bucket or object is minutely controlled by an Access Control List (ACL) and request authentication.
  • Metadata. Stored objects have attached metadata useful for management, such as the MD5 digital signature and creation date.
  • Reliability. The S3 Service Level Agreement commits to 99.9% availability.
  • Low Cost. Storage costs $ 0.15 per gigabyte per month, uploading costs $ 0.10 per gigabyte, downloading costs $0.17 per gigabyte with decreasing rates for terabyte levels.

The Jets3t Cockpit application
The most popular Java based open source toolkit for working with the S3 service is called JetS3t. The download includes a desktop application using Swing graphics called "Cockpit." Running Cockpit is probably the best way to get acquainted with the S3 service since it can guide you through creation of S3 buckets, uploading files, and managing them.

The Cockpit user interface is a typical Java Swing presentation listing your buckets and providing for managing the contents. The Access Control List provides for detailed control over who is allowed to read or modify the contents of a bucket or a specific file. File upload options include setting the encryption and access control settings

Cockpit supports a "drag and drop" interface for file backup, with a bucket selected, "dragging" a selected list of files to the dialog starts the upload process. You can also use a typical Java Swing file directory dialog to select files for backup but I found this dialog to be very slow when listing directories for some reason. As an example of file transfer speed, I was able to send a total of 312MB in seven files in approximately 34 minutes. This was with the simplest form of encryption, higher levels of encryption will take longer. When downloading files from S3, Cockpit uses the metadata attached to each file to determine the encryption used and will require the encryption key.

A custom backup solution
I experimented with using the JetS3t toolkit to build a custom backup program. What I wanted is a Java program that will run in the background on my desktop and backup files that are copied into a particular directory, then delete that local copy. When I had a problem with request authentication, the Amazon developer forum pointed me in the right direction and my custom solution is now working.

Things are not always sunny in the cloud because human systems are fallible. The S3 service itself has gone offline a couple of times in the last year, and of course there are many things that can go wrong with your Internet connection. Still, when tornado season rolls around again I am going to feel better knowning that I have extra backups in the cloud.

Amazon Web Services starting page

Introduction to Amazon Web Services for Java developers

JetS3t open-source Java toolkit home page

A Wikipedia article on cloud computing

Dig Deeper on High availability and disaster recovery

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.