Cloud data warehouse guide: Using Redshift, rival platforms
A comprehensive collection of articles, videos and more, hand-picked by our editors
An online ticketing service that manages tens of thousands of transactions daily took pressure off its Oracle production...
By submitting your email address, you agree to receive emails regarding relevant topic offers from TechTarget and its partners. You can withdraw your consent at any time. Contact TechTarget at 275 Grove Street, Newton, MA.
database and saved on software licensing when it deployed a cloud data warehouse.
The company, Etix, headquartered in Raleigh, N.C., wanted a separate data warehouse for analyzing data because queries against the production database would slow it down. It could have stuck with on-premises Oracle Corp. for the data warehouse as well, but that would have been substantially more expensive than the cloud-based deployment it ended up with, according to Daniel Heacock, senior business systems analyst for Etix.
"We've been working with Oracle since the beginning of our company, and I already feel like we're paying way too much," Heacock said.
We've been working with Oracle since the beginning of our company, and I already feel like we're paying way too much.
Daniel Heacock, senior business systems analyst for Etix
Today, Etix syncs about 700 GB of Oracle data with Amazon Redshift, which is compressed down to a third of the original size in the cloud. Amazon Web Services (AWS) said the Redshift platform can scale automatically to petabytes in size and will compete with big IT vendors, including Oracle, on price.
The product was first launched at the re:Invent Amazon user conference in November 2012, and is in open public beta. Etix first spun up one three-year reserved instance of Redshift in October for an annualized price of $1,999, which includes both up-front fees and hourly charges, according to Heacock.
Etix did not investigate the pricing of an Oracle Data Warehouse specifically, but the company paid approximately $500,000 for its on-premises Oracle Enterprise Edition database, Heacock said.
CloudBeam software from Attunity Inc. is needed to convert data into a usable form for RedShift and to manage the replication of transactions to the Amazon cloud. Attunity is one of several Amazon data integration partners, which also include Fly Data Inc. (formerly Hapyrus), Informatica Corp., SnapLogic Inc. and Talend.
Data integration can be one of the biggest cost "gotchas" in the cloud. But even with an additional $1,200 a month in Attunity subscription-based licensing fees -- for another $14,400 per year -- the company saved an estimated $80,000 in development time alone by integrating Oracle with Redshift using CloudBeam instead of doing the integration work in-house, Heacock said.
Amazon Redshift updates boost performance
While the Redshift cloud data warehouse proved to be the cheaper alternative, it wasn't higher-performing than the on-premises Oracle database in its first incarnation. With a single node from the previous generation, queries could take up to 40 seconds, Heacock said.
Last week, however, Amazon introduced new node types for Redshift, one of which boosts performance using solid-state drives (SSDs). The new Dense Compute nodes are primarily for customers who have less than 500 GB of data in their data warehouse and can scale from a starting point of 160 GB.
Heacock said he spun up three dw2.large Dense Compute nodes and queries took five seconds. On-premises query performance varies according to the hardware used to back the system, but Heacock said this is comparable to on-premises database queries.
The three nodes on three-year reserved instance contracts will cost $2,640 per year -- a price difference of 32% over the previous version -- but with three times the CPU and memory of the previous version.
Much of the savings for the new Redshift nodes comes from a drastically reduced amount of storage, but Heacock said that for a data warehouse of Etix's size, much of the original node's storage was redundant.
"We are gaining three times the memory and CPU resources, and almost four times the elastic compute units," he said. "Since the 76%storage loss is not an issue for us, this is a no-brainer."
Redshift pricing varies with performance needs
On-demand pricing for the entry-level dw2.large Dense Compute node, which comes with two virtual CPUs, 15 GB of memory and 160 GB of SSD storage, starts at $0.25 per hour. The larger dw2.8xlarge Dense Compute node, which comes with 32 vCPUs, 244 GB of memory and 2.56 TB of SSD storage, starts at $4.80 an hour.
If performance isn't as critical or storage density is the priority, larger Dense Storage nodes scale up to a petabyte of compressed data. The dw1.xlarge Dense Storage node, which comes with two vCPUs, 15 GB of memory and a two-terabyte hard drive, starts at $0.85 per hour for on-demand deployments. The dw1.8xlarge has 16 vCPUs, 120 GB of memory, a 16 TB hard drive, and is priced at $6.80 per hour on demand.
Redshift competes with offerings such as the Pivotal Labs Greenplum Database, Hewlett-Packard Co.'s Vertica and Teradata's Active Enterprise Data Warehouse. Typically, enterprises pay between $19,000 and $25,000 per terabyte of data per year with traditional data warehouses, according to statistics gathered by analyst firm ITG in June 2011. Some prices may be higher than that; chat reps on an authorized reseller website* this week quoted a list price of $37,000 per terabyte for a Greenplum Database license.
Oracle did not comment as of press time.
*Statement changed after initial publication.