Connecting cloud data sources with the OData API

The Open Data Protocol could be the key to linking on-premises enterprise data and cloud sources.

Integrating semi-structured data is one of the primary challenges of a burgeoning collection of Web 2.0 APIs. Whether the data originates in a consumer application or in enterprise Web services, it is necessary to find the least-common data denominator to enable on-premise or cloud-based services to understand one another.

The RESTful Open Data Protocol -- better known as the OData API -- has the potential to interconnect cloud-based enterprise Software as a Service offerings and Platform as a Service projects, as its doing now for big data services like the entire Netflix movie catalog and the Windows Azure Marketplace DataMarket.

Structured data sources -- such as relational databases, spreadsheets and files containing comma-separated values -- rely on the ubiquitous Open Database Connectivity (ODBC) data-access API, which Microsoft adapted from the SQLAccess Group's Call Level Interface (CLI) and released in 1992. Sun Microsystems released the Java Database Connectivity (JDBC) v1 API in 1997 and later added it to the Java Standard Edition. A JDBC-to-ODBC bridge enables JDBC connections to ODBC-accessible databases. ODBC and JDBC APIs enable processing SQL SELECT queries, as well as INSERT, UPDATE and DELETE statements, against tabular data and can execute stored procedures. Microsoft's OLE DB and ActiveX Data Objects (ADO) began to supplement ODBC in late 1996 as members of the Microsoft Data Access Components (MDAC.) ODBC and JDBC, however, remain the lingua franca of structured data connectivity for client/server environments.

The remarkable growth of the quantity of semi-structured data, mostly Web-based HTML and XHTML documents, created a demand for a Web-friendly, cloud-compatible data access API analogous to ODBC/JDBC. In 2002, Dave Weiner released the Really Simple Syndication (RSS) 2.0 API, which he derived from Netscape's RDF Site Summary and Rich Site Summary APIs. Wikipedia describes RSS as a "a family of Web feed formats used to publish frequently updated works -- such as blog entries, news headlines, audio, and video -- in a standardized format."

In 1993, Sam Ruby set up a wiki to discuss refinements to RSS, which attracted a large and vocal group of Web application developers and content publishers. Wiki members published Atom v0.2 and v0.3 in 2003; Google adopted Atom as the syndication format for Blogger, Google News and Gmail. In 2004, the Internet Engineering Task Force (IETF) formed the AtomPub working group -- co-chaired by Tim Bray and Paul Hoffman -- to standardize the Atom format. In late 2005, IETF issued a Proposed Standard for the Atom Syndication Format v1.0 as IETF RFC 4287. IETF issued the Proposed Standard for the Atom Publishing Protocol (AtomPub) as a Proposed RFC 5023 in October 2007. Google's GData format is based on Atom and AtomPub v1.0.

Pablo Castro, a Microsoft data architect, proposed a set of Web data service extensions to AtomPub called Codename "Astoria" in an "Accessing Data Services in the Cloud" session at the MIX 07 conference held in Las Vegas during April 2007. Design goals for these services were:

  • Web friendly, just plain HTTP
  • Uniform patterns for varying schemas
  • Focus on data, not formats
  • Stay high-level, abstract the store

One of Astoria's key features was the ability to access any data element -- called an entity -- by a Uniform Resource Identifier (URI), as well as related entities by navigating a graph of associations. Microsoft's Entity Data Model v1 defined the available entities, including data types and associations. The Astoria team also specified URI-compatible query options to enable filtering, sorting, paging and navigating.

Initially, Astoria supported plain XML (POX), RDF+XML, and JavaScript Object Notation (JSON) formats. The Astoria team began investigating AtomPub and Web3S to replace POX and RDF+XML, and settled on AtomPub as the default and JSON as an alternative format for AJAX applications in February 2008. Microsoft launched Astoria as the ADO.NET Data Services Framework Beta 1, together with the ADO.NET Entity Framework Beta 1, as elements of .NET 3.5 Beta 1 and Visual Studio 2008 SP1 in May 2008. Astoria's name changed from ADO.NET Data Services to Windows Communication Foundation (WCF) Data Services in November 2009 during Microsoft's Professional Developers Conference 2009. Microsoft renamed ADO.NET Data Services' formats to OData in early 2010.

So what exactly is OData
The Open Data Protocol is a REST-based protocol for performing create, read, update and delete (CRUD) operations on collections of Web-accessible resources configured as "data services." Microsoft established the OData.org website in March 2010 and published the current [MS-ODATA]: Open Data Protocol (OData) Specification in May 2011.

According to the OData FAQ, "OData may be submitted to a standards body at some point in the future based on feedback from the community. In the meantime, the Open Data Protocol specification is available under the Microsoft Open Specification Promise (OSP), allowing third parties, including open source projects, to build Data Services for any runtime as well as clients to consume such services."

Most popular browsers display OData feeds directly. A URI for the Order Details table of the SDK's Sample Read-Only Northwind Service returns the AtomPub representation of the first Order_Details item shown in Figure 1.

Figure 1. IE 9 displaying an OData feed of the first of 50 Order Details records from the Northwind sample database. (Requires turning off Feed Reading view)

It's evident from Figure 1 that OData is a very verbose protocol with a substantial amount of XML overhead. The first 50 Order Details entries are 65,536 bytes in length. The following URI for the Order Details table of the SDK's Sample Read-Only Northwind Service provider with a $format=json query option appended returns the following JSON representation of the first Order_Details item:

{ "d" : { "results": [ { "__metadata": { "uri": "https://odata.sqlazurelabs.com/OData.svc/v0.1/jc650b4zaf/Northwind/Order_Details(OrderID=10248,ProductID=11)", "type": "Northwind.Order_Details" }, "Discount": 0, "OrderID": 10248, "ProductID": 11, "Quantity": 12, "UnitPrice": "14.0000", "Order": { "__deferred": { "uri": "https://odata.sqlazurelabs.com/OData.svc/v0.1/jc650b4zaf/Northwind/Order_Details(OrderID=10248,ProductID=11)/Order" } }, "Product": { "__deferred": { "uri": "https://odata.sqlazurelabs.com/OData.svc/v0.1/jc650b4zaf/Northwind/Order_Details(OrderID=10248,ProductID=11)/Product" } } }, … }

The first 50 JSON entries consume 29,859 bytes and are 45.6% the size of the AtomPub version. Queries against the SQL Azure Labs' OData Service return a maximum of 50 entries. If the data source has more than 50 rows, a $skiptoken specifies the item to skip past to start the successive data page. Microsoft announced the availability of data.js, a JavaScript library for OData and HTML 5 features such as local storage, under the MIT license on June 29

WCF Data Services is a .NET client (Consumer) library for OData that's an optional data source for Visual Studio LightSwitch applications. The Windows Azure SDK's Microsoft.Windows.Azure.StorageClient library is another .NET client library for OData that presents data sources as ADO.NET collections. As of June 2011, the  provides links to additional client libraries for Silverlight 4, PHP, Java, Ruby, Objective-C (iPhone), Android, Windows Phone 7, .NET Compact Framework 3.5, Joomla and Drupal.

Current OData Producers include SAP NetWeaver 2.0 Gateway, IBM Web Sphere, GeoREST, Microsoft Excel (via PowerPivot), SharePoint 2010 and SharePoint Online (see Figure 2), SQL Azure, Windows Azure Table Storage, Windows Azure Marketplace DataMarket, SQL Server Reporting Services and Dynamics CRM. The OData SDK also provides links to server libraries, which enable creating OData Producers. As of June 2011, the Producer libraries supported .NET Framework 3.5 SP1, .NET Framework 4.0, odata4j (Java), OData on Rails (a prerelease for Ruby) and MySQL OData.

Figure 2. IE 9 displaying an OData feed of the first of 1,000 Order Details records from the Northwind sample database imported into items of a SharePoint Online list. (SharePoint throttles queries to a maximum of 1,000 entries.)

Despite its current Microsoft centricity, OData shows promise of gaining substantial mindshare in other development environments, especially Java and Ruby, for which both Consumer and Producer libraries are available now. If you need to interconnect with enterprise data from a variety of on-premises and cloud-based sources, give OData a test drive. There's no question that OData is rapidly becoming ODBC for the Web.

Roger Jennings is a data-oriented .NET developer and writer, the principal consultant of OakLeaf Systems and curator of the OakLeaf Systems blog. He's also the author of 30+ books on the Windows Azure Platform, Microsoft operating systems (Windows NT and 2000 Server), databases (SQL Azure, SQL Server and Access), .NET data access, Web services and InfoPath 2003. His books have more than 1.25 million English copies in print and have been translated into 20+ languages.

Dig Deeper on Cloud APIs and integration