Microsoft cloud service lets citizen developers crunch big data

Microsoft’s "Data Explorer" cloud service puts coding power into the hands citizen developers. It remains to be seen how deep into big data users get.

Microsoft is devoting substantial resources to developing codenamed apps and APIs targeting a new breed of IT teams who spelunk actionable business information from massive files, or big data. According to Gartner Research, these “citizen developers” will build at least 25% of new business applications by 2014.

Citizen developers differ from traditional programmers in that they’re end users who “create new business applications for consumption by others using development and runtime environments sanctioned by corporate IT.” Gartner calls this process “end-user application development,” or EUAD.

Workgroup and department members often turn to citizen developers to analyze or help them analyze structured information from on-premises and cloud-based Web server log files, semi-structured data from social websites like Twitter and Facebook, as well as hordes of unstructured files, such as Word documents and Excel worksheets.

Historically, IT staffs have written ad-hoc Microsoft Office applications with Excel or Access macros to manage structured data. Microsoft designed Visual Studio LightSwitch as a forms-over-data alternative for EUAD that provides simple data modeling features and easy deployment as on-premises or Windows Azure Web applications.

Microsoft’s LINQ Pack, LINQ to HPC and Project “Daytona” as well as Microsoft Research’s forthcoming open source Excel DataScope app were designed to make unstructured big data analytics in Windows Azure accessible to citizen developers. However, in November 2011, Microsoft announced plans to not release LINQ to HPC to production in favor of Apache Hadoop on Windows Azure, now in invitation-only private preview. To obtain a private preview, you must apply to Microsoft’s team for an invitation to test-drive the platform and its application programming interface (API). It remains to be seen if citizen developers will be capable of populating Hadoop clusters and programming MapReduce analytics.

Image goes here

Figure 1. (Click to enlarge.) 
This .NET Windows form app summarizes a stream of Twitter data, including buzz (daily tweet counts) and calculated sentiment (positive and negative tweet tone) about Windows 8, as well as estimated reliability of the sentiment calculations, from a data feed provided by the Codename “Social Analytics” API.

New Microsoft products for EUAD
The SQL Azure Labs team released in late 2011 private previews of Codename “Social Analytics”, “Data Explorer” and “Data Transfer,” all of which target EUAD. Social Analytics delivers a real-time stream consisting mostly of Twitter tweets, retweets, replies and direct messages. It also includes a few Facebook “likes,” posts and comments, as well as occasional StackOverflow questions and answers.

The Windows Azure Marketplace DataMarket currently supplies two live OData streams that incorporate sentiment data about Windows 8 or Bill Gates. Microsoft's Social Analytics experimental cloud uses a Social Analytics’ API and a downloadable Visual Studio 2010 SP1 Windows application that retrieves, displays and summarizes data from the Windows 8 data feed (see Figure 1). This app required almost 500 lines of C# code to generate, display and save the summary data as a comma-separated value (CSV) text file. It took me about a day to code and test and is probably too complex for most citizen developers to program.

Image goes here Figure 2. (Click to enlarge.)
Selecting a resource from the list at the left displays 12 tool icons with galleries for more Filters as well as Insert Column and Split Column Transform tools. ContentItems is a table included in the Codename “Data Analytics” VancouverWindows8 data feed from the Windows Azure Marketplace

Simplifying cloud-based mashups
Microsoft touts Codename “Data Explorer” (DE) as a way for ordinary PC users to automatically discover data available to download from the Windows Azure Marketplace; enrich data by combining it in mashups with related data from the Marketplace, Web, databases and other data types; and publish results from cloud-based workspaces stored in Windows Azure. DE also is an easily approachable, composable extract-transform-load (ETL) tool that provides many of the capabilities of SQL Server Integration Services (SSIS) without the long learning curve. DE provides a set of tools to manipulate data resources in the sequence you specify (Figure 2).

Data Explorer lets you emulate a complex set of procedural operations on tabular data, such as those needed to display source data and aggregate the daily buzz and sentiment values shown in Figure 1, by applying tools to resources. ContentItemTypes is an enumeration resource that the Lookup Column tool uses to translate numeric ContentTypeId values to readable ContentTypeName values in the second column of Figure 2’s ContentItems table display.

Image goes here

Figure 3. (Click to enlarge.)
The Daily Summary resource delivers a row for each of the 58 days that Social Analytics data was available with a Tweet Count column from Daily Items, Tones Positive and Reliability Pos columns from the Daily Positives table, as well as Tones Negative and Reliabilty Neg columns from Daily Negatives table.

The Tones resource provides a similar Lookup Column for ToneValues in the fifth column. Daily Items is the initial table for the DailySummary aggregation resource. Daily Summary has Published On and Tweet Count columns and a row for December 27, 2011 and the preceding 58 days. Merging aggregated DailyPositives and DailyNegatives table resources with the equivalent of an SQL left outer join on the Published On column creates a table with Published On, Tweet Count, Tones Positive, Tones Negative, Reliability Pos and Reliability Neg columns, as shown in Figure 3.

Writing formulas with graphical builder UIs
DE has a full-blown formula-based programming language that’s based on Microsoft’s M (for Modeling) language, a component of the ill-fated Oslo repository database and the Quadrant query and visualization tool. However, most DE users won’t need to write M code because DE’s user interface includes graphical builders for the fx expressions that appear at the top of tables. The following appears at the top of the table in Figure 3:

fx = Table.RenameColumns(ReorderedColumns,{{"Rt.Positives", "Tones Positive"}, {"Rt2.Negatives", "Tones Negative"}, {"Rt.ReliabilityPos", "Reliability Pos"}, {"Rt2.ReliabilityNeg", "Reliability Neg"}})

To expose complete multi-line formulas, click the v-shaped icon at the right of the first line.

Following are the formulas for all actions that define the Daily Summary and Merge resources:

shared #"Daily Summary" = let

    #"Daily Summary" = Table.Join(Merge,{"PublishedOn"},Table.PrefixColumns(DailyNegatives, "Rt2"),{"Rt2.PublishedOn"},JoinKind.LeftOuter),

    HiddenColumns = Table.RemoveColumns(#"Daily Summary",{"Rt.PublishedOn", "Rt2.PublishedOn"}),

    ReorderedColumns = Table.ReorderColumns(HiddenColumns,{"PublishedOn", "Tweet Count", "Rt.Positives", "Rt2.Negatives", "Rt.ReliabilityPos", "Rt2.ReliabilityNeg"}),

    RenamedColumns = Table.RenameColumns(ReorderedColumns,{{"Rt.Positives", "Tones Positive"}, {"Rt2.Negatives", "Tones Negative"}, {"Rt.ReliabilityPos", "Reliability Pos"}, {"Rt2.ReliabilityNeg", "Reliability Neg"}})



shared Merge = Table.Join(DailyItems,{"PublishedOn"},Table.PrefixColumns(DailyPositives, "Rt"),{"Rt.PublishedOn"},JoinKind.LeftOuter);

Image goes here

Figure 4. (Click to enlarge.)
All tools have graphical formula builders, which appear when you click the Edit button at the left of the formula. This early version of the graphical UI for the Merge builder requires that you select PublishedOn in one of the lists and add a prefix to the right table’s column names.

Figure 4 shows the builder UI for the Merge resource’s action that generates the formula for the left outer join between DailyItems and DailyPositives tables on the PublishedOn field. As noted in my post about problems discovered with values and merging tables in Data Explorer, this builder UI is far less than intuitive and was undergoing usability improvement by the DE team at press time

Publishing your big data mashup  
The DE app comes in two flavors: a Desktop Client that runs on your LAN and a Shared Workspace in Windows Azure that opens after you register with your DE invitation code. The Desktop Client and Shared Workspace are capable of sharing saved snapshots of mashups at a particular date and time. However, a problem with the Shared Workspace prevented saving snapshots to SQL Azure at press time. The DE team is working on a fix for this problem.

Image goes here

Figure 5. (Click to enlarge.) 
Shared Workspaces for all users or only those specified enable a download of CSV files of resources, Excel workbooks, Data Explorer mashups and OData feeds for use or as resources for other Data Explorer mashups.

You can open my public Shared Workspace with live data, as shown in Figure 5, and download the Data Explorer Mashup with live data from the codename “Social Analytics” to use as a starting point in creating the advanced mashup with summary data. You’ll need an Account Key, which you receive when you use your invitation to sign up for DE, to download the Social Analytics data.

Give Codename “Data Explorer” a test drive with this 41-step, illustrated tutorial on creating a mashup. I’m sure you’ll agree that the DE team has a killer big data tool in progress.


Roger Jennings is a data-oriented .NET developer and writer, a Windows Azure MVP, principal consultant of OakLeaf Systems and curator of the OakLeaf Systems blog. He's also the author of 30+ books on the Windows Azure Platform, Microsoft operating systems (Windows NT and 2000 Server), databases (SQL Azure, SQL Server and Access), .NET data access, Web services and InfoPath 2003. His books have more than 1.25 million English copies in print and have been translated into 20+ languages.

Next Steps

Low code/no code app makes waves on iTunes

Dig Deeper on Cloud application development