Microsoft is devoting substantial resources to developing codenamed apps and APIs targeting a new breed of IT teams who spelunk actionable business information from massive files, or big data. According to Gartner Research, these “citizen developers” will build at least 25% of new business applications by 2014.
Citizen developers differ from traditional programmers in that they’re end users who “create new business applications for consumption by others using development and runtime environments sanctioned by corporate IT.” Gartner calls this process “end-user application development,” or EUAD.
Workgroup and department members often turn to citizen developers to analyze or help them analyze structured information from on-premises and cloud-based Web server log files, semi-structured data from social websites like Twitter and Facebook, as well as hordes of unstructured files, such as Word documents and Excel worksheets.
Historically, IT staffs have written ad-hoc Microsoft Office applications with Excel or Access macros to manage structured data. Microsoft designed
Requires Free Membership to View
Microsoft’s LINQ Pack, LINQ to HPC and Project “Daytona” as well as Microsoft Research’s forthcoming open source Excel DataScope app were designed to make unstructured big data analytics in Windows Azure accessible to citizen developers. However, in November 2011, Microsoft announced plans to not release LINQ to HPC to production in favor of Apache Hadoop on Windows Azure, now in invitation-only private preview. To obtain a private preview, you must apply to Microsoft’s team for an invitation to test-drive the platform and its application programming interface (API). It remains to be seen if citizen developers will be capable of populating Hadoop clusters and programming MapReduce analytics.
Figure 1. (Click to enlarge.)
This .NET Windows form app summarizes a stream of Twitter data, including buzz (daily tweet counts)
and calculated sentiment (positive and negative tweet tone) about Windows 8, as well as estimated
reliability of the sentiment calculations, from a data feed provided by the Codename “Social
Analytics” API.
New Microsoft products for EUAD
The SQL Azure Labs team released in late 2011 private previews of Codename
“Social Analytics”, “Data
Explorer” and “Data
Transfer,” all of which target EUAD. Social Analytics delivers a real-time stream consisting
mostly of Twitter tweets, retweets, replies and direct messages. It also includes a few Facebook
“likes,” posts and comments, as well as occasional StackOverflow questions and answers.
The Windows Azure Marketplace DataMarket currently supplies two live OData streams that incorporate sentiment data about Windows 8 or Bill Gates. Microsoft's Social Analytics experimental cloud uses a Social Analytics’ API and a downloadable Visual Studio 2010 SP1 Windows application that retrieves, displays and summarizes data from the Windows 8 data feed (see Figure 1). This app required almost 500 lines of C# code to generate, display and save the summary data as a comma-separated value (CSV) text file. It took me about a day to code and test and is probably too complex for most citizen developers to program.
Selecting a resource from the list at the left displays 12 tool icons with galleries for more Filters as well as Insert Column and Split Column Transform tools. ContentItems is a table included in the Codename “Data Analytics” VancouverWindows8 data feed from the Windows Azure Marketplace
Simplifying cloud-based mashups
Microsoft touts Codename “Data
Explorer” (DE) as a way for ordinary PC users to automatically discover data available to
download from the Windows Azure Marketplace; enrich data by combining it in mashups with related
data from the Marketplace, Web, databases and other data types; and publish results from
cloud-based workspaces stored in Windows Azure. DE also is an easily approachable, composable
extract-transform-load (ETL) tool that provides many of the capabilities of SQL Server Integration
Services (SSIS) without the long learning curve. DE provides a set of tools to manipulate data
resources in the sequence you specify (Figure 2).
Data Explorer lets you emulate a complex set of procedural operations on tabular data, such as those needed to display source data and aggregate the daily buzz and sentiment values shown in Figure 1, by applying tools to resources. ContentItemTypes is an enumeration resource that the Lookup Column tool uses to translate numeric ContentTypeId values to readable ContentTypeName values in the second column of Figure 2’s ContentItems table display.
Figure 3. (Click to enlarge.)
The Daily Summary resource delivers a row for each of the 58 days that Social Analytics data was
available with a Tweet Count column from Daily Items, Tones Positive and Reliability Pos columns
from the Daily Positives table, as well as Tones Negative and Reliabilty Neg columns from Daily
Negatives table.
The Tones resource provides a similar Lookup Column for ToneValues in the fifth column. Daily Items is the initial table for the DailySummary aggregation resource. Daily Summary has Published On and Tweet Count columns and a row for December 27, 2011 and the preceding 58 days. Merging aggregated DailyPositives and DailyNegatives table resources with the equivalent of an SQL left outer join on the Published On column creates a table with Published On, Tweet Count, Tones Positive, Tones Negative, Reliability Pos and Reliability Neg columns, as shown in Figure 3.
Writing formulas with graphical builder UIs
DE has a full-blown formula-based
programming language that’s based on Microsoft’s M (for Modeling) language, a component of the
ill-fated Oslo
repository database and the Quadrant query and visualization tool. However, most DE users won’t
need to write M code because DE’s user interface includes graphical builders for the fx
expressions that appear at the top of tables. The following appears at the top of the table in
Figure 3:
fx = Table.RenameColumns(ReorderedColumns,{{"Rt.Positives", "Tones Positive"}, {"Rt2.Negatives", "Tones Negative"}, {"Rt.ReliabilityPos", "Reliability Pos"}, {"Rt2.ReliabilityNeg", "Reliability Neg"}})
To expose complete multi-line formulas, click the v-shaped icon at the right of the first line.
Following are the formulas for all actions that define the Daily Summary and Merge resources:
shared #"Daily Summary" = let
#"Daily Summary" = Table.Join(Merge,{"PublishedOn"},Table.PrefixColumns(DailyNegatives, "Rt2"),{"Rt2.PublishedOn"},JoinKind.LeftOuter),
HiddenColumns = Table.RemoveColumns(#"Daily Summary",{"Rt.PublishedOn", "Rt2.PublishedOn"}),
ReorderedColumns = Table.ReorderColumns(HiddenColumns,{"PublishedOn", "Tweet Count", "Rt.Positives", "Rt2.Negatives", "Rt.ReliabilityPos", "Rt2.ReliabilityNeg"}),
RenamedColumns = Table.RenameColumns(ReorderedColumns,{{"Rt.Positives", "Tones Positive"}, {"Rt2.Negatives", "Tones Negative"}, {"Rt.ReliabilityPos", "Reliability Pos"}, {"Rt2.ReliabilityNeg", "Reliability Neg"}})
in
RenamedColumns;
shared
Merge =
Table.Join(DailyItems,{"PublishedOn"},Table.PrefixColumns(DailyPositives,
"Rt"),{"Rt.PublishedOn"},JoinKind.LeftOuter);
Figure 4. (Click to enlarge.)
All tools have graphical formula builders, which appear when you click the Edit button at the left
of the formula. This early version of the graphical UI for the Merge builder requires that you
select PublishedOn in one of the lists and add a prefix to the right table’s column names.
Figure 4 shows the builder UI for the Merge resource’s action that generates the formula for the
left outer join between DailyItems and DailyPositives tables on the PublishedOn field. As noted in
my post about problems
discovered with values and merging tables in Data Explorer, this builder UI is far less than
intuitive and was undergoing usability improvement by the DE team at press time
Publishing your big data mashup
The DE app comes in two flavors: a Desktop Client that runs on your LAN and a Shared Workspace in
Windows Azure that opens after you register with your DE invitation code. The Desktop Client and
Shared Workspace are capable of sharing saved snapshots of mashups at a particular date and time.
However, a problem with the Shared Workspace prevented saving snapshots to SQL Azure at press time.
The DE team is working on a fix for this problem.
Figure 5. (Click to enlarge.)
Shared Workspaces for all users or only those specified enable a download of CSV files of
resources, Excel workbooks, Data Explorer mashups and OData feeds for use or as resources for other
Data Explorer mashups.
You can open my public Shared Workspace with live data, as shown in Figure 5, and download the Data Explorer Mashup with live data from the codename “Social Analytics” to use as a starting point in creating the advanced mashup with summary data. You’ll need an Account Key, which you receive when you use your invitation to sign up for DE, to download the Social Analytics data.
Give Codename “Data Explorer” a test drive with this 41-step, illustrated tutorial on creating a mashup. I’m sure you’ll agree that the DE team has a killer big data tool in progress.
ABOUT THE AUTHOR:
Roger Jennings is a data-oriented .NET developer and writer, a Windows Azure MVP, principal
consultant of OakLeaf Systems and curator of the OakLeaf Systems blog. He's also the author of 30+ books on the
Windows Azure Platform, Microsoft operating systems (Windows NT and 2000 Server), databases (SQL
Azure, SQL Server and Access), .NET data access, Web services and InfoPath 2003. His books have
more than 1.25 million English copies in print and have been translated into 20+ languages.
This was first published in January 2012
Cloud Computing Strategies for the CIO

Join the conversationComment
Share
Comments
Results
Contribute to the conversation