Monday, August 29, 2011
Sunday, August 21, 2011
Microsoft Azure platform is supporting Hadoop implementations, SQL Server interoperability drivers for Hadoop has also been announced, Microsoft Research is trying to develop project Dryad, which are all movement towards building capabilities to support unstructured data. Graph databases are one of the prominently used database types in the world of unstructured data. Many would argue why relational database cannot be used to achieve the same what graph databases are used for ? Here is one of the answers for the same. Graph databases apply graph theory and once you understand the same you would find the reason why RDBMS cannot cater what Graph Databases can. Neo4j is one of the leaders in this area. Industry leaders like Google also have their own implementation of graph database know as Pregel.
sones GraphDB is one of the graph databases of choice for Microsoft professionals, as it is developed using .NET and is easily supported on Azure platform. Huge volume of unstructured data needs flexible compute and storage platforms like Azure cloud platform, and as it is using .NET framework behind the scenes, it is ideal to be hosted on Azure platform. You can access the technical datasheet from here and below is the architecture diagram of the same. Interestingly this brings a new query language for DB professionals, GraphQL !!
Sunday, August 14, 2011
SQL Server Denali is adding the following new features in the DB Engine to support storage and management of unstructured data:
1) Lots of performance and scale work in Full-Text Search!
2) Customizable NEAR in FTS
3) The ability to search only within document properties instead of the full document
4) Semantic Similarity Search between documents. This provides you the ability to answer questions such as: "Find documents that talk about the same thing as this other document!"
5) Better scalability and performance for FileStream data, including the ability to store the data in multiple containers
6) Full Win 32 application compatibility for unstructured data stored in a new table called FILETABLE. You create a Filetable and can drag and drop your documents into the database and run your favorite Windows applications on them (e.g., Office, Windows Explorer).
Tuesday, August 09, 2011
MS BI and Hadoop Integration using Hadoop Connectors for SQL Server and Parallel Data Warehouse to analyze structured and unstructured dataI'm reading: MS BI and Hadoop Integration using Hadoop Connectors for SQL Server and Parallel Data Warehouse to analyze structured and unstructured dataTweet this !
Still the interoperability facilitated by this connector, would empower SQL Server to extract data of interest from this ocean of data hosted in Hadoop environments, making MS BI stack even more powerful. Database Engines, ETLs as well as OLAP Engines would see bigger challenges than ever when clients start using Hadoop as a source for SQL Server, but my viewpoint is that it would mostly work other way round. These connectors are opening a door to the possibility where SQL Server based databases as well as data warehouses can/would be used in combination with Hadoop and MapReduce, effectively creating new opportunities for the entire ecosystem of database community from clients to technicians.
Sunday, August 07, 2011
Columnar Databases and SQL Server Denali : Marathon towards being world's fastest analytical databaseI'm reading: Columnar Databases and SQL Server Denali : Marathon towards being world's fastest analytical databaseTweet this !
Wednesday, August 03, 2011
When it comes to geospatial reporting, the first challenge is to associate two entities together – Data and Geography. Geography is usually represented on a map, and associating data to a map requires a geographical element in your data to associate the two entities with each other. Reporting geospatial data is not something new, but reporting it intelligently requires some reasonable effort, and in this Analyzer recipe, we would take a look at what is the difference between reporting geospatial data and reporting the same in an intelligent manner.
Analyzer has two fundamental reporting controls related to this discussion – Intelligent Map and Pivot Table. To create a geospatial report, I used the “Reseller Sales Amount” measure as the data, the “Geography” hierarchy of the Geography dimension from the AdventureWorks cube, and added the same to the Intelligent Map control. Side-by-side I added a Pivot Table control and added the same entities to it. Effortlessly I created a geospatial report with a lot of built-in features provided out-of-box. The Intelligent Map control consists of a reasonable number of different maps including the world map, which I have used in my report. Many additional maps are available free of charge on the Internet. Strategy Companion has a list of some of the web sites where you can find these maps, which use the Shapefile format (.shp extension) created by ESRI, a well-known GIS company.
The first question that may come to mind is why is the pivot table added to the report, when a map is already there? The Intelligent Map control actually is quite intelligent as we will see through the course of our demo. The first point of intelligence is that, just from the Geography hierarchy, this control has associated all the locations correctly on the map. When you would hover over a particular area, you can see the associated data value in the tool tip. The feature that makes me happy is that it provides an out-of-box drill-down feature. The reason for having the pivot table is that if the user intends to figure out the point of analysis, all information would be required at a glance and the user cannot be expected to hover everywhere. So the pivot table acts as the data coordinates for the geospatial representation of the report.
To take this to the next level, double-click the report to drill down to the next level in the hierarchy of the selected area. The difference in color shows the performance of the area and the same can be measured from the scale shown on the report. Analysts would generally use the very intuitive and visual approach of figuring out the area of interest based on the varying shades of colors (you can also choose to use several different colors such as red, yellow, and green) and then get into the numeric details. The pivot table is also capable of drilling into the data and you can get all the details from there. You may also choose to expose one region’s details on the map while simultaneously showing the details for another region on the pivot table.
Pivot tables (also commonly called grids) are generally used for slicing and dicing of data, and geospatial representation of the data is used for distribution analysis of data over a selected geography. One problem from the above report is that you would not find names represented on some areas, also some areas might be very small from a geography perspective. In the pivot table you would find a long list for State-Province under United States. So how do you associate these two? The answer is “Manually”, as there is no association between these two parts of the report. Ideally after drilling down the geography, data distribution has become large, so geospatial representation is convenient for users to select area to start slicing-dicing. But this needs both types of the report components to work in harmony.
With this comes the challenge of usability, interactivity and intelligence all at the same time. The Intelligent Map control is capable of addressing these challenges. In the below screenshot you can see that this control can be set to support slicing-and-dicing data as the primary objective. Also the scope of actions on this control can be specified, which gives the flexibility to associate the actions performed on this control on different parts of the report.
After configuring this control, check out the report. Drill down on “United States”, and you would find that not only the map has gotten drilled down to the lower level, but the pivot table also works in harmony with the selected geography. I selected “Colorado” and on the grid the same got selected readily. Users do not need to scroll long lists to locate the area which they selected on the map for analysis. With an interactive Intelligent Map and Pivot Table, both capable of drill-down and drill-through features, and capable of working in harmony, users almost have a gadget in the form of a report, to perform slicing-and-dicing driven by geospatial analysis.
From a Microsoft BI products perspective, the ingredients I would need to create this geo-spatial recipe are: SSRS Bing Maps control, Grids (SSRS Tablix / PPS Analytical Grid) with drill-down and drill through enabled and connected using Sharepoint webparts. And still making them work in the same harmony as shown above would not be as effortless as this. Of course, each platform has its own advantages and limitations. To explore what more Analyzer has to offer compared with other reporting tools, you can download an evaluation version of Analyzer from here.
Tuesday, August 02, 2011
Semantic Web, Unstructured Data and technologies that store, process, analyze, warehouse such data and extract intelligence out of the same is the new challenge that is at the horizon of the IT industry. Few front line IT majors have tsunami sized data generated everyday due to the virtue of popularity of social media, and such organizations like Google, Yahoo, Facebook, etc have already started wrestling with these challenges. The benefits of processing unstructured data, and driving your business based on the extracted intelligence are very clear from the example of these companies where they within a duration less than a decade, their advertising revenues are worth billions and still skyrocketing. All standard business house have lots of unstructured data like emails, discussion forums, corporate blogs, recorded chat conversations with clients etc. Organizations generally have aspirations to build a knowledge base for all the different areas of business functions, but when they start seeking consulting on the strategy to implement the same, they find themselves in a whirlpool of processes and financial burdens. Unstructured data has the potential to generate data for such knowledge base.
Still the question in your mind would be, what has this to do with SSIS and Azure? Regular applications of SSIS are known to everyone, and more than that professionals would be knowing that SSIS is not supported on Azure cloud platform, which might be the motivation of reading this post as the subject line reflects that is might be supported now. SSIS is an in-memory processing architecture, and implementing the same of shared / dedicated cloud environments has its own challenges. I am also sure that SSIS Team must be on its way to bring it to the crowds in the time to come. But my interest in on the future application of SSIS on unstructured data, and hosting it on Azure cloud platform.