One of my friend has shared an very good info about CoSort Acceleration with BIG Data ==> By David Friedland. I would like to share to my folks who are actively learning/gaining info/knowledge from the blogspot..
There are a number of business intelligence tools available today than can transform raw data into meaningful information. Because this process can be complex and involve large volumes of data however, it makes sense to use the right technologies at each step in the process … tools and techniques that combine well to deliver the fastest, most accurate results for business decision making, and make the process of metadata management and report design simpler and more efficient.
Founders of modern data warehousing like Ralph Kimball have long recommended the external preparation of big transaction data outside the database and ETL tool layer, in sequential (flat) files. This approach removes a regular, resource-draining overhead from systems designed for queries and integrations, and provides a central, platform-independent way to exchange and manage data. The same approach for business intelligence (BI) tools — called data franchising — has been espoused for years by experts like Richard Sherman for logically related reasons:Data Franchising on Information Management. In addition to better performance, Sherman points out that pre-staging data to be visualized avoids the inherent metadata complexity, redundancy and synchronization problems of performing transformations in the BI layer.
One of the best examples of BI optimization through technology combination is the use of IRI CoSort to franchise, or prepare, high volumes of transactional data for data visualization outside MicroStrategy’s Business Intelligence platform. MicroStrategy’s platform delivers visual information on cumulative financial performance, strategy management and business intelligence, typically for analytic purposes. CoSort’s Sort Control Language (SortCL) data transformation program simultaneously filters, sorts, joins, aggregates, segments and reformats data into subset CSV and other popular ‘feed’ files outside the BI layer.
As you would expect, staging data in flat files with CoSort — especially within a modern, metadata-managing, data-federating GUI like the IRI Workbench built on Eclipse — avoids the data integrity and reconciliation issues Sherman describes. Moreover, when high volumes of data are prepared in advance with CoSort, MicroStrategy users can get their business intelligence information in at least half the time it would otherwise take (using MicroStrategy alone).
A simple laptop-based benchmark running MicroStrategy under Windows XP SP3 on an Intel® Core™ CPU M380 @2.53GHz with 3 GB of DDR3 memory bears this out. When the necessary data transformations of sorting, joining, and aggregation were performed in MicroStrategy for creating a report based on source data in 100MB and 50MB flat files, the entire job took 193 seconds:
Accelerating MicroStrategy with SortCL is straightforward, especially if you are familiar with Eclipse, or at least the structure of your source data. The IRI Workbench GUI for CoSort users is a convenient, graphical environment for discovering, defining, managing, and transforming data in flat files and/or data in RDBMS tables with SortCL. But whether you define the SortCL data definitions or job specifications in text files or with the GUI, the self-documenting nature of the 4GL makes it easy to create and maintain both forms of metadata. The free GUI merely automates many of these tasks:
The same transforms run in this one CoSort’s Sort Control Language (SortCL) program took 23 seconds on the same laptop, with the results available to multiple applications. In this case, the MicroStrategy user who created the same report (above in 193 seconds) only needed another 77 seconds to run this report, meaning that the start-to-finish time using CoSort was only 100 seconds.
For more information on using CoSort for standalone or accelerated business intelligence, please see the Business Intelligence section under solutions found here http://www.iri.com/solutions/business-intelligence and the other articles in the BI section of the IRI blog site.