Organizations are often faced with the issue of analyzing massive amounts of data. This task can quickly become a logistic nightmare if solving the business problem requires simultaneously using data stored in very different formats, such as video files, PDFs, and machine data. The purpose of Scry-Collatio is to alleviate this pressure by carefully collecting, organizing, refining and munging this data into one accessible virtual location. Some of the highlighted features of Scry-Collatio are given below.


Most business problems require substantial amounts of data whose attributes have some qualitative aspects. An incomplete understanding of these aspects often leads to distorted data models, thereby producing erroneous results. Thus, our data scientists work with clients to combine domain expertise with database engineering to design a central coherent data model that is more accurate for solving the business problem. The Scry-Collatio platform provides a graphical user interface (GUI) and visualization functionalities for experts to view, improve quality, and transform raw data so as to achieve such a comprehensive data model.


Through the Scry-Collatio platform, the entire journey from disparate data sources to the comprehensive data model is automated and codified in a secure environment. The data elements are labeled so that any new, incoming data is automatically assigned appropriate security entitlements. This platform guarantees proper documentation of the corresponding data structures for various governance and audit needs and also ensures that any dataset can be viewed – or improved upon – by only those who are entitled to do so.


After the data has been processed by Scry-Collatio, it can be used for descriptive, predictive, and prescriptive statistical computations; machine learning; natural language processing; and information retrieval. These tasks can be either done manually or through Scry-Jidoka. In some cases, users may need simple visualization, whereas in others, actionable insights may be desired; indeed, this refined data can be used in all such cases.


Our computational platform includes distributed and parallel processing and uses our proprietary algorithms as well as Open Source software. Our proprietary algorithms are designed to maximize in-memory computation so as to reduce computation time and the number of transfers within the hierarchical memory (i.e., among random access, solid state, and disk memories).


Scry-Collatio is designed to handle the logistic nightmare of managing data (that has varying levels of volume, variety, velocity and veracity) so as to make it ready for analysis. This platform supports almost all data formats including raw files; relational and non-relational databases; streaming audio, video and images data; machine data; social networking data; emails data; JSON data; XML data; healthcare data; and various forms of textual and presentations data. Using hypergraphs and related structures, Scry-Collatio ensures that users can move seamlessly among various data formats. In addition, more than 25 different web crawlers and scrapers constitute one of the libraries in Scry-Collatio that is used to scrape data from public and private clouds and from enterprise networks.


We import privacy and security related settings from where the data has been originally sourced (so as to maintain permissions that are already in place). Furthermore, we create “persistent data structures” to ensure that such entitlements are maintained in subsequent versions unless they are modified by the client. Finally, we have a strict role based access for all data that is imported into our platform so as to impose custom restrictions (in addition to those present), thereby making sure that this data is secure and obeys regulatory compliance.


Once data has been loaded and collated, Scry-Collatio displays potential problems and noise that may exist in this data. For this purpose, this platform uses in-built transformation algorithms and depicts different outputs such as pie-charts, bar-charts, two dimensional graphs, and graphs with nodes and edges. Such depictions usually describe the dataset mapping, data quality distribution, and relationships among various objects. After studying these outputs, users can modify the original data so as to reduce noise and potential problems that were present in this data.


Scry-Collatio provides multiuser collaboration that allows (a) simultaneous editing of a dataset that is consistent, (b) sharing previously created scripts and transformations, (c) creating new versions of the dataset and saving it in a “persistent manner” for all entitled users, and (d) transforming the dataset on an individual basis and then saving it for individual use. Hence, this platform minimizes the time required by multiple users in an organization to do the same data transformations.