Skip to main content

One post tagged with "monitoring"

View All Tags

Introducing Data Quality Monitor

· 7 min read
Cynepia Product Marketing

Background

In the era of Language Models and Advanced Artificial Intelligence Applications, need for reliable and accurate data has never been more important than now. Having a Comprehensive data and analytics platform has become non-negotiable need for a company of a certain size to acheive goals and benefits of these formidable new capabilities. Inability to access Data and Metadata seamlessly in a single pane is a major source of frustration in carrying out data driven digital transformation. Cobbled up point solutions often sold as best of breed have only added to challenges with integrating these solutions within one's data platform architecture. A Comprehensive data and analytics platform is therefore one of the key elements to success with data driven digital transformation.

Problem

In additon to platform challenges, Data teams face a variety of challenges in ensuring quality of the data products built by them and made available to the downstream users through the life cycle of the individual data assets/products. These data assets are often accumulated using 100s of upstream sources via source databases, SaaS systems via AP, Cloud Storages and more. The dynamic nature of the data itself along with movement from variety of systems have made troubleshooting data issues almost impossible, leading to longer down-times, frustrated data teams and loss of trust on data products.

One of the key challenges in trouble shooting such issues is lack of visibility of data changes often caused by upstream changes at source systems or somewhere during the journey of transformation. Effectively visibility can help ensure a better baseline reference profile for every data asset and mechanism to test for specific data tests (both syntax and semantics) of the new incoming data can help data teams react faster to the impending issue.

Solution

We are today introducing Xceed Dataset Monitors right within Xceed Data Catalog to help data teams get back in control over their data challenges. Data Engineers can now set data quality monitors for every incoming data and ensure that the necessary checks/tests are carried out every time new data arrives. Data Teams can create monitors using an easy to use GUI right from within the dataset details page. Data Teams can create multiple suites for individual downstream data product impact (for example dashboards created by downstream analyst or the data being used by a downstream data science team for an ML model).

Real-time monitoring and keeping all the stack holders informed ensures reduced downtime in event of upstream changes and ensures trust on end data products is never broken

Data Quality Monitoring Dashboard enables data teams track trends over time both at dataset as well as individual test levels. This further helps spot repetitive non-reliable tables/columns over time, helping stackholder teams to prioritize and take effective actions to improve the overall quality.

In Summary, Some of the key benefits of our approach to data observability/monitoring are as below:

  1. Inline with the data arrival critical to reduce actual downtime.

  2. Support for No code interface drop in right within the data catalog, lowers the bar to add/modify data quality tests/monitoring rules.

  3. Integrated approach ensures, you don't need another out-of-band data observability or monitoring tool.

  4. Single interface to bring all data users together. Keep every one informed in real time as data is refreshed.

  5. 360 view of all data artifacts and operations right from within the single application interface. Data teams now have ability to monitor datasets/columns with consistent issues

Key Features

  1. Cynepia Data Quality Monitors are Engine Independent, it works with all the supported engines including Spark, Pandas.

  2. Leverages Existing Data Profile for the dataset thereby optimizing compute usage.

  3. Support for exhautive list of monitor rules both at dataset level and column level.

  4. Support for multiple notification channels including In-App Notification, Slack and Emails.

  5. Run History with Data Quality Metrics Trends to monitoring trends at an overall suite level and individual monitor/test level.

How It Works

To Create a Data Quality Monitoring Suite, You first need to first define a Monitoring Suite from the dataset details page in your Data Catalog. Defining a Monitoring Suite for a dataset is a three step process as shown below:

  1. Create a New Monitoring Suite

Create a New Monitoring Suite

  1. Add individual monitors/tests to the suite

Add Tests to the suite

  1. Add a list of slack channels/users to notify on every run

Add channels/users to Notify

  1. Click Finish to create a new monitoring suite. You have successfully created a new data quality monitoring suite. You can click run manually to trigger a fresh run from Existing tab.

Run Tests

Once the run is completed, results are now available via the Run History tab as seen below:

Run History

About Xceed Analytics

Xceed Analytics is an AI powered comprehensive enterprise data platform unifies all your data, analytics and AI use cases and products under a single unified platform. A comprehensive data and analytics Platform is therefore vital to success of business transformation journey as we ride the new wave of Artificial Intelligence and take advantages of this new promising technology in the transformation journey.

Benefits of a Comprehensive Data & AI Platform

There are enumerous benefits of a comprehensive end-to-end Data and AI Platform

  1. Central repository for all the data, workflows and models.

  2. Seamlessly Discover, Manage Data Quality and Govern all your data products/artifacts through a single pane.

  3. Remove data silos, keep every stackholder engaged and notified.

  4. Accelerate deriving value from their most valuable asset which is data.

  5. Enables enterprises to cut/optimize costs via No Integration stack. You no longer need to stitch individual services from multiple vendors.

  6. Simplicity of overall architecture helps in streamlining of the overall data and analytics process.

Technical Capabilities

Some of the key data tools included in Xceed Data and Analytics Platform include:

  1. Versioned, Governed and Fully Integrated Data Lake based on open standards such as Apache Parquet.

  2. Unified abstraction for all data producers. Supports multiple OLAP and compute engines

    • Duckdb, Apache Spark, Pandas, Ray
  3. All common access methods supported. Access/Configure and Monitor with your prefered access method

    • SQL or Dataframe or CLI or Python SDK
  4. No-code Data Integration. Supports most common databases, cloud storages and SAAS applications.

  5. Integrated Data Catalog with Extensive Data Discovery, Governance and Data Quality Test Features.

  6. Xceed SQL Workbench Enables analyst to carry out exploratory analysis via a visual interface. Supported Engines include duckdb, Apache drill, Apache Spark

  7. Xceed Workflows for No/Low Code Interface data transformation pipelines. Supported Engines include Apache Spark, Duckdb, Apache Drill for SQL, Pandas, Pyspark for dataframes.

  8. Xceed AutoML - Enable onboarding every day ML use-cases across Classification, Regression and Forecasting.

  9. Xceed Business Intelligence & Reporting Provides all common dashboarding features to build beautiful datastories/dashboards.

  10. Xceed Notifications Ensure all stackholders are notified

  11. Xceed Model Registry home to all ML Models.

  12. Xceed Python SDK/CLI Data users can now work via Xceed APIs and Command Line Interface besides the user interface as an alternate choice for interacting with Xceed Analytics.

  13. Microservices architecture enables scalability while providing seamless integration.

For More details on Xceed Analytics Architecture, refer to Our Architecture Page

About Cynepia Technologies

Cynepia Technologies provides comprehensive end to end data stack to help enterprises organize, connect, make sense of their data, stay connected with their insights, make faster, real-time decisions and ultimately grow your business.

To learn more about Cynepia and Xceed Analytics, visit our website

For demo or product inquiry, write to us at Product Marketing


Get the power of futuristic Data & AI Platform for your enterprise.