Columns
Lists the columns in the step and enables you to review univariate and multi-variate statistics. By default, you will see univariate column profile.
Click on the dropdown on the top right corner, to choose the right column statistical option. Following options are supported.
- Column Profile
- MultiVariate Analysis
Univarate Column Information
- Select columns to see profile for, using the search option under SELECT COLUMN Section. Once you select the column to build the profile. A Univariate Profile Card is shown in the right panel. Each card shows variety of descriptive statistics based on the column type.
For Numerical Columns, The Card is shown as below:
[Numeric Column Profile Image]
Numeric Column Field Statistics
- Datatype
- Number Of Records
- Is unique flag
- Number of Distinct Values
- The Percentage of the distinct values
- Not Null values
- Cardinality
- Number of zeros(count)
- Percentage of zeros
- Minimum Value in the column
- Maximum Value in the column
- Average Value in the column
- Value that appears most often(Mode)
- Sum of all the values in the column
- Mean absolute Deviation (MAD)
- amount of variation or dispersion of a set of values. i.e the Standard Deviation.
- Variance
- Coefficient of Variation(CV) :A statistical measure of the relative dispersion of data points in a data series around the mean.
- The Interquartile Range (IQR)
- Skewness of the column.
For Categorical Columns(Text/String), The Card is shown as below:
[Categorical Column Profile Image]
Categorical Column Field Statistics
- Datatype of the column
- Number of Records
- Is unique flag
- Percentage of Unique Values
- Number of Distinct Values
- Percentage of Distinct Values
- Value that appears often (Mode)
- Memory Usage
- Cardinality
- Count of Not Null Values
With all these stats for categorical columns you will see an option to see all the values in that column with the name Uniques . Inside the Unqiues you will see the values and the count of number of times that particular value was seen. You will also be able to sort the values by their ascending and descending order.