Introduction to correlation plots: 3 ways to discover data relationships

by | Oct 11, 2021 | Blockchain Services, Cyber Security Services, Data Science | 0 comments

By the way, these correlation methods are part of the ADS feature type system, and if you are new to it, then you might want to check out the post How feature types improve your data science workflow. In summary, the feature type system helps codify what your data represents. Doing this makes the time-consuming validation process for each new dataset run better and faster. It also has tools to compute custom statistics, visualize information, check accuracy via validator and warning systems, and select columns based on feature types.

Computing correlations

The EDA features in ADS speed up your analysis by providing methods to compute different types of correlations. You can choose among several different correlation techniques depending on your use case. Further, there are two sets of methods, one to return a dataframe with the correlation information and a partner method that generates a plot.

Which correlation technique you use depends on the type of data that you are working with. When using these correlation techniques, you will need to slice your dataframe so that the calculation uses only the appropriate feature types. Here’s a summary of the different correlation techniques and the data they use:

pearson: The Pearson correlation coefficient is a normalized measure of the covariance between two sets of data. In essence, it measures the linear correlation between the datasets. This method is used when both datasets consist of continuous values.
correlation_ratio: The Correlation ratio measures the extent to which a distribution is spread out within individual categories relative to the spread of the entire population. This metric is used to compare categorical variables to continuous values.