# DataGenX

My e-Notes about Cloud, K8s, OpenShift, DataScience, Machine Learning, Python, Data Analytics, DataStage, DWH and ETL Concepts

## Friday, 12 May 2017

Continue from -
'Measuring Data Similarity or Dissimilarity #1'
'Measuring Data Similarity or Dissimilarity #2',

### 3. For Ordinal Attributes:

Ordinal attribute is an attribute with possible values that have a meaningful order or ranking among them but the magnitude between successive values is not known. Ordinal values are same as Categorical Values but with the Order.

Such as, For "Performance" columns Values are - Best, Better, Good, Average, Below Average, Bad

These values are Categorical values with order or rank so called Ordinal Values. Ordinal attributes can also be derived from discretization of numeric attributes by splitting the value range into finite number of ordered categories.

We assign rank to these categories to calculate the similarity or dissimilarity, i.e. - There is an attribute f having N possible state can have 1, 2, 3........f_N ranking.

#### How to Calculate Similarity or Dissimilarity:

1, Assign the Rank R_ifto each category of attribute f having N possible states.
2. Normalize the Rank between [0.0, 1.0] so that each attribute have equal weight.
Can be calculated as

R_in = \frac{R_if - 1}{N - 1}

3. Now Similarity or Dissimilarity can be calculated with any distance measuring techniques. ( 'Measuring Data Similarity or Dissimilarity #2)

## Disclaimer

The postings on this site are my own and don't necessarily represent IBM's or other companies positions, strategies or opinions. All content provided on this blog is for informational purposes and knowledge sharing only.
The owner of this blog makes no representations as to the accuracy or completeness of any information on this site or found by following any link on this site. The owner will not be liable for any errors or omissions in this information nor for the availability of this information. The owner will not be liable for any losses, injuries, or damages from the display or use of his information.