|By Udayan Banerjee||
|May 8, 2014 12:23 AM EDT||
Outlier = legitimate data point that’s far away from the mean or median in a distribution
Anomaly = illegitimate data point that’s generated by a different process than whatever generated the rest of the data
Ravi Parikh has written a very interesting blog on this topic - Garbage In, Garbage Out: How Anomalies Can Wreck Your Data. The blog talks more about anomalies and how to detect them through proper visualization technique. He gives an example of detecting election fraud through the following visualization:
Do read the full post!
Do you have the capability to assess data quality? Or even suggest appropriate analysis visualizations to help distinguish between Anomalies and Outliers? … Vijay Ghei