SYS-CON MEDIA Authors: Kevin Benedict, Jason Bloomberg, David H Deans, RealWire News Distribution, Gilad Parann-Nissany

Blog Feed Post

Difference between Anomalies and Outliers

Outlier = legitimate data point that’s far away from the mean or median in a distribution

Anomaly = illegitimate data point that’s generated by a different process than whatever generated the rest of the data

Ravi Parikh has written a very interesting blog on this topic - Garbage In, Garbage Out: How Anomalies Can Wreck Your Data. The blog talks more about anomalies and how to detect them through proper visualization technique. He gives an example of detecting election fraud through the following visualization:

anomalies_election_fraud

Do read the full post!

Interesting Question

Do you have the capability to assess data quality? Or even suggest appropriate analysis visualizations to help distinguish between Anomalies and Outliers? … Vijay Ghei

 


Read the original blog entry...

More Stories By Udayan Banerjee

Udayan Banerjee is CTO at NIIT Technologies Ltd, an IT industry veteran with more than 30 years' experience. He blogs at http://setandbma.wordpress.com.
The blog focuses on emerging technologies like cloud computing, mobile computing, social media aka web 2.0 etc. It also contains stuff about agile methodology and trends in architecture. It is a world view seen through the lens of a software service provider based out of Bangalore and serving clients across the world. The focus is mostly on...

  • Keep the hype out and project a realistic picture
  • Uncover trends not very apparent
  • Draw conclusion from real life experience
  • Point out fallacy & discrepancy when I see them
  • Talk about trends which I find interesting
Google