Removing outliers #22

Closed
opened 2025-11-02 00:01:39 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @grofte on GitHub (Jun 14, 2021).

Hello! Great content =]

But are you sure you want to remove outliers before feature engineering? E.g. if a feature has a power law distribution (as many do) then you would have outliers that are no longer outliers once you take the log of the feature.
Maybe you could add a warning or something. I makes sense to deal with outliers before your feature store but I wouldn't want to remove any outliers before having performed a thorough EDA. Now that I think about it the same goes for dealing with missing values. Of course we are talking MLOps so you might have meant that one should follow this guide once they have a model they are happy with but it seems more all encompassing what you have created.

Just a thought. Feel free to close this issue whenever you want.

Originally created by @grofte on GitHub (Jun 14, 2021). Hello! Great content =] But are you sure you want to remove outliers before feature engineering? E.g. if a feature has a power law distribution (as many do) then you would have outliers that are no longer outliers once you take the log of the feature. Maybe you could add a warning or something. I makes sense to deal with outliers before your feature store but I wouldn't want to remove any outliers before having performed a thorough EDA. Now that I think about it the same goes for dealing with missing values. Of course we are talking MLOps so you might have meant that one should follow this guide once they have a model they are happy with but it seems more all encompassing what you have created. Just a thought. Feel free to close this issue whenever you want.
Author
Owner

@GokuMohandas commented on GitHub (Jun 14, 2021):

@grofte This is a great point and I've added a note to make this a bit clearer. And you're right, it's definitely not a linear guide and I had quite a bit of trouble writing this section because EDA and transformations are back-and-forth processes. But for the sake of the lesson, I had to place them in separate lessons with caveat notes everywhere.

@GokuMohandas commented on GitHub (Jun 14, 2021): @grofte This is a great point and I've added a note to make this a bit clearer. And you're right, it's definitely not a linear guide and I had quite a bit of trouble writing this section because EDA and transformations are back-and-forth processes. But for the sake of the lesson, I had to place them in separate lessons with caveat notes everywhere.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/Made-With-ML#22