mirror of
https://github.com/GokuMohandas/Made-With-ML.git
synced 2026-03-09 07:12:37 -05:00
Which kind of model is better for keyword-set classification? #14
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @guotong1988 on GitHub (Dec 26, 2019).
There exists a similar task that is named text classification.
But I want to find a kind of model that the inputs are keyword set. And the keyword set is not from a sentence.
For example:
Another example:
Thank you.
@GokuMohandas commented on GitHub (Jan 3, 2020):
Hey @guotong1988 , you'll want to first gather enough data for the types of entities (fruit, vegetable etc.) that you care about. You can use an off-the-shelf set of embeddings (ex. GloVe) to train because these are common tokens and the embeddings for entities in the same class will already be clustered since they all used large, generic datasets to learn embeddings from.
In the second example, where you have labels like "Chinese fruit", you'll want to treat this as a multiclass classification problem (ex. output is [0, 1, 1, 0] instead of being one unique class [0, 1, 0, 0]. However, you can just make more classes like "fruit", "chinese fruit" but your model is going to start confusing classes because there will be a lot of overlap. You can also create two separate models to predict "fruit" and then "chinese" from the set of keywords but this is assuming every prediction has both labels.
Hope that helps.