Back to DrivenData | Blog

Is using corpus allowed?

HI! I’m trying to develop a model, using corpus data to develop sentiment analysis on the review.

Is it ilegal? On one side of the coin flip, it’s external data which is forbidden, on other side of the coin flip, when we talk about external data, we mean about training data that is gained using anything except the given data. But corpus is not really training data, it’s more like a dictionary.

My corpus data is like dictionary, it’s contains the correct spelling and some sentiment analysis. It’s not really ‘phrases’ but ‘words’

Thank!

Hey @Realdeo, as long as the corpus is part of an open source package and freely available to all competitors, that is totally fine.

Thanks for asking.

May I know which part of the rule says so. You see, I’m a paranoid guy, in case you’re wondering =p