After the exploratory analysis, it's time to search for insights on product reviews!
The aim here is to search for Product Features / Product Characteristics within reviews. Here my approach:
>> select product with most reviews
>> data engineering (text into numbers)
>> split the data set (train and test)
>> train the model (Logistic Regression)
>> plot SHAP value
>> re-run the model (nltk -- removing adjectives equal to "JJ")
Conclusions:
- The word "love" is the most powerful word defining a review rate
- Removing adjectives, the second most powerful word is "price"
- Product characteristics >> "easy to use", "great price", and "battery life"
- Impact of important dates >> "black friday"
Check code here >> https://github.com/zecakpm/NLP-product_reviews/blob/main/reviews_amzn_log_reg.ipynb
Comments