🎬 IMDB Movie Reviews Sentiment Analysis

This project performs sentiment analysis on the IMDB Dataset of 50,000 movie reviews. The dataset is labeled with binary sentiments: positive or negative.

We go through a full machine learning pipeline using Natural Language Processing (NLP) techniques to classify review sentiment.

Dataset

File: IMDB Dataset.csv
Columns:
- review: Text content of a user review
- sentiment: Label (positive or negative)

Project Steps

Data Loading
- Read and display the structure of the dataset.
Data Preprocessing
- Remove HTML tags, punctuation, stopwords
- Tokenization, lowercasing, and stemming
Exploratory Data Analysis
- Sentiment distribution
- Word clouds for positive and negative reviews
- Review length analysis
Text Vectorization
- Using TF-IDF for numerical feature extraction
Model Training
- Trained a Logistic Regression model
- Achieved accuracy over ~85% on test data
Evaluation
- Classification report
- Confusion matrix visualization

Visualizations

Sentiment distribution bar plot
Histogram of review lengths
Confusion matrix heatmap

Requirements

pip install pandas numpy matplotlib seaborn scikit-learn nltk

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
README.md		README.md
SentimentAnalysis.ipynb		SentimentAnalysis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎬 IMDB Movie Reviews Sentiment Analysis

Dataset

Project Steps

Visualizations

Requirements

About

Uh oh!

Languages

License

DSCmatter/SentimentAnalysis

Folders and files

Latest commit

History

Repository files navigation

🎬 IMDB Movie Reviews Sentiment Analysis

Dataset

Project Steps

Visualizations

Requirements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages