Sentiment Analysis on Twitter Data Using Machine Learning Techniques
Overview:
Developed a comprehensive sentiment analysis pipeline to classify Twitter data into positive, negative, and neutral sentiments. This project combined advanced data preprocessing, feature extraction, and machine learning techniques to provide actionable insights into public opinion and trends.
​
Key Contributions:
-
Data Preprocessing: Implemented robust text preprocessing steps, including lowercasing, tokenization, lemmatization, and the removal of mentions, URLs, emojis, punctuation, stopwords, and hashtags, ensuring clean and standardized data for analysis.
-
Feature Extraction: Utilized two feature extraction methods:
-
Bag-of-Words (BoW) to capture word frequency.
-
GloVe Embeddings for semantic understanding by representing words as dense vectors.
-
-
Model Development: Trained and evaluated three classifiers:
-
Logistic Regression using Scikit-learn.
-
Support Vector Machine (SVM) for high-dimensional feature separation.
-
Long Short-Term Memory (LSTM) neural networks built with PyTorch for capturing temporal patterns in text data.
-
-
Model Evaluation: Assessed model performance using metrics like F1 Score and accuracy across multiple datasets. The GloVe-LSTM combination demonstrated superior performance, showcasing its robustness in handling sentiment classification tasks.
Results:
-
Achieved high reliability in sentiment classification, with the GloVe-LSTM model outperforming traditional methods in accuracy and consistency.
-
Demonstrated the importance of advanced feature extraction techniques and deep learning architectures for handling unstructured text data effectively.
Skills and Technologies:
-
Programming & Libraries: Python, PyTorch, Scikit-learn, NLTK
-
Machine Learning: Logistic Regression, SVM, LSTM
-
Natural Language Processing: Text preprocessing, feature extraction (BoW, GloVe embeddings)
-
Evaluation Metrics: F1 Score, Precision, Recall, Accuracy
-
Visualization: Matplotlib, Seaborn for sentiment distribution and model performance insights
Impact:
This project highlights expertise in natural language processing, machine learning, and deep learning. It showcases the ability to extract meaningful insights from unstructured text data, making it directly relevant to data-driven roles in analytics and AI.