Building an Aspect Based Sentiment Analysis System

Introduction

In the fast-evolving world of Natural Language Processing (NLP), the field of sentiment analysis has expanded beyond simple positive or negative classifications to more nuanced interpretations that consider specific elements of content. Aspect-based sentiment analysis (ABSA) is at the forefront of this advancement, focusing on understanding the sentiment related to specific aspects within a text. In this blog post, I delve into our project on building an ABSA system using advanced machine learning techniques and the IMDB movie review dataset.

What is Aspect Based Sentiment Analysis?

Aspect Based Sentiment Analysis (ABSA) is a sub-discipline of sentiment analysis that not only discerns the general sentiment (positive, negative, neutral) expressed in text but also links this sentiment to specific aspects or features discussed in the text. For example, a movie review might say, “The storyline was captivating, but the cinematography was underwhelming.” Here, ABSA would identify ‘storyline’ and ‘cinematography’ as aspects and associate positive sentiment with the former and negative with the latter.

Why Focus on Movie Reviews?

Movie reviews are particularly challenging and rich for sentiment analysis. They contain a complex mix of emotions and viewpoints, often discussing multiple elements like acting, directing, screenplay, and music. This makes movie reviews an ideal candidate for developing and refining ABSA systems.

Our Approach

Data Set and Preprocessing

We utilized the IMDB movie review dataset, a popular resource for training sentiment analysis models that includes 50,000 reviews labeled as positive or negative. Our first step involved preprocessing this data, which included removing HTML tags, URLs, and non-standard characters, followed by sentence clause extraction and text lemmatization to refine the text for further analysis.

Models Used

We employed two main models for our analysis:

Convolutional Neural Networks (CNNs): Known for their ability to detect spatial hierarchies in data, CNNs were used to parse the reviews for general sentiments. They performed well, achieving an accuracy of 89%.
Long Short-Term Memory Networks (LSTMs): These are adept at capturing long-term dependencies in text data, crucial for reviews where context plays a significant role. The LSTM model showed an accuracy of 87%.

Aspect Extraction and Sentiment Classification

For extracting aspects, we used Latent Dirichlet Allocation (LDA) to identify prevalent topics within the reviews. We then classified these aspects into categories such as actors, plot, music, and director, using our trained models to assign sentiment scores.

Challenges and Innovations

One of the primary challenges was the integration of aspect extraction with sentiment analysis. Aspect extraction required pinpointing specific phrases within large text blocks and determining their sentiment contextually. We tackled this by combining LDA for aspect identification with neural networks for sentiment classification, which proved effective but computationally intensive.

Experiments and Findings

We conducted several experiments to compare model performance and the impact of different tokenization methods on these models. Our findings showed that while both CNN and LSTM are capable of performing sentiment analysis effectively, CNNs slightly outperformed LSTMs in our tests.

Conclusion

The project demonstrated the potential of using advanced machine learning techniques for aspect-based sentiment analysis. Although there were challenges, particularly in integrating different NLP techniques effectively, the outcomes highlighted the nuanced capabilities of ABSA in dissecting and understanding complex sentiment expressions within movie reviews.

Future Work

Looking ahead, we aim to refine our models to better handle the nuances of aspect identification and to explore the integration of more sophisticated tokenization techniques, such as those offered by BERT, to enhance our system’s understanding of context and sentiment.

We invite feedback and discussions on this project, so please share your thoughts and comments below!

Link to the Report

Link to the Code