In supervised machine learning, you usually have an input X, which goes into your prediction function to get your Y^. You can then compare your prediction with the true value Y. This gives you your cost which you use to update the parameters θ.
Sentiment analysis (also known as opinion mining or emotion AI) refers to the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information.
So, let's start sentiment analysis using Logistic Regression
We will be using the sample twitter data set for this exercise.
Given a tweet, or some text, we can represent it as a vector of dimension V, where V corresponds to our vocabulary size. For example: If you had the tweet “I am learning sentiment analysis”, then you would put a 1 in the corresponding index for any word in the tweet, and a 0 otherwise. As we can see, as V gets larger, the vector becomes more sparse. Furthermore, we end up having many more features and end up training θ V parameters. This could result in larger training time and large prediction time. …