Predicting the Stock Market: Sentiment Analysis

Charlie | January 15th 2018

Twitter is a vast ocean of data, users ‘tweet’ 140 characters to instantly inform the world of their opinion on any topic of their choice. This is often something trivial such as the quality of their breakfast. However, twitter should not be underestimated as a knowledge source, often news stories break on twitter before traditional media outlets can report them. This real-time property of twitter, along with the number of people using it, can very useful in order to know instantly the general public opinion on a certain topic.

The price of stock is based on the most basic economic law, supply and demand. It’s value on the market is dependent on how much people are willing to pay for it. For example if a company hires a new CEO with a brilliant reputation of bringing value to a company, the price will go up because more people expect him to improve the company. So if we assume that stock price is dependent on the reputation and actions of the company, finding out about the reputation and actions of a company before anyone else should give an advantage when predicting the future value of stock.

The aim of this project was to take advantage of twitter’s aggregated source of real-time data in order to extract knowledge on companies. These insights could then be used to predict the rise or fall of a company’s stock. In order to succeed we needed to correctly execute three steps:

Correctly mine relevant data.
Correctly extract knowledge from the data.
Use machine learning to correctly the predict the effect of the knowledge on stock price.

To obtain the information required, we mined data from both Twitter and the Google Finance APIs. Parsing the data from Twitter and applying pattern recognition enabled us to extract the tweets solely related to the companies of interest.

We then implemented a sentiment analysis algorithm using SentiWordNet’s sentiment API on the tweets, which generated an automated numerical value for the positivity of each tweet. Sentiment analysis is an area of natural language processing which itself is a field of artificial intelligence. It is a field that looks at how computers can understand and extract opinion from the human language.

We were then equipped with aggregated sentiment values for each company and information on how the stock price was changing. To compare these trends, we implemented spiked triggered average (STA), an analysis technique commonly used in computational neuroscience. STA studies the relationship of a recorded continuous signal preceding an event to see if there is a correlation between this signal and the event. We used it to study the correlation between the stock price and positive or negative tweets.

If you would like to find out more about this project, please contact us.