Data and AI for Trading
Abstract
In this thesis my objectives were to explore how data science and artificial intelligence canbe applied to make price predictions in the financial market. Methods include sentimental,technical, and fundamental analysis, by feeding these types of data into machine learning anddeep learning models like random forest, gradient boosting machine, support vector machine,gated recurrent unit and long-short term memory networks. It should be noted that successfulmethods within this application of artificial intelligence is extremely secretive for obvious reasons and is populated with a lot of fabricated information which has not been tested, verified,and used to earn actual money on the real market.Therefore, delving into this master thesis was like jumping into a jungle, not knowingwhat to look for, which data to use, which methods to use, what “good” looks like withinthis application and how to formulate the problem. It should also be noted that while mostprojects cannot be planned from start to finish without any changes, this project particularlyemphasizes experimenting and making decisions based on the outcomesPrice data was gathered from various financial markets like bonds, commodities, forex,and indices. Fundamental data was collected from forexfactory.com and sentimental data wascollected from The Global Database of Events, Language, and Tone (GDELT), a project thatfocuses on documenting and analyzing news into an archive. At first, I had no idea where togather data, how to gather data nor how far back I could get data. A lot of time was spenton literature review, planning and programming to retrieve data.My initial findings surprised me. I found that at least with the methods and data that Ihave tried so far, it is not sufficient to predict the financial markets, not even good enoughto surpass a monkey guessing if a price will go up or down, getting 50% accuracy. I do notbelieve that doing more of the same (for example gathering data on more assets) or usingmore advanced models will help, because I have over time tried the approach of more dataand more complex models without any luck. However, roughly six weeks before the deadlineof this thesis, I was able to make some progress.