Categories
Market Research Marketwatch

Improving Financial Sentiment Analysis with Machine Learning and Proxy Servers

Written by Vladimir Fomenko, the founder & CEO of Infatica.io, a global peer-to-business proxy network offering robust networking solutions at both enterprise and individual levels.

In trading, the facts that the trader learns before others can give them an advantage and bring money-making opportunities. The biggest success comes to traders who know how to:

  • Accumulate information, and
  • Interpret it.

A critical subset of these facts is sentiment data — information about how people react to the given product, event, idea, etc. The essential categories here are “understand positively” and “understand negatively”.

Until recently, sentiment facts weren’t quantifiable: It was not possible to measure human sentiments precisely. With the advent of natural language processing and machine learning, however, this task has finally become attainable.

In this article, we’ll explore how you can make use of sentiment analysis and web scraping to be successful in trading.

Overview of sentiment analysis

It is really hard to be up-to-date with all news and rumors for every person, even for high quality professionals. But that information often affects the prices of individual securities and the market as a whole. Here’s a typical example:

  • Because of the coronavirus outbreak, the government of Country A decides to hold meetings online instead of in-person.
  • Video Conferencing Software B is one of the most popular video conferencing solutions on the market, so the markets are expecting Software B to acquire a plethora of new users.
  • Software B’s rise in popularity makes the price of its stocks really high .

This example actually reflects Zoom stock’s recent rise:

Analyzing the data like this, including news and rumors, can be automated. Special software could analyze the news and tell the trader: “Buy Zoom stocks”

Tesla’s stock jumped 2.5% after Tencent said it amassed a 5% stake in the electric car maker. Ocwen jumped 12% premarket after disclosing it reached a deal with New York regulators that will end third-party monitoring of its business within the next three weeks. In addition, restrictions on buying mortgage-servicing rights may get eased. Cara Therapeutics’s shares surged 16% premarket, after the biotech company reported positive results in a trial of a treatment for uremic pruritus.

Let’s take another example – the infamous tweet of Elon Musk: “Tesla stock price is too high imo”.

The tweet caused Tesla stock price depreciation — look how it fell dramatically on the 1st of May:

The system that makes sentiment analysis possible is called natural language processing (or NLP for short.) NLP algorithms are made to analyze the meaning behind articles or news in natural (i.e. human-made: English or Spanish) languages.

It is a lot of effort to build and implement NLP algorithms; however, they offer a plenty of advantages:

  • NLP algorithms have superfast reaction time: They execute commands in mere milliseconds and work always.
  • They also offer scalability: their “expertise” can be applied to — given enough computing resources — to every source of financial data.

How does sentiment analysis work?

Every text (no matter its size) has a certain attitude: either positive, negative, or neutral. Sentiment analysis aims to see the angle of the given text (in most cases, of individual phrases and sentences) via dividing it into individual words (named tokens), analyzing their attitude, and so determining the “angle” of the target text.

This principle could seem confusing, therefore let’s fiddle with this technology ourselves.

Python programming language has an NLP-focused library called NLTK (Natural Language Toolkit). This website features an interactive implementation of NLTK’s sentiment analysis algorithm. Try inputting different sentences to see how the algorithmic will react.

Let’s check these sentences:

  • “This project is a great tool for processing raw data.” The algorithmic rule determines that this text is positive.
  • “This project will change the tech landscape.” The algorithmic rule determines that this text is neutral.
  • “This project didn’t live up to its potential.” The algorithmic rule determines that this text is negative.

Shortcomings of sentiment analysis algorithms

In the paragraphs above, we passed sentences with rather simple meanings to the interactive prompt: Words like “great” and “fail” sometimes mark the complete context of the sentence. But what if the sentence was more difficult? Let’s take a look.

For instance, let’s take the phrase: “The industry has seen higher days.” The algorithmic rule determines that this text is neutral. This example show that non-machine learning NLP algorithms have a tough time parsing implicit meanings:

  • Nuanced phrases,
  • Idioms,
  • Metaphors, etc.

Enhancing sentiment analysis with machine learning

That is why we need machine learning. Developers can train an algorithm on countless examples to make it “understand” the text’s context. This is how it goes:

  • Collect a dataset that focuses on financial sentiment texts.
  • Mark up each text’s sentiment.
  • Build a sentiment analysis model that is optimized for “financial language”.

The basis for a machine learning algorithm lies in the immense amount of information to train on: In our case, the algorithm would analyze news headlines and social media captions to try and see the correlations between texts and also the meanings behind them. Given enough training material, the code will “learn” (hence the name, machine learning) about the context related to the given text.

David Wallach, creator of various financial data scrapers, echoes the shortcomings of traditional (non-deep learning) algorithms:

One main objective of this project is to classify the sentiment of companies based on verified user’s tweets as well as articles published by reputable sources. Using current (free) text based sentiment analysis packages such as nltk, textblob, and others, I was unable to achieve decent sentiment analysis with regards to investing.

 

For example, a tweet would say Amazon is a buy, you must invest now and these libraries would classify it as negative or neutral sentiment. This is due to the training sets these classifiers were built on. For this reason, I decided to write a script that takes in the json representation of the database downloaded from the Firebase console (using export to JSON option) and lets you manually classify each sentence.

So the data is very important in the sentiment analysis workflow. But where to get this data?

Overview of web scraping

There is a method which helps to gather information for sentiment analysis, which is called web scraping.

It is the process of extracting and organizing data from websites.

How does web scraping work?

The method is reliable because of the same way all websites organize their data. Every website component — text, link, image, dynamic functionality, and so on — belongs to its respective category, denoted by standardized HTML tags.

A web scraper can navigate these elements with ease, locating and saving the data you need to gather.

NLP applications in FinTech

So how exactly do we scrape data? There is some software for it. For instance, Stocker, software for scraping financial data, follows these steps:

  • It creates google queries, grabbing the latest articles that focus on a particular company.
  • Then, it parses the articles for information, trying to detect whether important pieces of information are positive or negative.

Sentiment analysis is useful in other sectors as well:

Credit score analysis. A tool named LenddoScore can process the available data of loan applicants which is stored online, like social media profiles, browsing behavior, browsing history, and other markers. The software then rates the borrower’s creditworthiness.

Contracts analysis. JP Morgan has implemented a plethora of machine learning algorithms for numerous tasks. The company tested an NLP algorithm designed for contract analysis — and it has managed to save 360,000 man-hours in a year.

Customer service. Chatbots are powered by NLP algorithms. It is very useful to implement it not to answer some basic questions all the customers have, over and over.

Using proxies to ensure that your analysis runs successfully

However, there are websites which are blocking web scraping for different reasons. Here’s a typical example: a price aggregator tries to collect price data from multiple e-commerce businesses. Once this data is published on the aggregator website, potential customers will see that Vendor M offers the best price. To prevent this, other vendors may not allow scraping their websites whatsoever.

So try to use advanced web scraping tools, not the cheapest bots, otherwise information for trading might be not reliable.

How to web scrape sites which do not allow it? Using anti-bot systems of course, like proxy.

Out of all the numerous proxy types, residential proxies are the optimal solution: as their name suggests, they allow your scraper to appear as a real user, a resident of the country you selected. This enables you to pass through anti-scraping systems on most of the websites.

Conclusion

I am sure that  using proxy servers for sentiment analysis can improve your performance as a trader. Please comment if you ever tried it.

In trading, the facts that the trader learns before others can give them an advantage and bring money-making opportunities. The biggest success comes to traders who know how to:

  • Accumulate information, and
  • Interpret it.

A critical subset of these facts is sentiment data — information about how people react to the given product, event, idea, etc. The essential categories here are “understand positively” and “understand negatively”.

Until recently, sentiment facts weren’t quantifiable: It was not possible to measure human sentiments precisely. With the advent of natural language processing and machine learning, however, this task has finally become attainable.

In this article, we’ll explore how you can make use of sentiment analysis and web scraping to be successful in trading.

Overview of sentiment analysis

It is really hard to be up-to-date with all news and rumors for every person, even for high quality professionals. But that information often affects the prices of individual securities and the market as a whole. Here’s a typical example:

Because of the coronavirus outbreak, the government of Country A decides to hold meetings online instead of in-person.

Video Conferencing Software B is one of the most popular video conferencing solutions on the market, so the markets are expecting Software B to acquire a plethora of new users.

Software B’s rise in popularity makes the price of its stocks really high .

This example actually reflects Zoom stock’s recent rise.

Analyzing the data like this, including news and rumors, can be automated. Special software could analyze the news and tell the trader: “Buy Zoom stocks”

Tesla’s stock jumped 2.5% after Tencent said it amassed a 5% stake in the electric car maker. Ocwen jumped 12% premarket after disclosing it reached a deal with New York regulators that will end third-party monitoring of its business within the next three weeks. In addition, restrictions on buying mortgage-servicing rights may get eased. Cara Therapeutics’s shares surged 16% premarket, after the biotech company reported positive results in a trial of a treatment for uremic pruritus.

Let’s take another example – the infamous tweet of Elon Musk: “Tesla stock price is too high imo”.

The tweet caused Tesla stock price depreciation — look how it fell dramatically on the 1st of May.

The system that makes sentiment analysis possible is called natural language processing (or NLP for short.) NLP algorithms are made to analyze the meaning behind articles or news in natural (i.e. human-made: English or Spanish) languages.

It is a lot of effort to build and implement NLP algorithms; however, they offer a plenty of advantages:

NLP algorithms have superfast reaction time: They execute commands in mere milliseconds and work 24/7.

They also offer scalability: their “expertise” can be applied to — given enough computing resources — to every source of financial data.

How does sentiment analysis work?

Every text (no matter its size) has a certain attitude: either positive, negative, or neutral. Sentiment analysis aims to see the angle of the given text (in most cases, of individual phrases and sentences) via dividing it into individual words (named tokens), analyzing their attitude, and so determining the “angle” of the target text.

This principle could seem confusing, therefore let’s fiddle with this technology ourselves.

Python programming language has an NLP-focused library called NLTK (Natural Language Toolkit). This website features an interactive implementation of NLTK’s sentiment analysis algorithm. Try inputting different sentences to see how the algorithmic will react.

Let’s check these sentences:

“This project is a great tool for processing raw data.” The algorithmic rule determines that this text is positive.

“This project will change the tech landscape.” The algorithmic rule determines that this text is neutral.

“This project didn’t live up to its potential.” The algorithmic rule determines that this text is negative.

Shortcomings of sentiment analysis algorithms

In the paragraphs above, we passed sentences with rather simple meanings to the interactive prompt: Words like “great” and “fail” sometimes mark the complete context of the sentence. But what if the sentence was more difficult? Let’s take a look.

For instance, let’s take the phrase: “The industry has seen higher days.” The algorithmic rule determines that this text is neutral. This example show that non-machine learning NLP algorithms have a tough time parsing implicit meanings:

  •  Nuanced phrases,
  •  Idioms,
  • Metaphors, etc.

Enhancing sentiment analysis with machine learning

That is why we need machine learning. Developers can train an algorithm on countless examples to make it “understand” the text’s context. This is how it goes:

Collect a dataset that focuses on financial sentiment texts.

Mark up each text’s sentiment.

Build a sentiment analysis model that is optimized for “financial language”.

The basis for a machine learning algorithm lies in the immense amount of information to train on: In our case, the algorithm would analyze news headlines and social media captions to try and see the correlations between texts and also the meanings behind them. Given enough training material, the code will “learn” (hence the name, machine learning) about the context related to the given text.

David Wallach, creator of various financial data scrapers, echoes the shortcomings of traditional (non-deep learning) algorithms:

One main objective of this project is to classify the sentiment of companies based on verified user’s tweets as well as articles published by reputable sources. Using current (free) text based sentiment analysis packages such as nltk, textblob, and others, I was unable to achieve decent sentiment analysis with regards to investing.

For example, a tweet would say Amazon is a buy, you must invest now and these libraries would classify it as negative or neutral sentiment. This is due to the training sets these classifiers were built on. For this reason, I decided to write a script that takes in the json representation of the database downloaded from the Firebase console (using export to JSON option) and lets you manually classify each sentence.

So the data is very important in the sentiment analysis workflow. But where to get this data?

Overview of web scraping

There is a method which helps to gather information for sentiment analysis, which is called web scraping.

It is the process of extracting and organizing data from websites.

How does web scraping work?

The method is reliable because of the same way all websites organize their data. Every website component — text, link, image, dynamic functionality, and so on — belongs to its respective category, denoted by standardized HTML tags.

A web scraper can navigate these elements with ease, locating and saving the data you need to gather.

NLP applications in FinTech

So how exactly do we scrape data? There is some software for it. For instance, Stocker, software for scraping financial data, follows these steps:

It creates google queries, grabbing the latest articles that focus on a particular company.

Then, it parses the articles for information, trying to detect whether important pieces of information are positive or negative.

Sentiment analysis is useful in other sectors as well:

Credit score analysis.  A tool named LenddoScore can process the available data of loan applicants which is stored online, like social media profiles, browsing behaviour, browsing history, and other markers. The software then rates the borrower’s creditworthiness.

Contracts analysis.  JP Morgan has implemented a plethora of machine learning algorithms for numerous tasks. The company tested an NLP algorithm designed for contract analysis — and it has managed to save 360,000 man-hours in a year.

Customer service. Chatbots are powered by NLP algorithms. It is very useful to implement it not to answer some basic questions all the customers have, over and over.

Using proxies to ensure that your analysis runs successfully

However, there are websites which are blocking web scraping for different reasons. Here’s a typical example: a price aggregator tries to collect price data from multiple e-commerce businesses. Once this data is published on the aggregator website, potential customers will see that Vendor M offers the best price. To prevent this, other vendors may not allow scraping their websites whatsoever.

So try to use advanced web scraping tools, not the cheapest bots, otherwise information for trading might be not reliable.

How to web scrape sites which do not allow it? Using anti-bot systems of course, like proxy.

Out of all the numerous proxy types, residential proxies are the optimal solution: as their name suggests, they allow your scraper to appear as a real user, a resident of the country you selected. This enables you to pass through anti-scraping systems on most of the websites.

Conclusion

I am sure that using proxy servers for sentiment analysis can improve your performance as a trader. Please comment if you ever tried it.

Written by me, Vladimir Fomenko, the founder & CEO of Infatica, global peer-to-business proxy network

Leave a Reply

Your email address will not be published. Required fields are marked *