Understanding our data
We all come across large number of unstructured data in our everyday, like emails, social media posts or product reviews, and these must contain valuable information for us that could be used. These can represent insights into sentiment of people, which can be worthy of analysis to get to know their general perception of our products, services or even us. I have encountered several sentiment analyses prepared by people when researching for this article.
Let me show you what is a sentiment analysis. A good example could be showed through the 2020 US election and the perception of the two candidates, Donald Trump and Joe Biden. People analyzed tweets (as it is really popular in the US) in which either Donald Trump or Joe Biden was mentioned, whether these tweets’ sentiment was positive, negative, neutral or mixed. Based on this sentiment analysis and the geographical location of the sent tweets we could see which state preferred which candidate. It is important to note, that its external validity is questionable as we cannot be sure that either side was represented properly.
In order to do such an analysis I would suggest using Amazon Comprehend. Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to find insights and relationships in text, which requires no machine learning experience.
Other than the above-described sentiment analysis, Amazon Comprehend have the following really useful and interesting features:
- Entity Detection,
- Key Phrases Extraction,
- Language Detection,
- Topic Modelling on Large Document Collection, etc.
- +1 Amazon Comprehend Medical
Amazon Comprehend can analyze a collection of documents and other text files (such as social media posts) and automatically organize them by relevant terms or topics. These topics are valuable as personalized content is provided to the customers or richer search and navigation could be provided.
Amazon Comprehend Medical identifies medical information, such as health condition or medication, and determines their relationship (i.e., what kind of medicine and the usage frequency). It could provide context to analysts by selecting a disease and , such as whether a patient has tested positive or negative
As it can be seen on the picture, Amazon Comprehend automatically extracts the key phrases, entities, sentiments, etc. which data will be used for the analysis. In the Master’s program I am participating we have been using R for data analysis, therefore, I will show you how to use these features in R.
First we need to set up the AWS
keyTable <- read.csv("XYaccessKeys.csv", header = T) # XYaccessKeys.csv == the CSV downloaded from AWS containing your Acces & Secret keys
AWS_ACCESS_KEY_ID <- as.character(keyTable$Access.key.ID)
AWS_SECRET_ACCESS_KEY <- as.character(keyTable$Secret.access.key)#activate
Sys.setenv("AWS_ACCESS_KEY_ID" = AWS_ACCESS_KEY_ID,
"AWS_SECRET_ACCESS_KEY" = AWS_SECRET_ACCESS_KEY,
"AWS_DEFAULT_REGION" = "eu-west-1")#loading in the required package
Let’s see some use cases with codes for several features by using R:
“It is not reprehensible for anyone to sneeze anywhere. Peasants sneeze and so do police superintendents, and sometimes even privy councillors. All men sneeze. Tchervyakov was not in the least confused, he wiped his face with his handkerchief, and like a polite man, looked round to see whether he had disturbed any one by his sneezing.”
In order to get to know whether the above text (A snippet from one of Anton Chekhov’s short story) is positive, negative or neutral please see the code in R below.
detect_sentiment("It is not reprehensible for anyone to sneeze anywhere. Peasants sneeze and so do police superintendents, and sometimes even privy councillors. All men sneeze. Tchervyakov was not in the least confused, he wiped his face with his handkerchief, and like a polite man, looked round to see whether he had disturbed any one by his sneezing.")
In this case we will get the following result:
The result is quite straightforward. The sentiment of the text is negative and it states that what percentage is mixed, negative, neutral or positive. In our case more than 58% of the text is deemed negative.
We have the following three sentences in Russian, Tamil and Swiss German. Now let’s see whether Amazon Comprehend recognizes the language.
- Минэкономразвития предлагает мобилизовать деньги бизнеса, банков, населения и ЦБ
detect_language("Минэкономразвития предлагает мобилизовать деньги бизнеса, банков, населения и ЦБ")
- கொரோனா வைரஸால் பாதிக்கப்படுவோரின் எண்ணிக்கை இந்தியாவில் குறைந்து கொண்டிருக்கிறது.
detect_language("கொரோனா வைரஸால் பாதிக்கப்படுவோரின் எண்ணிக்கை இந்தியாவில் குறைந்து கொண்டிருக்கிறது.")
- Schwyzerdütsch isch ä Sommelbezeichnig fyr diejenige alemannische Dialekte, wu in dr Schwyz un in Liechtestai gsproche wärre.
detect_language("Schwyzerdütsch isch ä Sommelbezeichnig fyr diejenige alemannische Dialekte, wu in dr Schwyz un in Liechtestai gsproche wärre.")
All in all we could say that Amazon Comprehend successfully recognized all three languages, even the Swiss Deutsch. Good job! Although it had the highest uncertainty in case of the Swiss Deutsch text. Meanwhile it had a 100% perfect match for Tamil.
I hope you liked the brief summary of the features of Amazon Comprehend with some examples at the end.
Thanks for reading!