The Google NLP API Meets Ruby

At RailsConf I’ll be giving a talk on Natural Language Processing. As part of my preparation for this talk, I’ve been reading a bunch of the history of Natural Language Processing, and I’ve been experimenting with Google’s Natural Language API. The Ruby Gem for this API is alpha, but it works well enough to do basic experimentation.

Syntax Analysis

One of the things that intrigue me most about NLP is syntax analysis. Doing static analysis on English is tricky. Even determining the part of speech can be hard. For example, in the sentence, “I’m leaving work” the word work is a noun. I’m leaving a physical place. But in the sentence, “I work on my talk” work is a verb. Words such as “very” can be either adverbs or adjectives depending on where they are placed in a sentence.

Despite these challenges knowing the part of speech can be useful. At last week’s Seattle Ruby Brigade meeting we worked on a text generator that we’ll eventually use for a chat bot. We used simple Markov Chains for our bots, but sometimes that resulted in grammatically incorrect sentences. If we had been able to ensure that each sentence had a verb and a subject the generated text may have been better.

The Natural Language API breaks input text into tokens (words and punctuation) and then provides information about each token. Here’s some basic code that uses the Natural Language API to identify the part of speech of each word on the input.

require "google/cloud/language"

language = Google::Cloud::Language.new

content = ARGV[0]

document = language.document content
syntax = document.syntax

syntax.tokens.each do |token|
  puts "Word: #{token.text_span.text} #{token.part_of_speech.tag}"
end

I ran this code against the sentence “The cat plays.” and got this output.

Word: The DET
Word: cat NOUN
Word: plays VERB
Word: . PUNCT

The enum for mapping the tags to labels we are more familiar with is here. Running it against a slightly longer sentence, “The cat plays with the toy.” gives me this.

Word: The DET
Word: cat NOUN
Word: plays VERB
Word: with ADP
Word: the DET
Word: toy NOUN
Word: . PUNCT

In both examples, the API identifies cat as a noun and play as a verb. ‘The’ is identified as a determiner; you may know this as an article. In the longer sentence, ‘with’ is identified as an “Adposition (preposition and postposition)”.

The NLP API can also identify the role that a specific word is playing in a sentence by using the “label” property of the token.

require "google/cloud/language"

language = Google::Cloud::Language.new

content = ARGV[0]

document = language.document content
syntax = document.syntax

syntax.tokens.each do |token|
  puts "Word: #{token.text_span.text} #{token.label}"
end

And here’s the results of running the longer sentence from above through the API.

Word: The DET
Word: cat NSUBJ
Word: plays ROOT
Word: with PREP
Word: the DET
Word: toy POBJ
Word: . P

Cat is identified as the subject, plays as the root of the sentence, and “with the toy” as a prepositional phrase (preposition, determiner, prepositional object).

I’ve been enjoying playing around with the API just to learn it and see where the edges are. Here’s a diagram I made using the graph gem and the information returned from the syntax analysis call.

Diagram of the sentence The cat plays with the toy.

In my Rails Conf talk I’ll show other methods of sentence diagramming and go into more detail about what all these grammar terms mean for those who have forgotten middle school grammar.

Sentiment Analysis

At Ruby Conf in San Antonio, I gave a talk entitled Stupid Ideas for Many Computers. In that talk I do very hacky sentiment analysis on tweets by assigning values to various emoji, extracting the emoji from tweets, and adding the whole thing up. It was an incredibly stupid idea, but that was the purpose of the talk.

I’ll be reprising this code at Rails Conf, but this time I’ll be using proper sentiment analysis. The code is similar to the syntax analysis code above.

require "google/cloud/language"

language = Google::Cloud::Language.new

content = ARGV[0]

document = language.document content
sentiment = document.sentiment

puts sentiment.score
puts sentiment.magnitude

Score is a number between -1 (negative sentiment) and 1 (positive sentiment). The magnitude is a measure of “how much” the message was negative or positive. I ran some tweets through the sentiment analyzer.

I cherry-picked this one because I was confident the sentiment would be positive. I got a 0.7 sentiment score and a magnitude of 1.5. So “a lot of pretty darn positive” in rough English. I also tried the tweet that the Seattle Ruby Brigade sends out to remind us about meetings.

The sentiment for this was 0.1, so almost neutral. And the magnitude was 1.3. Together that is approximately “pretty strongly neutral”.

Conclusion

If you liked these examples, I encourage you to try out the Cloud Natural Language API library and just experiment with all the different types of analysis it supports. If you are at Rails Conf, you can see more examples in my talk or stop by the Google Cloud booth to try it out in a codelab.