Zephyrnet Logo

Natural Language Processing: Smarter conversations using Spacy

Date:

Patrick Rotzetter

In the previous 6 articles we have illustrated the usage of Google and AWS NLP APIs. We have also experimented the spacy library to extract entities and nouns from different documents. We have shown how to improve the model using pattern matching function from spaCy ( https://spacy.io/ ) . Finally we have also trained the model with new entities. We have demonstrated how to match CV to job profiles.

Let us now dig a bit deeper into some linguistic features of Spacy and how this can used in improving virtual conversations. The same can be used for mail processing, more advanced chatbots or virtual assistants. It can also be used as underlying technique for voice assistants.

Let us say we are an online shop for personal computers and we would like to allow our customers to send us requests to order computers. This can come through a site chatbot or by mail.

Let us assume we receive following input from a potential client:

Hello
I would like to order a notebook with 16GB and 256 GB disk, I would like to spend less than 1000 Francs, what would be the options
Thanks a lot
Patrick

As we have shown in earlier articles, let us import required Python libraries and process the text through the Spacy pipeline. Nothing new, but good to repeat:

# import required libraries
import spacy from spacy.pipeline

Once this is all done, let us start with named entity recognition as usual.

# print text entities detected 

We can also visualize the result directly in the text with highlighted entities.

Named entities using Spacy NER

The default model does not seem to detect notebook and disk as entities, but identifies the sender as a person and identifies the RAM and disk size as quantities. This is a good start, but still far away from a practical solution. So, let us add some domain specific entities that will help us later on.

# add domain specific entities and add to the pipeline

Now the results look a bit better

#process the mail again with added entities 
docMail=nlp(text)

Sometimes it is not enough to match only entities, for example we have defined the RAM size as 16 GB. So let us see how to detect the memory size automatically

matcher = PhraseMatcher(nlp.vocab) terms = ["16 GB","256 GB"] 

Quite cool, it detected the patterns and matched the text related to memory size. Unfortunately, the issue is that we do not know to what it refers to, so we need to start a different kind of analysis.

One of the key features of Spacy is its linguistic and predictive features. Indeed, Spacy is able to make a p rediction of which tag or label most likely applies in a specific context.

Let us start with displaying the result of part of speech tagging and dependency analysis. As we can see below, the code is pretty simple

displacy.render(docMail, style="dep", minify=True)
Spacy depedency tree

The result is quite impressive, it shows all predicted tags for each word and the dependency tree with the associated dependency labels. For example ‘I’ is a pronoun and is subject to the verb ‘like’.

Let us detect the numerical modifiers, as we will need them to identify the memory size required

for token in docMail: 
if token.dep_ == 'nummod':
print(f"Numerical modifier: {token.text} --> object: {token.head}")

This is again quite cool, we can associate quantities to different words in the text.

Spacy provides all the required tagging to find the action verbs, we want to know if the customer wants to order something or is just interested by some information for example. Let us iterate through all tokens in the text and search for an open clausal complement ( refer to for all possible dependency tags https://spacy.io/api/annotation#pos-tagging )

verbs = set() 
for possible_verbs in docMail:
if possible_verbs.dep == xcomp and possible_verbs.head.pos == VERB :
verbs.add(possible_verbs)

We have now identified ‘spend’ and ‘order’ as possible actions in the text. We can also do the same to find objects or items in the text that are the referred to by the client.

Let us find possible items in the text using the dependency tag ‘dobj’ for direct objects of a verb.

items = set() 
for possible_item in docMail:
if possible_item.dep == dobj and possible_item.head.pos == VERB:
items.add(possible_item)

‘Francs’ and ‘notebook’ have been found. Now we can think of using word similarities to find what kind of item the client is referring to. We could also use other techniques, but let us try a simple way for now. We will compare similarities between identified obejcts and the word ‘laptop’. The word ‘notebook’ is much closer to ‘laptop’ than Francs.

orderobject=nlp("laptop") 
for sub in items:
print(sub.similarity(orderobject))

Finally putting it together, we can think of automatically detecting the required action verb using a heuristic. Let us assume that if the similarity is more than 80%, then we have found the right verb. We then search for the direct object of the similar verb. That could look like this

orderword=nlp("order") 
for verb in verbs:
if (verb.similarity(orderword)) >=0.8:
for v in verb.children:
if v.dep==dobj:
print(v.text)

For this experiment we have used the following

  • Google collab executing a Python 3 notebook
  • Python 3.6.9
  • Spacy 2.2.4

(Visited 2 times, 2 visits today)

Source: https://chatbotslife.com/natural-language-processing-smarter-conversations-using-spacy-c725e810695

spot_img

Latest Intelligence

spot_img