Natural Language Processing: Smarter Conversations Using Spacy

In the previous 6 articles we have illustrated the usage of Google and AWS NLP APIs. We have also experimented the spacy library to extract entities and nouns from different documents. We have shown how to improve the model using pattern matching function from spaCy ( https://spacy.io/ ) . Finally we have also trained the model with new entities. We have demonstrated how to match CV to job profiles.

Let us now dig a bit deeper into some linguistic features of Spacy and how this can used in improving virtual conversations. The same can be used for mail processing, more advanced chatbots or virtual assistants. It can also be used as underlying technique for voice assistants.

Let us say we are an online shop for personal computers and we would like to allow our customers to send us requests to order computers. This can come through a site chatbot or by mail.

Let us assume we receive following input from a potential client:

Hello
I would like to order a notebook with 16GB and 256 GB disk, I would like to spend less than 1000 Francs, what would be the options
Thanks a lot
Patrick

As we have shown in earlier articles, let us import required Python libraries and process the text through the Spacy pipeline. Nothing new, but good to repeat:

# import required libraries
import spacy from spacy.pipeline import EntityRuler from spacy.matcher import Matcher,PhraseMatcher from spacy.symbols import nsubj, VERB, dobj, NOUN, root, xcomp from spacy import displacy from spacy.matcher import Matcher # install the large pre-trained English model driectly from spacy !python -m spacy download en_core_web_lg # process the input text through the standard spacy model docMail=nlp(text)

Once this is all done, let us start with named entity recognition as usual.

# print text entities detected for ent in docMail.ents : 
print(ent.text, ent.label_,) 16GB QUANTITY 256 GB QUANTITY less than 1000 Francs MONEY Patrick PERSON

We can also visualize the result directly in the text with highlighted entities.

Named entities using Spacy NER

The default model does not seem to detect notebook and disk as entities, but identifies the sender as a person and identifies the RAM and disk size as quantities. This is a good start, but still far away from a practical solution. So, let us add some domain specific entities that will help us later on.

# add domain specific entities and add to the pipelinepatterns = [{"label": "COMPUTER", "pattern": [{"lower": "notebook"}]},{"label": "CURRENCY", "pattern": [{"lower": "francs"}]}, 
{"label": "PART", "pattern": [{"lower": "disk"}]}]ruler = EntityRuler(nlp, patterns=patterns,overwrite_ents=True)nlp.add_pipe(ruler)

Now the results look a bit better

#process the mail again with added entities 
docMail=nlp(text) for ents in docMail.ents: 
# Print the entity text and its label 
print(ents.text, ents.label_,) notebook COMPUTER 
16GB QUANTITY 
256 GB QUANTITY 
disk PART 
Francs CURRENCY

Sometimes it is not enough to match only entities, for example we have defined the RAM size as 16 GB. So let us see how to detect the memory size automatically

matcher = PhraseMatcher(nlp.vocab) terms = ["16 GB","256 GB"] # Only run nlp.make_doc to speed things up 
patterns = [nlp.make_doc(t) for t in terms] matcher.add("MEMORY", None, *patterns) doc = nlp(text) matches = matcher(doc) for match_id, start, end in matches: span = doc[start:end] print(span.text) 16GB 
256 GB

Quite cool, it detected the patterns and matched the text related to memory size. Unfortunately, the issue is that we do not know to what it refers to, so we need to start a different kind of analysis.

One of the key features of Spacy is its linguistic and predictive features. Indeed, Spacy is able to make a p rediction of which tag or label most likely applies in a specific context.

Let us start with displaying the result of part of speech tagging and dependency analysis. As we can see below, the code is pretty simple

displacy.render(docMail, style="dep", minify=True)

Spacy depedency tree

The result is quite impressive, it shows all predicted tags for each word and the dependency tree with the associated dependency labels. For example ‘I’ is a pronoun and is subject to the verb ‘like’.

Let us detect the numerical modifiers, as we will need them to identify the memory size required

for token in docMail: 
if token.dep_ == 'nummod': 
print(f"Numerical modifier: {token.text} --> object: {token.head}") Numerical modifier: 16 --> object: GB 
Numerical modifier: 256 --> object: disk 
Numerical modifier: 1000 --> object: Francs

This is again quite cool, we can associate quantities to different words in the text.

Spacy provides all the required tagging to find the action verbs, we want to know if the customer wants to order something or is just interested by some information for example. Let us iterate through all tokens in the text and search for an open clausal complement ( refer to for all possible dependency tags https://spacy.io/api/annotation#pos-tagging )

verbs = set() 
for possible_verbs in docMail: 
if possible_verbs.dep == xcomp and possible_verbs.head.pos == VERB : 
verbs.add(possible_verbs) print(verbs) {spend, order}

We have now identified ‘spend’ and ‘order’ as possible actions in the text. We can also do the same to find objects or items in the text that are the referred to by the client.

Let us find possible items in the text using the dependency tag ‘dobj’ for direct objects of a verb.

items = set() 
for possible_item in docMail: 
if possible_item.dep == dobj and possible_item.head.pos == VERB: 
items.add(possible_item) print(items) {Francs, notebook}

‘Francs’ and ‘notebook’ have been found. Now we can think of using word similarities to find what kind of item the client is referring to. We could also use other techniques, but let us try a simple way for now. We will compare similarities between identified obejcts and the word ‘laptop’. The word ‘notebook’ is much closer to ‘laptop’ than Francs.

orderobject=nlp("laptop") 
for sub in items: 
print(sub.similarity(orderobject)) 0.0015887124852857469 
0.8021939809276627

Finally putting it together, we can think of automatically detecting the required action verb using a heuristic. Let us assume that if the similarity is more than 80%, then we have found the right verb. We then search for the direct object of the similar verb. That could look like this

orderword=nlp("order") 
for verb in verbs: 
if (verb.similarity(orderword)) >=0.8: 
for v in verb.children: 
if v.dep==dobj: 
print(v.text) notebook

For this experiment we have used the following

Google collab executing a Python 3 notebook
Python 3.6.9
Spacy 2.2.4

(Visited 2 times, 2 visits today)

Source: https://chatbotslife.com/natural-language-processing-smarter-conversations-using-spacy-c725e810695

Generative Data Intelligence

Natural Language Processing: Smarter conversations using Spacy

Wall Street giant, Anthony Scaramucci and Asia’s #1 Fund, Blockchain Founders Fund, fill the Strategic Round at TDMM to kick off the next...

Understanding of Real Estate Equity Crowdfunding: Strategies, Outlook, and Technological Developments

Latest Intelligence

AEON Credit Announces 1QFY2024/25 Results

Carbon Literacy for Education Discount Scheme – The Carbon Literacy Project

Lithium Mining In Argentina — Jobs vs. Environment – CleanTechnica

Wellington region’s new low carbon transport strategy

Capital receives first delivery of sustainable aviation fuel

Legacy carmakers face brand loyalty challenges as EV competitors emerge