Zephyrnet Logo

Jina.ai with Rasa

Date:

For the Jina.ai search for Story Chatbot, we need the do below for a search query to work.

  • create a Jina project using cookiecutter
  • download the data from kaggle
  • index the data using Jina cli
  • jina search process which uses the indexed data
jina-box search widget

Create a Jina project using cookiecutter

Create a project folder called story_chatbot.

mkdir story_project
cd story_project

Create a new virtual environment for jina.

python3 -m venv env/search_story_env
source env/search_story_env/bin/activate

1. 8 Proven Ways to Use Chatbots for Marketing (with Real Examples)

2. How to Use Texthero to Prepare a Text-based Dataset for Your NLP Project

3. 5 Top Tips For Human-Centred Chatbot Design

4. Chatbot Conference Online

Install cookiecutter and use the jina template with following values when prompted.

pip install -U cookiecutter && cookiecutter gh:jina-ai/cookiecutter-jinaproject_name: Search Stories (non-default) 
jina_version: 0.5.5
project_slug: search_stories
task_type: nlp (non-default)
index_type: strings (non-default)
public_port: 65482

Navigate to search folder.

cd story_search

Add kaggle to the auto generated requirements.txt. Jina dependencies are already present in the requirements.txt. As the nlp was selected during cookiecutter prompts, the project is pre-populated with torch and transformers dependencies.

Install the requirements.

pip install -r requirements.txt

Download story data from Kaggle

kaggle datasets download -d shubchat/1002-short-stories-from-project-guttenberg

setup the kaggle.json for above command to work.

stories from https://www.gutenberg.org/

Index the data using Jina cli

Index the story data using Jina with below command.

Optional(default is 500): export $MAX_DOCS=1100
python app.py index

Program will invoke the below code in app.py which will load the flows/index.yml file.

load index.yml file in app.py
index.yml

Flows are easy high level abstractions from Jina for tasks like indexing and searching.

Built-in executors are part of jina-hub and is a flexible way to add algorithms. A flow can have many executors in it from jina-hub or your own custom executor. This is similar to Rasa pipelines which makes it easy to select algorithms.

The first executor mentioned here is crafter which splits the sentences into chunks based on punctuations.

Second one is encoder which uses TransformerTorchEncoder using distilbert-base-cased. This is a wrapper of hugging face torch-version transformers.

Third and fourth one in the index.yml are chunk_indexer and doc_indexers gets data from encoder and saves it to jina format files.

Jina Search process

Now that the data is indexed, next step is to start the search process which can access the search data by running below command. Search Flow uses query.yml which are same executors as indexing except ranker.

python app.py search
query.yml for jina search

This will start the search process and expose it as a POST rest api as below.

http://localhost:65482/api/searchRequest
{"top_k":5,"mode":"search","data":["adventure stories"]}

Top 5 results will be returned as a json as the “top_k” is 5 in the request.

Jina box can also be used to view the results. Jina Search is now ready to use.

Source: https://chatbotslife.com/jina-ai-with-rasa-1e81a8b869cc?source=rss—-a49517e4c30b—4

spot_img

Latest Intelligence

spot_img

Chat with us

Hi there! How can I help you?