#python #neural search #Jina #deep learning #Linux #bash #shell

Grok the Linux shell, grok Jina

Published Sep 23, 2021 by Alex C-G


I’ve been working with Jina Flows and Executors a lot recently. I’ve really been enjoying it, and now I’m starting to really grok why.

Coding in Jina is very much like using a bash shell. (Or zsh. Or korn. Or take your pick)

Using the shell

In the shell you work with:

You use a package manager to install a command (if it isn’t already built-in). Then you can run commands and chain them together. For example, let’s say you installed Jina and want to add it to your requirements.txt:

yay -S ripgrep # yay is apt for Arch Linux
pip freeze | rg "jina" >> requirements.txt

I’m using ripgrep/rg instead of grep to illustrate the package management bit. (It’s like grep but way faster.)

Or something I do quite often: Run a Jina search, grab the resulting JSON, format it nicely, and yank it onto my clipboard (yy is a simple alias I’ll put at the end of this post.)

curl --request POST -d '{"top_k":100,"mode":"search","data":["aliens and monsters"]}' -H 'Content-Type: application/json' 'http://0.0.0.0:45678/search' | jq | yy

Using Jina

These concepts are reflected in Jina’s design pattern:

An example

Let’s say I’m a hopeless romantic (you know I am). If I were going to index every one of Shakespeare’s lines that said “love” I’d do it like this in bash:

I mean I have them all memorized, but y’know

for filename in /shakespeare/*.txt; do
  cat filename | grep "love" >> love_lines.txt # (Yes, I could use grep directly but I wanna show off piping mom)
done

Christ, I hate coding in bash.

And (more or less) like this in Jina:

# Create a doc for each of Shakey baby's works
docs = DocumentArray(from_files("shakespeare/*.txt"))

# Create simple Flow
flow = Flow()
       .add(uses="jinahub+docker://Sentencizer")              # break down into sentences
       .add(uses="jinahub+docker://TransformerTorchEncoder")  # encode into vectors
       .add(uses="jinahub+docker://SimpleIndexer")            # build index
       
# Index the Documents
flow.index(input=docs)

# Create a query Document
query_doc = Document(text="love")

# Run the search Flow and store matches
matches = flow.search(return_results=True)

# See the matches
print(matches)

I skipped the imports and maybe a few bits of config you may want to add, but you get the idea:

What’s the difference then?

yy and pp

Okay, I promised I’d explain yy. If you use Vim, you may get the reference. I use these aliases to shuttle data between my clipboard and command line:

Here are the aliases, along with gcpp which I use a lot:

alias pp='xclip -o -selection clipboard'
alias yy='xclip -selection clipboard'
alias gcpp='git clone `xclip -o -selection clipboard`'


*****

© 2018-2024, Alex Cureton-Griffiths