Published Sep 23, 2021 by Alex C-G
I’ve been working with Jina Flows and Executors a lot recently. I’ve really been enjoying it, and now I’m starting to really grok why.
Coding in Jina is very much like using a bash shell. (Or zsh. Or korn. Or take your pick)
In the shell you work with:
sed, grep, head, rm
|, >, &>
apt, yum, pamac
You use a package manager to install a command (if it isn’t already built-in). Then you can run commands and chain them together. For example, let’s say you installed Jina and want to add it to your requirements.txt:
yay -S ripgrep # yay is apt for Arch Linux
pip freeze | rg "jina" >> requirements.txt
I’m using ripgrep/rg instead of grep to illustrate the package management bit. (It’s like grep but way faster.)
Or something I do quite often: Run a Jina search, grab the resulting JSON, format it nicely, and yank it onto my clipboard (yy is a simple alias I’ll put at the end of this post.)
curl --request POST -d '{"top_k":100,"mode":"search","data":["aliens and monsters"]}' -H 'Content-Type: application/json' 'http://0.0.0.0:45678/search' | jq | yy
These concepts are reflected in Jina’s design pattern:
grep is a tool that does one thing and does it well, so too are TransformerTorchEncoder, CLIPImageEncoder, SimpleIndexer, etc.
.add(): Flow().add(CLIPImageEncoder).add(SimpleIndexer).
TransformerTorchEncoder yourself, you simply download (or pull) it: Flow().add("jinahub+docker://CLIPImageEncoder").
Let’s say I’m a hopeless romantic (you know I am). If I were going to index every one of Shakespeare’s lines that said “love” I’d do it like this in bash:
I mean I have them all memorized, but y’know
for filename in /shakespeare/*.txt; do
cat filename | grep "love" >> love_lines.txt # (Yes, I could use grep directly but I wanna show off piping mom)
done
Christ, I hate coding in bash.
And (more or less) like this in Jina:
# Create a doc for each of Shakey baby's works
docs = DocumentArray(from_files("shakespeare/*.txt"))
# Create simple Flow
flow = Flow()
.add(uses="jinahub+docker://Sentencizer") # break down into sentences
.add(uses="jinahub+docker://TransformerTorchEncoder") # encode into vectors
.add(uses="jinahub+docker://SimpleIndexer") # build index
# Index the Documents
flow.index(input=docs)
# Create a query Document
query_doc = Document(text="love")
# Run the search Flow and store matches
matches = flow.search(return_results=True)
# See the matches
print(matches)
I skipped the imports and maybe a few bits of config you may want to add, but you get the idea:
Sentencizer, TransformerTorchEncoder, and SimpleIndexer are the commands. They do one thing and do it (hopefully) well.
jinahub+docker:// we install them with the “package manager”.
.add is just like using | to pass the output of shell commands from one to another.
grep would miss any mention of loving, romance, heart, etc. TransformerTorchEncoder would see the connection.
yy and ppOkay, I promised I’d explain yy. If you use Vim, you may get the reference. I use these aliases to shuttle data between my clipboard and command line:
ls | yy - pipes the output of ls to the clipboard.
git clone `pp` - clones a git repo with the URL I just copied from my browser. (Those backticks pass the output of pp directly to the command via shell magic)
Here are the aliases, along with gcpp which I use a lot:
alias pp='xclip -o -selection clipboard'
alias yy='xclip -selection clipboard'
alias gcpp='git clone `xclip -o -selection clipboard`'