Gravio Blog
June 30, 2024

[Tutorial] How to take a screenshot on your Mac, send it to a local multimodal AI (LLava/Ollama), and trigger an API

In this blog post we learn how to take a screenshot on a mac, send that screenshot to a local AI (in this case Llava/Ollama) and trigger an API
[Tutorial] How to take a screenshot on your Mac, send it to a local multimodal AI (LLava/Ollama), and trigger an API

One of the most important interfaces between humans operating a computer and the computer system is the screen. In this tutorial, we learn how you can use Gravio, an edge computing platform, to take a screenshot, send it to a locally installed AI (Ollama / Llava)  for evaluation and then send the result to a restAPI (Requestbin.com)

For this tutorial you will need:

Approach

  • We will use an action to take a screenshot. The Action will trigger the screenshot by command line and dump it into a folder (in this case Desktop)
  • Gravio will read the screenshot file, take the image and send it to a local Ollama installation, alongside with a prompt
  • We post the result of the AI to a Requestbin API
  • The reply will be used to trigger an API, in our case a requestbin API

Installing Gravio

Download and install Gravio from the App Store

After Gravio Studio is successfully installed, click on “Proceed without Login”:

Click on the Gravio Hubkit download button

Download the Gravio HubKit for your OS (in your case, it’s Mac)

Install Gravioby dragging it to your applicaitons folder

You will be asked to enter your password. Once it’s installed Gravio will run as a background process in the Menubar:

Deploying your free Gravio License

Connect to your local Gravio instance from your Gravio Studio

You will see this screen

“No License” means you need to put a license file on your node. There are two ways of achieving this. Either you retrieve a license file from someone at Asteria that you can upload via Gravio Studio, or you use the web interface to deploy it. The latter is the more common way. You need to Click on “Register” on the Gravio website and follow the instructions to create your Gravio account.

Once you have created your Gravio account, you click on the Cogwheel to open the webinterface of your local Gravio server

You will have to create a new password for your local Gravio server:

In the “Initial Settings” tab, click on “Obtain a license via Internet” and log in using your Gravio online account details.

After a successful license deployment, your Gravio will look like this

Setting up the action that creates the Screenshots

Open the Actions window and create a new action

Use the Exec component and write 

screencapture /followed/by/the/folder/you/like/to/put/the/png/file.png

On the latest macOS you will notice that the screencapture file will not contain any windows, just the background image. This is for privacy reasons. You will need to allow Gravio to take screenshots. You do this by opening “Privacy and Security” in your system settings and click on “Screen & System Audio Recording”:

Under Screen & System Audio Recording, click the + sign 

And Add Gravio HubKit from your Applications folder

You will have to enter your username and password.

Press Click and Re-Open and run the Action by pressing the Play button on the top right. A screenshot file will be stored in the folder that you have chosen. 

How to Set Up Ollama with Llava on your Mac to do local VQA image processing

Download and install Ollama from the Ollama website following their instructions

Once installed, run the command “ollama run llava” to install the llava model. This is likely to take a while.

Once installed, search for the llava multi modal documentation to extract the example:

By default, the API will stream the result, however you can use the same JSON document, with an added “streaming” : false parameter from Gravio to query your local Ollama server and get the result in one response:

Note, the image is base64 encoded.

In order to use your own image, you can use the “Read File” component and the BASE64 encoding function

To see the result, you need to enable the “debug” flag on the top left of the component before hitting the “Play” button. To edit the prompt, just write the prompt you like to send to the AI within the JSON document. Be careful when using double quotes, you may have to escape them using a backslash \. 

To extract just the message, we use Gravio’s JSONPath function:

JSONPath(“$.response”,cv.Payload)

Sending the AI Reply to a restAPI

In this last step of the tutorial, we send the reply of the AI to a requestbin API. For this purpose, open the requestbin website and click on “create public bin instead:”

You will get an API endpoint and an interface to see incoming data:

Back in Gravio, add that endpoint to another HTTP request component, alongside with the JSONPath extraction of the message:

Now, if you hit the play button, the endpoint will show the result of the local AI:

Note, at the time of writing, the AI is not yet very accurate, and/or the prompts need to be more specific. However we can expect that in the future the AI will be able to describe more accurately what it sees on screen. 

Of course, you can also send the image to more powerful, cloud based APIs, like OpenAI. For this, you however will need to use a paid Gravio account to have access to get access to those components. Ask our team at getstarted@gravio.com to explore your options. 

That’s it! All you need to do now is to combine both actions, the screenshot taking and the Ollama query action, put them into one and off you go taking screenshots and processing them with local VQA!

Next Steps

You can now trigger this action either using a time trigger, schedule or sensor / API input. Needless to day, you can trigger many other things, not only an API from the AI reply. You could send it to a messenger, e-mail or insert it into a textfile or database. Your imagination is the only limit.

Let us know if you have any questions. Join us on our Slack at https://link.gravio.com/slack or write to us getstarted@gravio.com

We’re happy to help and explore more ideas together with you!

Latest Posts
[Tutorial] Using Ollama, LLaVA and Gravio to Build a Local Visual Question and Answer AI Assistant
Tutorial on how to use Gravio, Ollama, LLaVA AI to build a local Visual Question and Answer (VQA) application. Anyone can build this solution without coding required and deploy it as a PoC or even in a production environment if the use case fits.
Monday, June 3, 2024
Read More