One of the most important interfaces between humans operating a computer and the computer system is the screen. In this tutorial, we learn how you can use Gravio, an edge computing platform, to take a screenshot, send it to a locally installed AI (Ollama / Llava) for evaluation and then send the result to a restAPI (Requestbin.com)
Download and install Gravio from the App Store
After Gravio Studio is successfully installed, click on “Proceed without Login”:
Click on the Gravio Hubkit download button
Download the Gravio HubKit for your OS (in your case, it’s Mac)
Install Gravioby dragging it to your applicaitons folder
You will be asked to enter your password. Once it’s installed Gravio will run as a background process in the Menubar:
Connect to your local Gravio instance from your Gravio Studio
You will see this screen
“No License” means you need to put a license file on your node. There are two ways of achieving this. Either you retrieve a license file from someone at Asteria that you can upload via Gravio Studio, or you use the web interface to deploy it. The latter is the more common way. You need to Click on “Register” on the Gravio website and follow the instructions to create your Gravio account.
Once you have created your Gravio account, you click on the Cogwheel to open the webinterface of your local Gravio server
You will have to create a new password for your local Gravio server:
In the “Initial Settings” tab, click on “Obtain a license via Internet” and log in using your Gravio online account details.
After a successful license deployment, your Gravio will look like this
Open the Actions window and create a new action
Use the Exec component and write
screencapture /followed/by/the/folder/you/like/to/put/the/png/file.png
On the latest macOS you will notice that the screencapture file will not contain any windows, just the background image. This is for privacy reasons. You will need to allow Gravio to take screenshots. You do this by opening “Privacy and Security” in your system settings and click on “Screen & System Audio Recording”:
Under Screen & System Audio Recording, click the + sign
And Add Gravio HubKit from your Applications folder
You will have to enter your username and password.
Press Click and Re-Open and run the Action by pressing the Play button on the top right. A screenshot file will be stored in the folder that you have chosen.
Download and install Ollama from the Ollama website following their instructions
Once installed, run the command “ollama run llava” to install the llava model. This is likely to take a while.
Once installed, search for the llava multi modal documentation to extract the example:
By default, the API will stream the result, however you can use the same JSON document, with an added “streaming” : false parameter from Gravio to query your local Ollama server and get the result in one response:
Note, the image is base64 encoded.
In order to use your own image, you can use the “Read File” component and the BASE64 encoding function
To see the result, you need to enable the “debug” flag on the top left of the component before hitting the “Play” button. To edit the prompt, just write the prompt you like to send to the AI within the JSON document. Be careful when using double quotes, you may have to escape them using a backslash \.
To extract just the message, we use Gravio’s JSONPath function:
JSONPath(“$.response”,cv.Payload)
In this last step of the tutorial, we send the reply of the AI to a requestbin API. For this purpose, open the requestbin website and click on “create public bin instead:”
You will get an API endpoint and an interface to see incoming data:
Back in Gravio, add that endpoint to another HTTP request component, alongside with the JSONPath extraction of the message:
Now, if you hit the play button, the endpoint will show the result of the local AI:
Note, at the time of writing, the AI is not yet very accurate, and/or the prompts need to be more specific. However we can expect that in the future the AI will be able to describe more accurately what it sees on screen.
Of course, you can also send the image to more powerful, cloud based APIs, like OpenAI. For this, you however will need to use a paid Gravio account to have access to get access to those components. Ask our team at getstarted@gravio.com to explore your options.
That’s it! All you need to do now is to combine both actions, the screenshot taking and the Ollama query action, put them into one and off you go taking screenshots and processing them with local VQA!
You can now trigger this action either using a time trigger, schedule or sensor / API input. Needless to day, you can trigger many other things, not only an API from the AI reply. You could send it to a messenger, e-mail or insert it into a textfile or database. Your imagination is the only limit.
Let us know if you have any questions. Join us on our Slack at https://link.gravio.com/slack or write to us getstarted@gravio.com
We’re happy to help and explore more ideas together with you!