This article will cover the building of a local Generative AI Assistant solution that utilizes open-sourced technologies with Gravio to build a local Visual Question and Answer (VQA) Computer Vision solution that works completely without internet connectivity on-premise.
tldr; we will use Gravio, Ollama and a Large Language Model (LLM) called Large Language and Vision Assistant (LLaVA) to build and deploy a Local AI Assistant application that does VQA.
Download Ollama installer using the link provided on their website or you can click here.
Run the installer and follow the steps accordingly.
Upon successful installation, we can check to see if Ollama was installed properly. This can be identified in 2 different ways:
To check using the System Tray, open your System Tray in Windows and you should see the Ollama icon.
To check using Command Prompt, open the Start Menu and type “cmd” and then press Enter.
In your command window you can type “Ollama” and you should see a list of available commands and how to use them. If they are available, that means that Ollama has been installed.
In this step, we will download and run LLaVA LLM locally/on our Machine. You can download any LLM available from Ollama’s library and you can find out more about the available Models here.
Background of LLaVA: Large Language and Vision Assistant (LLaVA) is a pre-trained end to end multimodal model that combines a vision encoder for general-purpose visual and language understanding. You can find out more about this LLM on their official website.
Navigate to the LLaVA Model on Ollama’s website or you can click here. Then, on the page, you can copy the command line “ollama run llava”.
Then, paste the command line in the same command window and hit enter. If this is your first time installing and running LLaVA, you should expect a download progress bar. Wait for the download and installation process to complete. When it is complete, you should be able to interact with the model directly. We are ready to move on to building the rest of the solution.
We will need to create a Folder in the location of your choice on your desktop. This will serve as the Folder that Gravio has to look out for when a new image gets dumped into this folder. In my case, I have created a folder called “ImageUpload”.
PS. you can also do this with an ONVIF camera if you have one!
Once you have created a Folder in your desired location, run Gravio HubKit and Gravio Studio. In Gravio Studio, access your localhost HubKit Node and go to the Device List page.
In the Device List, select add a “Camera”. In the options for Camera Type, you can select Folder from the drop-down menu and then specify the File Path of the Folder you have just created. For example: C:\your\path\here. Then, you can change the settings according to your preference and save.
We have to add the Layer in Gravio Studio so that the platform will know that it has to pick up the File from the Folder. In the Device tab, select the “+” sign and Add Layer. Scroll down to “Camera” and save it. Then, you will need to bind the Folder you have saved to the Layer and ensure that it is Toggled on.
We have successfully enabled our Folder as the input to Gravio and the platform will be watching for a file upload now.
It is time to create our Action so that we can build the next part of the solution. This Action should consist of:
We will only need 4 steps in our Action. The components are:
Add a new Action by selecting the “+” in Actions page and name it something meaningful, like LLaVAAPI in my case.
File Read Component
Let’s set up the FileRead component by setting the File Name to be read as the latest image file in the ImageUpload folder. We can set it in the Pre-Mappings section like this:
cp.FileName = tv.Data
HTTP Request
Set the HTTP Request component according to the API configuration of Ollama and LLaVA. You can refer to the Ollama LLaVA documentation on how to use the API here. First, we can set the properties:
And we can leave the rest as default.
We have to set the body as well so that we can set the Payload correctly. To do that, please use the Pre-Mappings section:
Line Message
There are 2 main mappings we need to consider when using the Line Message component for this particular workflow.
Note: You will need your Line Token in order to use this component. You can get it here.
Set the Line Message component up like this:
The other fields can be left empty, the main settings are in the pre-mappings and post mappings.
cp.Message = JSONPath(“$.response”, cv.Payload)
Now to the final step, FileWrite.
FileWrite
We can set the FileWrite properties to the file name that we like to write the file to. In this case, I have just used a simple text file, ai-results.txt. As for the contents of what to write into the text file, set it up in the pre-mappings like this:
cv.Payload = DateFormat(Now(), "02 Jan 06 15:04") + " LlaVa Response: " + cv.Payload + "\r"
We are done! We can move onto building our final part, the Trigger.
Configuring the Trigger is straightforward. Go back to the Triggers tab and select Add Event Trigger.
Next, configure the Area, Layer and Device by setting it to the ImageUpload Folder that we have created in step 1.
Then, in “Actions” select the Action we have just created, in this case, LLaVAAPI.
Select “Save” and we are done! The application has been built and everything should be automated.
Prepare your own images or look for some images online for your specific test scenario. It can be anything that you would like to ask about the image.
Change the prompt before testing the application. In order to change the prompt to the AI, change it in the LLaVAAPI Action that we built. Change it in the pre-mappings, there is a prompt key in the body. Amend it to the question that you would like to ask the AI.
Save or drop an image into the ImageUpload folder or the folder that you mapped in Gravio Studio in Part 2, step 1. I used this image from Google Search:
Watch the magic happen! Gravio should pick the Image up, do a HTTP Request with the Image and a prompt then get a response from LLaVA LLM. I have tried with a few images, so here’s an example of the result.
We can also view the results in our text file that we created as our last step in the Action. Navigate to the folder that HubKit stores in and open the text file. On Windows, this is the file path:
C:\ProgramData\HubKit\actmgr\data
I have logged previous results from my testing too in Japanese language and it works just as well. You can try it with your preferred language and let us know whether it works too!
That’s it! We have officially built a versatile solution using VQA and there are many use cases that you can apply it for almost out of the box. Although Line requires internet connectivity, you can remove this step and have it just logged into a text file, csv or SQL even. Let us know what you would like to create and we can show you in the next tutorial! See you soon!
If you want to build a solution without coding with very low developmental timeframes, you can use Gravio as the middleware to connect to various services. In this case, Ollama, Line and writing to a text file. We have successfully built a Local VQA application that ingests images from a Folder that provides a response from an AI.
If you have any questions or need help, don’t hesitate to get in touch with us either on gravio.com (chat in the bottom right corner) or on https://link.gravio.com/slack - We’re excited to see what you will build with this technology!