A picture is worth a thousands words
Well, when it comes to new AI models to recognize what is in a photo it is definitely true.
Although a lot is about generative AI, we have also seen use cases where we use AI to recognize what is in a picture. You can imagine this can be useful in the following scenario’s:
- blind people that have a device with them to look around and experience the view in front of them but cannot see it;
- e-commerce art shops that generate automatic descriptions for the art they sell;
- a smart pet door that recognizes if a cat brings in little birds (we just had that again);
- a museum that has an app. to automatically describe art after taking a picture.
Some AI tools we discovered to generate descriptions of pictures
When I started my search for a tool I came across astica.
This company offers a few interesting demos and use cases. What I like is that you can combine a few API’s to recognize the picture and then create speech too. But this can also be done with other tools.
They have different solutions for generating descriptions and it seems their GPT-S model is the one that generates longer versions of descriptions. However, is has one drawback. It is slow!
Another one I cam across is Jina.ai’s SceneXplain. This is a visual storytelling api and tool.
It works in a similar way to astica. You can feed it a picture and it starts describing what’s in there.
As with all the tools there are two important things:
- the application is key, how can people use all this great technology
The second point is important, AI is pretty pricy in the end when do a lof of interactions. There are some samples that for example Auto-GPT tasks go haywire and get stuck into loops consuming AI credits. Not something you want.