Google Agentic Era Hackathon: Our AI Use Case – Live API in Retail

Introduction

AI applications, and agents in particular, offer exciting possibilities for reimagining the way we interact with technology. However, the rapid development of prototypes often contrasts with the complex task of transforming them into production-ready solutions. Reliable and trustworthy interaction is crucial, especially given the challenges associated with the “black box” behavior of Large Language Models (LLMs).

The Google Agentic Era Hackathon: A Platform for Innovation

On March 18, 2025, Google invited developer teams from across Europe to the Agentic Era Hackathon. The event took place both remotely and on-site in London, Paris, Madrid, Milan, and Tel Aviv. Individuals and teams took on the challenge of developing an agent based on the Agent Starter Pack.

The Agent Starter Pack: A Good Starting Point for Developing AI Agents

The Agent Starter Pack for Google Cloud provides a collection of templates for generative AI agents that can serve as a starting point for development and deployment in various environments. It offers assistance for common challenges such as deployment, operation, evaluation, customization, and monitoring.

Our Project: Exploring Intelligent Interaction in Retail

Our team utilized the Agent Starter Pack to develop an agent based on Google’s Multimodal Live API. This agent can process video, audio, and text data in real-time.

Our Idea:
An in-store terminal that enables a new kind of customer interaction in department stores. Here’s an example:

Customer: “I’m looking for a women’s jacket.”
Agent: “We have a selection of jackets. What will you be using it for?”
Customer: “For everyday wear, but it should also be water-resistant since I occasionally go hiking.”
Agent: “In that case, we recommend an outdoor jacket. You can find them on the 3rd floor in the outdoor clothing section, directly to the left of the escalator.”

Product Recognition: An Interesting Possibility

Another interesting feature we implemented is product recognition. Customers hold a product in front of the terminal, and the AI identifies it – not just as “colorful Adidas running shoes,” but with the exact SKU code. This could, for example, enable online ordering of an item that is no longer available in the store in the desired size.

Presentation and Demo

Technology in Detail

For the implementation, we used the following Google models (https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models):

Base Model: Gemini 2.0 Flash
Dialogue Management: Real-time streaming Live API
Knowledge Base: Vector indexing with text-embedding-004, Function Calling
Product Recognition: Vectorization of product images with multimodalembedding@001, Function Calling

Conclusion and Outlook

The Google Agentic Era Hackathon was an interesting experience. We have been following the development of the Agent Starter Pack for some time and were excited to work with the Multimodal Live API for the first time. The potential of this technology is significant. Our AI use case – Live API in retail, and in particular image recognition in live interactions – may be transferable to other areas. We are excited about the future development and will follow it closely.

Interested in AI Agents?

If you would like to learn more about our work in the field of AI agents, please do not hesitate to contact us for a consultation.