AI Usage

03/02/2025

Running AI APIs and comparing Search with AI. Should we use AI or Google? RAG and RAGgraphs for more accurate responses.

With growing usage, the API (key) stands as a center piece. So that AI can do the heavy lifting, while the application supports with the pipeline and processing the outputs. AI API come at a fee, which changed last week with DeepSeek. Running this model locally surpassed the impediment as shows API endpoint for local Deepseek R1. To try out an AI API Aimlapi can be used. A great comparison offers
Artificial Analysis.

A great application would be creating a voice assistant that can book appointments Python Advanced AI Voice Assistant - Frontend & Backend.

There have been open-source applications before on HuggingFace. However, they have been specialized towards certain applications. Whether generating pictures or creating conversations as in my Google Colab notebook with BlenderBot and audioldm2-large. Runtime -> run all. There are licence types for commercial applications. For example BlenderBot is for non-commertial use.

Jupiter or Colab is great to share and to work in. Being supported by Gemini on the sidebar. Whereas VS Code is likewise great to work in, but has the debug option, so that code can be executed line by line. The support is from Git Copilot. VS Code with Copilot can be used as well suitably for VBA with xlwings. Not to forget a good prompt, which is all we need. I would like to highlight Chain-of-Thoughts and Multimodal CoT Prompting with the rise of multimodal AI, (Prompt Engineering Guide).

Using Search vs. AI

Search engines are at the center of the daily life. One might face considerations,
Search engines themselves start incorporating AI outputs, which brings the best of both worlds together.

Content gap analysis would wins ChatGPT search. Local query search Google performs better according to (Enge, 2024). Alternative comparisons in (Rojo-Echeburúa, 2024; Koetsier, 2024). At work are indications by a study of Nielsen (2023) to use AI rather than Google, resulting in 158% higher productivity.

Source: LinkedIn

Underlying models

Dependent on the underlying model, AI branches into many applications. A great competitor to GAN, RNN and VAE are transformers.

Transformers

Transformers are among the ways of implementing generative AI. They process all parts simultaneously; and understand the interplay of words. A key article is "Attention Is All You Need." They possess 3 key features: Positional encoding, Attention, Self-attention. The words are parsed as embeddings, run though an encoder, followed by a decoder, concluded by the output. Alternatively, to transformers can be used GANs (Generative Adversarial Networks), VAEs (Variational Autoencoders). Encoders were predecessed by RNNs (Recurrent Neural Networks). Transformer's shortcoming is the computational cost. Transformers are specialized neural network architectures for sequence modeling tasks.

The self-attention mechanism is a key component of Transformer models, allowing them to weigh different words in a sequence based on their contextual importance. It enables a model to dynamically focus on relevant words when processing each token, regardless of their position. This is achieved using query, key, and value matrices, which help compute attention scores and determine how much influence each word should have on another. Self-attention allows models to capture long-range dependencies and contextual relationships efficiently. A fundamental article presents Vaswani, A., 2017: Attention is all you need.

Source: Medium

RAG

Retrieval Augmented Generation (RAG) lowers the issue of AI hallucination with a set of pre-defined list of answers, given the question, AI finds the most suitable.
,,Retrieval-augmented generation (RAG) is an AI technique for improving the quality of LLM-generated responses by including trusted sources of knowledge, outside of the original training set, to improve the accuracy of the LLM's output. Implementing RAG in an LLM-based question answering system has benefits: 1) assurance that an LLM has access to the most current, reliable facts, 2) reduce hallucinations rates, and 3) provide source attribution to increase user trust in the output.", (Expert.ai).

RAGgraph

An interesting application of RAG features the RAGgraph. Similar concepts can be found in knowledge graphs, Bayesian networks, potentially structural equations.

As the example from Neo4j shows bellow, the graph correctly elicits, the challenger of Jimmy Carter, Gerald Ford.

Source: Neo4j

For my thesis, the Poisson-Gamma model has applications in reliability and risk. The characteristic (of my special model) was having dependencies between parameters.

Visualization of my thesis with RAGgraph.

Bi-Encoders, Cross-Encoders, and ColBERT

RAG often consists of 2 components (Abbas, 2024):

Retrieval module: retrieve relevant documents
Generation module: Generate finale output with generative model (like GPT).

Strategies are applied to text retrieval tasks.

Strategy A: Bi-Encoder model processes the query and document separately, converting them into vector representations using two neural networks, often with a shared architecture. These embeddings are then compared through a similarity function, like dot product or cosine similarity, to retrieve relevant documents.

Strategy B: Cross-Encoder model — also known as reranking model — processes the query and document jointly by using a single transformer model. The interactions act at a deeper level as opposed to bi-encoders, facilitating increased attention between the query and document. Given a query and document pair, the model will return a similarity score.

Strategy C: ColBERT combines both bi-encoders and cross-encoders. It operates using token-level embeddings and computes the a maximum similarity score between each query token.

Bi-Encoders: Efficient for large-scale retrieval, faster
Cross-Encoders: Precise for relevance assessment, slower
Hybrid models (like ColBERT): Balancing efficiency and precision

Source: The Basics of AI-Powered (Vector) Search

In NLP, an embedding space is the vector space created by base vectors representing the context and the meaning that words carry. For one vector might be the positivity/negativity of a word, the other vector the importance/unimportance. The concept shows similarity to linear algebra vector spaces created by eigenvectors, principal component analysis (with rotation), or structural equations.
Further, similar words are observed to be close together in the embedding space and will have a small angel in between them, meaning that the cosine shall be near to, referred to as cosine similarity.

For more news on AI, I highly recommend subscribing to the TLDR Tech Newsletter (my referral) & MIT Sloan Review.