VIEW SPEECH SUMMARY
- LLMs are part of AI, machine learning, deep learning, and generative AI.
- Models like GPT, Gemini (Google), and others contribute to this landscape, which is rapidly evolving.
- Gemini is Google’s multimodal LLM able to process and generate text and images with very long context lengths.
- Gemini can be accessed via Google AI Studio, Google Cloud (Vertex AI), or via the open-source Gemma variant.
2. Common Problems with LLMs
- While easy to call via API, LLMs require careful pre- and post-processing.
- Hallucinations: LLMs generate false or outdated information.
- Poor calculation abilities: calculations done by LLMs are often incorrect.
- Outputs may vary in format and quality (free text, JSON, malformed data).
- Costs can escalate quickly with large or frequent queries.
- Latency and slow response times for large inputs like videos.
- Difficult to objectively measure output quality.
- Risk of disclosing personal or harmful content without filters.
3. Solutions and Best Practices
- Use LLM frameworks (e.g., Langchain, Langchain4j, Firebase Genkit, Semantic Kernel) to handle orchestration and processing pipelines.
- Grounding LLM responses by augmenting input with:
- Google Search results for up-to-date info.
- Google Maps for location data.
- Vertex AI Search for private/custom data.
- Employ code execution tools to handle calculation queries by generating and running Python code for accurate results.
- Apply Retrieval Augmented Generation (RAG):
- Chunk and embed documents into vector databases.
- At query time, embed the question and retrieve similar documents for grounding responses.
- Frameworks like Langchain simplify building RAG pipelines.
- Use function calling to integrate external APIs (weather, stocks, etc.) that LLMs call automatically to provide fresh data.
- To structure outputs, specify desired schemas (e.g., with Pydantic) to get reliable, parseable JSON responses.
- Manage expensive calls via context caching (reuse context without resending) and batch generation (send multiple requests simultaneously for cost savings and efficiency).
- Monitor performance and latency using tools like LangTrace to pinpoint slow operations.
- Evaluate output quality with frameworks including Vertex AI Gen AI evaluation, DeepEval, Ragas, Promfool, and TrueLens.
- Implement security frameworks (e.g., LLM guard, Promfool, Guardrails) to:
- Filter harmful or unwanted content.
- Enforce input/output rules (block code, ban topics, enforce language).
- Anonymize sensitive user data before sending to LLM and de-anonymize responses.
- Chain multiple scanners for layered security.
Actionable Items / Tasks:
- Explore and integrate LLM frameworks like Langchain to implement complex workflows involving pre/post processing.
- Implement grounding via search APIs or private data ingestion for accurate, current LLM responses.
- Use code execution tools to ensure accuracy in calculation-based queries.
- Develop or use RAG pipelines with vector embeddings and semantic search to build knowledge-grounded LLM applications.
- Enable function calling in your applications to augment LLM with real-time APIs.
- Define schemas for LLM outputs to enforce structured and reliable data formats.
- Utilize context caching and batch generation strategies to optimize cost and speed.
- Use monitoring tools such as LangTrace to troubleshoot and optimize LLM call performance.
- Adopt evaluation frameworks to quantitatively assess LLM output quality and relevancy.
- Integrate security and safety controls, including anonymization, content filtering, and scanning, to prevent privacy leaks and harmful content delivery.
- Review the speaker’s GitHub repo and provided example code for Langchain, grounding, evaluation, and security implementations to accelerate development.
Avoid common LLM pitfalls
15:20 - 15:50, 27th of May (Tuesday) 2025 / DEV AI & DATA STAGE
It’s easy to generate content with a Large Language Model (LLM), but the output often suffers from hallucinations (fake content), outdated information (not based on the latest data), and reliance on public data only (no private data). Additionally, the output format can be chaotic, often littered with harmful or personally identifiable information (PII), and using a large context window can become expensive—making LLMs less than ideal for real-world applications.
In this talk, we’ll begin with a quick overview of the latest advancements in LLMs. We’ll then explore various techniques to overcome common LLM challenges: grounding and Retrieval-Augmented Generation (RAG) to enhance prompts with relevant data; function calling to provide LLMs with more recent information; batching and context caching to control costs; frameworks for evaluating and security testing your LLMs and more!
By the end of this session, you’ll have a solid understanding of how LLMs can fail and what you can do to address these issues.