Expanding my knowledge graph
One of my favorite courses at Carnegie Mellon was Graph Theory. I liked it so much I got a job as a research assistant running simulations and building a horrifyingly ugly UI with Java (meh, it was 2004 and the first and only time I built a desktop app).
GraphDBs and use cases like semantic search didn’t come up for me in my first ~15 years in tech but in recent years I hear about them and “knowledge graphs” regularly.
A customer I’m working with is currently building out a knowledge graph (KG) and we’re looking at how to incorporate it into an AI assistant that has access to their unstructured data in Sharepoint.
Overview
Neo4j, a leading graph database, launched vector similarity search (VSS) in August 2023. VSS is the technique used in semantic search to find similar chunks of documents to a (vectorized) input query. With the introduction of VSS within graph databases this made it easier to leverage KGs to find relevant structured data in a RAG design.
Use cases like:
- Decision support for medical practioners by making it easier to understand how symptoms, diseases and treatments relate to one another
- Translation and sentiment analysis where KGs provide context and disambiguate words,
- Learning Plans personalized to a user’s learning history and the relationships between different concepts
- Creative writing assistant that leverages the relationships and attributes of characters to generate potential plots and character developments
Although here in Los Angeles that last one is not what people want to talk about, so let's keep it moving...
Research
This week I was excited to read a paper about using KG’s to reduce hallucinations. Don’t get too excited - the methodology proposed ended up being less effective than a traditional RAG approach.
So I guess I learned what not to do —which is also valuable!
The author (research linked here) proposed creating a knowledge graph unstructured data and then using that as the source for a RAG implementation. It seemed overcomplicated to me, and I also noticed the author didn’t (or hasn’t yet) tried multiple version on the methodology.
My takeaway - if you have unstructured data, continue to embed it and store it in a vector database; if you have structured data and want to make it easier to understand the relationships between rows and tables —setup a knowledge graph.
That’s not to say that knowledge graphs aren’t useful in AI applications - there’s a lot of research happening. Here’s a summary if you want to dive in.
- An overview of research happening to build Large Graph Models (Nov 2023)
- an Iterative Reading-thenReasoning (IRR) framework to solve question answering tasks based on structured data, called StructGPT “Experimental results on 8 datasets show that our approach can boost the zero-shot performance of LLMs by a large margin, and achieve comparable performance as full-data supervisedtuning methods”
- Using graphs to improve reasoning capabiltiies in LLMs
My Experience
I spent some time generating GQL tuples and learning about cyphers and the query language for GraphDBs but wasn’t able to experiment this week.
I did find that using the new gpt-4o model to generate test data consistently had issues. ChatGPT regularly returned test data that didn’t adhere to the standard GQL format.
Not to fret —a simple follow up prompt validate the resulting tuples follow the standard for GQL and fix any issues
helped.
A good reminder that just as we edit our essays, QA our code and check our answers before we submit a test; so should our AI agents.
That's all for now. I hope you're enjoying these musings and they inspire you to learn and do your own experiments. -MQ