Expanding my knowledge graph

Expanding my knowledge graph
A crop from DALL-E "two people studying a whiteboard filled with drawings inspired by graph theory. they are dressed casually and have their hands in their pockets. The scene captures them in a spacious, well-lit room, engaging deeply with the complex material on the board."

One of my favorite courses at Carnegie Mellon was Graph Theory. I liked it so much I got a job as a research assistant running simulations and building a horrifyingly ugly UI with Java (meh, it was 2004 and the first and only time I built a desktop app).

GraphDBs and use cases like semantic search didn’t come up for me in my first ~15 years in tech but in recent years I hear about them and “knowledge graphs” regularly.

A customer I’m working with is currently building out a knowledge graph (KG) and we’re looking at how to incorporate it into an AI assistant that has access to their unstructured data in Sharepoint.

📝
A knowledge graph is a representation of facts and information based on connections across entities. It consists of nodes and edges that represent data entities and their relationships. For example a person-node “Taylor Swift” and an album-node “The Tortured Poets Department” has a relationship “released”. This is a great (technical) overview of Knowledge Graphs if you want to dive in.

Overview

Neo4j, a leading graph database, launched vector similarity search (VSS) in August 2023. VSS is the technique used in semantic search to find similar chunks of documents to a (vectorized) input query. With the introduction of VSS within graph databases this made it easier to leverage KGs to find relevant structured data in a RAG design.

Use cases like:

  • Decision support for medical practioners by making it easier to understand how symptoms, diseases and treatments relate to one another
  • Translation and sentiment analysis where KGs provide context and disambiguate words,
  • Learning Plans personalized to a user’s learning history and the relationships between different concepts
  • Creative writing assistant that leverages the relationships and attributes of characters to generate potential plots and character developments

Although here in Los Angeles that last one is not what people want to talk about, so let's keep it moving...

Research

This week I was excited to read a paper about using KG’s to reduce hallucinations. Don’t get too excited - the methodology proposed ended up being less effective than a traditional RAG approach.

So I guess I learned what not to do —which is also valuable!

The author (research linked here) proposed creating a knowledge graph unstructured data and then using that as the source for a RAG implementation. It seemed overcomplicated to me, and I also noticed the author didn’t (or hasn’t yet) tried multiple version on the methodology.

My takeaway - if you have unstructured data, continue to embed it and store it in a vector database; if you have structured data and want to make it easier to understand the relationships between rows and tables —setup a knowledge graph.

That’s not to say that knowledge graphs aren’t useful in AI applications - there’s a lot of research happening. Here’s a summary if you want to dive in.

My Experience

I spent some time generating GQL tuples and learning about cyphers and the query language for GraphDBs but wasn’t able to experiment this week.

I did find that using the new gpt-4o model to generate test data consistently had issues. ChatGPT regularly returned test data that didn’t adhere to the standard GQL format.

Not to fret —a simple follow up prompt validate the resulting tuples follow the standard for GQL and fix any issues helped.

A good reminder that just as we edit our essays, QA our code and check our answers before we submit a test; so should our AI agents.


That's all for now. I hope you're enjoying these musings and they inspire you to learn and do your own experiments. -MQ