📖

What is Knowledge Graph?

A knowledge graph is a structured representation of real-world entities and the relationships between them, stored as nodes and edges that machines can read, query, and reason over. It connects facts into a network so that information can be retrieved, linked, and inferred rather than treated as isolated text.

A knowledge graph is a way of organizing information as a network of entities (the things in the world, such as people, places, products, or concepts) and the relationships that connect them. Instead of storing facts in isolated tables or documents, a knowledge graph links them together so that a statement like "Paris is the capital of France" is represented as a structured triple: a subject (Paris), a predicate (is the capital of), and an object (France). This structure lets software traverse connections, follow chains of meaning, and surface answers that go beyond keyword matching.

How a knowledge graph works

At its core, a knowledge graph is built from triples expressed in a formal schema, most commonly the Resource Description Framework (RDF) or the property-graph model used by databases such as Neo4j. Each triple asserts a fact: (Marie Curie) — discovered — (Radium). Nodes carry properties and types (e.g., a "Person"), while edges carry the nature of the relationship. A schema or ontology defines what kinds of entities and relations are allowed, which lets the graph enforce consistency and support basic reasoning.

Construction is typically a pipeline of extraction and integration. Source documents, databases, or web pages are parsed, named entities are recognized, candidate relationships are extracted (often with machine-learning models), and the results are resolved against an existing graph to merge duplicates. Query languages like SPARQL or Cypher then let applications ask graph-shaped questions, such as "which scientists worked at institutions funded by X and published on Y?" — something that would require costly joins in a relational database.

Why it matters

Knowledge graphs give AI systems a shared, explicit substrate of facts to draw on. Search engines use them to power direct-answer panels; recommendation systems use them to find related items through shared attributes; and large language models use them as a source of grounded, up-to-date information through techniques such as retrieval-augmented generation. By making relationships first-class, a knowledge graph also makes provenance and context traceable, which is critical in domains like healthcare, finance, and enterprise knowledge management where hallucination and stale data are real risks.

Key types

  • Open / public knowledge graphs — large, general-purpose graphs such as Wikidata, DBpedia, and Google's Knowledge Graph, built from public sources and used to enrich search and assistants.
  • Enterprise knowledge graphs — private graphs that unify a company's internal data (customers, products, contracts, assets) for analytics, compliance, and AI applications.
  • Domain knowledge graphs — focused graphs in a specific field, such as biomedical (e.g., UMLS, Gene Ontology) or materials science, where vocabulary control and curation matter more than breadth.
  • Multimodal knowledge graphs — extensions that link text nodes to images, video, or audio, enabling cross-modal retrieval and reasoning.

For all its variants, a knowledge graph's defining feature is that relationships are as queryable as the things they connect — turning scattered facts into a navigable, machine-readable map of a domain.

Frequently Asked Questions

What is the difference between a knowledge graph and a database?
A traditional relational database stores data in tables with predefined schemas and relies on joins to connect records. A knowledge graph stores data as a network of entities and relationships, where the connections themselves are first-class and can be traversed directly. This makes knowledge graphs more flexible for highly connected, evolving, or semantically rich data.
How is a knowledge graph different from a large language model?
A large language model (LLM) is a neural network that learns statistical patterns from text and generates fluent responses, but it does not store facts in a structured, queryable form. A knowledge graph is an explicit, curated store of facts and relationships. They are often combined — the graph supplies verified, up-to-date facts while the LLM handles natural-language understanding and generation.
What is retrieval-augmented generation (RAG) and how does it use a knowledge graph?
Retrieval-augmented generation is a pattern in which a model first retrieves relevant information from an external source and then generates an answer conditioned on that information. A knowledge graph can serve as the retrieval source, allowing the system to pull specific entities and relationships rather than raw text passages, which improves precision and makes the grounding of each claim inspectable.
Do small teams need a knowledge graph, or is it only for large companies?
Public knowledge graphs such as Wikidata and DBpedia are freely available, and lightweight graph databases make it practical for small teams to build focused graphs for specific projects. The investment only pays off when the data is genuinely relational and the team needs to query connections directly; for simple structured data, a spreadsheet or relational database is usually sufficient.