Category: Knowledge Graphs

Bridging Legal Knowledge and AI: Retrieval-Augmented Generation with Vector Stores, Knowledge Graphs, and Hierarchical Non-negative Matrix Factorization

Ryan C. Barron, Maksim E. Eren, Olga M. Serafimova, Cynthia Matuszek, Boian S. Alexandrov

May 9, 2025 (v2)

arXiv | PDF

The paper presents a jurisdiction-specific legal AI system that combines Retrieval-Augmented Generation (RAG), Vector Stores (VS), Knowledge Graphs (KG), and Hierarchical Non-Negative Matrix Factorization (HNMFk) to improve legal information retrieval and reduce LLM hallucinations. The system was built and tested on New Mexico’s legal corpus: 265 constitutional provisions, 28,251 statutory sections, 5,727 Supreme Court cases, and 10,072 Court of Appeals cases, all scraped from Justia.

The core innovation is using HNMFk (via the T-ELF library) to automatically discover latent topic clusters within legal documents and then integrating those topics into a Neo4j knowledge graph alongside citation links and metadata. When a user asks a legal question, the system performs both semantic vector search and knowledge graph traversal, then feeds the combined results to an LLM for grounded, citation-backed answers. In evaluations against GPT-4o, Claude 3 Opus, Gemini Pro, and Nemotron-70B, the system provided more accurate, reproducible, and citation-specific answers — particularly for quantitative queries (e.g., counting cases mentioning “habeas corpus”) and citation pattern queries where general-purpose LLMs either refused to answer or hallucinated fake case names.

Retrieval-Augmented Generation (RAG) is a technique where an LLM doesn’t rely solely on its training data to answer questions. Instead, it first retrieves relevant documents from an external database, then generates an answer grounded in those documents. This reduces hallucinations because the model cites real sources rather than guessing. Think of it like an open-book exam versus a closed-book exam.

Vector Store (VS) / Vector Database is a database that stores text as high-dimensional numerical vectors (embeddings) rather than raw strings. When you search, your query is also converted to a vector, and the database finds the most semantically similar documents using distance metrics (cosine similarity). This means “due process violations” can match documents about “constitutional rights infringements” even without shared keywords. The paper uses Milvus as its vector database and OpenAI’s text-embedding-ada-002 for generating embeddings.

Knowledge Graph (KG) is a structured database where information is stored as nodes (entities) connected by edges (relationships), forming triplets like (Case A) –[cites]–> (Statute B). Unlike vector stores that find similar documents, knowledge graphs can traverse explicit relationships — e.g., “find all cases that cite this statute and were decided after 2010.” The paper uses Neo4j, the most widely-used graph database.

Non-Negative Matrix Factorization (NMF) is a dimensionality reduction technique that decomposes a large matrix into two smaller matrices where all values are non-negative (zero or positive). For text, you start with a TF-IDF matrix (documents x terms) and decompose it into: (1) a topics x terms matrix (what words define each topic) and (2) a documents x topics matrix (which topics each document belongs to). The non-negativity constraint makes results interpretable — each topic is an additive combination of words, not a mix of positive and negative weights.

TF-IDF (Term Frequency-Inverse Document Frequency) is a numerical statistic that reflects how important a word is to a document within a corpus. Words that appear frequently in one document but rarely across all documents get high scores. “Estoppel” appearing in 50 of 10,000 cases would score high; “the” appearing everywhere scores near zero. It’s the input matrix that NMF decomposes.

Hierarchical NMF with Automatic Model Selection (HNMFk) is an extension of NMF that: (1) automatically determines the optimal number of topics k using bootstrap resampling and silhouette scores (rather than requiring you to guess), and (2) applies NMF recursively — first finding broad topics, then decomposing each into subtopics, creating a tree structure. The paper uses the T-ELF library (Tensor Extraction of Latent Features) developed at Los Alamos National Laboratory. Maximum decomposition depth was set to 2, with a minimum of 100 documents per cluster to continue decomposing.

Silhouette Score is a metric measuring how well-separated clusters are. Ranges from -1 to 1: high values mean data points are well-matched to their own cluster and poorly matched to neighboring clusters. Used here to determine the optimal number of topics at each level of the hierarchy.

Neo4j is a graph database that stores data as nodes and relationships natively (not in tables). Cypher is its query language, similar to how SQL is for relational databases. Example: MATCH (c:Case)-[:CITES]->(s:Statute) WHERE s.id = 'NMSA 41-5-1' RETURN c finds all cases citing a specific statute.

ROUGE-L is a metric for evaluating text summaries by measuring the longest common subsequence (LCS) between generated and reference text. Higher ROUGE-L means the generated text preserves more of the reference’s sentence structure. Used here to evaluate AI-generated legal answers against expert reference answers.

NLI (Natural Language Inference) Entailment is classification task where a model determines if one text (hypothesis) logically follows from another (premise). Labels: entailment (follows), contradiction (conflicts), or neutral (unrelated). Used here to check if AI-generated answers are logically supported by reference legal texts.

FactCC and SummaC are Evaluation metrics for factual consistency. FactCC fine-tunes a model on labeled correct/incorrect summaries to detect factual errors. SummaC aggregates entailment scores across sentence pairs. Both check whether generated text stays faithful to source documents — critical in legal contexts where a fabricated citation could lead to sanctions.

LEGAL-BERT is a version of BERT pre-trained on legal text (court opinions, legislation, contracts) rather than general web text. It better understands legal language nuances like “estoppel,” “res judicata,” and “habeas corpus.” Referenced as a baseline for domain-specific embeddings.

LexGLUE is a benchmark dataset for legal NLP with 7 tasks spanning contracts, court opinions, and legislation. Referenced as one of the standard evaluation frameworks for legal AI systems.

March 15, 2026
Capturing Legal Reasoning Paths from Facts to Law in Court Judgments using Knowledge Graphs

Ryoma Kondo, Riona Matsuoka, Takahiro Yoshida, Kazuyuki Yamasawa, Ryohei Hisano
August 24, 2025
PDF

The paper tackles legal reasoning by building a knowledge graph from 648 Japanese administrative court decisions that makes the hidden reasoning path machine-readable. The system uses large language models to extract the key components of legal reasoning: factual findings, legal provisions cited, and how the court applied those provisions to the facts, and connects them through a purpose-built legal ontology. The result is a structured graph where you can trace the logical steps from a fact to the legal norm it triggers to the outcome it produces. In retrieval tests, the system outperformed standard LLM baselines at finding the correct legal provisions given a set of facts, meaning the knowledge graph adds genuine precision beyond what a general-purpose AI can achieve alone.

Knowledge Graph (KG) is a database that stores information as a network of entities and relationships rather than rows and columns. In a legal context, entities might be facts, court decisions, legal provisions, parties, and the relationships between them capture how they connect (e.g., “Fact A triggers Provision B which leads to Outcome C”). Knowledge graphs make implicit relationships explicit and queryable.

Legal Reasoning Path is the structured logical chain a court follows from factual findings to a legal conclusion: facts → applicable legal norm → application of the norm to the facts → decision. In most court opinions this path is written as prose and must be inferred by a human reader. This paper’s core contribution is extracting and storing these paths as structured data.

Ontology is a formal specification of concepts and relationships within a domain — essentially a vocabulary with rules. A legal ontology defines what entities exist in legal reasoning (facts, norms, parties, outcomes) and how they can relate to each other. It constrains the knowledge graph so that extracted information follows a consistent structure across all cases.

Expert Annotation is created by having human domain experts (legal professionals) manually label examples to create a “gold standard” dataset for evaluating the system’s accuracy. The annotated examples serve as the benchmark. If the system’s extracted reasoning paths match what the experts identified, the system is considered accurate.

February 22, 2026

Category: Knowledge Graphs

Bridging Legal Knowledge and AI: Retrieval-Augmented Generation with Vector Stores, Knowledge Graphs, and Hierarchical Non-negative Matrix Factorization

Capturing Legal Reasoning Paths from Facts to Law in Court Judgments using Knowledge Graphs