Building knowledge graphs from CNC machine manuals
A typical Fanuc CNC manual runs 2,000 pages. Inside those pages live thousands of parameters, hundreds of alarm codes, dozens of procedures, and a web of cross-references that would make a cartographer weep. We turned one into 779 structured, searchable knowledge units. Here is how that works — and why it matters.
What a knowledge unit actually is
A knowledge unit (KU) is a single, self-contained fact extracted from documentation. It is not a paragraph or a page — it is a structured piece of knowledge with a type, a claim, conditions, limitations, confidence score, and source references.
For example, from a Fanuc alarm code table:
Example KU — Type: procedure
Claim: A CNC lathe program header requires a sequence of mandatory G-codes: G18 (XZ plane), work offset (e.g. G54), movement mode, metric G71 or imperial G70 system, and absolute dimensioning G90.
Conditions: CNC lathe programming, Fanuc-compatible controllers
Confidence: 0.95
This is not a copy of the manual text. It is a distilled, typed, validated piece of knowledge that a search engine or AI assistant can reason about.
The extraction pipeline
Building a knowledge graph from CNC documentation is a multi-stage process. Each stage adds structure to what starts as a flat collection of pages.
Domain ontology construction
Before touching a single document, we build a domain-specific extraction ontology. For CNC machining, this includes entity types (tools, parameters, alarm codes, G-codes), relationship types (causes, requires, conflicts-with), and knowledge unit types (procedure, common_mistake, decision_rule, safety_rule, specification).
Visual detection
Every page is analyzed for visual structure: tables, diagrams, labeled figures, code blocks, headers, and procedure formatting. This is not OCR — it is layout understanding. A DocLayNet-trained model identifies what each region of the page represents.
Structural parsing
Tables become typed rows and columns. Diagrams become labeled entities. Procedures become ordered steps. Cross-references become explicit links. The document transitions from a visual artifact to a structured data source.
Entity and relationship extraction
Using the domain ontology, the system identifies entities (G18, Alarm 410, Spindle Speed), relationships (G18 selects XZ plane, Alarm 410 caused by servo error), and facts (maximum spindle speed is 12,000 RPM under condition X).
Knowledge unit assembly and validation
Extracted facts are assembled into typed knowledge units, each with a confidence score, source reference, and cross-validation against other KUs. The Visual Evidence Gate provides a second proof channel for values extracted from diagrams.
Each stage of the pipeline adds structure to raw document content.
What 779 KUs look like in practice
The CNC knowledge graph we built from a mixed corpus of Fanuc, Haas, and general machining documentation produced 1,140 raw KUs. After type-aware quality filtering, 779 remained — each meeting minimum confidence thresholds for its type.
The breakdown tells an interesting story about what CNC documentation actually contains:
Knowledge Unit Distribution
225 procedures — step-by-step workflows for setup, calibration, and maintenance
192 claims — factual assertions about machine behavior and capabilities
139 common mistakes — errors with symptoms and corrections
88 decision rules — if-then conditions for operational choices
77 tips — best practices from experienced operators
26 definitions — precise technical terminology
15 warnings — safety-critical information
12 causal relationships — cause-and-effect chains
The surprising finding is the density of "common mistakes." Nearly 18% of the extractable knowledge in CNC documentation relates to things that go wrong and how to fix them. This is the knowledge that matters most on the shop floor — and it is precisely the knowledge that is hardest to find with traditional search.
Why this matters operationally
A knowledge graph is not a fancier search index. It is a queryable representation of expertise. When a technician asks "how do I reduce chatter in machining?", the system does not find pages that contain the word "chatter." It finds the relevant decision rules, common mistakes, and procedures — typed, validated, and linked — and synthesizes a coherent answer with source citations.
The difference between searching a manual and querying a knowledge graph is the difference between looking for a word in a dictionary and asking an expert a question.
You can try this yourself on our live demo, which runs on exactly this CNC knowledge graph with 779 indexed KUs.