Named Entity Recognition (NER) is a fundamental task within Natural Language Understanding (NLU) that aims to locate and classify “named entities” in unstructured text into predefined categories. These categories typically include names of people, organizations, locations, expressions of time, quantities, monetary values, and percentages. NER systems identify the proper nouns in a text and label them with a semantic class, transforming unstructured text into structured, actionable data.
Context: Relation to LLMs and Generative Engine Optimization (GEO)
NER is a crucial Preprocessing step that structures data for Large Language Models (LLMs) and is vital for improving Information Retrieval (IR) in search engines.
- Knowledge Graph Construction: NER is a primary component used to build and populate Knowledge Graphs. By identifying entities (e.g., “Apple,” “Tim Cook,” “Cupertino”), NER links them to a database of real-world facts. This structured knowledge is then used by LLMs during Inference to ground their answers in factual data, improving Relevance and preventing Hallucination.
- Semantic Search and Query Analysis: In Neural Search, NER helps a search engine understand the core subjects of a user’s query. For example, in the query “Eiffel Tower tickets in June,” NER identifies “Eiffel Tower” (Location/Landmark) and “June” (Date). The search system can then use these structured entities to filter results and prioritize pages that match those specific entities, leading to higher quality Generative Snippets.
- LLM Input Structuring: While modern LLMs are capable of performing NER implicitly, explicit NER is often run on long source documents (especially in Retrieval-Augmented Generation (RAG) pipelines) to create metadata. This metadata helps the Vector Search component quickly retrieve chunks of text containing the required entities.
How NER Works
NER is a sequence-labeling task, where the system assigns a label to every Token in the input sentence.
Consider the sentence: “Sundar Pichai visited London last week.”
| Token | NER Tag | Category |
| Sundar | B-PER | Person (Beginning) |
| Pichai | I-PER | Person (Inside) |
| visited | O | Other (Not an entity) |
| London | B-LOC | Location |
| last | O | Other |
| week | B-TIM | Time |
- B (Beginning): Marks the start of a multi-word entity.
- I (Inside): Marks a word inside a multi-word entity.
- O (Outside): Marks a word that is not part of any named entity.
Modern NER is typically handled by large Transformer Architecture models like BERT, which use Context Window awareness to resolve ambiguous entities (e.g., correctly tagging “Apple” as an Organization when discussing stocks, or a Food when discussing fruit).
Related Terms
- Natural Language Understanding (NLU): The broader field that includes NER.
- Knowledge Graph: The structured database populated by the entities found via NER.
- Tokenization: The initial step of breaking text into the units that NER tags.