Positive-Unlabeled (PU) Learning

Positive-Unlabeled (PU) Learning is a specialized area of machine learning, typically categorized as a semi-supervised technique. It is used when the training dataset contains a set of labeled positive (P) examples but the remaining large set of examples are unlabeled (U), meaning they contain a mix of both positive and negative examples, but the negative ones are not explicitly labeled as such. The goal of PU learning is to train a high-performing classifier using only the known positive examples and the unlabeled data.

Context: Relation to LLMs and Search

PU learning is highly relevant in data-scarce or data-collection-heavy areas of Generative Engine Optimization (GEO), particularly for tasks where collecting explicit negative examples is prohibitively difficult, expensive, or impractical.

Relevance Assessment (Implicit Feedback): In a search or Retrieval-Augmented Generation (RAG) system, it is easy to collect positive feedback: a user clicking a search result or giving a Generative Snippet a thumbs-up. However, a user not clicking a result (an unlabeled example) doesn’t necessarily mean it was negative; they may have simply found what they needed in the first result. PU learning helps train the underlying Ranking Algorithm by distinguishing between truly negative results hidden in the unlabeled set and the passive, ignored positive results.
Cold Start Problems: When an LLM application is first launched, there is very little explicitly labeled negative feedback. PU learning can quickly bootstrap the initial Fine-Tuning or Reward Model (RM) for Reinforcement Learning from Human Feedback (RLHF) by leveraging the plentiful positive examples and inferring the negative examples from the unlabeled data.
Data Curation: PU techniques are used to clean and curate training datasets for LLMs. If a desired type of data (positive) is known, PU learning can effectively identify and filter out “noisy” or undesirable examples (implied negative) from a vast, mixed corpus.

The Mechanics: Converting to Supervised Learning

The challenge in PU learning is that the classifier must learn to distinguish positive from negative without ever seeing a definitive negative label. Common approaches involve transforming the problem into a standard Supervised Learning task:

1. Two-Step Approach (Spy Technique)

Step 1: Identify Reliable Negatives (RN): A small fraction of the positive (P) examples are “planted” (spied) into the unlabeled (U) set. A classifier is then trained to distinguish between the P set and the U set. The examples from U that the classifier is very confident are not positive (and are not the planted spies) are designated as Reliable Negative (RN) examples.
Step 2: Final Classification: A final classifier (e.g., a Support Vector Machine (SVM) or a deep network) is trained using the original labeled Positive (P) data and the newly identified Reliable Negative (RN) data.

2. Cost-Sensitive/Risk Minimization

Mechanism: This method modifies the standard loss function (e.g., Cross-Entropy Loss) to account for the fact that the unlabeled data (U) contains a mix of both positive and negative examples. The goal is to minimize the empirical risk by estimating the prior probability of a positive sample occurring in the unlabeled set. This is a more complex, statistically rigorous approach.

PU Learning vs. Semi-Supervised Learning

While PU learning is technically a form of Semi-Supervised Learning (SSL), SSL generally implies having small sets of labeled positive and labeled negative examples, plus a large pool of unlabeled data. PU learning is more constrained, focusing only on the presence of labeled positive and the absence of explicit negative examples.

Related Terms

Supervised Learning (SL): The target state, as PU learning attempts to convert a difficult problem into a standard SL problem.
Reward Model (RM): A model often trained with PU techniques when human feedback is sparse or only positive preference is easily collected.
Precision: A key metric in search that PU learning helps optimize by minimizing the number of undetected False Positives hidden in the unlabeled set.

Appear More in
AI Engines

Dominate results in ChatGPT, Gemini & Claude. Contact us today.

This will take you to WhatsApp

AppearMore provides specialized generative engine optimization services designed to structure your brand entity for large language models. By leveraging knowledge graph injection and vector database optimization, we ensure your business achieves citation dominance in AI search results and chat-based query responses.