Why 90% of LLMs Fail in TV Personalization

Paolo Cremonesi, CTO of ContentWise, benchmarks LLMs against industrial standards.

Large Language Models (LLMs) enhance OTT recommender systems by bridging the gap between collaborative filtering and semantic content understanding.

While traditional systems rely on user behavior, LLMs leverage deep semantic embeddings to analyze plot, actors, and metadata. However, for most broadcasters, a hybrid approach (integrating LLM-generated embeddings into traditional algorithms) remains the most cost-effective and accurate path to scalability.

The strategic outlook on LLMs in OTT

The emergence of Generative AI has created a paradox in the video industry: while the potential for semantic understanding is vast, the path to ROI is narrow. The core challenge lies in moving beyond the hype of Large Language Models to find where they truly outperform established industrial standards.

 

The evolution of video personalization

The context of today’s recommendation landscape is rooted in a decade-old lesson. Since the $1M Netflix Prize in 2010, the industry has learned that there is a significant divide between academic models and real-world industrial utility. As we transition from simple collaborative filtering to deep learning and now LLMs, the goal remains the same: scaling human-like content intuition across millions of users without sacrificing system performance or economic sustainability.

The goal isn’t to choose between LLMs and traditional Recommender Systems (RS). It is to orchestrate them. By pairing the semantic understanding of LLMs with the proven scalability of RS, you can build a discovery engine that is both human-like and production-ready.

 

The “efficiency gap” in generative AI

The most critical revelation from the field is that 90% of deep learning techniques published recently fail to outperform traditional systems in production-grade environments.

Using an LLM out-of-the-box for direct recommendations often leads to a popularity bias no better than basic metrics.

Furthermore, the high computational cost of real-time LLM inference creates a sustainability gap that only hyperscalers can currently afford to bridge.

The future is hybrid: LLMs adding explainability and semantic depth on top of scalable, data-driven recommender systems.Prof. Paolo Cremonesi

Key Technical Benchmarks for Media Leaders

💬

The hybrid advantage

The most efficient use of LLMs is in batch mode to perform feature engineering and metadata enrichment, rather than real-time recommendation.

🎭

Semantic embeddings

LLMs excel at creating embeddings, mathematical representations of the semantic meaning of content and the mood of a user.

💸

The scalability hurdle

Fine-tuning a proprietary LLM is only economically sustainable for hyperscalers like YouTube; for average OTT players, the computational cost outweighs the quality gains.

⚖️

Bias risks

LLMs often carry inherent cultural and popularity biases that can degrade the diversity of recommendations if not balanced by traditional behavioral

[01:50 min] Defining Success: What actually constitutes the "best" recommender system in a commercial environment?

[04:15 min] The Modern Recommender Pipeline: Mapping where LLMs fit into feature engineering and user interaction.

[06:40 min] Implementation Scenarios: A comparison of "out-of-the-box" usage vs. fine-tuned hybrid systems.

[09:20 min]The Economics of AI: Why building a proprietary LLM is a game reserved for the world's largest tech entities.

[12:15 min]Accuracy Benchmarks: Data-backed results comparing LLMs to traditional popularity-based baselines.
Paolo Cremonesi

As a visionary academic and technology entrepreneur, in his dual roles as the co-founder and CTO of ContentWise, Paolo infuses the organization with his deep expertise in data science, AI algorithms, and computing. A professor of Computer Science Engineering at Politecnico di Milano, he leads innovative research in quantum computing and recommender systems.

Paolo, with a Ph.D. in computer science and a Master’s degree in aerospace engineering, has authored over 200 papers and several patents, marking him as a prominent innovator in data science and AI.