© 2026 ContentWise. All rights reserved.
Why most LLMs fail in real-world TV personalization
ContentWise has spent more than 20 years benchmarking recommender systems, comparing every significant academic and commercial approach against industrial performance criteria. When the Netflix Prize winner was announced in 2010, ContentWise published its finding that the solution was useless for production. Netflix confirmed it two years later.
This paper applies the same benchmarking discipline to LLMs. The findings are specific, evidence-based, and directly actionable for operators building or upgrading recommendation infrastructure today.
First, you can use the LLM out of the box, without fine-tuning, or you can fine-tune it by feeding your catalog of content and user behavioral data into the LLM, modifying its weights so it knows your user base much better. This comes at an additional cost.
Second, you can use the LLM alone, or you can use it together with a traditional recommender system. These two dimensions give you four combinations.
95%
of deep learning LLM-based reccomendation techniques fail in production
50+
models benchmarked across 4 deployment scenarios
20 years
or R&D in recommendations systems and deep learning
Coming June 2027! Seats are limited and reserved for a select group of peers. Save the date now!
(Or you can mail info@contentwise.com for information.)
© 2026 ContentWise. All rights reserved.