MaxDiff Analysis

How to Interpret MaxDiff Results

7 min read

A practical guide to reading MaxDiff output. Learn how to interpret utility scores, identify priority tiers, compare segments, and present findings to stakeholders.

How to Interpret MaxDiff Results

What MaxDiff Output Looks Like

MaxDiff analysis produces a utility score for each item in your list, representing its share of total preference. The scores are ratio-scaled: if Item A scores 12 and Item B scores 4, respondents prefer A three times as much as B. This level of quantitative precision is what separates MaxDiff from rating scales, where everything clusters near the top.

Understanding how to read these scores, spot meaningful patterns, and translate them into decisions is where the method's real value sits.

Reading Utility Scores

The Basics

Scores are typically rescaled to sum to 100 across all items. Each score represents the percentage of total preference that item captures. Here's an example from a 15-item feature prioritization study:

Rank Feature Score
1 Real-time collaboration 12.8
2 Offline access 10.1
3 Custom reporting 8.4
4 API access 7.9
5 Mobile app 7.2
6 SSO/SAML 6.5
7 Slack integration 5.8
8 Dark mode 5.1
9 Custom branding 4.6
10 Gantt charts 4.2
11 Time tracking 3.8
12 Resource allocation 3.5
13 Guest access 2.9
14 Custom fields 2.4
15 Emoji reactions 1.8

What You Can Say

  • Real-time collaboration is 7x more preferred than emoji reactions (12.8 / 1.8).
  • The top 3 features capture 31.3% of total preference (12.8 + 10.1 + 8.4).
  • The bottom 5 features account for only 15.4% combined.

These are ratio-scale statements. You can't make them with Likert data, where the same features might all score between 3.8 and 4.5 on a 5-point scale.

What You Can't Say

MaxDiff scores are relative. They don't tell you whether any item is "important" in an absolute sense. If all 15 features are niche and unexciting, the top-ranked one still scores 12.8. The score means it's the most preferred, not that customers urgently want it.

To address this limitation, consider anchored MaxDiff, which adds a threshold question ("Would you actually pay for / use this feature?") that separates genuinely desired items from items that are merely "best of the set."

Identifying Priority Tiers

Raw utility scores give you a continuous ranking, but business decisions usually need discrete categories: "build this," "consider this," "skip this."

Look for Natural Breaks

Plot the scores as a bar chart sorted by value. You'll often see visible gaps where clusters form:

  • Tier 1 (must-build): Items 1-3 in the example above. Clear separation from the rest, collectively capturing 31% of preference.
  • Tier 2 (should-build): Items 4-7. Solid scores but noticeably lower than Tier 1.
  • Tier 3 (nice-to-have): Items 8-12. Moderate scores, no single item stands out.
  • Tier 4 (deprioritize): Items 13-15. Minimal preference, safe to defer.

The break points depend on your specific data distribution. Sometimes there's a clear cliff between items 5 and 6. Sometimes the curve is smooth and you need to set thresholds based on business logic ("top quartile gets resourced this quarter").

Use the Ratio Test

For any pair of items, check the ratio. If Item A is 2x or more preferred than Item B, there's meaningful separation. If they're within 1.2x of each other (e.g., 5.8 vs. 5.1), they're effectively tied for practical purposes.

Segment Comparisons

MaxDiff's analytical value multiplies when you compare scores across segments. The same item list often produces dramatically different priorities for different audiences.

How to Compare

Run HB estimation separately for each segment, then compare the item rankings side by side. Look for:

  • Items that rank high for all segments: These are table stakes. Everyone wants them, so they're foundational features, not differentiators.
  • Items that rank high for one segment but low for another: These are segment-specific opportunities. API access might rank #2 for developers and #14 for marketing teams.
  • Items where segments directly conflict: Rare but actionable. If enterprise buyers prioritize compliance features and SMB buyers rank those same features last, you can't serve both with the same product positioning.

Statistical Testing

Don't assume that a rank difference equals a meaningful difference. If Feature X scores 8.4 in Segment A and 7.1 in Segment B, those scores might not be statistically distinguishable given the confidence intervals. Most HB software outputs standard errors. Use them to test whether between-segment differences are real before making product decisions based on segment splits.

Presenting MaxDiff Results to Stakeholders

Lead With the Priority Ranking

A horizontal bar chart sorted by score is the most intuitive visualization. Color-code the tiers (green for must-build, yellow for consider, gray for defer). This single visual answers the core question: "What should we focus on?"

Show Segment Contrasts

A side-by-side bar chart or heat map comparing segment rankings gives stakeholders the "so what" beyond the overall ranking. "Enterprise wants API access and SSO. SMB wants mobile and offline. Our product roadmap should reflect that split."

Use Ratios, Not Raw Scores

"Collaboration is 7x more preferred than emoji reactions" is more impactful and easier to remember than "collaboration scored 12.8 and emoji reactions scored 1.8." Translate the data into language that matches how your stakeholders make decisions.

Avoid Over-Precision

Don't present MaxDiff scores to two decimal places and treat the ranking as a rigid hierarchy. The difference between rank 6 (5.8) and rank 7 (5.1) may not be meaningful. Present tiers and clear priority groups, not a 15-item ranked list where every position matters.

Common Interpretation Mistakes

  1. Treating relative as absolute. The top-scored item isn't necessarily "important." It's the most preferred in your list. If the list contains weak items, the winner is the least weak.

  2. Ignoring the bottom of the list. The lowest-scoring items tell you what to stop investing in. That's often more actionable than the top of the list.

  3. Comparing scores across studies. A score of 12.8 in one study and 12.8 in another doesn't mean the same thing. MaxDiff scores are relative to the item list. Different items, different study, different scores.

  4. Reporting without context. Always pair MaxDiff scores with segment data and, if possible, behavioral data (usage, purchase history). Stated preference and actual behavior don't always align.

Frequently Asked Questions

Can I compare MaxDiff scores to Likert importance ratings?

Not directly. They measure different things on different scales. You can run both in the same survey (on different items) and compare directionally, but the numbers aren't on the same metric.

What's a "good" MaxDiff score?

There's no universal benchmark. Scores are relative to your item list and study design. Focus on the spread between items (is there meaningful separation?) and the ratio between top and bottom items (a wide ratio means clear differentiation; a narrow ratio means no strong preferences).

Should I report raw utilities or rescaled scores?

Rescaled scores (summing to 100) are easier for non-technical audiences to understand and compare. Raw utilities are more useful for advanced analysis. Report rescaled scores to stakeholders and keep raw utilities for your analysis files.


Analyze MaxDiff results in real time -- try Quali-Fi free for 14 days.

Frequently Asked Questions

Related Guides

Put it into practice

Ready to apply this in your research?

Quali-Fi makes it easy to run surveys, conjoint studies, and more, all in one platform.