How to Design a MaxDiff Survey
The Design Decisions That Matter
A MaxDiff survey has fewer moving parts than a conjoint study, but the design decisions still make or break your results. The item list, the number of items per set, the number of sets, and the experimental design all affect data quality, respondent experience, and what you can learn from the analysis.
Getting these right takes about an hour of upfront planning. Getting them wrong wastes your entire sample budget.
Step 1: Build Your Item List
How Many Items
Target 10-30 items. Below 10, a simple drag-and-drop ranking gives you roughly the same information with less setup. Above 30, the survey gets long because each item needs to appear at least 3 times across all sets. If you start with 40+ candidates, run a qualitative pre-screen (stakeholder interviews, customer feedback analysis, or a quick open-ended survey) to cut the list before building the MaxDiff.
Writing Good Items
Each item should be:
- Self-contained. Respondents need to understand it without context. "Real-time collaboration" works. "The thing we discussed in the Q3 roadmap meeting" doesn't.
- At the same abstraction level. Don't mix broad categories ("Better design") with specific features ("Add dark mode toggle in settings"). Respondents can't meaningfully compare items at different scales.
- Distinct from other items. Two items that sound similar ("Easy to use" and "Simple interface") will split votes and both score lower than they should. Merge them or keep the more specific one.
- Written in consumer language. Use the words your customers use, not internal product jargon. "Save drafts automatically" beats "Enable autosave persistence."
A Practical Test
Read your item list to someone outside the project. If they ask "What does this mean?" or "How is this different from that one?", revise. Ambiguous items produce ambiguous data.
Step 2: Set Items Per Set
Show 4-5 items per set. This is the number of options respondents see on each screen.
- 4 items per set is the most common default. It works well across all item counts and keeps the cognitive task simple.
- 5 items per set extracts slightly more information per task. Use it when you have 20+ items and want to keep the total number of sets manageable.
- 3 items per set is technically valid but inefficient. Each task still produces only one "best" and one "worst" data point, so showing fewer items wastes respondent effort.
- 6+ items per set adds cognitive load without proportional data gain. Respondents start scanning rather than evaluating.
When in doubt, use 4.
Step 3: Calculate the Number of Sets
Use this formula:
Minimum sets = (total items x minimum appearances) / items per set
"Minimum appearances" should be 3 for reliable estimation. More appearances produce tighter estimates, but each additional set adds respondent time.
| Total Items | Items Per Set | Min Appearances | Sets Needed |
|---|---|---|---|
| 12 | 4 | 3 | 9 |
| 15 | 4 | 3 | 11-12 |
| 20 | 4 | 3 | 15 |
| 20 | 5 | 3 | 12 |
| 25 | 5 | 3 | 15 |
| 30 | 5 | 3 | 18 |
Keep total sets at or below 20. Beyond that, respondent fatigue degrades data quality. If your calculation requires 20+ sets, either reduce the item count or accept 2 appearances per item (less ideal but workable with larger samples).
Step 4: Generate the Experimental Design
The experimental design assigns specific items to specific sets. It controls which items appear together and how often each item shows up. Don't create this manually.
Good designs meet three criteria:
- Level balance: Every item appears the same number of times across all sets.
- Pair balance: Every pair of items appears together roughly the same number of times. No two items should always appear in the same set (which would confound their effects) or never appear together.
- Positional balance: Items appear in different positions within the set (top, middle, bottom) to avoid position bias.
Most MaxDiff-capable platforms (including Quali-Fi) generate balanced incomplete block designs automatically. After generation, spot-check: does every item appear exactly 3 times? Does any pair of items appear together more than twice? If the answer to either question is wrong, regenerate.
Step 5: Choose the Task Framing
The question stem shapes how respondents interpret the task. Common framings:
| Research Goal | Task Wording |
|---|---|
| Feature importance | "Which is most/least important to you?" |
| Message appeal | "Which is most/least appealing?" |
| Purchase influence | "Which would most/least influence your decision?" |
| Brand perception | "Which best/least describes [Brand]?" |
| Employee priorities | "Which matters most/least in your workplace?" |
Match the framing to your research question. If you're prioritizing features for a product roadmap, "most important / least important" is clearest. If you're testing advertising claims, "most compelling / least compelling" gets closer to what you're actually measuring.
Step 6: Embed Within a Larger Survey
MaxDiff rarely runs as a standalone survey. Typical survey flow:
- Screening questions (confirm eligibility, quotas)
- Context-setting questions (usage behavior, current satisfaction)
- MaxDiff exercise (the core prioritization task)
- Follow-up questions (why they chose their top pick, demographics, firmographics)
Place the MaxDiff exercise early-to-middle in the survey, before respondent fatigue sets in. Avoid placing it after a long battery of Likert or open-ended questions.
Total survey time should stay under 12 minutes. The MaxDiff portion takes 3-5 minutes for a 15-20 item study, leaving 7-9 minutes for supporting questions.
Step 7: Soft Launch and Quality Check
Field 30-50 responses before full launch. Check:
- Completion rate: Below 70% signals the survey is too long or confusing.
- Median completion time: Under 2 minutes for the full survey suggests speeders. Over 20 minutes suggests confusion.
- Item frequency: Verify each item appears the expected number of times in the collected data. Design errors sometimes surface only in live data.
- Extreme patterns: Flag respondents who always pick the first item as "best" and last item as "worst" (position bias), or who complete the MaxDiff section in under 30 seconds.
Fix issues before committing to full sample.
Quick Design Checklist
- 10-30 items, all at the same abstraction level
- 4-5 items per set
- 12-18 sets (each item appears 3+ times)
- Balanced experimental design (auto-generated)
- Task framing matches your research question
- MaxDiff placed early in the survey, total survey under 12 minutes
- Soft launch with 30-50 respondents before full field
Frequently Asked Questions
Can I randomize which items each respondent sees?
In standard MaxDiff, every respondent sees every item. The randomization is in which items appear together in each set, not in which items are included. "Sparse" designs where respondents see only a subset of items exist but require larger samples and more complex analysis.
What if two of my items are very similar?
Merge them. Similar items split votes and both score lower than a combined version would. If you're genuinely unsure whether they're distinct, run a quick pilot with both and check if their scores are nearly identical.
Can I include images alongside text items?
Yes, and it helps when items are visual (package designs, logos, ad concepts). Just ensure image quality and load time are consistent across items so no item gets an advantage from better presentation.
Related Guides
- MaxDiff Analysis: Complete Guide -- Full methodology overview
- MaxDiff Sample Size Requirements -- How many respondents you need
- How to Interpret MaxDiff Results -- Reading scores and segments
- Best-Worst Scaling -- The three cases of BWS
- MaxDiff Survey Template -- Ready-to-use template
- Survey Question Types -- Where MaxDiff fits among other question types
Design your MaxDiff study -- try Quali-Fi free for 14 days.