Data Science Quiz for Data Analysts: Are You Ready to Switch to DS?

| Reading Time: 3 minutes

Authored & Published by
Nahush Gowda, senior technical content specialist with 6+ years of experience creating data and technology-focused content in the ed-tech space.

| Reading Time: 3 minutes
Quick Summary

This data science readiness quiz has 25 questions across three tiers and five domains, built specifically for working data analysts considering the switch to data science.

Questions test real decision-making judgment, not tool syntax, the kind of thinking that separates analyst-mode from data scientist-mode in interviews and on the job.

The five domains covered are Statistics and Experimentation, Python and Tooling, Machine Learning, Problem Framing, and Causal Reasoning.

Use your domain scores, not just your total, to identify exactly where to focus your preparation before making the switch.


Most data analysts who want to move into data science don’t have a skills problem. They have a blind spot problem. You know SQL, you understand your domain, and you’ve probably built more dashboards than you can count. But data science asks for something different, and the gap is rarely where people expect it to be.

This data science quiz is built specifically for working analysts considering the switch. It doesn’t test tool syntax or definitions you could look up in ten seconds. Every question reflects a real decision point that separates analyst thinking from data scientist thinking, the kind of judgment that comes up in DS interviews and in the first few months on the job.

The quiz covers five domains: Statistics and Experimentation, Python and Tooling, Machine Learning, Problem Framing, and Causal Reasoning. Questions are structured across three tiers of increasing difficulty, so you’ll get a sense not just of what you know, but of exactly where your thinking starts to shift from analyst to data scientist.

If you’re still weighing whether to make the switch at all, start with our full Data Analyst to Data Scientist guide first. This quiz assumes you’ve made that call and want to know how ready you actually are.

How to Use This Quiz

There are 25 questions split across three tiers. Tier 1 covers foundations that strong data analysts should already have. Tier 2 sits in the transition zone where the real gaps tend to show up. Tier 3 tests the kind of judgment that is expected of data scientists but rarely of analysts.

For each question, pick your answer using the radio buttons, then click “Reveal Answer” to see whether you were right and why it matters. Keep a tally of your correct answers per domain as you go, because you’ll need those numbers for the gap analysis at the end. A notepad or sticky note works fine for this.

Most analysts ace Tier 1, slow down in Tier 2, and hit their real gaps in Tier 3. That’s exactly what the quiz is designed to show you.


Tier 1: Analyst Foundations (Q1 to Q8)

These are the skills a strong, senior data analyst should have solid footing in. If you’re dropping points here, that’s where to start before thinking about the transition at all.

Q1 (Stats Q)

You have a dataset with 10,000 rows and you want to know if the average order value differs between two customer segments. Which test do you use?




👉 Reveal Answer

✓ Correct Answer: B

Why this matters: Choosing the right statistical test for the right data type is a baseline expectation in data science. Chi-square is for categorical variables, ANOVA is for three or more groups, and linear regression would work but is overkill for a simple two-group mean comparison. A two-sample t-test is the clean, direct answer here.

Q2 (Python Q)

You need to calculate the 7-day rolling average of daily revenue in a pandas DataFrame. What’s the right approach?




👉 Reveal Answer

✓ Correct Answer: B

Why this matters: Pandas has a .rolling() method built exactly for this. Using a for loop instead shows unfamiliarity with the library and will be painfully slow on real datasets. This is a basic Python fluency question that comes up in almost every DS technical screen.

Q3 (Stats Q)

Your dataset has a column with about 8% missing values. What’s the right first step before deciding how to handle them?




👉 Reveal Answer

✓ Correct Answer: C

Why this matters: Missing data falls into three types: missing completely at random, missing at random, and missing not at random. If users with missing income data are systematically lower earners who skipped the field, imputing with the mean will bias your model in a predictable direction. Jumping to a fix before understanding the mechanism is a common and costly mistake.

Q4 (Framing Q)

A stakeholder asks you to “look into why sales dropped last month.” What do you do first?




👉 Reveal Answer

✓ Correct Answer: D

Why this matters: Jumping to the data before defining the question is the single most common reason analysts produce analysis that doesn’t drive a decision. A data scientist’s job starts before the SQL query. Clarifying the metric, the baseline, and the downstream decision takes five minutes and saves hours of work in the wrong direction.

Q5 (Python Q)

What is the main advantage of using vectorized operations in pandas over row-wise iteration with iterrows()?




👉 Reveal Answer

✓ Correct Answer: C

Why this matters: On a million-row dataset, iterrows() can take minutes. The equivalent vectorized operation takes seconds. This is not a trivia question. DS work routinely involves datasets at this scale, and writing inefficient code in a technical interview or on the job signals that you haven’t crossed over from spreadsheet-era thinking yet.

Q6 (ML Q)

What is the purpose of splitting your dataset into training and test sets before building a model?




👉 Reveal Answer

✓ Correct Answer: B

Why this matters: A model that scores 99% on training data and 60% on test data is not a good model. It has memorized patterns that don’t exist in the real world. The train/test split is how you catch this before shipping something broken to production.

Q7 (Causal Q)

You find that cities with more hospitals have higher death rates. Does this mean hospitals are causing deaths?




👉 Reveal Answer

✓ Correct Answer: B

Why this matters: Confounding variables are everywhere in real data. A third variable, in this case population size, is driving both the number of hospitals and the number of deaths. This is one of the first causal reasoning tests interviewers use because it’s simple on the surface and exposes whether you instinctively distinguish correlation from causation.

Q8 (Stats Q)

You’re comparing conversion rates across five different marketing channels. Why is running five separate t-tests a problem?




👉 Reveal Answer

✓ Correct Answer: C

Why this matters: At a 5% significance threshold, running 20 tests means you’d expect one false positive by pure chance. The correct approach for multiple comparisons is ANOVA for overall differences, followed by post-hoc tests with corrections like Bonferroni if you need pairwise results. This is a common trap in real A/B testing environments.


Tier 2: The Transition Zone (Q9 to Q18)

This is where the analyst-to-DS gap actually lives. These questions mix deeper statistical judgment, applied machine learning, and the problem framing skills that DS roles demand but analyst roles rarely develop. Most candidates start losing points somewhere in this tier.

Q9 (Stats Q)

You run an A/B test for 3 days. The treatment group shows a 12% lift in Day-1 retention with p = 0.04. Your PM wants to ship it. What do you do?




👉 Reveal Answer

✓ Correct Answer: B

Why this matters: A p-value is not a standalone shipping signal. You need adequate test duration, sufficient sample size pre-calculated from your minimum detectable effect, and awareness of novelty effects. Data scientists are expected to push back on premature calls like this. That pushback is part of the job description.

Q10 (ML Q)

Your churn model achieves 95% accuracy on a dataset where 95% of users don’t churn. Why is this result meaningless?




👉 Reveal Answer

✓ Correct Answer: C

Why this matters: Class imbalance makes accuracy a deceptive metric. When classes are skewed, you need precision, recall, F1, or AUC-ROC instead. Recognizing this trap and knowing which metric to use instead is one of the most commonly tested ML concepts in DS interviews.

Q11 (Framing Q)

A product manager asks you to build a model to predict which users will upgrade to premium. What’s the most important question to settle before writing a single line of code?




👉 Reveal Answer

✓ Correct Answer: B

Why this matters: A model without a downstream action is just an intellectual exercise. Before building anything, you need to know what happens when the model identifies a high-propensity user. Is there an email trigger? An in-app prompt? A sales follow-up? If no action exists or none is planned, the model has no business value regardless of its accuracy.

Q12 (Python Q)

You need to find all users who appear in table A but not in table B. How do you do this in pandas?




👉 Reveal Answer

✓ Correct Answer: B

Why this matters: This is the pandas equivalent of a LEFT ANTI JOIN in SQL, a pattern analysts use constantly. Option C doesn’t work because you can’t compare Series of different lengths directly. Knowing how to translate your SQL intuition cleanly into pandas is a core part of the tooling shift this transition requires.

Q13 (ML Q)

You’re evaluating two fraud detection models. Model A has higher precision and Model B has higher recall. Which do you choose?




👉 Reveal Answer

✓ Correct Answer: C

Why this matters: This is a judgment question, not a math question. Interviewers use it to test whether you understand that metric selection is a business decision, not a technical one. Candidates who pick D are outsourcing the judgment to a formula instead of thinking through what the business actually cares about.

Q14 (Causal Q)

Users who use Feature X have 3x higher retention than users who don’t. Your stakeholder wants to push all users to use Feature X. What’s your concern?




👉 Reveal Answer

✓ Correct Answer: C

Why this matters: This is one of the most common causal inference traps in product analytics. Engaged users use more features. That does not mean forcing feature adoption will retain disengaged users. You need a randomized experiment or a quasi-experimental design to make that causal claim safely.

Q15 (Stats Q)

What does it mean practically when you say a result is statistically significant at the 5% level?




👉 Reveal Answer

✓ Correct Answer: C

Why this matters: Statistical significance and practical significance are different things. A 0.001% improvement in conversion can be statistically significant on a large enough dataset but completely irrelevant to the business. Data scientists are expected to report both the p-value and the effect size, and to distinguish between the two clearly when presenting findings.

Q16 (ML Q)

Your model trains with 98% accuracy but tests at 61% on held-out data. What is happening and what do you do?




👉 Reveal Answer

✓ Correct Answer: C

Why this matters: A large gap between training and test performance is the textbook definition of overfitting. The model has learned noise in the training set rather than real signal. Every ML practitioner needs to be able to diagnose this and name the fixes. Regularization (L1/L2), cross-validation, pruning, and simpler architectures are all valid responses depending on the model type.

Q17 (Framing Q)

A stakeholder disagrees with your analysis and says your recommendation is wrong. How do you handle it?




👉 Reveal Answer

✓ Correct Answer: C

Why this matters: Data scientists are expected to defend their work without being defensive, and to update their views when presented with good evidence. Simply deferring undermines your value. Tweaking the analysis to match what someone wants to hear is p-hacking. This is a behavioral question that DS interviews include explicitly, and interviewers watch closely for which instinct you reach for first.

Q18 (Causal Q)

You want to measure the impact of a sales training program on revenue. You compare revenue of employees who attended versus those who didn’t. What is the biggest threat to this analysis?




👉 Reveal Answer

✓ Correct Answer: B

Why this matters: Self-selection bias is the main threat in any voluntary treatment setting. High performers self-select into training. The fix is randomized assignment to training groups, or a quasi-experimental design like difference-in-differences if randomization isn’t possible. This is exactly the kind of real-world causal reasoning DS roles demand.


Tier 3: Thinking Like a Data Scientist (Q19 to Q25)

These questions test pure DS judgment, the kind that comes from operating with ambiguity, defending assumptions under pressure, and making decisions when the data doesn’t give you a clean answer. This is where most candidates hit their real ceiling.

Q19 (Causal Q)

Your team finds that users who receive push notifications have 20% higher daily active usage. Leadership wants to increase push notification frequency immediately. What is the first question you raise?




👉 Reveal Answer

✓ Correct Answer: C

Why this matters: Engaged users opt into notifications and are also more active. Scaling up notifications without establishing a causal link risks annoying users with no engagement gain, and potentially increasing opt-out rates. Before acting on an observational correlation at scale, you need experimental evidence. This is causal thinking applied directly to a real product decision.

Q20 (Stats Q)

You’re designing an experiment to test a new recommendation algorithm with 7-day revenue per user as the primary metric. What is the most important design decision to make before you launch the test?




👉 Reveal Answer

✓ Correct Answer: C

Why this matters: Running a test without pre-calculating your minimum detectable effect and required sample size leads to underpowered studies that can’t detect real effects, or worse, stopping early when results look promising. Pre-registration of your design is foundational experiment discipline and one of the areas DS interviewers probe most consistently.

Q21 (ML Q)

You include tomorrow’s sales figures as a feature in a model that predicts whether a customer will churn today. Your model achieves near-perfect accuracy. What is the problem?




👉 Reveal Answer

✓ Correct Answer: C

Why this matters: Data leakage is one of the most dangerous and common ML pitfalls in production environments. When a feature contains information that would not be available at the time the model makes a real prediction, the model learns a shortcut that only exists in the training data. The result looks extraordinary in evaluation and fails completely when deployed.

Q22 (Framing Q)

You finish a six-week analysis and present a recommendation that would save the company $2M annually. The VP disagrees and says the approach won’t work operationally. What do you do?




👉 Reveal Answer

✓ Correct Answer: C

Why this matters: Senior data scientists are expected to be decision partners, not just analysts who hand off results. That means knowing when to hold your ground and when to genuinely update your view based on new information. Option A is pure deference. Option B ignores valid domain knowledge. Option D is p-hacking under pressure. Option C is what intellectual integrity looks like in practice.

Q23 (Causal Q)

You cannot run a randomized experiment because the product team says it’s too risky to expose half the user base to an untested feature. What approach do you take to estimate the causal effect instead?




👉 Reveal Answer

✓ Correct Answer: C

Why this matters: Randomized experiments are the gold standard but they’re not always available. Knowing quasi-experimental methods is what separates a data scientist from someone who can only work in ideal conditions. Difference-in-differences, regression discontinuity, and propensity score matching are all standard tools for estimating causal effects from observational data when experiments aren’t possible.

Q24 (Framing Q)

A new feature launches and a week later DAU goes up 8%. The product team credits the feature and wants to announce the win. What questions do you ask before supporting that narrative?




👉 Reveal Answer

✓ Correct Answer: C

Why this matters: Without a controlled experiment, you cannot attribute that DAU lift to the feature with confidence. Seasonality, concurrent changes, and novelty effects are all plausible alternative explanations. A data scientist’s job is to protect the organization from overconfident narratives, especially ones that might drive bad resource decisions downstream. This is what being a trusted decision partner actually means.

Q25 (ML Q)

You deploy a churn prediction model that performed well in testing. Three months later, its precision drops significantly. You haven’t changed anything. What is the most likely explanation?




👉 Reveal Answer

✓ Correct Answer: C

Why this matters: Model drift is one of the most practical and underappreciated challenges in production ML. A model trained on last year’s user behavior will degrade as behavior evolves. Data scientists are expected to build monitoring pipelines that catch drift early, and to have a retraining strategy ready before it becomes a problem. This is the difference between building models and owning them.


How to Interpret Your Score

Add up your correct answers across all 25 questions. Find your tier below, then move to the domain tally section to get a more specific picture of where to focus your preparation.

Score Tier What It Means
21 to 25 Thinking Like a Data Scientist Your fundamentals are strong and your judgment is close to interview-ready. Focus the rest of your prep on end-to-end projects and behavioral rounds.
15 to 20 In the Transition Zone You have a solid foundation but specific gaps are showing up, most likely in Tier 2 and Tier 3. Use the domain breakdown below to find out which areas to target.
9 to 14 Strong Analyst, DS Gap Exists Your analytical instincts are good but the technical and causal reasoning depth needs deliberate work. Budget three to five focused months on the phased roadmap.
0 to 8 Start Here First The transition is absolutely possible, but start with Python and statistics fundamentals before touching machine learning. The hub guide has a clear phased roadmap for exactly this starting point.

Domain Gap Analysis: Finding Your Specific Weak Spots

Your overall score tells you where you are on the transition journey. Your domain scores tell you what to do about it. Go back through the quiz and count how many questions you got right in each domain using the domain tags next to each question number.

Domain Questions Your Score
Statistics and Experimentation Q1, Q3, Q8, Q9, Q15, Q20 __ / 6
Python and Tooling Q2, Q5, Q12 __ / 3
Machine Learning Q6, Q10, Q13, Q16, Q21, Q25 __ / 6
Problem Framing Q4, Q11, Q17, Q22, Q24 __ / 5
Causal Reasoning Q7, Q14, Q18, Q19, Q23 __ / 5

What to Do With Your Domain Scores

If you scored below 60% in Statistics and Experimentation, this is the most important gap to close before anything else. The majority of DS interview questions in product and analytics roles test experiment design, p-value interpretation, and statistical validity. The practical focus here is not memorizing formulas but learning to judge when a result is trustworthy enough to act on. Work through a statistics and experimentation curriculum and practice designing experiments from scratch with pre-specified sample sizes.

If you scored below 60% in Python and Tooling, this is a fixable gap and usually the fastest one to close. The barrier here is not conceptual but syntactic. You already understand what groupby, filter, and pivot do. You just need to learn the pandas way to do it. Spend four to six weeks doing your existing analyst work entirely in Python notebooks, using Git to version your work. The goal is fluency, not expertise.

If you scored below 60% in Machine Learning, the issue is usually not the algorithms themselves but the judgment around them. Knowing that Random Forest exists is not enough. You need to be able to say why you chose it, what trade-offs it introduces, how you evaluated it, and what could go wrong. Focus on building one complete end-to-end project rather than surveying many algorithms shallowly.

If you scored below 60% in Problem Framing, this is the gap that is hardest to close through reading alone and the one that matters most in senior DS interviews. Practice is the only real fix. Take an ambiguous business question each week and spend 20 minutes writing out how you would clarify it, define success, propose an approach, and identify risks, before touching any data.

If you scored below 60% in Causal Reasoning, you are in good company. This is the most underprepared area across DS candidates coming from analytics backgrounds and the one that interviewers consistently say separates strong candidates from great ones. Start by learning to spot confounders and selection bias in your own past analysis work. Then study the basics of experiment design, what randomization actually achieves and when observational approaches can substitute for it.


Ready to Turn Your Score Into a Plan?

Now you know your score, your tier, and your domain gaps. The full Data Analyst to Data Scientist guide maps out a phased roadmap that goes from Python fundamentals all the way through interview preparation, with each phase aligned directly to the domain gaps this quiz surfaces.

Read the Full Transition Guide

Register for our webinar

Uplevel your career with AI/ML/GenAI

Loading_icon
Loading...
1 Enter details
2 Select webinar slot
By sharing your contact details, you agree to our privacy policy.

Select a Date

Time slots

Time Zone:

IK courses Recommended

Master AI tools and techniques customized to your job roles that you can immediately start using for professional excellence.

Fast filling course!

Master ML, Deep Learning, and AI Agents with hands-on projects, live mentorship—plus FAANG+ interview prep.

Master Agentic AI, LangChain, RAG, and ML with FAANG+ mentorship, real-world projects, and interview preparation.

Learn to scale with LLMs and Generative AI that drive the most advanced applications and features.

Learn the latest in AI tech, integrations, and tools—applied GenAI skills that Tech Product Managers need to stay relevant.

Dive deep into cutting-edge NLP techniques and technologies and get hands-on experience on end-to-end projects.

Select a course based on your goals

Agentic AI

Learn to build AI agents to automate your repetitive workflows

Switch to AI/ML

Upskill yourself with AI and Machine learning skills

Interview Prep

Prepare for the toughest interviews with FAANG+ mentorship

Ready to Enroll?

Get your enrollment process started by registering for a Pre-enrollment Webinar with one of our Founders.

Next webinar starts in

00
DAYS
:
00
HR
:
00
MINS
:
00
SEC

Register for our webinar

How to Nail your next Technical Interview

Loading_icon
Loading...
1 Enter details
2 Select slot
By sharing your contact details, you agree to our privacy policy.

Select a Date

Time slots

Time Zone:

Almost there...
Share your details for a personalised FAANG career consultation!
Your preferred slot for consultation * Required
Get your Resume reviewed * Max size: 4MB
Only the top 2% make it—get your resume FAANG-ready!

Registration completed!

🗓️ Friday, 18th April, 6 PM

Your Webinar slot

Mornings, 8-10 AM

Our Program Advisor will call you at this time

Register for our webinar

Transform Your Tech Career with AI Excellence

Transform Your Tech Career with AI Excellence

Join 25,000+ tech professionals who’ve accelerated their careers with cutting-edge AI skills

25,000+ Professionals Trained

₹23 LPA Average Hike 60% Average Hike

600+ MAANG+ Instructors

Webinar Slot Blocked

Interview Kickstart Logo

Register for our webinar

Transform your tech career

Transform your tech career

Learn about hiring processes, interview strategies. Find the best course for you.

Loading_icon
Loading...
*Invalid Phone Number

Used to send reminder for webinar

By sharing your contact details, you agree to our privacy policy.
Choose a slot

Time Zone: Asia/Kolkata

Choose a slot

Time Zone: Asia/Kolkata

Build AI/ML Skills & Interview Readiness to Become a Top 1% Tech Pro

Hands-on AI/ML learning + interview prep to help you win

Switch to ML: Become an ML-powered Tech Pro

Explore your personalized path to AI/ML/Gen AI success

Your preferred slot for consultation * Required
Get your Resume reviewed * Max size: 4MB
Only the top 2% make it—get your resume FAANG-ready!
Registration completed!
🗓️ Friday, 18th April, 6 PM
Your Webinar slot
Mornings, 8-10 AM
Our Program Advisor will call you at this time

Get tech interview-ready to navigate a tough job market

Best suitable for: Software Professionals with 5+ years of exprerience
Register for our FREE Webinar

Next webinar starts in

00
DAYS
:
00
HR
:
00
MINS
:
00
SEC

Your PDF Is One Step Away!

The 11 Neural “Power Patterns” For Solving Any FAANG Interview Problem 12.5X Faster Than 99.8% OF Applicants

The 2 “Magic Questions” That Reveal Whether You’re Good Enough To Receive A Lucrative Big Tech Offer

The “Instant Income Multiplier” That 2-3X’s Your Current Tech Salary