Data Science Quiz for Data Analysts: Ready to Switch to DS?

Quick Summary

This data science readiness quiz has 25 questions across three tiers and five domains, built specifically for working data analysts considering the switch to data science.

Questions test real decision-making judgment, not tool syntax, the kind of thinking that separates analyst-mode from data scientist-mode in interviews and on the job.

The five domains covered are Statistics and Experimentation, Python and Tooling, Machine Learning, Problem Framing, and Causal Reasoning.

Use your domain scores, not just your total, to identify exactly where to focus your preparation before making the switch.

Most data analysts who want to move into data science don’t have a skills problem. They have a blind spot problem. You know SQL, you understand your domain, and you’ve probably built more dashboards than you can count. But data science asks for something different, and the gap is rarely where people expect it to be.

This data science quiz is built specifically for working analysts considering the switch. It doesn’t test tool syntax or definitions you could look up in ten seconds. Every question reflects a real decision point that separates analyst thinking from data scientist thinking, the kind of judgment that comes up in DS interviews and in the first few months on the job.

The quiz covers five domains: Statistics and Experimentation, Python and Tooling, Machine Learning, Problem Framing, and Causal Reasoning. Questions are structured across three tiers of increasing difficulty, so you’ll get a sense not just of what you know, but of exactly where your thinking starts to shift from analyst to data scientist.

If you’re still weighing whether to make the switch at all, start with our full Data Analyst to Data Scientist guide first. This quiz assumes you’ve made that call and want to know how ready you actually are.

Table of Contents

How to Use This Quiz
Tier 1: Analyst Foundations (Q1 to Q8)
Tier 2: The Transition Zone (Q9 to Q18)
Tier 3: Thinking Like a Data Scientist (Q19 to Q25)
How to Interpret Your Score
Domain Gap Analysis
FAQs

How to Use This Quiz

There are 25 questions split across three tiers. Tier 1 covers foundations that strong data analysts should already have. Tier 2 sits in the transition zone where the real gaps tend to show up. Tier 3 tests the kind of judgment that is expected of data scientists but rarely of analysts.

For each question, pick your answer using the radio buttons, then click “Reveal Answer” to see whether you were right and why it matters. Keep a tally of your correct answers per domain as you go, because you’ll need those numbers for the gap analysis at the end. A notepad or sticky note works fine for this.

Most analysts ace Tier 1, slow down in Tier 2, and hit their real gaps in Tier 3. That’s exactly what the quiz is designed to show you.

Tier 1: Analyst Foundations (Q1 to Q8)

These are the skills a strong, senior data analyst should have solid footing in. If you’re dropping points here, that’s where to start before thinking about the transition at all.

Q1 (Stats Q)

You have a dataset with 10,000 rows and you want to know if the average order value differs between two customer segments. Which test do you use?

A. Chi-square test, because you’re comparing two groups.

B. A two-sample t-test, because you’re comparing means of a continuous variable across two independent groups.

C. Linear regression with segment as a predictor variable.

D. ANOVA, because you need to control for variance across the full dataset.

👉 Reveal Answer

✓ Correct Answer: B

Why this matters: Choosing the right statistical test for the right data type is a baseline expectation in data science. Chi-square is for categorical variables, ANOVA is for three or more groups, and linear regression would work but is overkill for a simple two-group mean comparison. A two-sample t-test is the clean, direct answer here.

Q2 (Python Q)

You need to calculate the 7-day rolling average of daily revenue in a pandas DataFrame. What’s the right approach?

A. Write a for loop that iterates over rows and manually averages the previous 7 values.

B. Use df[‘revenue’].rolling(7).mean(), which is vectorized and handles edge cases cleanly.

C. Export to Excel and use the built-in moving average chart feature.

D. Use df.groupby() with a custom aggregation function.

👉 Reveal Answer

✓ Correct Answer: B

Why this matters: Pandas has a .rolling() method built exactly for this. Using a for loop instead shows unfamiliarity with the library and will be painfully slow on real datasets. This is a basic Python fluency question that comes up in almost every DS technical screen.

Q3 (Stats Q)

Your dataset has a column with about 8% missing values. What’s the right first step before deciding how to handle them?

A. Drop all rows with missing values to keep the dataset clean.

B. Impute with the column mean immediately so the dataset is complete.

C. Understand why the values are missing. Whether the missingness is random or systematic changes what you should do next.

D. Use a model to predict and fill the missing values before doing any other analysis.

👉 Reveal Answer

✓ Correct Answer: C

Why this matters: Missing data falls into three types: missing completely at random, missing at random, and missing not at random. If users with missing income data are systematically lower earners who skipped the field, imputing with the mean will bias your model in a predictable direction. Jumping to a fix before understanding the mechanism is a common and costly mistake.

Q4 (Framing Q)

A stakeholder asks you to “look into why sales dropped last month.” What do you do first?

A. Pull the sales data and start slicing by region, product, and channel to find the drop.

B. Build a dashboard showing sales trends across the last six months.

C. Ask the stakeholder which sales metric they mean and send a preliminary summary within the hour.

D. Clarify which metric, what the relevant time window is, what “dropped” means relative to, and what decision the stakeholder needs to make from this analysis.

👉 Reveal Answer

✓ Correct Answer: D

Why this matters: Jumping to the data before defining the question is the single most common reason analysts produce analysis that doesn’t drive a decision. A data scientist’s job starts before the SQL query. Clarifying the metric, the baseline, and the downstream decision takes five minutes and saves hours of work in the wrong direction.

Q5 (Python Q)

What is the main advantage of using vectorized operations in pandas over row-wise iteration with iterrows()?

A. Vectorized operations produce more accurate results.

B. Vectorized operations use less memory.

C. Vectorized operations run significantly faster because they operate on entire arrays through optimized C code rather than iterating row by row in Python.

D. Vectorized operations are easier to read and debug.

👉 Reveal Answer

✓ Correct Answer: C

Why this matters: On a million-row dataset, iterrows() can take minutes. The equivalent vectorized operation takes seconds. This is not a trivia question. DS work routinely involves datasets at this scale, and writing inefficient code in a technical interview or on the job signals that you haven’t crossed over from spreadsheet-era thinking yet.

Q6 (ML Q)

What is the purpose of splitting your dataset into training and test sets before building a model?

A. To reduce computation time by training on a smaller dataset.

B. To evaluate how well the model generalizes to unseen data, rather than just measuring how well it memorized the training data.

C. To comply with standard industry practice, even if it doesn’t change the model’s performance.

D. To create two independent models and compare which one performs better.

👉 Reveal Answer

✓ Correct Answer: B

Why this matters: A model that scores 99% on training data and 60% on test data is not a good model. It has memorized patterns that don’t exist in the real world. The train/test split is how you catch this before shipping something broken to production.

Q7 (Causal Q)

You find that cities with more hospitals have higher death rates. Does this mean hospitals are causing deaths?

A. Possibly. More hospitals means more medical interventions, which increases risk.

B. No. Larger cities have more hospitals and also higher absolute death counts. Population size is a confounding variable driving both metrics.

C. The data is flawed and needs to be recollected before drawing any conclusions.

D. Yes, but only in cities where the hospital density exceeds the national average.

👉 Reveal Answer

✓ Correct Answer: B

Why this matters: Confounding variables are everywhere in real data. A third variable, in this case population size, is driving both the number of hospitals and the number of deaths. This is one of the first causal reasoning tests interviewers use because it’s simple on the surface and exposes whether you instinctively distinguish correlation from causation.

Q8 (Stats Q)

You’re comparing conversion rates across five different marketing channels. Why is running five separate t-tests a problem?

A. T-tests only work for continuous variables, not conversion rates.

B. Five tests take too long to run and would slow down the analysis.

C. Running multiple tests inflates the false positive rate. With enough comparisons, you’ll find a “significant” result by chance even if nothing is real. This is the multiple comparisons problem.

D. You should use a chi-square test instead, since conversion is a binary outcome.

👉 Reveal Answer

✓ Correct Answer: C

Why this matters: At a 5% significance threshold, running 20 tests means you’d expect one false positive by pure chance. The correct approach for multiple comparisons is ANOVA for overall differences, followed by post-hoc tests with corrections like Bonferroni if you need pairwise results. This is a common trap in real A/B testing environments.

Tier 2: The Transition Zone (Q9 to Q18)

This is where the analyst-to-DS gap actually lives. These questions mix deeper statistical judgment, applied machine learning, and the problem framing skills that DS roles demand but analyst roles rarely develop. Most candidates start losing points somewhere in this tier.

Q9 (Stats Q)

You run an A/B test for 3 days. The treatment group shows a 12% lift in Day-1 retention with p = 0.04. Your PM wants to ship it. What do you do?

A. Ship it. A p-value below 0.05 is statistically significant by definition.

B. Push back. Three days is almost certainly too short to reach adequate power, and early lifts are often driven by novelty effects that disappear within a week.

C. Run the same test again to see if the result holds.

D. Check if the result holds separately for mobile and desktop before deciding.

👉 Reveal Answer

✓ Correct Answer: B

Why this matters: A p-value is not a standalone shipping signal. You need adequate test duration, sufficient sample size pre-calculated from your minimum detectable effect, and awareness of novelty effects. Data scientists are expected to push back on premature calls like this. That pushback is part of the job description.

Q10 (ML Q)

Your churn model achieves 95% accuracy on a dataset where 95% of users don’t churn. Why is this result meaningless?

A. It isn’t. 95% accuracy is strong regardless of the dataset composition.

B. The model needs more features before accuracy is a fair measure.

C. A model that predicts “no churn” for every single user would also score 95% accuracy. The model may have learned nothing useful at all, and you’d only discover this by checking recall on the churner class.

D. You should retrain on a larger dataset to get a more reliable accuracy score.

👉 Reveal Answer

✓ Correct Answer: C

Why this matters: Class imbalance makes accuracy a deceptive metric. When classes are skewed, you need precision, recall, F1, or AUC-ROC instead. Recognizing this trap and knowing which metric to use instead is one of the most commonly tested ML concepts in DS interviews.

Q11 (Framing Q)

A product manager asks you to build a model to predict which users will upgrade to premium. What’s the most important question to settle before writing a single line of code?

A. Which algorithm performs best on propensity modeling tasks.

B. What action will the business take based on this model’s output, and is that intervention actually feasible to execute?

C. How much labeled training data is available.

D. What accuracy threshold the PM considers acceptable.

👉 Reveal Answer

✓ Correct Answer: B

Why this matters: A model without a downstream action is just an intellectual exercise. Before building anything, you need to know what happens when the model identifies a high-propensity user. Is there an email trigger? An in-app prompt? A sales follow-up? If no action exists or none is planned, the model has no business value regardless of its accuracy.

Q12 (Python Q)

You need to find all users who appear in table A but not in table B. How do you do this in pandas?

A. pd.merge(A, B, how=’outer’)

B. pd.merge(A, B, how=’left’, indicator=True).query(‘_merge == “left_only”‘)

C. A[A[‘user_id’] != B[‘user_id’]]

D. pd.concat([A, B]).drop_duplicates()

👉 Reveal Answer

✓ Correct Answer: B

Why this matters: This is the pandas equivalent of a LEFT ANTI JOIN in SQL, a pattern analysts use constantly. Option C doesn’t work because you can’t compare Series of different lengths directly. Knowing how to translate your SQL intuition cleanly into pandas is a core part of the tooling shift this transition requires.

Q13 (ML Q)

You’re evaluating two fraud detection models. Model A has higher precision and Model B has higher recall. Which do you choose?

A. Always choose the higher precision model to minimize false alarms.

B. Always choose the higher recall model to catch more fraud.

C. It depends on the cost asymmetry. Missing actual fraud is usually worse than flagging a legitimate transaction, so recall often matters more. But if false positives destroy user trust, precision matters too. The business context determines the right call.

D. Use whichever model has the higher F1 score and move on.

👉 Reveal Answer

✓ Correct Answer: C

Why this matters: This is a judgment question, not a math question. Interviewers use it to test whether you understand that metric selection is a business decision, not a technical one. Candidates who pick D are outsourcing the judgment to a formula instead of thinking through what the business actually cares about.

Q14 (Causal Q)

Users who use Feature X have 3x higher retention than users who don’t. Your stakeholder wants to push all users to use Feature X. What’s your concern?

A. A 3x lift is too high to be credible. The data should be rechecked.

B. Retention is the wrong metric to use for this kind of feature analysis.

C. This is correlation, not causation. Feature X users are likely already more engaged. Forcing the feature on disengaged users won’t replicate the effect unless a controlled experiment proves the causal link.

D. No concern. A 3x signal is strong enough evidence to act on.

👉 Reveal Answer

✓ Correct Answer: C

Why this matters: This is one of the most common causal inference traps in product analytics. Engaged users use more features. That does not mean forcing feature adoption will retain disengaged users. You need a randomized experiment or a quasi-experimental design to make that causal claim safely.

Q15 (Stats Q)

What does it mean practically when you say a result is statistically significant at the 5% level?

A. There is a 95% chance the result is true.

B. The effect size is large enough to be practically meaningful.

C. If there were truly no effect, you would observe a result this extreme or more extreme only 5% of the time by chance. It says nothing about practical importance.

D. The model’s predictions will be correct 95% of the time.

👉 Reveal Answer

✓ Correct Answer: C

Why this matters: Statistical significance and practical significance are different things. A 0.001% improvement in conversion can be statistically significant on a large enough dataset but completely irrelevant to the business. Data scientists are expected to report both the p-value and the effect size, and to distinguish between the two clearly when presenting findings.

Q16 (ML Q)

Your model trains with 98% accuracy but tests at 61% on held-out data. What is happening and what do you do?

A. The test set is probably corrupted or unrepresentative. Resample it.

B. The model is underfitting. Add more features and increase model complexity.

C. The model is overfitting. It memorized the training data but hasn’t learned generalizable patterns. Address it with regularization, cross-validation, or a simpler model.

D. The model is fine. Training accuracy is what matters for deployment.

👉 Reveal Answer

✓ Correct Answer: C

Why this matters: A large gap between training and test performance is the textbook definition of overfitting. The model has learned noise in the training set rather than real signal. Every ML practitioner needs to be able to diagnose this and name the fixes. Regularization (L1/L2), cross-validation, pruning, and simpler architectures are all valid responses depending on the model type.

Q17 (Framing Q)

A stakeholder disagrees with your analysis and says your recommendation is wrong. How do you handle it?

A. Defer to them since they know the business better than you do.

B. Rerun the analysis with different parameters until you find a result they agree with.

C. Walk them through your assumptions and methodology clearly. Ask them to point to a specific flaw. If they raise a valid point, update the analysis. If not, hold your position and escalate if needed.

D. Ask your manager to step in and resolve the disagreement.

👉 Reveal Answer

✓ Correct Answer: C

Why this matters: Data scientists are expected to defend their work without being defensive, and to update their views when presented with good evidence. Simply deferring undermines your value. Tweaking the analysis to match what someone wants to hear is p-hacking. This is a behavioral question that DS interviews include explicitly, and interviewers watch closely for which instinct you reach for first.

Q18 (Causal Q)

You want to measure the impact of a sales training program on revenue. You compare revenue of employees who attended versus those who didn’t. What is the biggest threat to this analysis?

A. The training program was too short to have a measurable revenue impact.

B. Employees who opted into training are probably already more motivated, which means the two groups are not comparable to begin with. Any effect you measure is contaminated by self-selection bias.

C. Revenue is a lagging metric and won’t capture the training’s short-term effects.

D. The sample size of employees may be too small to reach statistical significance.

👉 Reveal Answer

✓ Correct Answer: B

Why this matters: Self-selection bias is the main threat in any voluntary treatment setting. High performers self-select into training. The fix is randomized assignment to training groups, or a quasi-experimental design like difference-in-differences if randomization isn’t possible. This is exactly the kind of real-world causal reasoning DS roles demand.

Tier 3: Thinking Like a Data Scientist (Q19 to Q25)

These questions test pure DS judgment, the kind that comes from operating with ambiguity, defending assumptions under pressure, and making decisions when the data doesn’t give you a clean answer. This is where most candidates hit their real ceiling.

Q19 (Causal Q)

Your team finds that users who receive push notifications have 20% higher daily active usage. Leadership wants to increase push notification frequency immediately. What is the first question you raise?

A. What time of day are the notifications being sent?

B. What is the technical cost of scaling up notification delivery?

C. Is this correlation or causation? Engaged users are more likely to have notifications enabled. Has anyone run a controlled experiment where notification frequency was randomized?

D. Which user segments respond best to push notifications so we can target them first?

👉 Reveal Answer

✓ Correct Answer: C

Why this matters: Engaged users opt into notifications and are also more active. Scaling up notifications without establishing a causal link risks annoying users with no engagement gain, and potentially increasing opt-out rates. Before acting on an observational correlation at scale, you need experimental evidence. This is causal thinking applied directly to a real product decision.

Q20 (Stats Q)

You’re designing an experiment to test a new recommendation algorithm with 7-day revenue per user as the primary metric. What is the most important design decision to make before you launch the test?

A. Choosing the right visualization to present results to leadership afterward.

B. Making sure the control group is slightly larger than the treatment group for statistical safety.

C. Pre-specifying the minimum detectable effect and calculating the required sample size before launching, so the test has enough power to detect a real effect if one exists.

D. Getting sign-off from the product and engineering teams before the test goes live.

👉 Reveal Answer

✓ Correct Answer: C

Why this matters: Running a test without pre-calculating your minimum detectable effect and required sample size leads to underpowered studies that can’t detect real effects, or worse, stopping early when results look promising. Pre-registration of your design is foundational experiment discipline and one of the areas DS interviewers probe most consistently.

Q21 (ML Q)

You include tomorrow’s sales figures as a feature in a model that predicts whether a customer will churn today. Your model achieves near-perfect accuracy. What is the problem?

A. No problem. More features generally improve model accuracy.

B. The model might be slightly overfitting because of too many features.

C. This is data leakage. The model is using information from the future that would not exist at prediction time, making the accuracy completely meaningless in production.

D. Near-perfect accuracy always indicates the model is too simple and needs more complexity.

👉 Reveal Answer

✓ Correct Answer: C

Why this matters: Data leakage is one of the most dangerous and common ML pitfalls in production environments. When a feature contains information that would not be available at the time the model makes a real prediction, the model learns a shortcut that only exists in the training data. The result looks extraordinary in evaluation and fails completely when deployed.

Q22 (Framing Q)

You finish a six-week analysis and present a recommendation that would save the company $2M annually. The VP disagrees and says the approach won’t work operationally. What do you do?

A. The VP has more authority. Update the recommendation to match their view.

B. Defend the analysis aggressively. The numbers don’t lie.

C. Ask the VP to be specific about the operational constraint. If it’s valid, incorporate it as a constraint and reframe the recommendation. If it’s not, walk through your assumptions calmly and ask them to point to a specific gap in the analysis.

D. Rerun the analysis with different assumptions until the recommendation aligns with the VP’s view.

👉 Reveal Answer

✓ Correct Answer: C

Why this matters: Senior data scientists are expected to be decision partners, not just analysts who hand off results. That means knowing when to hold your ground and when to genuinely update your view based on new information. Option A is pure deference. Option B ignores valid domain knowledge. Option D is p-hacking under pressure. Option C is what intellectual integrity looks like in practice.

Q23 (Causal Q)

You cannot run a randomized experiment because the product team says it’s too risky to expose half the user base to an untested feature. What approach do you take to estimate the causal effect instead?

A. Accept that causal inference is impossible without an experiment and report only the correlation.

B. Run the analysis on a sample of 1,000 users instead of the full base to reduce risk.

C. Use a quasi-experimental design such as difference-in-differences, regression discontinuity, or propensity score matching depending on what the data structure allows.

D. Present the observational correlation with a note that it might be causal and let leadership decide.

👉 Reveal Answer

✓ Correct Answer: C

Why this matters: Randomized experiments are the gold standard but they’re not always available. Knowing quasi-experimental methods is what separates a data scientist from someone who can only work in ideal conditions. Difference-in-differences, regression discontinuity, and propensity score matching are all standard tools for estimating causal effects from observational data when experiments aren’t possible.

Q24 (Framing Q)

A new feature launches and a week later DAU goes up 8%. The product team credits the feature and wants to announce the win. What questions do you ask before supporting that narrative?

A. Was the 8% lift statistically significant?

B. Is 8% a large enough effect to be worth announcing?

C. Was this feature launched with a controlled experiment? Were there any other changes that week? Is DAU typically seasonal at this time of year? Have you checked whether the lift persists past the novelty window?

D. Support the narrative. An 8% lift in DAU is a strong result and the team deserves recognition.

👉 Reveal Answer

✓ Correct Answer: C

Why this matters: Without a controlled experiment, you cannot attribute that DAU lift to the feature with confidence. Seasonality, concurrent changes, and novelty effects are all plausible alternative explanations. A data scientist’s job is to protect the organization from overconfident narratives, especially ones that might drive bad resource decisions downstream. This is what being a trusted decision partner actually means.

Q25 (ML Q)

You deploy a churn prediction model that performed well in testing. Three months later, its precision drops significantly. You haven’t changed anything. What is the most likely explanation?

A. The model has a bug that appeared after deployment.

B. The model needs to be retrained on a larger dataset.

C. Model drift. The statistical relationship between your features and the target has changed over time because real-world user behavior has shifted. The model is now making predictions based on patterns that no longer hold.

D. The test set used during evaluation was not representative of production data.

👉 Reveal Answer

✓ Correct Answer: C

Why this matters: Model drift is one of the most practical and underappreciated challenges in production ML. A model trained on last year’s user behavior will degrade as behavior evolves. Data scientists are expected to build monitoring pipelines that catch drift early, and to have a retraining strategy ready before it becomes a problem. This is the difference between building models and owning them.

How to Interpret Your Score

Add up your correct answers across all 25 questions. Find your tier below, then move to the domain tally section to get a more specific picture of where to focus your preparation.

Score	Tier	What It Means
21 to 25	Thinking Like a Data Scientist	Your fundamentals are strong and your judgment is close to interview-ready. Focus the rest of your prep on end-to-end projects and behavioral rounds.
15 to 20	In the Transition Zone	You have a solid foundation but specific gaps are showing up, most likely in Tier 2 and Tier 3. Use the domain breakdown below to find out which areas to target.
9 to 14	Strong Analyst, DS Gap Exists	Your analytical instincts are good but the technical and causal reasoning depth needs deliberate work. Budget three to five focused months on the phased roadmap.
0 to 8	Start Here First	The transition is absolutely possible, but start with Python and statistics fundamentals before touching machine learning. The hub guide has a clear phased roadmap for exactly this starting point.

Domain Gap Analysis: Finding Your Specific Weak Spots

Your overall score tells you where you are on the transition journey. Your domain scores tell you what to do about it. Go back through the quiz and count how many questions you got right in each domain using the domain tags next to each question number.

Domain	Questions	Your Score
Statistics and Experimentation	Q1, Q3, Q8, Q9, Q15, Q20	__ / 6
Python and Tooling	Q2, Q5, Q12	__ / 3
Machine Learning	Q6, Q10, Q13, Q16, Q21, Q25	__ / 6
Problem Framing	Q4, Q11, Q17, Q22, Q24	__ / 5
Causal Reasoning	Q7, Q14, Q18, Q19, Q23	__ / 5

What to Do With Your Domain Scores

If you scored below 60% in Statistics and Experimentation, this is the most important gap to close before anything else. The majority of DS interview questions in product and analytics roles test experiment design, p-value interpretation, and statistical validity. The practical focus here is not memorizing formulas but learning to judge when a result is trustworthy enough to act on. Work through a statistics and experimentation curriculum and practice designing experiments from scratch with pre-specified sample sizes.

If you scored below 60% in Python and Tooling, this is a fixable gap and usually the fastest one to close. The barrier here is not conceptual but syntactic. You already understand what groupby, filter, and pivot do. You just need to learn the pandas way to do it. Spend four to six weeks doing your existing analyst work entirely in Python notebooks, using Git to version your work. The goal is fluency, not expertise.

If you scored below 60% in Machine Learning, the issue is usually not the algorithms themselves but the judgment around them. Knowing that Random Forest exists is not enough. You need to be able to say why you chose it, what trade-offs it introduces, how you evaluated it, and what could go wrong. Focus on building one complete end-to-end project rather than surveying many algorithms shallowly.

If you scored below 60% in Problem Framing, this is the gap that is hardest to close through reading alone and the one that matters most in senior DS interviews. Practice is the only real fix. Take an ambiguous business question each week and spend 20 minutes writing out how you would clarify it, define success, propose an approach, and identify risks, before touching any data.

If you scored below 60% in Causal Reasoning, you are in good company. This is the most underprepared area across DS candidates coming from analytics backgrounds and the one that interviewers consistently say separates strong candidates from great ones. Start by learning to spot confounders and selection bias in your own past analysis work. Then study the basics of experiment design, what randomization actually achieves and when observational approaches can substitute for it.

Ready to Turn Your Score Into a Plan?

Now you know your score, your tier, and your domain gaps. The full Data Analyst to Data Scientist guide maps out a phased roadmap that goes from Python fundamentals all the way through interview preparation, with each phase aligned directly to the domain gaps this quiz surfaces.

Read the Full Transition Guide

Data Science Quiz for Data Analysts: Are You Ready to Switch to DS?

How to Use This Quiz

Tier 1: Analyst Foundations (Q1 to Q8)

You have a dataset with 10,000 rows and you want to know if the average order value differs between two customer segments. Which test do you use?

You need to calculate the 7-day rolling average of daily revenue in a pandas DataFrame. What’s the right approach?

Your dataset has a column with about 8% missing values. What’s the right first step before deciding how to handle them?

A stakeholder asks you to “look into why sales dropped last month.” What do you do first?

What is the main advantage of using vectorized operations in pandas over row-wise iteration with iterrows()?

What is the purpose of splitting your dataset into training and test sets before building a model?

You find that cities with more hospitals have higher death rates. Does this mean hospitals are causing deaths?

You’re comparing conversion rates across five different marketing channels. Why is running five separate t-tests a problem?

Tier 2: The Transition Zone (Q9 to Q18)

You run an A/B test for 3 days. The treatment group shows a 12% lift in Day-1 retention with p = 0.04. Your PM wants to ship it. What do you do?

Your churn model achieves 95% accuracy on a dataset where 95% of users don’t churn. Why is this result meaningless?

A product manager asks you to build a model to predict which users will upgrade to premium. What’s the most important question to settle before writing a single line of code?

You need to find all users who appear in table A but not in table B. How do you do this in pandas?

You’re evaluating two fraud detection models. Model A has higher precision and Model B has higher recall. Which do you choose?

Users who use Feature X have 3x higher retention than users who don’t. Your stakeholder wants to push all users to use Feature X. What’s your concern?

What does it mean practically when you say a result is statistically significant at the 5% level?

Your model trains with 98% accuracy but tests at 61% on held-out data. What is happening and what do you do?

A stakeholder disagrees with your analysis and says your recommendation is wrong. How do you handle it?

You want to measure the impact of a sales training program on revenue. You compare revenue of employees who attended versus those who didn’t. What is the biggest threat to this analysis?

Tier 3: Thinking Like a Data Scientist (Q19 to Q25)

Your team finds that users who receive push notifications have 20% higher daily active usage. Leadership wants to increase push notification frequency immediately. What is the first question you raise?

You’re designing an experiment to test a new recommendation algorithm with 7-day revenue per user as the primary metric. What is the most important design decision to make before you launch the test?

You include tomorrow’s sales figures as a feature in a model that predicts whether a customer will churn today. Your model achieves near-perfect accuracy. What is the problem?

You finish a six-week analysis and present a recommendation that would save the company $2M annually. The VP disagrees and says the approach won’t work operationally. What do you do?

You cannot run a randomized experiment because the product team says it’s too risky to expose half the user base to an untested feature. What approach do you take to estimate the causal effect instead?

A new feature launches and a week later DAU goes up 8%. The product team credits the feature and wants to announce the win. What questions do you ask before supporting that narrative?

You deploy a churn prediction model that performed well in testing. Three months later, its precision drops significantly. You haven’t changed anything. What is the most likely explanation?

How to Interpret Your Score

Domain Gap Analysis: Finding Your Specific Weak Spots

What to Do With Your Domain Scores

Uplevel your career with AI/ML/GenAI

Select a Date

Time slots

IK courses Recommended

Select a course based on your goals

Register for our webinar

How to Nail your next Technical Interview

Select a Date

Time slots

Registration completed!

🗓️ Friday, 18th April, 6 PM

Your Webinar slot

⏰ Mornings, 8-10 AM

Our Program Advisor will call you at this time

Register for our webinar

Transform Your Tech Career with AI Excellence

Transform Your Tech Career with AI Excellence

Transform your tech career

Transform your tech career

Get tech interview-ready to navigate a tough job market

Next webinar starts in

Your PDF Is One Step Away!