Interview Kickstart has enabled over 21000 engineers to uplevel.
The world is moving quickly towards data-driven businesses to make decisions, perform actions, and use data for support. The process of collecting and organizing data to achieve helpful conclusions is the main objective of data analysis. This process uses analytical and logical reasoning to gain information from the data.
One of the tools used for data analysis is Pandas. Pandas is an open-source, fundamental, high-level building block used to perform practical and real-world data analysis in Python. It is one of the most popular data-wrangling packages. It performs well with many other data science modules inside the Python ecosystem.
In this article, we will learn and understand what Pandas is and how you can add a new column in an existing data frame.
Method 1: Declaring a new list as a column
Method 2: Using DataFrame.insert()
Method 3: Using the Dataframe.assign() method
Method 4: Using the dictionary data structure
Pandas provides powerful and flexible data structures that make data manipulation and analysis easy. A data frame is one of these structures. Data frames represent a method to store data in rectangular grids that can be easily overviewed for analysis. Each row of these grids corresponds to a value, while each column represents a vector containing data for a specific variable.
In Pandas, a DataFrame represents a two-dimensional, heterogenous, tabular data structure with labeled rows and columns (axes). In simple words, it contains three components ? data, rows, columns.
Consider the following data frame called df. It contains 14 columns.
You want to add another column called patient_name. There are multiple ways that you can perform this action. We’ll be covering four ways in the following sections.
First, you create a list that contains the required information (names of patients). Then, you create a column name (patient_name) in the data frame (df) to which we assign the newly created list by using the ‘=’ operator.
# Creating a list of names
names = ["Alice", "Mark", "John", "Bob", "David"]
# Creating the patient_name in the df data frame
df["patient_name"] = names
# Observe the result
df.head()
Result:
If you use this method, you have the flexibility to add the required column at any position in the existing data frame.
The syntax is as follows:
Considering the df data frame, you can add the patient_name column in the first position after the age column.
# Creating a list of names
names = ["Alice", "Mark", "John", "Bob", "David"]
# Using DataFrame.insert() to add the patient_name column
# Adding this column in position 1
df.insert(1, "patient_name", names)
# Observe the result
df.head()
Note: You can use column_position to add the column in any preferable position in the data frame. For example, if you want to add it in position 3, then the code will be: df.insert(3, "patient_name", names)
Result:
This method allows you to assign a new column into an existing data frame. Here, the patient_name column is passed as a parameter, and its corresponding list of values is equated against it.
# Creating a list of names
names = ["Alice", "Mark", "John", "Bob", "David"]
# Using assign() to create the patient_name column
# Column name must be equated with the corresponding list of values
df = df.assign(patient_name = names)
# Observe the result
df.head()
Result:
You can use the Python dictionary (key-value pair) to add a new column in an existing data frame. In this method, you must use the new column as the key and an existing column as the value.
# Creating a dictionary
# {key: value}
# key contains values of the new column
# values contain inputs of an existing column
# Example
# key represents the new values for the patient_name column
# value represents the age column which is an existing column
names = {"Alice": 63, "Mark": 37, "John": 41, "Bob": 56, "David": 57}
# Creating the patient_name column in the df data frame
df["patient_name"] = names
# Observe the result
df.head()
Alternatively, you can use the map function to add a new column in the df dataframe. This can be performed using the following code:
# Creating a dictionary, where keys represent age as mentioned in the data frame
nameDict = {63: "Alice",
37: "Mark",
41: "John",
56: "Bob",
57: "David"
}
# Using the map function to add new column in the pandas data frame
df["patient_name"] = df["Age"].map(nameDict)
# Observe the result
df.head()
Result:
Here are a few practice questions on Pandas and data frames:
Check out the interview questions and problems page to learn more
Question 1: How do I create a data frame using Pandas in Python?
You can use the following code to create a DataFrame using Pandas in Python:
Question 2: Can I add the values to a new column without creating a separate list?
It is important to provide a structure to the values that you want to add to a column. This structure is provided using the list data structure. Ideally, it is advisable to create a list then pass it into the functions to create a column.
Example: studentRecords = pd.DataFrame({"sName": ["Alice", "Bob"], "sAge": [20, 32]})
Question 3: Can I rewrite the values of a column once created?
Yes, we can overwrite the values of a column.
If you’re looking for guidance and help with getting your prep started, sign up for our free webinar. As pioneers in the field of technical interview prep, we have trained thousands of software engineers to crack the toughest coding interviews and land jobs at their dream companies, such as Google, Facebook, Apple, Netflix, Amazon, and more!
-------
Article contributed by Problem Setters Official
Attend our webinar on
"How to nail your next tech interview" and learn