Interview Kickstart has enabled over 21000 engineers to uplevel.
In coding interviews, software engineers are often given a problem statement that requires parsing complex inputs. This is where Java coders can stand out, as the String.split method helps in parsing complex strings easily using regex delimiters. This article discusses the different use-cases of the split method in detail.
List of topics covered in this article:
When we have multiple pieces of information in a string, we need to parse the string to extract the data as individual pieces. For example, parsing data from a file line by line.
The split method splits a string into multiple strings based on the delimiter provided and returns the array of strings computed by splitting this string around matches of the given regular expression.
The array returned by the method contains each substring terminated by another substring that matches the given expression or terminated by end of string. The order of substrings returned will be the same as they occur in the given string. If there is no match found for the delimiter provided, the resulting array of string will contain the single string provided.
Strings in Java can be parsed using the split method.
Following are the points to be considered while using the split method:
In Java, delimiters are characters that separate the strings into tokens. We can define any character as a delimiter in Java.
The delimiters should be treated as a separating character. This requires a little bit of knowledge of regular expressions as well. Let’s take an example.
We want to divide the string based on space character.
String text = "This is simple text";
String [] result = text.split(" ");
for(String word: result) {
System.out.println(word);
}
Output:
This
is
simple
text
Now, let’s change our previous example a little bit.
String text = “This is simple text”
We want to split this text as we have done in our previous example. We want to extract the words from the given text using the split method. But here, if we apply text.split(“ ”), we won’t get the expected result because the text contains multiple consecutive spaces, and a split of consecutive spaces will result in empty strings. Now, the question is — how should we treat consecutive delimiters?
For this example, we want consecutive spaces to be treated as one:
String text = "This is simple text";
String [] result = text.split("[ ]+");
for(String word: result) {
System.out.println(word);
}
In the above example, we have used [ ]+ as a delimiter. The delimiter is written inside square brackets. This form is a kind of regular expression. The plus sign is used to indicate that consecutive delimiters should be treated as one.
Say we want to split the string based on “|” or “.” The most obvious solution would be to split by a special character. But this will not yield the result we want.
String text = "This|is|simple|text";
String [] result = text.split("|");
for(String word: result) {
System.out.println(word);
}
This program will print the following output:
To solve this problem, we have to include escape characters along with special characters as delimiters to get the expected result.
String text = "This|is|simple|text";
String [] result = text.split("\\|");
for(String word: result) {
System.out.println(word);
}
Is there any other way to solve this problem? Yes, we can solve this problem using a special character inside a bracket. Let’s see the sample code for better understanding:
String text = "This|is|simple|text";
String [] result = text.split("[|]");
for(String word: result) {
System.out.println(word);
}
Suppose we have a string containing several sentences that use only commas, periods, question marks, and small case English letters. We want to split the string based on commas, periods, and question marks.
String text = "This,,,is?not.simple.text";
String [] result = text.split("[,?.]+");
for(String word: result) {
System.out.println(word);
}
Output:
This
is
not
simple
text
So, we can split the string based on multiple characters used as delimiter. We have to put all the splitting characters inside the bracket([]). Here, we have used [ ]+ as a delimiter. The plus sign (+) is used to indicate that consecutive delimiters should be treated as one.
So far, we've understood what split() does and also covered the various ways in which delimiters can be treated. Now, let's see how we can use regex along with the split method.
Following is the syntax for using regex along with split:
String[] split(String regex) : This will work in a similar manner to split(String regex, limit = 0).
String[] split(String regex, int limit)
Regex: This is a delimiting regular expression.
Limit: The limit parameter determines how many times the pattern is applied, and therefore it affects the length of the resultant array.
Let’s see some examples with limit parameters.
If we set the limit as 0, the pattern will be applied as many times as possible. The resulting array will return all strings that are separated by delimiters provided. Have a look at the example:
String text = "This is a simple text";
String [] result = text.split(" ",0);
for(String word: result) {
System.out.println(word);
}
Output:
This
is
a
simple
Text
If we provide a positive limit, then the pattern will be applied to (positive_limit -1).
String text = "This is a simple text";
String [] result = text.split(" ", 2);
for(String word: result) {
System.out.println(word);
}
Output:
This
is a simple text
While using the String.split() method in Java there are some common exceptions that we encounter. Mainly: PatternSyntaxException and NullPointerException. Lets discuss more about when these exceptions occur.
If the delimiter regular expression is not a valid syntax, it will throw a PatternSyntaxException. Let’s check the example below.
String text = "This is a sim\\ple text";
String [] result = text.split("\\");
for(String word: result) {
System.out.println(word);
}
The problem is backslash is an escape character for other special characters like “.” or “|”
If we want to split the string from \\, we have to introduce escape characters for this too. In Java, each of these backslashes needs to be escaped again. Have a look at the following program:
String text = "This is a sim\\ple text";
String [] result = text.split("\\\\");
for(String word: result) {
System.out.println(word);
}
Output:
This is a sim
ple text
The split method does not accept a null argument. It will throw java.lang.NullPointerException.
String text = "This is a simple text";
String [] result = text.split(null);
for(String word: result) {
System.out.println(word);
}
Advantages:
Disadvantage:
Question 1: Between String.Split and StringTokenizer, which is better?
Answer: Generally, StringTokenizer is faster in terms of performance, but String.split is more reliable. The split method of String and the java.util.regex package incur the significant overhead of using regexes, hence making it slow. StringTokenizer does not use java.util.regex and therefore gives better performance. On the other hand, String split returns an array of results and is more convenient to use than StringTokenizer.
Question 2: What is the best application of the string split method?
Answer: For validating official email IDs, we can easily split the string using the @ symbol, and then we can validate both the email as well as the domain name in the resultant array. It can also be used to parse data from a file line by line.
If you’re looking for guidance and help with getting started, sign up for our free webinar. As pioneers in the field of technical interview preparation, we have trained thousands of engineers to crack the toughest coding interviews and land jobs at their dream companies, such as Google, Facebook, Apple, Netflix, Amazon, and more!
---------
Article contributed by Problem Setters Official
Attend our webinar on
"How to nail your next tech interview" and learn