How To Filter Rows In R

In this article, we are going to filter the rows from dataframe in R programming linguistic communication using Dplyr package.

Dataframe in use:

Method 1: Subset or filter a row using filter()

To filter or subset row we are going to use the filter() function.

Syntax:

filter(dataframe,condition)

Here, dataframe is the input dataframe, and condition is used to filter the data in the dataframe

Example: R program to filter the data frame

R

              library              (dplyr)            
              data=              data.frame              (id=              c              (7058,7059,7060,7089,7072,7078,7093,7034),            
                            department=              c              (              'IT'              ,              'sales'              ,              'finance'              ,              'IT'              ,              'finance'              ,            
                            'sales'              ,              'Hr'              ,              'HR'              ),            
                            salary=              c              (34500.00,560890.78,67000.78,25000.00,            
                            78900.00,25000.00,45000.00,90000))            
              impress              (information)            
              print              (              "=========================="              )            
              print              (              filter              (data,section==              "sales"              ))            

Output:

Method 2: Filter dataframe with multiple weather

We are going to utilize the filter function to filter the rows. Here we have to specify the condition in the filter function.

Syntax:

filter(dataframe,condition1condition2,.condition north)

Here, dataframe is the input dataframe and conditions is used to filter the data in the dataframe

Instance: R program to filter multiple rows

R

              library              (dplyr)            
              data=              data.frame              (id=              c              (7058,7059,7060,7089,7072,7078,7093,7034),            
                            department=              c              (              'It'              ,              'sales'              ,              'finance'              ,              'IT'              ,              'finance'              ,            
                            'sales'              ,              'HR'              ,              'HR'              ),            
                            salary=              c              (34500.00,560890.78,67000.78,25000.00,            
                            78900.00,25000.00,45000.00,90000))            
              impress              (data)            
              print              (              "=========================="              )            
              print              (              filter              (data,department==              "sales"              & salary >27000))            

Output:

Case: Filter rows by OR operator

R

              library              (dplyr)            
              information=              data.frame              (id=              c              (7058,7059,7060,7089,7072,7078,7093,7034),            
                            section=              c              (              'IT'              ,              'sales'              ,              'finance'              ,              'IT'              ,              'finance'              ,            
                            'sales'              ,              'HR'              ,              '60 minutes'              ),            
                            salary=              c              (34500.00,560890.78,67000.78,25000.00,            
                            78900.00,25000.00,45000.00,90000))            
              print              (information)            
              print              (              "=========================="              )            
              print              (              filter              (data,section==              "It"              |  salary >27000))            

Output:

Example: R program to filter using and, or

R

              library              (dplyr)            
              information=              information.frame              (id=              c              (7058,7059,7060,7089,7072,7078,7093,7034),            
                            section=              c              (              'Information technology'              ,              'sales'              ,              'finance'              ,              'IT'              ,              'finance'              ,            
                            'sales'              ,              'HR'              ,              'Hour'              ),            
                            salary=              c              (34500.00,560890.78,67000.78,25000.00,            
                            78900.00,25000.00,45000.00,90000))            
              print              (data)            
              print              (              "=========================="              )            
              print              (              filter              (data,department==              "sales"              & bacon >27000 | salary<5000))            

Output:

Method 3: Using slice_head() role

This role is used to get top north rows from the dataframe.

Syntax:

dataframe %>% slice_head(n)

where, dataframe is the input dataframe, %>% is the operator (pipe operator) that loads the dataframe and due north is the number of rows to be displayed.

Example: R program that used slice_head() to filter rows

R

              library              (dplyr)            
              data=              information.frame              (id=              c              (7058,7059,7060,7089,7072,7078,7093,7034),            
                            department=              c              (              'IT'              ,              'sales'              ,              'finance'              ,              'It'              ,              'finance'              ,            
                            'sales'              ,              'HR'              ,              'HR'              ),            
                            salary=              c              (34500.00,560890.78,67000.78,25000.00,            
                            78900.00,25000.00,45000.00,90000))            
              impress              (data)            
              print              (              "=========================="              )            
              information %>%                            slice_head              (n=3)            
              impress              (              "=========================="              )            
              data %>%                            slice_head              (n=5)            
              print              (              "=========================="              )            
              information %>%                            slice_head              (n=i)            

Output:

Method 4: Using slice_tail() office

This function is used to get final n rows from the dataframe

Syntax:

dataframe %>% slice_tail(north)

Where, dataframe is the input dataframe, %>% is the operator (pipe operator) that loads the dataframe and n is the number of rows to be displayed from terminal

Example: R program to filter last rows by using slice_tail() method

R

              library              (dplyr)            
              information=              data.frame              (id=              c              (7058,7059,7060,7089,7072,7078,7093,7034),            
                            department=              c              (              'IT'              ,              'sales'              ,              'finance'              ,              'It'              ,              'finance'              ,            
                            'sales'              ,              'HR'              ,              'HR'              ),            
                            salary=              c              (34500.00,560890.78,67000.78,25000.00,            
                            78900.00,25000.00,45000.00,90000))            
              print              (data)            
              print              (              "=========================="              )            
              data %>%                            slice_tail              (north=3)            
              print              (              "=========================="              )            
              data %>%                            slice_tail              (n=5)            
              print              (              "=========================="              )            
              data %>%                            slice_tail              (n=i)            

Output:

Method v: Using top_n() function

This function is used to get tiptop north rows.

Syntax:

information %>% top_n(n=5)

Case: R program that filter rows using top_n() function

R

              library              (dplyr)            
              data=              data.frame              (id=              c              (7058,7059,7060,7089,7072,7078,7093,7034),            
                            department=              c              (              'IT'              ,              'sales'              ,              'finance'              ,              'Information technology'              ,              'finance'              ,            
                            'sales'              ,              'Hr'              ,              '60 minutes'              ),            
                            salary=              c              (34500.00,560890.78,67000.78,25000.00,78900.00,            
                            25000.00,45000.00,90000))            
              impress              (data)            
              print              (              "=========================="              )            
              data %>%                            top_n              (due north=3)            
              print              (              "=========================="              )            
              information %>%                            top_n              (n=five)            
              impress              (              "=========================="              )            
              information %>%                            top_n              (n=1)            

Output:

Method 6: Using slice_sample() function

Here, nosotros are going to filter rows using the slice_sample() function, this will render sample n rows randomly

Syntax:

slice_sample(north)

Example: R program to filter rows using slice_sample () function

R

              library              (dplyr)            
              data=              data.frame              (id=              c              (7058,7059,7060,7089,7072,7078,7093,7034),            
                            section=              c              (              'IT'              ,              'sales'              ,              'finance'              ,              'Information technology'              ,              'finance'              ,            
                            'sales'              ,              'HR'              ,              'HR'              ),            
                            bacon=              c              (34500.00,560890.78,67000.78,25000.00,            
                            78900.00,25000.00,45000.00,90000))            
              print              (data)            
              impress              (              "=========================="              )            
              data %>%                            slice_sample              (northward=3)            
              print              (              "=========================="              )            
              data %>%                            slice_sample              (northward=v)            
              impress              (              "=========================="              )            
              data %>%                            slice_sample              (n=ane)            

Output:

Method 7: Using slice_max() function

This office returns the maximum northward rows of the dataframe based on a column

Syntax:

dataframe %>% slice_max(cavalcade, due north )

Where dataframe is the input dataframe, the column is the dataframe column where max rows are returned based on this column and n is the number of maximum rows to exist returned

Example: R program to filter using slice_max() function

R

              library              (dplyr)            
              data=              data.frame              (id=              c              (7058,7059,7060,7089,7072,7078,7093,7034),            
                            department=              c              (              'IT'              ,              'sales'              ,              'finance'              ,              'Information technology'              ,              'finance'              ,            
                            'sales'              ,              'HR'              ,              'HR'              ),            
                            salary=              c              (34500.00,560890.78,67000.78,25000.00,            
                            78900.00,25000.00,45000.00,90000))            
              print              (information)            
              print              (              "=========================="              )            
              print              (data %>%                            slice_max              (salary, n = 3))            
              print              (              "=========================="              )            
              print              (data %>%                            slice_max              (department, north = 5))            
              print              (              "=========================="              )            

Output:

Method 8: Using slice_min() function

This part returns the minimum due north rows of the dataframe based on a cavalcade

Syntax:

dataframe %>% slice_min(column, north )

Where dataframe is the input dataframe, the column is the dataframe column where max rows are returned based on this column and northward is the number of minimum rows to be returned

Example: R program to filter using slice_min()

R

              library              (dplyr)            
              data=              data.frame              (id=              c              (7058,7059,7060,7089,7072,7078,7093,7034),            
                            department=              c              (              'Information technology'              ,              'sales'              ,              'finance'              ,              'IT'              ,              'finance'              ,            
                            'sales'              ,              'Hour'              ,              '60 minutes'              ),            
                            salary=              c              (34500.00,560890.78,67000.78,25000.00,            
                            78900.00,25000.00,45000.00,90000))            
              print              (information)            
              print              (              "=========================="              )            
              print              (data %>%                            slice_min              (salary, due north = 3))            
              print              (              "=========================="              )            
              print              (data %>%                            slice_min              (section, n = 5))            
              print              (              "=========================="              )            

Output:

Method 9: Using sample_frac() role

The sample_frac() function selects a random n percentage of rows from a data frame (or table). Commencement parameter contains the data frame proper name, the second parameter tells what percentage of rows to select

Syntax:

(sample_frac(dataframe,n)

Where dataframe is the input dataframe and northward is the fraction value

Example: R program to filter information using sample_frac() function

R

              library              (dplyr)            
              data=              data.frame              (id=              c              (7058,7059,7060,7089,7072,7078,7093,7034),            
                            section=              c              (              'IT'              ,              'sales'              ,              'finance'              ,              'Information technology'              ,              'finance'              ,            
                            'sales'              ,              'HR'              ,              'HR'              ),            
                            salary=              c              (34500.00,560890.78,67000.78,25000.00,            
                            78900.00,25000.00,45000.00,90000))            
              impress              (data)            
              print              (              "=========================="              )            
              print              (              sample_frac              (information,0.2))            
              print              (              "=========================="              )            
              impress              (              sample_frac              (data,0.four))            
              impress              (              "=========================="              )            
              print              (              sample_frac              (data,0.7))            
              print              (              "=========================="              )            

Output: