/ Bash Scripts

Analyze and filter data with AWK

AWK is a scripting language used for manipulating data and generating reports. This tutorial guides you through the process of creating a bash script that leverages the power of AWK for data analysis and filtering.

Prerequisites

You should have:

  1. A Linux system with bash shell
  2. Familiarity with Linux and Bash scripting
  3. An understanding of AWK’s basic syntax and operations

Bash Script Creation

Start by creating a new bash script. Open a terminal window, navigate to your desired directory, and create a new file:

$ touch analyze_with_awk.sh

Then, make the file executable:

$ chmod +x analyze_with_awk.sh

Bash Script Structure

Open the script in your preferred text editor:

$ nano analyze_with_awk.sh

First, specify the interpreter directive (shebang) at the top of the script. This will allow the system to understand that this script should run with the Bash shell:

#!/bin/bash

Argument Parsing

You’ll want to provide custom options for the script user. Consider a case where a user can provide an input file -i, a delimiter -d, and a column -c for data filtering. Use the getopts shell function to parse command-line arguments:

while getopts "i:d:c:" opt; do
  case ${opt} in
    i )
      inputFile=$OPTARG
      ;;
    d )
      delimiter=$OPTARG
      ;;
    c )
      column=$OPTARG
      ;;
    \? )
      echo "Invalid option: $OPTARG" 1>&2
      exit 1
      ;;
    : )
      echo "Invalid option: $OPTARG requires an argument" 1>&2
      exit 1
      ;;
  esac
done
shift $((OPTIND -1))

AWK Command

With the input parameters parsed, you can build the AWK command. Suppose you want to print all lines where the value in the specified column is greater than 50. Construct the AWK command as follows:

awk -v col="$column" -F"$delimiter" '{if($col > 50) print $0}' "$inputFile"

In this command:

  • -v col="$column" allows you to use shell variable $column inside the AWK script.
  • -F"$delimiter" sets the input field separator to the specified delimiter.
  • {if($col > 50) print $0} is the AWK script that prints a line if the column value is greater than 50.
  • $inputFile is the input file to process.

Complete Bash Script

The complete Bash script, analyze_with_awk.sh, should look as follows:

#!/bin/bash

while getopts "i:d:c:" opt; do
  case ${opt} in
    i )
      inputFile=$OPTARG
      ;;
    d )
      delimiter=$OPTARG
      ;;
    c )
      column=$OPTARG
      ;;
    \? )
      echo "Invalid option: $OPTARG" 1>&2
      exit 1
      ;;
    : )
      echo "Invalid option: $OPTARG requires an argument" 1>&2
      exit 1
      ;;
  esac
done
shift $((OPTIND -1))

awk -v col="$column" -F"$delimiter"

 '{if($col > 50) print $0}' "$inputFile"

Execution

Execute the script by providing necessary options:

$ ./analyze_with_awk.sh -i input.txt -d , -c 2

In the command above, -i input.txt specifies the input file, -d , sets the delimiter to comma, and -c 2 indicates that the script should analyze the second column.

Summary

You have created a Bash script that uses AWK for data analysis and filtering. You’ve also learned how to parse command-line arguments in Bash and execute AWK commands with custom options within a Bash script. This script provides a flexible and powerful tool for data analysis, and you can modify it according to your needs.

Reference

Was this helpful?

Thanks for your feedback!