AWK is a scripting language used for manipulating data and generating reports. This tutorial guides you through the process of creating a bash script that leverages the power of AWK for data analysis and filtering.
Prerequisites
You should have:
- A Linux system with bash shell
- Familiarity with Linux and Bash scripting
- An understanding of AWK’s basic syntax and operations
Bash Script Creation
Start by creating a new bash script. Open a terminal window, navigate to your desired directory, and create a new file:
$ touch analyze_with_awk.sh
Then, make the file executable:
$ chmod +x analyze_with_awk.sh
Bash Script Structure
Open the script in your preferred text editor:
$ nano analyze_with_awk.sh
First, specify the interpreter directive (shebang) at the top of the script. This will allow the system to understand that this script should run with the Bash shell:
#!/bin/bash
Argument Parsing
You’ll want to provide custom options for the script user. Consider a case where a user can provide an input file -i
, a delimiter -d
, and a column -c
for data filtering. Use the getopts
shell function to parse command-line arguments:
while getopts "i:d:c:" opt; do
case ${opt} in
i )
inputFile=$OPTARG
;;
d )
delimiter=$OPTARG
;;
c )
column=$OPTARG
;;
\? )
echo "Invalid option: $OPTARG" 1>&2
exit 1
;;
: )
echo "Invalid option: $OPTARG requires an argument" 1>&2
exit 1
;;
esac
done
shift $((OPTIND -1))
AWK Command
With the input parameters parsed, you can build the AWK command. Suppose you want to print all lines where the value in the specified column is greater than 50. Construct the AWK command as follows:
awk -v col="$column" -F"$delimiter" '{if($col > 50) print $0}' "$inputFile"
In this command:
-v col="$column"
allows you to use shell variable$column
inside the AWK script.-F"$delimiter"
sets the input field separator to the specified delimiter.{if($col > 50) print $0}
is the AWK script that prints a line if the column value is greater than 50.$inputFile
is the input file to process.
Complete Bash Script
The complete Bash script, analyze_with_awk.sh
, should look as follows:
#!/bin/bash
while getopts "i:d:c:" opt; do
case ${opt} in
i )
inputFile=$OPTARG
;;
d )
delimiter=$OPTARG
;;
c )
column=$OPTARG
;;
\? )
echo "Invalid option: $OPTARG" 1>&2
exit 1
;;
: )
echo "Invalid option: $OPTARG requires an argument" 1>&2
exit 1
;;
esac
done
shift $((OPTIND -1))
awk -v col="$column" -F"$delimiter"
'{if($col > 50) print $0}' "$inputFile"
Execution
Execute the script by providing necessary options:
$ ./analyze_with_awk.sh -i input.txt -d , -c 2
In the command above, -i input.txt
specifies the input file, -d ,
sets the delimiter to comma, and -c 2
indicates that the script should analyze the second column.
Summary
You have created a Bash script that uses AWK for data analysis and filtering. You’ve also learned how to parse command-line arguments in Bash and execute AWK commands with custom options within a Bash script. This script provides a flexible and powerful tool for data analysis, and you can modify it according to your needs.
Reference
getopts
: Bash manualawk
: GNU AWK User’s Guide