How to count occurrences of text in a file? The Next CEO of Stack OverflowCount duplicated words in a text fileuniq --count command is yields incorrect result?How to compare two (vague) file lists and print the duplicates?How to count occurrences of each character?How do I count text lines?How long it will take to sort uniq a 62GB file?How does uniq work?Count number of data points in fileWrong sorting a text fileHow to count number of partial occurrences of a string in a file
Is it okay to majorly distort historical facts while writing a fiction story?
is it ok to reduce charging current for li ion 18650 battery?
How to invert MapIndexed on a ragged structure? How to construct a tree from rules?
How to get from Geneva Airport to Metabief?
Would this house-rule that treats advantage as a +1 to the roll instead (and disadvantage as -1) and allows them to stack be balanced?
Inappropriate reference requests from Journal reviewers
What is the difference between 翼 and 翅膀?
Is it my responsibility to learn a new technology in my own time my employer wants to implement?
Received an invoice from my ex-employer billing me for training; how to handle?
What happened in Rome, when the western empire "fell"?
What was the first Unix version to run on a microcomputer?
How did people program for Consoles with multiple CPUs?
Rotate a column
Make solar eclipses exceedingly rare, but still have new moons
Chain wire methods together in Lightning Web Components
unclear about Dynamic Binding
Is it possible to replace duplicates of a character with one character using tr
Why isn't acceleration always zero whenever velocity is zero, such as the moment a ball bounces off a wall?
Circle x^2 + y^2 = n! doesn't hit any lattice points for any n except for 0, 1, 2 and 6 or does it?
Newlines in BSD sed vs gsed
Can we say or write : "No, it'sn't"?
Why, when going from special to general relativity, do we just replace partial derivatives with covariant derivatives?
Is a distribution that is normal, but highly skewed considered Gaussian?
Why do remote US companies require working in the US?
How to count occurrences of text in a file?
The Next CEO of Stack OverflowCount duplicated words in a text fileuniq --count command is yields incorrect result?How to compare two (vague) file lists and print the duplicates?How to count occurrences of each character?How do I count text lines?How long it will take to sort uniq a 62GB file?How does uniq work?Count number of data points in fileWrong sorting a text fileHow to count number of partial occurrences of a string in a file
I have a log file sorted by IP addresses,
I want to find the number of occurrences of each unique IP address.
How can I do this with bash? Possibly listing the number of occurrences next to an ip, such as:
5.135.134.16 count: 5
13.57.220.172: count 30
18.206.226 count:2
and so on.
Here’s a sample of the log:
5.135.134.16 - - [23/Mar/2019:08:42:54 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:55 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:55 -0400] "POST /wp-login.php HTTP/1.1" 200 3836 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:55 -0400] "POST /wp-login.php HTTP/1.1" 200 3988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:56 -0400] "POST /xmlrpc.php HTTP/1.1" 200 413 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:05 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:06 -0400] "POST /wp-login.php HTTP/1.1" 200 3985 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:07 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:08 -0400] "POST /wp-login.php HTTP/1.1" 200 3833 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:09 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:11 -0400] "POST /wp-login.php HTTP/1.1" 200 3836 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:12 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:15 -0400] "POST /wp-login.php HTTP/1.1" 200 3837 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:17 -0400] "POST /xmlrpc.php HTTP/1.1" 200 413 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.233.99 - - [23/Mar/2019:04:17:45 -0400] "GET / HTTP/1.1" 200 25160 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
18.206.226.75 - - [23/Mar/2019:21:58:07 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "https://www.google.com/url?3a622303df89920683e4421b2cf28977" "Mozilla/5.0 (Windows NT 6.2; rv:33.0) Gecko/20100101 Firefox/33.0"
18.206.226.75 - - [23/Mar/2019:21:58:07 -0400] "POST /wp-login.php HTTP/1.1" 200 3988 "https://www.google.com/url?3a622303df89920683e4421b2cf28977" "Mozilla/5.0 (Windows NT 6.2; rv:33.0) Gecko/20100101 Firefox/33.0"
18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
command-line bash sort uniq
add a comment |
I have a log file sorted by IP addresses,
I want to find the number of occurrences of each unique IP address.
How can I do this with bash? Possibly listing the number of occurrences next to an ip, such as:
5.135.134.16 count: 5
13.57.220.172: count 30
18.206.226 count:2
and so on.
Here’s a sample of the log:
5.135.134.16 - - [23/Mar/2019:08:42:54 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:55 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:55 -0400] "POST /wp-login.php HTTP/1.1" 200 3836 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:55 -0400] "POST /wp-login.php HTTP/1.1" 200 3988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:56 -0400] "POST /xmlrpc.php HTTP/1.1" 200 413 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:05 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:06 -0400] "POST /wp-login.php HTTP/1.1" 200 3985 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:07 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:08 -0400] "POST /wp-login.php HTTP/1.1" 200 3833 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:09 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:11 -0400] "POST /wp-login.php HTTP/1.1" 200 3836 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:12 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:15 -0400] "POST /wp-login.php HTTP/1.1" 200 3837 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:17 -0400] "POST /xmlrpc.php HTTP/1.1" 200 413 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.233.99 - - [23/Mar/2019:04:17:45 -0400] "GET / HTTP/1.1" 200 25160 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
18.206.226.75 - - [23/Mar/2019:21:58:07 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "https://www.google.com/url?3a622303df89920683e4421b2cf28977" "Mozilla/5.0 (Windows NT 6.2; rv:33.0) Gecko/20100101 Firefox/33.0"
18.206.226.75 - - [23/Mar/2019:21:58:07 -0400] "POST /wp-login.php HTTP/1.1" 200 3988 "https://www.google.com/url?3a622303df89920683e4421b2cf28977" "Mozilla/5.0 (Windows NT 6.2; rv:33.0) Gecko/20100101 Firefox/33.0"
18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
command-line bash sort uniq
With “bash”, do you mean the plain shell or the command line in general?
– dessert
yesterday
Do you have any database software available to use?
– SpacePhoenix
yesterday
Related
– Julien Lopez
20 hours ago
add a comment |
I have a log file sorted by IP addresses,
I want to find the number of occurrences of each unique IP address.
How can I do this with bash? Possibly listing the number of occurrences next to an ip, such as:
5.135.134.16 count: 5
13.57.220.172: count 30
18.206.226 count:2
and so on.
Here’s a sample of the log:
5.135.134.16 - - [23/Mar/2019:08:42:54 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:55 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:55 -0400] "POST /wp-login.php HTTP/1.1" 200 3836 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:55 -0400] "POST /wp-login.php HTTP/1.1" 200 3988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:56 -0400] "POST /xmlrpc.php HTTP/1.1" 200 413 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:05 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:06 -0400] "POST /wp-login.php HTTP/1.1" 200 3985 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:07 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:08 -0400] "POST /wp-login.php HTTP/1.1" 200 3833 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:09 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:11 -0400] "POST /wp-login.php HTTP/1.1" 200 3836 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:12 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:15 -0400] "POST /wp-login.php HTTP/1.1" 200 3837 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:17 -0400] "POST /xmlrpc.php HTTP/1.1" 200 413 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.233.99 - - [23/Mar/2019:04:17:45 -0400] "GET / HTTP/1.1" 200 25160 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
18.206.226.75 - - [23/Mar/2019:21:58:07 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "https://www.google.com/url?3a622303df89920683e4421b2cf28977" "Mozilla/5.0 (Windows NT 6.2; rv:33.0) Gecko/20100101 Firefox/33.0"
18.206.226.75 - - [23/Mar/2019:21:58:07 -0400] "POST /wp-login.php HTTP/1.1" 200 3988 "https://www.google.com/url?3a622303df89920683e4421b2cf28977" "Mozilla/5.0 (Windows NT 6.2; rv:33.0) Gecko/20100101 Firefox/33.0"
18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
command-line bash sort uniq
I have a log file sorted by IP addresses,
I want to find the number of occurrences of each unique IP address.
How can I do this with bash? Possibly listing the number of occurrences next to an ip, such as:
5.135.134.16 count: 5
13.57.220.172: count 30
18.206.226 count:2
and so on.
Here’s a sample of the log:
5.135.134.16 - - [23/Mar/2019:08:42:54 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:55 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:55 -0400] "POST /wp-login.php HTTP/1.1" 200 3836 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:55 -0400] "POST /wp-login.php HTTP/1.1" 200 3988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:56 -0400] "POST /xmlrpc.php HTTP/1.1" 200 413 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:05 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:06 -0400] "POST /wp-login.php HTTP/1.1" 200 3985 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:07 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:08 -0400] "POST /wp-login.php HTTP/1.1" 200 3833 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:09 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:11 -0400] "POST /wp-login.php HTTP/1.1" 200 3836 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:12 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:15 -0400] "POST /wp-login.php HTTP/1.1" 200 3837 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:17 -0400] "POST /xmlrpc.php HTTP/1.1" 200 413 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.233.99 - - [23/Mar/2019:04:17:45 -0400] "GET / HTTP/1.1" 200 25160 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
18.206.226.75 - - [23/Mar/2019:21:58:07 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "https://www.google.com/url?3a622303df89920683e4421b2cf28977" "Mozilla/5.0 (Windows NT 6.2; rv:33.0) Gecko/20100101 Firefox/33.0"
18.206.226.75 - - [23/Mar/2019:21:58:07 -0400] "POST /wp-login.php HTTP/1.1" 200 3988 "https://www.google.com/url?3a622303df89920683e4421b2cf28977" "Mozilla/5.0 (Windows NT 6.2; rv:33.0) Gecko/20100101 Firefox/33.0"
18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
command-line bash sort uniq
command-line bash sort uniq
edited yesterday
dessert
25.3k673107
25.3k673107
asked yesterday
j0hj0h
6,5521657120
6,5521657120
With “bash”, do you mean the plain shell or the command line in general?
– dessert
yesterday
Do you have any database software available to use?
– SpacePhoenix
yesterday
Related
– Julien Lopez
20 hours ago
add a comment |
With “bash”, do you mean the plain shell or the command line in general?
– dessert
yesterday
Do you have any database software available to use?
– SpacePhoenix
yesterday
Related
– Julien Lopez
20 hours ago
With “bash”, do you mean the plain shell or the command line in general?
– dessert
yesterday
With “bash”, do you mean the plain shell or the command line in general?
– dessert
yesterday
Do you have any database software available to use?
– SpacePhoenix
yesterday
Do you have any database software available to use?
– SpacePhoenix
yesterday
Related
– Julien Lopez
20 hours ago
Related
– Julien Lopez
20 hours ago
add a comment |
7 Answers
7
active
oldest
votes
You can use grep and uniq for the list of addresses, loop over them and grep again for the count:
for i in $(<log grep -o '^[^ ]*' | uniq); do
printf '%s count %dn' "$i" $(<log grep -c "$i")
done
grep -o '^[^ ]*' outputs every character from the beginning (^) until the first space of each line, uniq removes repeated lines, thus leaving you with a list of IP addresses. Thanks to command substitution, the for loop loops over this list printing the currently processed IP followed by “ count ” and the count. The latter is computed by grep -c, which counts the number of lines with at least one match.
Example run
$ for i in $(<log grep -o '^[^ ]*'|uniq);do printf '%s count %dn' "$i" $(<log grep -c "$i");done
5.135.134.16 count 5
13.57.220.172 count 9
13.57.233.99 count 1
18.206.226.75 count 2
18.213.10.181 count 3
10
This solution iterates over the input file repeatedly, once for each IP address, which will be very slow if the file is large. The other solutions usinguniq -corawkonly need to read the file once,
– David
yesterday
1
@David this is true, but this would have been my first go at it as well, knowing that grep counts. Unless performance is measurably a problem... dont prematurely optimize?
– D. Ben Knoble
yesterday
3
I would not call it a premature optimization, given that the more efficient solution is also simpler, but to each their own.
– David
yesterday
add a comment |
You can use cut and uniq tools:
cut -d ' ' -f1 test.txt | uniq -c
5 5.135.134.16
9 13.57.220.172
1 13.57.233.99
2 18.206.226.75
3 18.213.10.181
Explanation :
cut -d ' ' -f1: extract first field (ip address)uniq -c: report repeated lines and display the number of occurences
New contributor
Mikael Flora is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
6
One could usesed, e.g.sed -E 's/ *(S*) *(S*)/2 count: 1/'to get the output exactly like OP wanted.
– dessert
yesterday
1
This should be the accepted answer, as the one by dessert needs to read the file repeatedly so is much slower. And you can easily usesort file | cut ....in case you're not sure if the file is already sorted.
– Guntram Blohm
yesterday
add a comment |
If you don't specifically require the given output format, then I would recommend the already posted cut + uniq based answer
If you really need the given output format, a single-pass way to do it in Awk would be
awk 'c[$1]++ ENDfor(i in c) print i, "count: " c[i]' log
This is somewhat non-ideal when the input is already sorted since it unnecessarily stores all the IPs into memory - a better, though more complicated, way to do it in the pre-sorted case (more directly equivalent to uniq -c) would be:
awk '
NR==1 last=$1
$1 != last print last, "count: " c[last]; last = $1
c[$1]++
END print last, "count: " c[last]
'
Ex.
$ awk 'NR==1 last=$1 $1 != last print last, "count: " c[last]; last = $1 c[$1]++ ENDprint last, "count: " c[last]' log
5.135.134.16 count: 5
13.57.220.172 count: 9
13.57.233.99 count: 1
18.206.226.75 count: 2
18.213.10.181 count: 3
it would be easy to change the cut + uniq based answer with sed to appear in the demanded format.
– Peter A. Schneider
yesterday
@PeterA.Schneider yes it would - I believe that was already pointed out in comments to that answer
– steeldriver
yesterday
Ah, yes, I see.
– Peter A. Schneider
yesterday
add a comment |
Here is one possible solution:
IN_FILE="file.log"
for IP in $(awk 'print $1' "$IN_FILE" | sort -u)
do
echo -en "$IPtcount: "
grep -c "$IP" "$IN_FILE"
done
- replace
file.logwith the actual file name. - the command substitution expression
$(awk 'print $1' "$IN_FILE" | sort -u)will provide a list of the unique values of the first column. - then
grep -cwill count each of these values within the file.
$ IN_FILE="file.log"; for IP in $(awk 'print $1' "$IN_FILE" | sort -u); do echo -en "$IPtcount: "; grep -c "$IP" "$IN_FILE"; done
13.57.220.172 count: 9
13.57.233.99 count: 1
18.206.226.75 count: 2
18.213.10.181 count: 3
5.135.134.16 count: 5
1
Preferprintf...
– D. Ben Knoble
yesterday
This means you need to process the entire file multiple times. Once to get the list of IPs and then once more for each of the IPs you find.
– terdon♦
yesterday
add a comment |
Some Perl:
$ perl -lae '$k$F[0]++; } print "$_ count: $k$_" for keys(%k)' log
13.57.233.99 count: 1
18.206.226.75 count: 2
13.57.220.172 count: 9
5.135.134.16 count: 5
18.213.10.181 count: 3
This is the same idea as Steeldriver's awk approach, but in Perl. The -a causes perl to automatically split each input line into the array @F, whose first element (the IP) is $F[0]. So, $k$F[0]++ will create the hash %k, whose keys are the IPs and whose values are the number of times each IP was seen. The improve this answer
add a comment print "$_ count: $k$_" for keys(%k)' log
13.57.233.99 count: 1
18.206.226.75 count: 2
13.57.220.172 count: 9
5.135.134.16 count: 5
18.213.10.181 count: 3
This is the same idea as Steeldriver's awk approach, but in Perl. The -a causes perl to automatically split each input line into the array @F, whose first element (the IP) is $F[0]. So, $k$F[0]++ will create the hash %k, whose keys are the IPs and whose values are the number of times each IP was seen. The
Some Perl:
$ perl -lae '$k$F[0]++; print "$_ count: $k$_" for keys(%k)' log
13.57.233.99 count: 1
18.206.226.75 count: 2
13.57.220.172 count: 9
5.135.134.16 count: 5
18.213.10.181 count: 3
This is the same idea as Steeldriver's awk approach, but in Perl. The -a causes perl to automatically split each input line into the array @F, whose first element (the IP) is $F[0]. So, $k$F[0]++ will create the hash %k, whose keys are the IPs and whose values are the number of times each IP was seen. The improve this answer
add a comment print "$_ count: $k$_" for keys(%k)' log
13.57.233.99 count: 1
18.206.226.75 count: 2
13.57.220.172 count: 9
5.135.134.16 count: 5
18.213.10.181 count: 3
This is the same idea as Steeldriver's awk approach, but in Perl. The -a causes perl to automatically split each input line into the array @F, whose first element (the IP) is $F[0]. So, $k$F[0]++ will create the hash %k, whose keys are the IPs and whose values are the number of times each IP was seen. The
Some Perl:
$ perl -lae '$k$F[0]++; print "$_ count: $k$_" for keys(%k)' log
13.57.233.99 count: 1
18.206.226.75 count: 2
13.57.220.172 count: 9
5.135.134.16 count: 5
18.213.10.181 count: 3
This is the same idea as Steeldriver's awk approach, but in Perl. The -a causes perl to automatically split each input line into the array @F, whose first element (the IP) is $F[0]. So, $k$F[0]++ will create the hash %k, whose keys are the IPs and whose values are the number of times each IP was seen. The { is funky perlspeak for "do the rest at the very end, after processing all input". So, at the end, the script will iterate over the keys of the hash and print the current key ($_) along with its value ($k$_).
And, just so people don't think that perl forces you to write script that look like cryptic scribblings, this is the same thing in a less condensed form:
perl -e '
while (my $line=<STDIN>)
@fields = split(/ /, $line);
$ip = $fields[0];
$counts$ip++;
foreach $ip (keys(%counts))
print "$ip count: $counts$ipn"
' < log
answered yesterday
terdon♦terdon
67.4k13139222
67.4k13139222
add a comment |
add a comment |
Maybe this is not what the OP want; however, if we know that the IP address length will be limited to 15 characters, a quicker way to display the counts with unique IPs from a huge log file can be achieved using uniq command alone:
$ uniq -w 15 -c log
5 5.135.134.16 - - [23/Mar/2019:08:42:54 -0400] ...
9 13.57.220.172 - - [23/Mar/2019:11:01:05 -0400] ...
1 13.57.233.99 - - [23/Mar/2019:04:17:45 -0400] ...
2 18.206.226.75 - - [23/Mar/2019:21:58:07 -0400] ...
3 18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] ...
Options:
-w N compares no more than N characters in lines
-c will prefix lines by the number of occurrences
New contributor
Y. Pradhan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Likely good enough in practice, but worth noting the corner cases. Only 6 probably constant characters after the IP ` - - [`. But in theory the address could be up to 8 characters shorter than the maximum so a change of date could split the count for such an IP. And as you hint, this won't work for IPv6.
– Martin Thornton
21 hours ago
add a comment |
Maybe this is not what the OP want; however, if we know that the IP address length will be limited to 15 characters, a quicker way to display the counts with unique IPs from a huge log file can be achieved using uniq command alone:
$ uniq -w 15 -c log
5 5.135.134.16 - - [23/Mar/2019:08:42:54 -0400] ...
9 13.57.220.172 - - [23/Mar/2019:11:01:05 -0400] ...
1 13.57.233.99 - - [23/Mar/2019:04:17:45 -0400] ...
2 18.206.226.75 - - [23/Mar/2019:21:58:07 -0400] ...
3 18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] ...
Options:
-w N compares no more than N characters in lines
-c will prefix lines by the number of occurrences
New contributor
Y. Pradhan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Likely good enough in practice, but worth noting the corner cases. Only 6 probably constant characters after the IP ` - - [`. But in theory the address could be up to 8 characters shorter than the maximum so a change of date could split the count for such an IP. And as you hint, this won't work for IPv6.
– Martin Thornton
21 hours ago
add a comment |
Maybe this is not what the OP want; however, if we know that the IP address length will be limited to 15 characters, a quicker way to display the counts with unique IPs from a huge log file can be achieved using uniq command alone:
$ uniq -w 15 -c log
5 5.135.134.16 - - [23/Mar/2019:08:42:54 -0400] ...
9 13.57.220.172 - - [23/Mar/2019:11:01:05 -0400] ...
1 13.57.233.99 - - [23/Mar/2019:04:17:45 -0400] ...
2 18.206.226.75 - - [23/Mar/2019:21:58:07 -0400] ...
3 18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] ...
Options:
-w N compares no more than N characters in lines
-c will prefix lines by the number of occurrences
New contributor
Y. Pradhan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Maybe this is not what the OP want; however, if we know that the IP address length will be limited to 15 characters, a quicker way to display the counts with unique IPs from a huge log file can be achieved using uniq command alone:
$ uniq -w 15 -c log
5 5.135.134.16 - - [23/Mar/2019:08:42:54 -0400] ...
9 13.57.220.172 - - [23/Mar/2019:11:01:05 -0400] ...
1 13.57.233.99 - - [23/Mar/2019:04:17:45 -0400] ...
2 18.206.226.75 - - [23/Mar/2019:21:58:07 -0400] ...
3 18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] ...
Options:
-w N compares no more than N characters in lines
-c will prefix lines by the number of occurrences
New contributor
Y. Pradhan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Y. Pradhan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
answered yesterday
Y. PradhanY. Pradhan
311
311
New contributor
Y. Pradhan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Y. Pradhan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Y. Pradhan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Likely good enough in practice, but worth noting the corner cases. Only 6 probably constant characters after the IP ` - - [`. But in theory the address could be up to 8 characters shorter than the maximum so a change of date could split the count for such an IP. And as you hint, this won't work for IPv6.
– Martin Thornton
21 hours ago
add a comment |
Likely good enough in practice, but worth noting the corner cases. Only 6 probably constant characters after the IP ` - - [`. But in theory the address could be up to 8 characters shorter than the maximum so a change of date could split the count for such an IP. And as you hint, this won't work for IPv6.
– Martin Thornton
21 hours ago
Likely good enough in practice, but worth noting the corner cases. Only 6 probably constant characters after the IP ` - - [`. But in theory the address could be up to 8 characters shorter than the maximum so a change of date could split the count for such an IP. And as you hint, this won't work for IPv6.
– Martin Thornton
21 hours ago
Likely good enough in practice, but worth noting the corner cases. Only 6 probably constant characters after the IP ` - - [`. But in theory the address could be up to 8 characters shorter than the maximum so a change of date could split the count for such an IP. And as you hint, this won't work for IPv6.
– Martin Thornton
21 hours ago
add a comment |
Cut -f1 -d- my.log | sort | uniq -c
Explanation: Take the first field of my.log splitting on ‘-‘ and sort it. Uniq needs sorted input. ‘-c’ tells it to count occurrences.
New contributor
PhD is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |
Cut -f1 -d- my.log | sort | uniq -c
Explanation: Take the first field of my.log splitting on ‘-‘ and sort it. Uniq needs sorted input. ‘-c’ tells it to count occurrences.
New contributor
PhD is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |
Cut -f1 -d- my.log | sort | uniq -c
Explanation: Take the first field of my.log splitting on ‘-‘ and sort it. Uniq needs sorted input. ‘-c’ tells it to count occurrences.
New contributor
PhD is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Cut -f1 -d- my.log | sort | uniq -c
Explanation: Take the first field of my.log splitting on ‘-‘ and sort it. Uniq needs sorted input. ‘-c’ tells it to count occurrences.
New contributor
PhD is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
PhD is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
answered 2 hours ago
PhDPhD
101
101
New contributor
PhD is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
PhD is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
PhD is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |
add a comment |
Thanks for contributing an answer to Ask Ubuntu!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1129521%2fhow-to-count-occurrences-of-text-in-a-file%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
With “bash”, do you mean the plain shell or the command line in general?
– dessert
yesterday
Do you have any database software available to use?
– SpacePhoenix
yesterday
Related
– Julien Lopez
20 hours ago