内容目录
一般对于单行结构简单的数据,我们排重直接用uniq命令就可以,但是在实际处理日志文件时候我们可能需要根据指定列来排重得到我们想要的数据结果。假设我们有日志文件access.log,文件内容如下
10.xxx.xxx.xxx - - [22/Sep/2016:11:14:07 +0800] "POST /A HTTP/1.1" 200 14535 "-" "-" 10.xxx.xxx.xxx - - [21/Sep/2016:13:03:25 +0800] "POST /B HTTP/1.1" 200 11741 "-" "-" 10.xxx.xxx.xxx - - [24/Sep/2016:11:15:07 +0800] "POST /A HTTP/1.1" 200 14535 "-" "-" 10.xxx.xxx.xxx - - [10/Sep/2016:14:17:44 +0800] "GET /C HTTP/1.1" 200 1295 "-" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"
我们需要对/A /B /C的接口地址对去重处理,命令如下
awk '{ a[$7]=$0 }END{for(i in a)print a[i]}' access.log
可以得到结果如下
10.xxx.xxx.xxx - - [21/Sep/2016:13:03:25 +0800] "POST /B HTTP/1.1" 200 11741 "-" "-" 10.xxx.xxx.xxx - - [24/Sep/2016:11:15:07 +0800] "POST /A HTTP/1.1" 200 14535 "-" "-" 10.xxx.xxx.xxx - - [10/Sep/2016:14:17:44 +0800] "GET /C HTTP/1.1" 200 1295 "-" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"