【对比python】日志处理2 | 润乾 -欧洲杯在线开户
任务:每条日志不定行,每行都有相同的标记表示是一条记录。
python
1 | import pandas as pd |
2 | log_file = 'e://txt//indefinite _info.txt' |
3 | log_info = pd.read_csv(log_file,header=none) |
4 | log_g = log_info.groupby(log_info[0].apply(lambda x:x.split("\t")[0]),sort=false) |
5 | columns = ["userid","gender","age","salary","province","musicid","watch_time","time"] |
6 | df_dic = {} |
7 | for c in columns: |
8 | df_dic[c]=[] |
9 | for index,group in log_g: |
10 | rec_dic = {} |
11 | rec = group.values.flatten() |
12 | rec = '\t'.join(rec).split("\t") |
13 | for r in rec: |
14 | v = r.split(":") |
15 | rec_dic[v[0]]=v[1] |
16 | for col in columns: |
17 | if col not in rec_dic.keys(): |
18 | df_dic[col].append(none) |
19 | else: |
20 | df_dic[col].append(rec_dic[col]) |
21 | df = pd.dataframe(df_dic) |
22 | print(df) |
集算器
a | ||
1 | e://txt//indefinite _info.txt | |
2 | =file(a1).import@s() | |
3 | [userid,gender,age,salary,province,musicid,watch_time,time] | |
4 | =a2.group@o(_1.array("\t")(1)) | |
5 | =a4.(~.(_1.array("\t")).conj().id().align(a3,~.array("\:")(1)).(~.array("\:")(2))).conj() | |
6 | =create(${a3.concat@c()}).record(a5) |
集算器的归并分组方式和特殊的对齐运算使的日志整理轻松写意。