UNSW-NB15 由澳大利亚网络安全中心(ACCS)创建。
一、特征描述
These features are described in UNSW-NB15_features.csv file.
No. | Name | Type | Description |
---|---|---|---|
1 | srcip | nominal | Source IP address |
2 | sport | integer | Source port number |
3 | dstip | nominal | Destination IP address |
4 | dsport | integer | Destination port number |
5 | proto | nominal | Transaction protocol |
6 | state | nominal | Indicates to the state and its dependent protocol, e.g. ACC, CLO, CON, ECO, ECR, FIN, INT, MAS, PAR, REQ, RST, TST, TXD, URH, URN, and (-) (if not used state) |
7 | dur | Float | Record total duration |
8 | sbytes | Integer | Source to destination transaction bytes |
9 | dbytes | Integer | Destination to source transaction bytes |
10 | sttl | Integer | Source to destination time to live value |
11 | dttl | Integer | Destination to source time to live value |
12 | sloss | Integer | Source packets retransmitted or dropped |
13 | dloss | Integer | Destination packets retransmitted or dropped |
14 | service | nominal | http, ftp, smtp, ssh, dns, ftp-data ,irc and (-) if not much used service |
15 | Sload | Float | Source bits per second |
16 | Dload | Float | Destination bits per second |
17 | Spkts | integer | Source to destination packet count |
18 | Dpkts | integer | Destination to source packet count |
19 | swin | integer | Source TCP window advertisement value |
20 | dwin | integer | Destination TCP window advertisement value |
21 | stcpb | integer | Source TCP base sequence number |
22 | dtcpb | integer | Destination TCP base sequence number |
23 | smeansz | integer | Mean of the ?ow packet size transmitted by the src |
24 | dmeansz | integer | Mean of the ?ow packet size transmitted by the dst |
25 | trans_depth | integer | Represents the pipelined depth into the connection of http request/response transaction |
26 | res_bdy_len | integer | Actual uncompressed content size of the data transferred from the server抯 http service. |
27 | Sjit | Float | Source jitter (mSec) |
28 | Djit | Float | Destination jitter (mSec) |
29 | Stime | Timestamp | record start time |
30 | Ltime | Timestamp | record last time |
31 | Sintpkt | Float | Source interpacket arrival time (mSec) |
32 | Dintpkt | Float | Destination interpacket arrival time (mSec) |
33 | tcprtt | Float | TCP connection setup round-trip time, the sum of 抯ynack and 抋ckdat. |
34 | synack | Float | TCP connection setup time, the time between the SYN and the SYN_ACK packets. |
35 | ackdat | Float | TCP connection setup time, the time between the SYN_ACK and the ACK packets. |
36 | is_sm_ips_ports | Binary | If source (1) and destination (3)IP addresses equal and port numbers (2)(4) equal then, this variable takes value 1 else 0 |
37 | ct_state_ttl | Integer | No. for each state (6) according to specific range of values for source/destination time to live (10) (11). |
38 | ct_flw_http_mthd | Integer | No. of flows that has methods such as Get and Post in http service. |
39 | is_ftp_login | Binary | If the ftp session is accessed by user and password then 1 else 0. |
40 | ct_ftp_cmd | integer | No of flows that has a command in ftp session. |
41 | ct_srv_src | integer | No. of connections that contain the same service (14) and source address (1) in 100 connections according to the last time (26). |
42 | ct_srv_dst | integer | No. of connections that contain the same service (14) and destination address (3) in 100 connections according to the last time (26). |
43 | ct_dst_ltm | integer | No. of connections of the same destination address (3) in 100 connections according to the last time (26). |
44 | ct_src_ ltm | integer | No. of connections of the same source address (1) in 100 connections according to the last time (26). |
45 | ct_src_dport_ltm | integer | No of connections of the same source address (1) and the destination port (4) in 100 connections according to the last time (26). |
46 | ct_dst_sport_ltm | integer | No of connections of the same destination address (3) and the source port (2) in 100 connections according to the last time (26). |
47 | ct_dst_src_ltm | integer | No of connections of the same source (1) and the destination (3) address in in 100 connections according to the last time (26). |
48 | attack_cat | nominal | The name of each attack category. In this data set , nine categories e.g. Fuzzers, Analysis, Backdoors, DoS Exploits, Generic, Reconnaissance, Shellcode and Worms |
49 | Label | binary | 0 for normal and 1 for attack records |
二、Train & Test
A partition from this dataset is configured as a training set and testing set, namely, [UNSW_NB15_training-set.csv](https://www.unsw.adfa.edu.au/unsw-canberra-cyber/cybersecurity/ADFA-NB15-Datasets/a part of training and testing set/UNSW_NB15_training-set.csv) and [UNSW_NB15_testing-set.csv](https://www.unsw.adfa.edu.au/unsw-canberra-cyber/cybersecurity/ADFA-NB15-Datasets/a part of training and testing set/UNSW_NB15_testing-set.csv) respectively.
The number of records in the training set is 175,341 records and the testing set is 82,332 records from the different types, attack and normal.Figure 1 and 2 show the testbed configuration dataset and the method of the feature creation of the UNSW-NB15, respectively.
1 特征类型
1 | # df_train.columns |
2 分类统计
ID | Type | Count | Train(drop_duplicates) | Test(drop_duplicates) |
---|---|---|---|---|
0 | Normal | 93000 | 56000(51890) | 37000(34206) |
1 | Generic | 58871 | 40000(4181) | 18871(3657) |
2 | Exploits | 44525 | 33393(19844) | 11132(7609) |
3 | Fuzzers | 24246 | 18184(16150) | 6062(4838) |
4 | DoS | 16353 | 12264(3806) | 4089(1718) |
5 | Reconnaissance | 13987 | 10491(7522) | 3496(2703) |
6 | Analysis | 2677 | 2000(1594) | 677(446) |
7 | Backdoor | 2329 | 1746(1535) | 583(346) |
8 | Shellcode | 1511 | 1133(1091) | 378(378) |
9 | Worms | 174 | 130(127) | 44(44) |
Total | 257673 | 175341 | 82332 |
3 异常值
1 | # NaN, Duplicated, Inf |