Constructing an Advanced Query

Modified on Tue, 05 Dec 2023 at 01:07 AM

The Advanced Query feature within the Samurai XDR application uses Microsoft's Kusto Query Language (KQL).  In this article we discuss the basics of KQL, the logic of a query and provide some examples to get you started.

 

What is KQL?

In short, KQL is as it states, a Query Language.

The "K" in KQL (Kusto) is named after Jacques Cousteau, the infamous ocean explorer! Just like Jacques's exploration into the depths of the oceans, finding previously unknown volcanic basins,  KQL provides you the ability to explore the expanse of your telemetry data.  

 

Why and when use KQL?

Of course, you are not going to find any volcanic basins or new species of dolphin in your data, however in the cybersecurity context it will allow you to find actionable information.  Use of KQL will allow you to investigate your data to answer simple questions such as 'is my log source generating data' through to tracing the sequence of events in a breach. You may be familiar with the term Threat Hunting, effectively searching for malicious, suspicious or nefarious activity - whether that be proactive via determining a hypothesis through to hunts based on Indicators of Compromise (IOCs) and Indicators of Attack (IOAs). In essence, using KQL helps you answer the following questions:

  1. Does X exist
  2. Where does X exist?
  3. Why does X exist?
  4. How to respond?

 

KQL Logic

A typical query is structured to search, locate information and produce results.

The structure may include:

  1. What? table to query
  2. Pipe (|) for command separation
  3. Filter
  4. Order data
  5. Modify Columns in results

Lets walk through some simple examples to understand the logic.

 

Find events between two hosts

1 events
2 | where src_ip == "10.170.236.50" and dest_ip == "10.179.236.106"
  1. The first step in this query outlines what to query, in this example it is the "events" table.  By default Samurai XDR always queries the "events" table. 
  2. The pipe ( | ) command is always used for command separation. 
  3. We then use a 'where' operator to filter within the query for the source ip address (src_ip) of  "10.170.236.50" and a destination IP address (dst_ip) of "10.179.236.106"

example1.png

 

When looking at the results of this query, you will be presented with associated Fields based on the query which allows you to narrow down your search. KQL query statements work like a funnel, starting with a large data set and passing it through multiple operators until it is filtered, summarized or rearranged as required.

By selecting a Favorite or Other Fields you can start to narrow down your results to your requirements. Alternatively you may choose to include the fields within the query itself or use the Project operatorto include specific column fields within your result.

Refer to Advanced Query Functionality for information on the feature within the Samurai XDR application.

 

 

Search for events with source IP 10.170.236.50 and display a table with a few selected columns

mceclip0.png The pipe ( | ) command is always used for command separation

1 events
2 |where src_ip=="10.170.236.50"
3 |order by timestamp
4 |project timestamp,action,src_ip,src_port,dest_ip,dest_port
  1. Query the 'events' table
  2. Filter events using the where operator for source IP address "10.170.236.50"
  3. Use the order operator to order results by timestamp
  4. Use the project operator to include the column fields "timestamp, action, src_ip, src_port, dest_ip, dest_port"

example2.png

 

The simple examples above make use of common operators, use the KQL quick reference guide for more info on operators which includes a comprehensive list with definitions.

 

 

Complex Examples

Lets now walk through some more complex examples.

 

Frequency of Events

A common requirement is to find the frequency of occurrence of an event.  For instance, in this example we are reviewing Amazon VPC Flow logs and finding which destination IP addresses are receiving the most connections to port numbers below 1024.

events
where  product == "VPC Flow Logs"
    and toint(dest_port) < 1024
| summarize connections = count() by dest_ip
| order by connections 

 

What is also worth noting in this example is that we first need to convert the destination port number to an integer type (as it is a text field) before checking if it is a low-numbered port (less than 1024).  For more information on the data types used in the data lake, you can refer to the article on event field data types.

This query produces output which looks something like this:


 

The result helps us to see which IP addresses are accepting the most connections to privileged ports.

 

 

Querying raw logs

Advanced Query isn't only able to query logs from sources which originate from supported integrations.  The fact that logs from any kind of syslog source can be ingested into Samurai XDR makes it possible to query the raw content of these logs using Advanced Query.

In the example below, we are taking authentication logs from host, and querying them for failed authentication attempts.  We are able to use the extend operator and extract function to create our own fields from the log lines, parsing them using regular expressions.

events
where host == "10.1.1.1"
        and  (raw   contains "Invalid"or raw contains "fail")
        and raw !contains "connect"
extend message = substring(raw, 16)
extend src_host = extract("([A-Za-z0-9\\-]+) .+"1, message)
extend msg_info = extract("\\[[0-9]+\\]\\: (.+)"1, message)
extend app_src = extract("[A-Za-z0-9\\-]+ ([a-zA-Z0-9\\-]+).+"1, message)
extend user = extract(" ([A-Za-z0-9\\-]+) from "1, msg_info)
extend src_ip = extract("from ([0-9a-f\\.\\:]+)"1, msg_info)
project   timestamp, 
            host,
            src_host,
            app_src,
            user,
            src_ip ,
            msg_info
summarize attempts = count() by src_ip
order by attempts

 

Once we have extracted the fields we want, we can then go on to perform more operations.  In this case we are summarizing the logs by counting the failed authentication attempts by source IP address, and ordering the list so that the IP address with the most failed attempts is listed first.  In this case, this helps us to find potential brute force attackers who are trying to guess passwords through brute force tactics.

 

Tips!

Be Specific when constructing queries!

Used correctly, Advanced Query can perform sophisticated queries matching against a data set measured in terabytes within seconds! However poorly constructed queries can cause problems, cause dreaded browser slowdowns, or even trigger a query time-out when exceeding the maximum allowed query wait time. The more specific you are with your query, the quicker you are able to will get to the Result.

mceclip0.png Refer to the Microsoft documentation Query Limits for further information on limitations.

Lets look at an example:

  • Do not run a query with no criteria (for example simply 'events') for a time period exceeding a few minutes. Whilst this might be tempting to view all events, this will match ALL results (alerts and events) in your Samurai XDR tenant, delivering a sub optimal experience - results for such a query could potentially be measured in Gigabytes or at times Terabytes!

Instead, try to be as specific as possible:

  • If you are querying activity for a specific source host, add a where statement specifically asking for results from a specific source:
events 
 where src == "172.21.33.99"

    •  Example:
      • # matching results: 9 100 000 events/alerts
      • Approx time to completion: Partial results in 45 seconds.
  • If you are researching activity related to a specific source type, extend the query to specifically ask for results matching a type:
events 
 where src == "172.21.33.99" and type == "WEBPROXY"

    • Example:
      • # matching results: 3 700 000 events/alerts
      • Approx time to completion: Full Results in 40 seconds.
  • If you are looking for specific fields, extend the query to specifically project specific fields: 
events 
| where src == "172.21.33.99" and type == "WEBPROXY" 
| project timestamp, src, url

    • Example:
    • # matching results: 3 700 000 events/alerts
    • Approx time to completion: Full Results in 7 seconds

 

What Now?

As you may have realized from reading this article, Advanced Query is powerful tool - only limited by your own understanding of KQL and also in determining what questions or hypothesis against your data you may have. We recommend you start by writing a few simple queries and review the Microsoft documentation. If you need a reminder of usage in the Samurai XDR application, be sure to review Advanced Query Functionality.