Share on facebook
Share on twitter
Share on linkedin

Advanced Phishing Detections in Microsoft Threat Protection, Early Steps into KQL

By Jarrell Pulsford, SOC Analyst at Bridewell Consulting.

If you have any basic experience within IT Security, you’re likely to have heard of Phishing. It is one of the longest standing, most effective and easiest to pull off hacker techniques there is. Although there are usually various detections in place, often a few emails will still manage to slip through, hoping to catch out unwary recipients.  

I am a Cyber Security Analyst for Bridewell Consulting, and I have been recently getting to grips with using the Azure platform for security and events management. As an additional necessity I have been learning to use Microsoft’s own Kusto Query Language, or KQL. As a proactive SOC we are always looking into ways to improve our detections, as if we cannot detect something, we cannot respond to it.

I was recently tasked with looking into a way to expand upon our existing phishing detections, and this led me to a great blog by Stuart Gregg on the matter.

Read it here

Although Stuart provides a great initial query here, I found when running this against our logs there were huge numbers of false positives.  I then decided to start adding some context around known events and senders as suggested, and found the overall query started getting more complex.

As someone still getting to grips with KQL (and query languages in general), there was a large amount of troubleshooting involved until I was able to produce the results I wanted.

My hope with this blog is to clearly explain my logical steps for each addition, so that others in my position can take away some basic KQL knowledge and apply this to their own advanced detection. I will provide screenshots of every addition and aim to bullet point through the KQL. If you have established KQL knowledge some of these steps and operators may be overexplained, but I hope that the resulting query is still of some inspirational use.

To begin with, here’s the provided initial framework:

This query searches through the EmailEvents table for anything containing more than 0 URLS, then the EmailUrlInfo table for any matches on the specified URL (in this case and joins the results together on their mutual NetworkMessageId to ensure they are from the same email. I will explain the join operator further on. The final part establishes which columns to display in the results and how to order them.

Although this was a great starting off point, the vast number of false positive results returned from running the query indicated that I would need to make some additions.

The first things that initially seemed worth adding were:

  • Exclusions of any results that were already detected by the phish filter anyway
  • Adding a list of approved senders
  • Adding more interesting potential phishing URLs

The earliest additions started to look something like this:

This query now contains:


  • PhishFilterVerdict !=”Phish” // Where the PhishFilterVerdict column of the EmailEvents table does not contain the result ‘Phish’
  • SenderFromDomain !in ( ) // Where the SenderFromDomain column does not contain any of the domains added between the brackets.

I built a small approved senders list from known domains in my inbox, and added this to the query. I have had to censor the contents but they would look something like this (“”,””,””) etc.

I also then added the URLs of just a few more storage services known to be associated with Phishing.

Running this back over 7 days still didn’t yield ideal results however, there were still 672 emails returned, and skimming through a few hundred showed that me that there were still far too many false positives.

My next logical step to try and narrow this down a bit was to increase the list of approved senders and domains. I needed to be able to obtain a list of a substantial size from somewhere, rather than manually reviewing my inbox.

I obtained this quickly by heading over to the Office 365 Security and Compliance Center (, then to the anti-spam settings. These contained an extensive, established list of domains and addresses inside the anti-spam policy.

Expanding the Allowed Email Addresses and Domains policy showed the allowed sender and allowed domains list. I was able to export all of these into notepad ++, reformat the list and then add them to the aforementioned place in the query.

Next, I also realised it would be more useful to have a much more reliable, high confidence list of phishing URLs to search for. I knew that users submitted suspected phishes to the Microsoft 365 Security centre for approvals on things such as deletion or url blocking, and that there must be a table of these submissions somewhere. I consulted a colleague with more experience with the various azure platform tables as to where to look, and he came back to me with the appropriate table and the following query. 

This query works as follows:

  • Let reportedPhish = // The ‘let’ statement binds names to expressions. Here it is binding ‘reportedPhish’ to the results of a following query so that we can reference its results later
  • AlertInfo where Title == “Email reported by user as malware or phish’ // Returns matches in the Title column of the AlertInfo table for “Email reported by user as malware or phish’
  • Join AlertEvidence on AlertId // The join operator merges the rows of two separate tables based on a matching value of two specified columns. In this case it is now matching the AlertInfo table rows with rows from the AlertEvidence table on their mutual AlertID column values
  • Where EntityType == ‘Url’ //Where the value in the EntityType column is Url
  • Summarize count() by RemoteUrl; // Summarizes everything now returned by the values in the RemoteUrl column, and their volume This final result is binded to ‘reportedPhish’
  • EmailUrlInfo | join kind=inner reportedPhish on $right.RemoteUrl == $left.Url // This then uses the join operator to take the established value of reportedPhish and merges that with matching results in the Url column of the EmailUrlInfo table
  • Summarize count() by Url | sort by count_desc | limit 40 // This is summarizing the final results of the Url column by their volume, sorting them by that count in descending order and limiting the results to 40

The end result meant that I now have a list of the top 40 user submitted phishing URLs, over a time period of my choice. In this case, I took the top 40 of the past 30 days. This provided a much more targeted and high confidence list, although I would advise manually reviewing these URL submissions to account for user error.

After taking these additional values to filter by and adding them to the query, it started to look like this:

To keep things looking a bit tidier, and in order to be able to easily update the user submitted phishes list, I’ve assigned some dynamic values with the let operator that we can then reference later on in the query.

By this point I could run this and it returns some pretty high confidence phishes based on user submissions that ignore emails from our trusted domains list. Good stuff, but there were still some drawbacks. The query still relied on us receiving genuine phishing urls from our userbase and hoping there are minimal errors, and then that these submissions were being regularly reviewed and appended to the query manually on a regular basis.

What would be even better is if I could pull in some extremely high confidence, verified URLs  from a regularly updated, external source. Well, luckily KQL has a great lookups feature, the ‘externaldata’ operator.

The externaldata operator does pretty much what it says one the tin. It returns a table from an external storage location, and lets us define our own schema.

The website phishtank is, in their own words, ‘a free community site where anyone can submit, verify, track and share phishing data.’ It’s a great source for verified, high confidence phishing urls, and helpfully they have a page to help developers incorporate their data.

They provide all their data as a downloadable csv file, which is updated hourly. This made for a great final addition to the query.

Before I started joining the URL column from the external CSV onto the existing query, I needed to make sure I was parsing out the fields correctly.

Here, I have:

  • Named the columns from the csv, and defined the data type. In this case both are strings. You may name the columns anything you like here. The column we need is the ‘url’ column, which I’ve corrected to a capital U in order to use the join operator later on.
  • Referenced the external file, via its URL.
  • Ignored the first line, as these are the column headers.
  • Used the distinct operator to only return the URL column, we won’t need the phish ID
  • Limited our results to just the first 10, I’m only testing to see if the data extraction works, we don’t need the full 15000+ list.

As you can see from the bottom half of the screenshot, this has returned the verified phishing URLs in a single column. Now I just needed to join these results into our existing query.

So finally, after combining all the additional aspects I had something that looks like this.

For easier reading, I’ve split the full query into two boxes. The first is the main addition. This works like this:

  • I’ve bound a name to the various expressions used throughout. For ease I’ve labelled them based on their purpose.
  • ‘getUnreportedURLs’ is pulling out all emails containing a URL that are not already classified as phishes, this time without narrowing it down to our submitted URLs.
  • ‘getPhishtankCSV’ is our externaldata, pulling out the validated URLs.
  • ‘getUnreportedPhish’ then simply joins the result of query 1 and 2 on their URL outputs to see if we get a match.
  • ‘excludeUsersubmittedPhish’ is simply the query we wrote earlier to check through the same emails for the ‘usersubmittedphishes’
  • Finally, the union operator takes the output of getUnreportedPhish and excludeUsersubmittedPhish (if there is any) and combines them into the same results table.

After these steps, I now had a high confidence, regularly updated hunting query that will pick up anything the phish filter may have missed. I hope this final query and blog provides a great template for your own SOC hunting, and some elaboration into basic KQL. I will include the query as a template framework below.  You could leave this simply as a saved hunting query in MTP for analysts to run manually, or look into automating some actions based off these results. Stuart Gregg as referenced covers using logic apps to then trigger an incident in Microsoft sentinel, and automated purging in part 2 of his blog here:

Final Query:

let usersubmittedphishes = dynamic([]);
let allowedsenders = dynamic([]);
let alloweddomains = dynamic ([]);
let getUnreportedURLs=EmailEvents
| where UrlCount > 0 and PhishFilterVerdict !=”Phish” and SenderFromDomain !in (alloweddomains) and SenderMailFromAddress !in (allowedsenders)
| join
on NetworkMessageId
| project Subject, Url, SenderMailFromAddress, SenderFromAddress, SenderMailFromDomain, RecipientEmailAddress, PhishFilterVerdict, Timestamp
| order by Timestamp desc;
let getPhishtankCSV= externaldata (phish_id: string, Url:string) [“”] with (ignoreFirstRecord=true,)
| distinct Url;
let getUnreportedPhish= getUnreportedURLs
| join
on Url;
let excludeUsersubmittedPhish = EmailEvents
| where UrlCount > 0 and PhishFilterVerdict !=”Phish” and SenderFromDomain !in (alloweddomains) and SenderMailFromAddress !in (allowedsenders)
| join
    | where Url has_any(usersubmittedphishes)
on NetworkMessageId
| project Subject, Url, SenderMailFromAddress, SenderFromAddress, SenderMailFromDomain, RecipientEmailAddress, PhishFilterVerdict, Timestamp
| order by Timestamp desc;
union (getUnreportedPhish),(excludeUsersubmittedPhish)

For more information on our services, an informal chat or anything else please do not hesitate to reach out to us on the following channels:

Related Posts