Hunt.io SQL Documentation
Welcome to the Hunt.io SQL Documentation for the web portal. This guide explains how to use the read-only SQL-like interface in the Hunt App at https://app.hunt.io for threat hunting and security event analysis. (For API-specific details, see Hunt API Documentation).
1. Introduction
The Hunt.io SQL interface empowers threat hunters and analysts to explore a massive repository of security events in a way that best suits their investigative needs. Using this read-only SQL-like language you can select fields, filter results, group data, and leverage functions (such as uniq(), min(), and max()) to analyze events—from malware probes and certificate anomalies to open directory listings and phishing alerts.
2. Hunt SQL Concepts
-
Perform Threat Hunting: Query and analyze events quickly to detect patterns or anomalies.
-
Analyze Historical Data: Use relative time expressions to filter events and aggregation functions to determine activity periods.
-
Monitor Specific Indicators: Focus on key indicators such as phishing URLs, certificate issues, or unusual file names.
-
Aggregate Data: Group and summarize results to reveal trends or count unique events.
-
Query by IP, ASN, and Hosting Company: Filter events based on specific IP addresses, IP ranges, ASNs, or hosting company domains.
3. Using the Web Interface
-
Accessing SQL: Click the SQL option in the left navigation bar.
-
Samples Dropdown: Select sample queries from the dropdown (top right of the query box) to get started.
-
Run Query: Click Run Query to execute your query. A spinner indicates processing, and results appear under the Results tab.
-
Format Query: Use the Format Query button to automatically reformat your query for readability.
-
Copy and Clear: Use the copy icon to quickly copy your query, or click the trash can icon to clear the query box.
-
Autocomplete: Autocomplete assists in finding specific field names as you type.
-
API Access: For further details, refer to Hunt API Documentation.
4. SQL Query Syntax Overview
The interface supports a read-only SQL-like subset:
- SELECT Clause:
Specify the fields to return.SELECT ip, port FROM http - FROM Clause:
Specify the dataset (e.g.,http,malware,certificates). - WHERE Clause:
Filter records using supported operators. Use==for equality andgtfor "greater than":SELECT ip, port FROM http WHERE timestamp.day gt '2024-08-01' - GROUP BY Clause:
Group results by one or more fields. - Aggregate Functions:
Use functions such asuniq(),min(), andmax()for aggregations.
Note: Only
SELECTqueries are permitted. Data modifications (INSERT/UPDATE/DELETE) are not allowed.
5. Field Properties and Allowed Operators
Each field in a dataset includes metadata that indicates:
- Sortable: Fields that can be used in
ORDER BYclauses. - Comparable: Fields that support operators such as
==andgt. - Regex Support:
Many string fields (e.g.,hostnamein thehttpdataset orheader.raw) support regex via theRLIKEoperator, while simpler pattern matching can be performed withLIKE.
6. Datasets Overview
| Dataset | Description | Key Fields / Notes |
|---|---|---|
| http | Web scanning data for HTTP/HTTPS services. | Banners, status codes, headers, timestamps. |
| protocol | Raw protocol scanning data from internet probes. | IP, port, TTL, fingerprint. |
| ssh | Scan data from hosts with exposed SSH services. | Host, port, banner, versions, key values. |
| malware | Information on Command and Control (C2) servers tracked by Hunt. | Malware name, ASN information, IP, hostname, scan details. |
| certificates | TLS/SSL certificate events with full issuer and subject details. | Subject common name, issuer details, timestamps. (JA4X fields are defined here but are currently not accessible via SQL.) |
| open_directories | Data on exposed open directories, including hosts, file names, and directory structures. | Hosts, file names, directory structures, file URLs. |
| open_directory_filenames | File data from open directories that were not downloaded via AttackCapture. | Full file names and URLs for pivoting. |
| phishing | Records of phishing sites and URLs tracked by proprietary scanners. | URLs, timestamps, verdicts. |
| honeypot | Data from devices scanning Hunt honeypots, including probe details. | Probe details, IP, HTTP requests. |
| crawler | Data obtained by Hunt URL crawlers; useful for discovering URLs not yet linked to phishing domains. | Discovered URLs, status, errors. |
| urlx | An experimental dataset—a collection of URLs for reconnaissance. | Experimental; used for recon; subject to change. |
7. Time Stamps and Relative Time Analysis
Time stamps are essential for analyzing event data. Every dataset includes time stamp fields (e.g., timestamp, timestamp.day, timestamp.hour, timestamp.week) that allow both absolute and relative filtering.
Default Time Range Settings:
- For most tables, if no explicit time filter is specified, the system defaults to the past 30 days.
- For the certificates and http tables, the default is the past 7 days.
You can override these defaults by specifying explicit time conditions. Querying data beyond these ranges is subject to your data quota; please contact support if you require extended historical data or special permissions.
Relative Time Expressions:
Use NOW with allowed units (SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, YEAR) to dynamically filter recent events:
SELECT ip, port, timestamp
FROM http
WHERE timestamp > NOW - 1 DAY
Custom Time Ranges:
Combine conditions to set a custom range:
SELECT ip, port, timestamp
FROM malware
WHERE timestamp > NOW - 7 DAY
AND timestamp <= NOW - 1 DAY
Time Aggregation:
Aggregate time data to determine the first and last occurrence of events:
SELECT ip, min(timestamp) AS first_active, max(timestamp) AS last_active
FROM malware
GROUP BY ip
ORDER BY first_active
LIMIT 10
8. Example Queries
Basic Examples
- Select All Malware Probes:
SELECT * FROM malware - Query by Individual IP:
SELECT * FROM malware WHERE ip == '192.168.1.1' - Query by IP Range:
SELECT * FROM http WHERE ip BETWEEN '1.0.0.0' AND '1.0.0.255' - Query by IP Range (with included values):
SELECT *
FROM http
WHERE ip >= '1.0.0.0' and ip <= '1.0.0.255'
9. Querying by IP, ASN, and Hosting Company
- By Individual IP:
SELECT * FROM malware WHERE ip == '192.168.1.1' - By IP Range:
SELECT * FROM malware WHERE ip LIKE '192.168.1.%' - By ASN: (Using a numeric ASN as an example)
SELECT * FROM malware WHERE asn.number == 12345 - By Hosting Company:
If available, filter events by hosting company domains.
10. Advanced Query Examples and Nuances
Aggregation and Grouping
- Query A: Malware Distribution by Name
SELECT malware.name, count(*) AS events FROM malware WHERE timestamp.day gt '2024-07-01' GROUP BY malware.name ORDER BY events DESC LIMIT 10 - Query B: Aggregate Malware by ASN (Unique Malware Names)
SELECT asn.name, uniq(malware.name) FROM malware WHERE timestamp.day gt '2024-08-01' GROUP BY asn.name - Query C: Grouping by Certificate Common Name
SELECT subject.common_name, min(timestamp) AS first_seen, max(timestamp) AS last_seen, count(*) AS cert_count FROM certificates GROUP BY subject.common_name ORDER BY cert_count DESC LIMIT 10
Regex (RLIKE) Examples
Demonstrate advanced pattern matching:
-
URL Pattern Matching in urlx Dataset:
SELECT * FROM urlx WHERE url RLIKE '^https:\/\/[0-9\.]+\/[a-z0-9]{16}\.php' AND timestamp.day gt '2024-11-01'Explanation:
^https:\/\/ensures the URL starts with "https://".[0-9\.]+matches one or more digits or dots (an IP-like pattern).\/[a-z0-9]{16}\.phpmatches a slash followed by exactly 16 lowercase alphanumeric characters, ending with ".php".
-
Matching Certificate Common Names in the certificates Dataset:
SELECT ip, port FROM certificates WHERE ja4x.full = '7022c563de38_7022c563de38_e73b053161df' AND subject.common_name RLIKE '^new[0-9]{2}$' AND port = '3389' GROUP BY ip, portExplanation:
^newensures the certificate Common Name starts with "new".[0-9]{2}matches exactly two numeric digits (e.g., new01, new99).$enforces that the Common Name ends after the two digits, preventing partial matches.
-
Open Directories Example: Filtering PDF Files
SELECT file_name, file_url FROM open_directories WHERE file_name RLIKE '.*\.pdf$' LIMIT 10Explanation:
.*matches any characters preceding the file extension.\.pdfmatches the literal ".pdf" (with the dot escaped).$asserts that the match occurs at the end of the string.
Testing Unsupported Operations
- Arithmetic Expressions Not Supported:
The following query attempts to add 100 to theportfield, which is unsupported:This query returns an error. Use standard comparison operators (SELECT * FROM certificates WHERE port + 100 > 500==,gt) instead.
11. Additional Resources
- Hunt SQL Blog Post: https://hunt.io/blog/sql
- Hunt SQL Features: https://hunt.io/features/hunt-sql
- SQL Search API Documentation: https://apidocs.hunt.io
- Support: [email protected]
12. Change Log
- 11/01/2024: Initial SQL interface launch in the Hunt App.
- 12/15/2024: API documentation released.
- 01/10/2025: Updated SQL documentation with advanced query examples, relative time syntax, and URLx query enhancements.
- [Current Date]: Refined documentation with additional queries and improved user experience details.