Current dataset includes records for 2014-2024 AND records for cyber events reported on (i.e., publication date of media source as opposed to the actual date the cyber event occurred) in January and August 2025.
Enhancements to the Cyber Events Database: We now leverage the Global Database of Events, Language, and Tone (GDELT) Project’s Web News NGrams 3.0 and Article List datasets for monitoring of global news to identify candidate cyber events. This identification process supplements our original data scraping approach beginning with date from January 2025. This allows us to:
- Expand our coverage
- Improve our tracking of non-English language sources
- Provide monthly updates with the previous month's cyber event records (with new data released on the second Wednesday of every month)
Additionally, we have several new variables, including:
- Identify events without GDELT (original_method) – To allow researchers to control for differences in data collection methods when analyzing trends across the entire dataset (2014–present) and account for the shift to GDELT in 2025 an Integer (0 or 1) indicates whether the event data was collected using the original web scraping method (pre-GDELT, before January 2025) or the GDELT-based method (January 2025 onward).
- Date of publication (reported_date): The date, in DD-MM-YYY format, the media source published the article identifying the relevant cyber event. This is different from event_date which is the date or estimated date a cyber event actually occurred.
- Severity measures for disruptive events (magnitude, duration, scope) – Qualitative and quantitative information that describes the magnitude, duration, and/ or scope of the cyber event.
- Severity measures for exploitive events (ip, org_data, cust_data) – Qualitative and quantitative information describing the type i.e., intellectual property, organizational, and/or customer, and amount of data compromised.
Our newest data release includes records for cyber events reported on in January and August 2025. Please note that our month variable uses the month reported in event_date, which is the actual or estimated date the cyber event occurred. Therefore, you will see many months in 2025 represented in the month column. Nonetheless, we are still in the process of coding data for events reported on from February through July 2025. Our team is working diligently to backfill all of 2025 using our new data collection method and will do so incrementally over several weeks.
You can also expect a monthly update with the previous month's events coded and posted on the second Wednesday of the following month. For example, our Wednesday, October 8, 2025 update will include cyber events reported on in September 2025 and so on.